Exploiting a Cross-mmap Overflow in Firefox
This post will explore how CVE-2016-9066, a simple but quite interesting (from an exploitation perspective) vulnerability in Firefox, can be exploited to gain code execution.
The following code is used for loading the data for (external) script tags:
The code will be invoked by
OnIncrementalData whenever new data has arrived from the server. The bug is a simple integer overflow, happening when the server sends more than 4GB of data. In that case,
capacity will wrap around and the following call to
mBuffer.reserve will not modify the buffer in any way.
mDecode->Convert then writes data past the end of an 8GB buffer (data is stored as char16_t in the browser), which will be backed by an mmap chunk (a common practice for very large chunks).
The patch is also similarly simple:
The bug doesn’t look too promising at first. For one, it requires sending and allocating multiple gigabytes of memory. As we will see however, the bug can be exploited fairly reliably and the exploit completes within about a minute after opening the page on my 2015 MacBook Pro. We will now first explore how this bug can be exploited to pop a calculator on macOS, then improve the exploit to be more reliable and use less bandwidth afterwards (spoiler: we will use HTTP compression).
Since the overflow happens past the end of an mmap region, our first concern is whether it is possible to reliably allocate something behind the overflown buffer. In contrast to some heap allocators, mmap (which can be thought of as a memory allocator provided by the kernel) is very deterministic: calling mmap twice will result in two consecutive mappings if there are no existing holes that could satisfy either of the two requests. You can try this for yourself using the following piece of code. Note that the result will be different depending on whether the code is run on Linux or macOS. The mmap region grows towards lower addresses on Linux while it grows towards higher ones on macOS. For the rest of this post we will focus on macOS. A similar exploit should be possible on Linux and Windows though.
The output of the above program tells us that we should be able to allocate something behind the overflowing buffer by simply mmap’ing memory until all existing holes are filled, then allocating one more memory chunk through mmap. To verify this we will do the following:
When the browser requests payload.js, have the server reply with a Content-Length of 0x100000001 but only send the first 0xffffffff bytes of the data
Have the webserver send the remaining two bytes of payload.js
Check the first few bytes of every ArrayBuffer, one should contain the data sent by the webserver
/sync looks as follows:
On the client side, I used synchronous XMLHttpRequests to block script execution until the server has finished its part:
With that we can implement the above scenario and we will see that indeed one of the ArrayBuffer objects now contains our payload byte at the start of its buffer. There is one small limitation though: we can only overflow with valid UTF-16, as that is what Firefox uses internally. We’ll have to keep this in mind. What remains now is to find something more interesting to allocate instead of the ArrayBuffer that was overflown into.
Hunting for Target Objects
malloc (and thus the
The tenured heap. Longer lived objects as well as a few selected object types are allocated here. This is a fairly classical heap that keeps track of free spots which it then reuses for future allocations.
The Nursery. This is a memory region that contains short-lived objects. Most JSObjects are first allocated here, then moved into the tenured heap if they are still alive during the next GC cycle (this includes updating all pointers to them and thus requires that the gargabe collector knows about all pointers to its objects). The nursery requires no free list or similar: after a GC cycle the nursery is simply declared free since all alive objects have been moved out of it.
For a more in depth discussion of Spidermonkey internals see this phrack article.
Objects in the tenured heap are stored in containers called Arenas:
firstFreeSpan member of the Arena class: it is the very first member of an Arena object (and thus lies at the beginning of an mmap-ed region), and essentially indicates the index of the first free cell inside this Arena. This is how a FreeSpan instance looks like:
last are byte indices into the Arena, indicating the head of the freelist. This opens up an interesting way to exploit this bug: by overflowing into the
firstFreeSpan field of an Arena, we may be able to allocate an object inside another object, preferably inside some kind of accessible inline data. We would then be able to modify the “inner” object arbitrarily.
This technique has a few benefits:
We only need to overflow 4 bytes of the following chunk and thus won’t corrupt any pointers or other sensitive data
As it turns out, ArrayBuffer objects up to a size of 96 bytes will have their data stored inline after the object header. They will also skip the nursery and thus be located inside an Arena. This makes them ideal for our exploit. We will thus
Allocate lots of ArrayBuffers with 96 bytes of storage
Overflow and create a fake free cell inside the Arena following our buffer
Allocate more ArrayBuffer objects of the same size and see if one of them is placed inside another ArrayBuffer’s data (just scan all “old” ArrayBuffers for non-zero content)
The Need for GC
Unfortunately, it’s not quite that easy: in order for Spidermonkey to allocate an object in our target (corrupted) Arena, the Arena must have previously been marked as (partially) free. This means that we need to free at least one slot in each Arena. We can do this by deleting every 25th ArrayBuffer (since there are 25 per Arena), then forcing garbage collection.
Spidermonkey triggers garbage collection for a variety of reasons. It seems the easiest one to trigger is
TOO_MUCH_MALLOC: it is simply triggered whenever a certain number of bytes have been allocated through malloc. Thus, the following code suffices to trigger a garbage collection:
Afterwards, our target arena will be put onto the free list and our subsequent overwrite will corrupt it. The next allocation from the corrupted arena will then return a (fake) cell inside the inline data of an ArrayBuffer object.
(Optional Reading) Compacting GC
Actually, it’s a little bit more complicated. There exists a GC mode called compacting GC, which will move objects from multiple partially filled arenas to fill holes in other arenas. This reduces internal fragmentation and helps freeing up entire Chunks so they can be returned to the OS. For us however, a compacting GC would be troublesome since it might fill the hole we created in our target arena. The following code is used to determine whether a compacting GC should be run:
Looking at the code there should be ways to prevent a compacting GC from happening (e.g. by performing some animations). It seems we are lucky though: our
gc function from above will trigger the following code path in Spidermonkey, thus preventing a compacting GC since the invocationKind will be
GC_NORMAL instead of
Writing an Exploit
At this point we have all the pieces together and can actually write an exploit. Once we have created the fake free cell and allocated an ArrayBuffer inside of it, we will see that one of the previously allocated ArrayBuffers now contains data. An ArrayBuffer object in Spidermonkey looks roughly as follows:
XXX_SLOT constants determine the offset of the corresponding value from the start of the object. As such, the data pointer (
DATA_SLOT) will be stored at
addrof(ArrayBuffer) + sizeof(ArrayBuffer).
We can now construct the following exploit primitives:
Reading from an absolute memory address: we set the
DATA_SLOTto the desired address and read from the inner ArrayBuffer
Writing to an absolute memory address: same as above, but this time we write to the inner ArrayBuffer
slots_pointer through our existing read primitive
To avoid crashing the browser process during the next GC cycle, we have to repair a few things:
The ArrayBuffer following the outer ArrayBuffer in our exploit, as that one will have been corrupted by the inner ArrayBuffer’s data. To fix this, We can simply copy another ArrayBuffer object into that location
The Cell that was originally freed in our Arena now looks like a used Cell and will be treated as such by the collector, leading to a crash since it has been overwritten with other data (e.g. a FreeSpan instance). We can fix this by restoring the original firstFreeSpan field of our Arena to mark that Cell as free again.
This suffices to keep the browser alive after the exploit finishes.
Putting everything together, the following steps will award us with an arbitrary read/write primitive:
Insert a script tag to load the payload and eventually trigger the bug.
Wait for the server to send up to 2GB + 1 bytes of data. The browser will now have allocated the final chunk that we will later overflow from. We try to fill the existing mmap holes using ArrayBuffer objects like we did for the very first PoC.
Have the server send everything up to 0xffffffff bytes in total, completely filling the current chunk
Free one ArrayBuffer in every arena and try to trigger gargabe collection so the arenas are inserted into the free list.
Have the server send the remaining data. This will trigger the overflow and corrupt the internal free list (indicating which cells are unused) of one of the arenas. The freelist is modified such that the first free cell lies within the inline data of one of the ArrayBuffers contained in the Arena.
Allocate more ArrayBuffers. If everything worked so far, one of them will be allocated inside the inline data of another ArrayBuffer. Search for that ArrayBuffer.
If found, construct an arbitrary memory read/write primitive. We can now modify the data pointer of the inner ArrayBuffer, so this is quite easy.
Repair the corrupted objects to keep the process alive after our exploit is finished.
What remains now is to somehow pop a calculator.
A simple way to run custom code is to abuse the JIT region, however, this technique is (partially) mitigated in Firefox. This can be bypassed given our exploitation primitives (e.g. by writing a small ROP chain and transferring control there), but this seemed to complicated for a simple PoC.
I instead ended up using some standard CTF tricks to finish the exploit: looking for cross references to libc functions that accept a string as first argument (in this case strcmp), I found the implementation of
Date.toLocalFormat and noticed that it converts its first argument from a JSString to a C-string, which it then uses as first argument for strcmp. So we can simply replaced the GOT entry for
strcmp with the address of
system and execute
data_obj.toLocaleFormat("open -a /Applications/Calculator.app");. Done :)
Improving the Exploit
At this point the basic exploit is already finished. This section will now describe how to make it more reliable and less bandwidth hungry.
Up until now our exploit simply allocated a few very large ArrayBuffer instances (1GB each) to fill existing mmap holes, then allocated roughly another GB of js::Arena instances to overflow into. It thus assumed that the browsers heap operations are more or less deterministic during exploitation. Since this isn’t necessarily the case, we’d like to make our exploit a little more robust.
A quick look at then implementation of the mozilla::Vector class (which is used to hold the script buffer) shows us that it uses
realloc to double the size of its buffer when needed. Since jemalloc directly uses mmap for larger chunks, this leaves us with the following allocation pattern:
- mmap 1MB
- mmap 2MB, munmap previous chunk
- mmap 4MB, munmap previous chunk
- mmap 8MB, munmap previous chunk
- mmap 8GB, munmap previous chunk
Because the current chunk size will always be larger than the sum of all previous chunks sizes, this will result in a lot of free space preceding our final buffer. In theory, we could simply calculate the total sum of the free space, then allocate a large ArrayBuffer afterwards. In practice, this doesn’t quite work since there will be other allocations after the server starts sending data and before the browser finishes decompressing the last chunk. Also jemalloc holds back a part of the freed memory for later usage. Instead we’ll try to allocate a chunk as soon as it is freed by the browser. Here’s what we’ll do:
The server sends all data up to the next power of two (in MB) and thus triggers exactly one call to realloc at the end. The browser will now free a chunk of a known size
The server sets the
This is repeated until we have send 0x80000001 bytes (thus forced the allocation of the final buffer)
This simple algorithm is implemented on the server side here and on the client in step 1. Using this algorithm, we can fairly reliably get an allocation behind our target buffer by spraying only a few megabytes of ArrayBuffer instances instead of multiple gigabytes.
Reducing Network Load
Our current exploit requires sending 4GB of data over the network. That’s easy to fix though: we’ll use HTTP compression. The nice part here is that e.g. zlip supports “streamed” compression, which makes it possible to incrementally compress the payload. With this we just have to add each part of the payload to the zlib stream, then call flush on it to obtain the next compressed chunk of the payload and send that to the server. The server will uncompress this chunk upon receiving it and perform the desired action (e.g. perform one realloc step).
This is implemented in the
construct_payload method in poc.py and manages to reduce the size of the payload to about 18MB.
About Resource Usage
At least in theory, the exploit requires quite a lot of memory:
around 256 MB of ArrayBuffers
However, since many of the buffers are never written to, they don’t necessarily consume any physical memory. Moreover, during the final realloc, only 4GB of the new buffer will be written to before the old buffer is freed, so really “only” 8 GB are required there.
That’s still a lot of memory though. However, there are some technologies that will help reduce that number if physical memory becomes low:
Memory compression (macOS): large memory regions can be compressed and swapped out. This is perfect for our use case since the 8GB buffer will be completely filled with zeroes. This effect can be observed in the Activity Monitor.app, which at some point shows more than 6 GB of memory as “compressed” during the exploit.
Page deduplication (Windows, Linux): pages containing the same content are mapped copy-on-write (COW) and point to the same physical page (essentially reducing memory usage to 4KB).
CPU usage will also be quite high during peek times (decompression). However, CPU pressure could further be reduced by sending the payload in smaller chunks with delays in between (which would obviously increase the time it takes for the exploit to work though). This would also give the OS more time to compress and/or deduplicate the large memory buffers.
Further Possible Improvements
There are a few sources of unreliability in the current exploit, mostly dealing with timing:
If a garbage collection cycle runs after we have corrupted the FreeSpan but before we have fixed it, we crash
If a compacting gargabe collection cycle runs after we have freed some of the ArrayBuffers but before we have triggered the overflow, the exploit fails as the Arena will be filled up again.
If the fake free Cell happens to be placed inside the freed ArrayBuffer’s cell, then our exploit will fail and the browser will crash during the next gargabe collection cycle. With 25 cells per arena this gives us a theoretical 1/25 chance to fail. However, in my experiments, the free cells were always located at the same offset (1216 bytes into the Arena), indicating that the state of the engine at the beginning of the exploit is fairly deterministic (at least regarding the state of the Arenas holding objects of size 160 bytes).
From my experience, the exploit runs pretty reliable (>95%) if the browser is not under heavy usage. The exploit still works if 10+ other tabs are open, but might fail if for example a large web application is currently loading.
While the bug isn’t ideal from an attacker’s perspective, it still can be exploited fairly reliably and without too much bandwidth usage. It is interesting to see how various technologies (compression, same page merging, …) can make a bug such as this one easier to exploit.
Thinking of ways to prevent exploitability of such a bug, a few things come to mind. One fairly generic mitigation are guard pages (a page leading to a segfault whenever accessed in some way). These would have to be allocated before or after every mmap allocated region and would thus protect against exploitation of linear overflows such as this one. They would, however, not protect against non-linear overflows such as this bug. Another possibility would be to introduce internal mmap randomization to scatter the allocated regions throughout the address space (likely only effective on 64-bit systems). This would best be performed by the kernel, but could also be done in userspace.