At –1– there is no check if the thisObject’s buffer has been transferred (detached/neutered) while executing a species constructor. Also note that the default constructor (the lambda expression) creates an uninitialized array. It is possible to detach the array buffer during speciesConstruct while also invoking the default constructor, for example by setting up the target array as follows:
JSGenericTypedArrayView::set then does the following:
here, other will be the original array which is detached by now. Its length will be zero and memmove becomes a nop.
This results in an uninitialized array being returned to the caller, potentially leaking addresses and thus allowing for an ASLR bypass.
Here is a complete single-page application ;) to trigger the bug and dump the leaked data:
The original report will eventually be available here.
In this post we’ll take a look at how to exploit the load function in Lua.
The Lua interpreter provides an interesting function to Lua code: load. It makes it possible to load (and subsequently execute) precompiled Lua bytecode at runtime (which one can obtain either through luac or by using string.dump on a function). This is interesting since not only does it require a parser for a binary format, but also allows execution of arbitrary bytecode. In fact, using afl to fuzz the bytecode loader will yield hundreds of crashes within a few minutes. Even the documentation explicitly warns about this function: Lua does not check the consistency of binary chunks. Maliciously crafted binary chunks can crash the interpreter.
Unsurprisingly, it turned out that malicious bytecode cannot just crash the interpreter, but also allows for native code execution within the interpreter process. This was the motivation for the “read-eval-pwn loop” CTF challenge of 33C3 CTF.
Let’s dig a bit deeper and find out what’s causing the interpreter to crash.
The Lua virtual machine is fairly simple, supporting only 47 different opcodes. The Lua interpreter uses a register-based virtual machine. As such many opcodes have register indices as operands. Other resources (e.g. constants) are also referenced by index. Take for example the implementation of the “LOADK” opcode, which is used to load a Lua value from a function’s constant table (to which the variable k points in the following code):
We can see that there are no kinds of bounds checks. This is true for other opcodes as well (and also isn’t the only source of crashes). This is of course a known fact, but there also doesn’t seem to be a good solution for this (maybe apart from completely disabling load). See also this email to the Lua mailing list that I wrote some time ago and the replies, in particular this one.
Anyway, this looks like an interesting “feature” to write an exploit for.. ;)
Our plan will be to abuse the out-of-bounds indexing in the LOADK handler to inject custom objects into the virtual machine. The basic idea here is to allocate lots of strings containing fake Lua values through the constant table of the serialized function, hoping one would be placed behind the constants array itself during deserialization. Afterwards we use a the LOADK opcode with an out-of-bounds index to load a fake value in one of those strings.
Note that this approach has one drawback though: it relies on the heap layout since it indexes behind the heap chunk that holds the constants. This is a source of unreliability. It may be possible to avoid this by scanning (in the bytecode) for a particular value (e.g. a certain integer value) which marks the start of the fake values, but this is left as an exercise for the reader… ;)
At this point there is a very simple exploit in certain scenarios: assuming the Lua interpreter was modified such that e.g. os.execute was present in the binary but not available to Lua code (which happens if one just comments out this line in the source code), then we can simply create a fake function value that points to the native implementation and call it with whatever shell command we want to execute. If required, we can obtain the address of the interpreter code itself through tostring(load) (or any other native function for that matter):
So what if we removed those functions entirely, and, to make it more interesting, also used clang’s control flow integrity on the binary so we couldn’t immediately gain RIP control through a fake function object? How can we exploit that?
Let’s start with an arbitrary read/write primitive:
We’ll create a fake string object with its length set to 0x7fffffffffffffff, allowing us to leak memory behind the string object itself (unfortunately, Lua treats the index into the string as unsigned long, so reading before the string isn’t possible, but also not necessary)
Since strings are immutable, we’ll also set up a fake table object (a combination of dict and list if you’re familiar with python), allowing us to write Lua values to anywhere in memory by setting the array pointer to the desired address
Next, we notice that the interpreter makes use of the setjmp mechanism to implement exceptions and yielding inside coroutines. The setjmp API is an easy way to bypass CFI protection since it directly loads various registers, including the instruction pointer, from a memory chunk.
To finish our exploit we will thus allocate coroutines until one of them is placed after our faked string. We can then leak the address of the jmpbuf structure, modify it from inside the coroutine and call yield, causing the interpreter to jump to an arbitrary address with fully controlled rsp register and a few others. A short ROP chain will do the rest.
Find the full exploit, together with all files necessary to reproduce the CTF challenge on my github.
This post will explore how CVE-2016-9066, a simple but quite interesting (from an exploitation perspective) vulnerability in Firefox, can be exploited to gain code execution.
The following code is used for loading the data for (external) script tags:
The code will be invoked by OnIncrementalData whenever new data has arrived from the server. The bug is a simple integer overflow, happening when the server sends more than 4GB of data. In that case, capacity will wrap around and the following call to mBuffer.reserve will not modify the buffer in any way. mDecode->Convert then writes data past the end of an 8GB buffer (data is stored as char16_t in the browser), which will be backed by an mmap chunk (a common practice for very large chunks).
The patch is also similarly simple:
The bug doesn’t look too promising at first. For one, it requires sending and allocating multiple gigabytes of memory. As we will see however, the bug can be exploited fairly reliably and the exploit completes within about a minute after opening the page on my 2015 MacBook Pro. We will now first explore how this bug can be exploited to pop a calculator on macOS, then improve the exploit to be more reliable and use less bandwidth afterwards (spoiler: we will use HTTP compression).
Since the overflow happens past the end of an mmap region, our first concern is whether it is possible to reliably allocate something behind the overflown buffer. In contrast to some heap allocators, mmap (which can be thought of as a memory allocator provided by the kernel) is very deterministic: calling mmap twice will result in two consecutive mappings if there are no existing holes that could satisfy either of the two requests. You can try this for yourself using the following piece of code. Note that the result will be different depending on whether the code is run on Linux or macOS. The mmap region grows towards lower addresses on Linux while it grows towards higher ones on macOS. For the rest of this post we will focus on macOS. A similar exploit should be possible on Linux and Windows though.
The output of the above program tells us that we should be able to allocate something behind the overflowing buffer by simply mmap’ing memory until all existing holes are filled, then allocating one more memory chunk through mmap. To verify this we will do the following:
When the browser requests payload.js, have the server reply with a Content-Length of 0x100000001 but only send the first 0xffffffff bytes of the data
Have the webserver send the remaining two bytes of payload.js
Check the first few bytes of every ArrayBuffer, one should contain the data sent by the webserver
On the client side, I used synchronous XMLHttpRequests to block script execution until the server has finished its part:
With that we can implement the above scenario and we will see that indeed one of the ArrayBuffer objects now contains our payload byte at the start of its buffer. There is one small limitation though: we can only overflow with valid UTF-16, as that is what Firefox uses internally. We’ll have to keep this in mind. What remains now is to find something more interesting to allocate instead of the ArrayBuffer that was overflown into.
Hunting for Target Objects
The tenured heap. Longer lived objects as well as a few selected object types are allocated here. This is a fairly classical heap that keeps track of free spots which it then reuses for future allocations.
The Nursery. This is a memory region that contains short-lived objects. Most JSObjects are first allocated here, then moved into the tenured heap if they are still alive during the next GC cycle (this includes updating all pointers to them and thus requires that the gargabe collector knows about all pointers to its objects). The nursery requires no free list or similar: after a GC cycle the nursery is simply declared free since all alive objects have been moved out of it.
For a more in depth discussion of Spidermonkey internals see this phrack article.
Objects in the tenured heap are stored in containers called Arenas:
Both first and last are byte indices into the Arena, indicating the head of the freelist. This opens up an interesting way to exploit this bug: by overflowing into the firstFreeSpan field of an Arena, we may be able to allocate an object inside another object, preferably inside some kind of accessible inline data. We would then be able to modify the “inner” object arbitrarily.
This technique has a few benefits:
We only need to overflow 4 bytes of the following chunk and thus won’t corrupt any pointers or other sensitive data
As it turns out, ArrayBuffer objects up to a size of 96 bytes will have their data stored inline after the object header. They will also skip the nursery and thus be located inside an Arena. This makes them ideal for our exploit. We will thus
Allocate lots of ArrayBuffers with 96 bytes of storage
Overflow and create a fake free cell inside the Arena following our buffer
Allocate more ArrayBuffer objects of the same size and see if one of them is placed inside another ArrayBuffer’s data (just scan all “old” ArrayBuffers for non-zero content)
The Need for GC
Unfortunately, it’s not quite that easy: in order for Spidermonkey to allocate an object in our target (corrupted) Arena, the Arena must have previously been marked as (partially) free. This means that we need to free at least one slot in each Arena. We can do this by deleting every 25th ArrayBuffer (since there are 25 per Arena), then forcing garbage collection.
Spidermonkey triggers garbage collection for a variety of reasons. It seems the easiest one to trigger is TOO_MUCH_MALLOC: it is simply triggered whenever a certain number of bytes have been allocated through malloc. Thus, the following code suffices to trigger a garbage collection:
Afterwards, our target arena will be put onto the free list and our subsequent overwrite will corrupt it. The next allocation from the corrupted arena will then return a (fake) cell inside the inline data of an ArrayBuffer object.
(Optional Reading) Compacting GC
Actually, it’s a little bit more complicated. There exists a GC mode called compacting GC, which will move objects from multiple partially filled arenas to fill holes in other arenas. This reduces internal fragmentation and helps freeing up entire Chunks so they can be returned to the OS. For us however, a compacting GC would be troublesome since it might fill the hole we created in our target arena. The following code is used to determine whether a compacting GC should be run:
Looking at the code there should be ways to prevent a compacting GC from happening (e.g. by performing some animations). It seems we are lucky though: our gc function from above will trigger the following code path in Spidermonkey, thus preventing a compacting GC since the invocationKind will be GC_NORMAL instead of GC_SHRINK.
Writing an Exploit
At this point we have all the pieces together and can actually write an exploit. Once we have created the fake free cell and allocated an ArrayBuffer inside of it, we will see that one of the previously allocated ArrayBuffers now contains data. An ArrayBuffer object in Spidermonkey looks roughly as follows:
The XXX_SLOT constants determine the offset of the corresponding value from the start of the object. As such, the data pointer (DATA_SLOT) will be stored at addrof(ArrayBuffer) + sizeof(ArrayBuffer).
We can now construct the following exploit primitives:
Reading from an absolute memory address: we set the DATA_SLOT to the desired address and read from the inner ArrayBuffer
Writing to an absolute memory address: same as above, but this time we write to the inner ArrayBuffer
To avoid crashing the browser process during the next GC cycle, we have to repair a few things:
The ArrayBuffer following the outer ArrayBuffer in our exploit, as that one will have been corrupted by the inner ArrayBuffer’s data.
To fix this, We can simply copy another ArrayBuffer object into that location
The Cell that was originally freed in our Arena now looks like a used Cell and will be treated as such by the collector, leading to a crash since it has been overwritten with other data (e.g. a FreeSpan instance).
We can fix this by restoring the original firstFreeSpan field of our Arena to mark that Cell as free again.
This suffices to keep the browser alive after the exploit finishes.
Putting everything together, the following steps will award us with an arbitrary read/write primitive:
Insert a script tag to load the payload and eventually trigger the bug.
Wait for the server to send up to 2GB + 1 bytes of data. The browser will
now have allocated the final chunk that we will later overflow from.
We try to fill the existing mmap holes using ArrayBuffer objects like
we did for the very first PoC.
(largest size so the data is still allocated inline behind the object) and hope
one of them is placed right after the buffer we are about to overflow.
Mmap allocates contiguous regions, so this can only fail if we don’t allocate
enough memory or if something else is allocated there.
Have the server send everything up to 0xffffffff bytes in total, completely
filling the current chunk
Free one ArrayBuffer in every arena and try to trigger gargabe collection
so the arenas are inserted into the free list.
Have the server send the remaining data. This will trigger the overflow
and corrupt the internal free list (indicating which cells are unused) of
one of the arenas. The freelist is modified such that the first free cell lies
within the inline data of one of the ArrayBuffers contained in the Arena.
Allocate more ArrayBuffers. If everything worked so far, one of them will be
allocated inside the inline data of another ArrayBuffer. Search for that
If found, construct an arbitrary memory read/write primitive. We can
now modify the data pointer of the inner ArrayBuffer, so this is quite easy.
Repair the corrupted objects to keep the process alive after our exploit is finished.
What remains now is to somehow pop a calculator.
A simple way to run custom code is to abuse the JIT region, however, this technique is (partially) mitigated in Firefox. This can be bypassed given our exploitation primitives (e.g. by writing a small ROP chain and transferring control there), but this seemed to complicated for a simple PoC.
I instead ended up using some standard CTF tricks to finish the exploit: looking for cross references to libc functions that accept a string as first argument (in this case strcmp), I found the implementation of Date.toLocalFormat and noticed that it converts its first argument from a JSString to a C-string, which it then uses as first argument for strcmp. So we can simply replaced the GOT entry for strcmp with the address of system and execute data_obj.toLocaleFormat("open -a /Applications/Calculator.app");. Done :)
Improving the Exploit
At this point the basic exploit is already finished. This section will now describe how to make it more reliable and less bandwidth hungry.
Up until now our exploit simply allocated a few very large ArrayBuffer instances (1GB each) to fill existing mmap holes, then allocated roughly another GB of js::Arena instances to overflow into. It thus assumed that the browsers heap operations are more or less deterministic during exploitation. Since this isn’t necessarily the case, we’d like to make our exploit a little more robust.
A quick look at then implementation of the mozilla::Vector class (which is used to hold the script buffer) shows us that it uses realloc to double the size of its buffer when needed. Since jemalloc directly uses mmap for larger chunks, this leaves us with the following allocation pattern:
mmap 2MB, munmap previous chunk
mmap 4MB, munmap previous chunk
mmap 8MB, munmap previous chunk
mmap 8GB, munmap previous chunk
Because the current chunk size will always be larger than the sum of all previous chunks sizes, this will result in a lot of free space preceding our final buffer. In theory, we could simply calculate the total sum of the free space, then allocate a large ArrayBuffer afterwards. In practice, this doesn’t quite work since there will be other allocations after the server starts sending data and before the browser finishes decompressing the last chunk. Also jemalloc holds back a part of the freed memory for later usage. Instead we’ll try to allocate a chunk as soon as it is freed by the browser. Here’s what we’ll do:
The server sends all data up to the next power of two (in MB) and thus triggers exactly one call to realloc at the end. The browser will now free a chunk of a known size
This is repeated until we have send 0x80000001 bytes (thus forced the allocation of the final buffer)
This simple algorithm is implemented on the server side here and on the client in step 1. Using this algorithm, we can fairly reliably get an allocation behind our target buffer by spraying only a few megabytes of ArrayBuffer instances instead of multiple gigabytes.
Reducing Network Load
Our current exploit requires sending 4GB of data over the network. That’s easy to fix though: we’ll use HTTP compression. The nice part here is that e.g. zlip supports “streamed” compression, which makes it possible to incrementally compress the payload. With this we just have to add each part of the payload to the zlib stream, then call flush on it to obtain the next compressed chunk of the payload and send that to the server. The server will uncompress this chunk upon receiving it and perform the desired action (e.g. perform one realloc step).
This is implemented in the construct_payload method in poc.py and manages to reduce the size of the payload to about 18MB.
About Resource Usage
At least in theory, the exploit requires quite a lot of memory:
Actually, it’s more like 12 GB, since during the final realloc, the content of a 4GB buffer must be copied to a new 8GB buffer
around 256 MB of ArrayBuffers
However, since many of the buffers are never written to, they don’t necessarily consume any physical memory. Moreover, during the final realloc, only 4GB of the new buffer
will be written to before the old buffer is freed, so really “only” 8 GB are required there.
That’s still a lot of memory though. However, there are some technologies that will help reduce that number if physical memory becomes low:
Memory compression (macOS): large memory regions can be compressed and swapped out. This is perfect for our use case since the 8GB buffer will be completely filled with zeroes. This effect can be observed in the Activity Monitor.app, which at some point shows more than 6 GB of memory as “compressed” during the exploit.
Page deduplication (Windows, Linux): pages containing the same content are mapped copy-on-write (COW) and point to the same physical page (essentially reducing memory usage to 4KB).
CPU usage will also be quite high during peek times (decompression). However, CPU pressure could further be reduced by sending the payload in smaller chunks with delays in between (which would obviously increase the time it takes for the exploit to work though). This would also give the OS more time to compress and/or deduplicate the large memory buffers.
Further Possible Improvements
There are a few sources of unreliability in the current exploit, mostly dealing with timing:
If a garbage collection cycle runs after we have corrupted the FreeSpan but before we have fixed it, we crash
If a compacting gargabe collection cycle runs after we have freed some of the ArrayBuffers but before we have triggered the overflow, the exploit fails as the Arena will be filled up again.
If the fake free Cell happens to be placed inside the freed ArrayBuffer’s cell, then our exploit will fail and the browser will crash during the next gargabe collection cycle. With 25 cells per arena this
gives us a theoretical 1/25 chance to fail. However, in my experiments, the free cells were always located at the same offset (1216 bytes into the Arena), indicating that the state of the engine at the beginning of the exploit is fairly deterministic (at least regarding the state of the Arenas holding objects of size 160 bytes).
From my experience, the exploit runs pretty reliable (>95%) if the browser is not under heavy usage. The exploit still works if 10+ other tabs are open, but might fail if for example a large web application is currently loading.
While the bug isn’t ideal from an attacker’s perspective, it still can be exploited fairly reliably and without too much bandwidth usage. It is interesting to see how various technologies (compression, same page merging, …) can make a bug such as this one easier to exploit.
Thinking of ways to prevent exploitability of such a bug, a few things come to mind. One fairly generic mitigation are guard pages (a page leading to a segfault whenever accessed in some way). These would have to be allocated before or after every mmap allocated region and would thus protect against exploitation of linear overflows such as this one. They would, however, not protect against non-linear overflows such as this bug. Another possibility would be to introduce internal mmap randomization to scatter the allocated regions throughout the address space (likely only effective on 64-bit systems). This would best be performed by the kernel, but could also be done in userspace.