Normal view

There are new articles available, click to refresh the page.
Before yesterdayMain stream

The quantum state of Linux kernel garbage collection CVE-2021-0920 (Part I)

10 August 2022 at 23:00

A deep dive into an in-the-wild Android exploit

Guest Post by Xingyu Jin, Android Security Research

This is part one of a two-part guest blog post, where first we'll look at the root cause of the CVE-2021-0920 vulnerability. In the second post, we'll dive into the in-the-wild 0-day exploitation of the vulnerability and post-compromise modules.

Overview of in-the-wild CVE-2021-0920 exploits

A surveillance vendor named Wintego has developed an exploit for Linux socket syscall 0-day, CVE-2021-0920, and used it in the wild since at least November 2020 based on the earliest captured sample, until the issue was fixed in November 2021.  Combined with Chrome and Samsung browser exploits, the vendor was able to remotely root Samsung devices. The fix was released with the November 2021 Android Security Bulletin, and applied to Samsung devices in Samsung's December 2021 security update.

Google's Threat Analysis Group (TAG) discovered Samsung browser exploit chains being used in the wild. TAG then performed root cause analysis and discovered that this vulnerability, CVE-2021-0920, was being used to escape the sandbox and elevate privileges. CVE-2021-0920 was reported to Linux/Android anonymously. The Google Android Security Team performed the full deep-dive analysis of the exploit.

This issue was initially discovered in 2016 by a RedHat kernel developer and disclosed in a public email thread, but the Linux kernel community did not patch the issue until it was re-reported in 2021.

Various Samsung devices were targeted, including the Samsung S10 and S20. By abusing an ephemeral race condition in Linux kernel garbage collection, the exploit code was able to obtain a use-after-free (UAF) in a kernel sk_buff object. The in-the-wild sample could effectively circumvent CONFIG_ARM64_UAO, achieve arbitrary read / write primitives and bypass Samsung RKP to elevate to root. Other Android devices were also vulnerable, but we did not find any exploit samples against them.

Text extracted from captured samples dubbed the vulnerability “quantum Linux kernel garbage collection”, which appears to be a fitting title for this blogpost.

Introduction

CVE-2021-0920 is a use-after-free (UAF) due to a race condition in the garbage collection system for SCM_RIGHTS. SCM_RIGHTS is a control message that allows unix-domain sockets to transmit an open file descriptor from one process to another. In other words, the sender transmits a file descriptor and the receiver then obtains a file descriptor from the sender. This passing of file descriptors adds complexity to reference-counting file structs. To account for this, the Linux kernel community designed a special garbage collection system. CVE-2021-0920 is a vulnerability within this garbage collection system. By winning a race condition during the garbage collection process, an adversary can exploit the UAF on the socket buffer, sk_buff object. In the following sections, we’ll explain the SCM_RIGHTS garbage collection system and the details of the vulnerability. The analysis is based on the Linux 4.14 kernel.

What is SCM_RIGHTS?

Linux developers can share file descriptors (fd) from one process to another using the SCM_RIGHTS datagram with the sendmsg syscall. When a process passes a file descriptor to another process, SCM_RIGHTS will add a reference to the underlying file struct. This means that the process that is sending the file descriptors can immediately close the file descriptor on their end, even if the receiving process has not yet accepted and taken ownership of the file descriptors. When the file descriptors are in the “queued” state (meaning the sender has passed the fd and then closed it, but the receiver has not yet accepted the fd and taken ownership), specialized garbage collection is needed. To track this “queued” state, this LWN article does a great job explaining SCM_RIGHTS reference counting, and it's recommended reading before continuing on with this blogpost.

Sending

As stated previously, a unix domain socket uses the syscall sendmsg to send a file descriptor to another socket. To explain the reference counting that occurs during SCM_RIGHTS, we’ll start from the sender’s point of view. We start with the kernel function unix_stream_sendmsg, which implements the sendmsg syscall. To implement the SCM_RIGHTS functionality, the kernel uses the structure scm_fp_list for managing all the transmitted file structures. scm_fp_list stores the list of file pointers to be passed.

struct scm_fp_list {

        short                   count;

        short                   max;

        struct user_struct      *user;

        struct file             *fp[SCM_MAX_FD];

};

unix_stream_sendmsg invokes scm_send (af_unix.c#L1886) to initialize the scm_fp_list structure, linked by the scm_cookie structure on the stack.

struct scm_cookie {

        struct pid              *pid;           /* Skb credentials */

        struct scm_fp_list      *fp;            /* Passed files         */

        struct scm_creds        creds;          /* Skb credentials      */

#ifdef CONFIG_SECURITY_NETWORK

        u32                     secid;          /* Passed security ID   */

#endif

};

To be more specific, scm_send → __scm_send → scm_fp_copy (scm.c#L68) reads the file descriptors from the userspace and initializes scm_cookie->fp which can contain SCM_MAX_FD file structures.

Since the Linux kernel uses the sk_buff (also known as socket buffers or skb) object to manage all types of socket datagrams, the kernel also needs to invoke the unix_scm_to_skb function to link the scm_cookie->fp to a corresponding skb object. This occurs in unix_attach_fds (scm.c#L103):

/*

 * Need to duplicate file references for the sake of garbage

 * collection.  Otherwise a socket in the fps might become a

 * candidate for GC while the skb is not yet queued.

 */

UNIXCB(skb).fp = scm_fp_dup(scm->fp);

if (!UNIXCB(skb).fp)

        return -ENOMEM;

The scm_fp_dup call in unix_attach_fds increases the reference count of the file descriptor that’s being passed so the file is still valid even after the sender closes the transmitted file descriptor later:

struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)

{

        struct scm_fp_list *new_fpl;

        int i;

        if (!fpl)

                return NULL;

        new_fpl = kmemdup(fpl, offsetof(struct scm_fp_list, fp[fpl->count]),

                          GFP_KERNEL);

        if (new_fpl) {

                for (i = 0; i < fpl->count; i++)

                        get_file(fpl->fp[i]);

                new_fpl->max = new_fpl->count;

                new_fpl->user = get_uid(fpl->user);

        }

        return new_fpl;

}

Let’s examine a concrete example. Assume we have sockets A and B. The A attempts to pass itself to B. After the SCM_RIGHTS datagram is sent, the newly allocated skb from the sender will be appended to the B’s sk_receive_queue which stores received datagrams:

unix_stream_sendmsg creates sk_buff which contains the structure scm_fp_list. The scm_fp_list has a fp pointer points to the transmitted file (A). The sk_buff is appended to the receiver queue and the reference count of A is 2.

sk_buff carries scm_fp_list structure

The reference count of A is incremented to 2 and the reference count of B is still 1.

Receiving

Now, let’s take a look at the receiver side unix_stream_read_generic (we will not discuss the MSG_PEEK flag yet, and focus on the normal routine). First of all, the kernel grabs the current skb from sk_receive_queue using skb_peek. Secondly, since scm_fp_list is attached to the skb, the kernel will call unix_detach_fds (link) to parse the transmitted file structures from skb and clear the skb from sk_receive_queue:

/* Mark read part of skb as used */

if (!(flags & MSG_PEEK)) {

        UNIXCB(skb).consumed += chunk;

        sk_peek_offset_bwd(sk, chunk);

        if (UNIXCB(skb).fp)

                unix_detach_fds(&scm, skb);

        if (unix_skb_len(skb))

                break;

        skb_unlink(skb, &sk->sk_receive_queue);

        consume_skb(skb);

        if (scm.fp)

                break;

The function scm_detach_fds iterates over the list of passed file descriptors (scm->fp) and installs the new file descriptors accordingly for the receiver:

for (i=0, cmfptr=(__force int __user *)CMSG_DATA(cm); i<fdmax;

     i++, cmfptr++)

{

        struct socket *sock;

        int new_fd;

        err = security_file_receive(fp[i]);

        if (err)

                break;

        err = get_unused_fd_flags(MSG_CMSG_CLOEXEC & msg->msg_flags

                                  ? O_CLOEXEC : 0);

        if (err < 0)

                break;

        new_fd = err;

        err = put_user(new_fd, cmfptr);

        if (err) {

                put_unused_fd(new_fd);

                break;

        }

        /* Bump the usage count and install the file. */

        sock = sock_from_file(fp[i], &err);

        if (sock) {

                sock_update_netprioidx(&sock->sk->sk_cgrp_data);

                sock_update_classid(&sock->sk->sk_cgrp_data);

        }

        fd_install(new_fd, get_file(fp[i]));

}

/*

 * All of the files that fit in the message have had their

 * usage counts incremented, so we just free the list.

 */

__scm_destroy(scm);

Once the file descriptors have been installed, __scm_destroy (link) cleans up the allocated scm->fp and decrements the file reference count for every transmitted file structure:

void __scm_destroy(struct scm_cookie *scm)

{

        struct scm_fp_list *fpl = scm->fp;

        int i;

        if (fpl) {

                scm->fp = NULL;

                for (i=fpl->count-1; i>=0; i--)

                        fput(fpl->fp[i]);

                free_uid(fpl->user);

                kfree(fpl);

        }

}

Reference Counting and Inflight Counting

As mentioned above, when a file descriptor is passed using SCM_RIGHTS, its reference count is immediately incremented. Once the recipient socket has accepted and installed the passed file descriptor, the reference count is then decremented. The complication comes from the “middle” of this operation: after the file descriptor has been sent, but before the receiver has accepted and installed the file descriptor.

Let’s consider the following scenario:

  1. The process creates sockets A and B.
  2. A sends socket A to socket B.
  3. B sends socket B to socket A.
  4. Close A.
  5. Close B.

Socket A and B form a reference count cycle.

Scenario for reference count cycle

Both sockets are closed prior to accepting the passed file descriptors.The reference counts of A and B are both 1 and can't be further decremented because they were removed from the kernel fd table when the respective processes closed them. Therefore the kernel is unable to release the two skbs and sock structures and an unbreakable cycle is formed. The Linux kernel garbage collection system is designed to prevent memory exhaustion in this particular scenario. The inflight count was implemented to identify potential garbage. Each time the reference count is increased due to an SCM_RIGHTS datagram being sent, the inflight count will also be incremented.

When a file descriptor is sent by SCM_RIGHTS datagram, the Linux kernel puts its unix_sock into a global list gc_inflight_list. The kernel increments unix_tot_inflight which counts the total number of inflight sockets. Then, the kernel increments u->inflight which tracks the inflight count for each individual file descriptor in the unix_inflight function (scm.c#L45) invoked from unix_attach_fds:

void unix_inflight(struct user_struct *user, struct file *fp)

{

        struct sock *s = unix_get_socket(fp);

        spin_lock(&unix_gc_lock);

        if (s) {

                struct unix_sock *u = unix_sk(s);

                if (atomic_long_inc_return(&u->inflight) == 1) {

                        BUG_ON(!list_empty(&u->link));

                        list_add_tail(&u->link, &gc_inflight_list);

                } else {

                        BUG_ON(list_empty(&u->link));

                }

                unix_tot_inflight++;

        }

        user->unix_inflight++;

        spin_unlock(&unix_gc_lock);

}

Thus, here is what the sk_buff looks like when transferring a file descriptor within sockets A and B:

When the file descriptor A sends itself to the file descriptor B, the reference count of the file descriptor A is 2 and the inflight count is 1. For the receiver file descriptor B, the file reference count is 1 and the inflight count is 0.

The inflight count of A is incremented

When the socket file descriptor is received from the other side, the unix_sock.inflight count will be decremented.

Let’s revisit the reference count cycle scenario before the close syscall. This cycle is breakable because any socket files can receive the transmitted file and break the reference cycle: 

The file descriptor A sends itself to the file descriptor B and vice versa. The inflight count of the file descriptor A and B is both 1 and the file reference count is both 2.

Breakable cycle before close A and B

After closing both of the file descriptors, the reference count equals the inflight count for each of the socket file descriptors, which is a sign of possible garbage:

The cycle becomes unbreakable after closing A and B. The reference count equals to the inflight count for A and B.

Unbreakable cycle after close A and B

Now, let’s check another example. Assume we have sockets A, B and 𝛼:

  1. A sends socket A to socket B.
  2. B sends socket B to socket A.
  3. B sends socket B to socket 𝛼.
  4. 𝛼 sends socket 𝛼 to socket B.
  5. Close A.
  6. Close B.

A, B and alpha form a breakable cycle.

Breakable cycle for A, B and 𝛼

The cycle is breakable, because we can get newly installed file descriptor B from the socket file descriptor 𝛼 and newly installed file descriptor A' from B’.

Garbage Collection

A high level view of garbage collection is available from lwn.net:

"If, instead, the two counts are equal, that file structure might be part of an unreachable cycle. To determine whether that is the case, the kernel finds the set of all in-flight Unix-domain sockets for which all references are contained in SCM_RIGHTS datagrams (for which f_count and inflight are equal, in other words). It then counts how many references to each of those sockets come from SCM_RIGHTS datagrams attached to sockets in this set. Any socket that has references coming from outside the set is reachable and can be removed from the set. If it is reachable, and if there are any SCM_RIGHTS datagrams waiting to be consumed attached to it, the files contained within that datagram are also reachable and can be removed from the set.

At the end of an iterative process, the kernel may find itself with a set of in-flight Unix-domain sockets that are only referenced by unconsumed (and unconsumable) SCM_RIGHTS datagrams; at this point, it has a cycle of file structures holding the only references to each other. Removing those datagrams from the queue, releasing the references they hold, and discarding them will break the cycle."

To be more specific, the SCM_RIGHTS garbage collection system was developed in order to handle the unbreakable reference cycles. To identify which file descriptors are a part of unbreakable cycles:

  1. Add any unix_sock objects whose reference count equals its inflight count to the gc_candidates list.
  2. Determine if the socket is referenced by any sockets outside of the gc_candidates list. If it is then it is reachable, remove it and any sockets it references from the gc_candidates list. Repeat until no more reachable sockets are found.
  3. After this iterative process, only sockets who are solely referenced by other sockets within the gc_candidates list are left.

Let’s take a closer look at how this garbage collection process works. First, the kernel finds all the unix_sock objects whose reference counts equals their inflight count and puts them into the gc_candidates list (garbage.c#L242):

list_for_each_entry_safe(u, next, &gc_inflight_list, link) {

        long total_refs;

        long inflight_refs;

        total_refs = file_count(u->sk.sk_socket->file);

        inflight_refs = atomic_long_read(&u->inflight);

        BUG_ON(inflight_refs < 1);

        BUG_ON(total_refs < inflight_refs);

        if (total_refs == inflight_refs) {

                list_move_tail(&u->link, &gc_candidates);

                __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags);

                __set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);

        }

}

Next, the kernel removes any sockets that are referenced by other sockets outside of the current gc_candidates list. To do this, the kernel invokes scan_children (garbage.c#138) along with the function pointer dec_inflight to iterate through each candidate’s sk->receive_queue. It decreases the inflight count for each of the passed file descriptors that are themselves candidates for garbage collection (garbage.c#L261):

/* Now remove all internal in-flight reference to children of

 * the candidates.

 */

list_for_each_entry(u, &gc_candidates, link)

        scan_children(&u->sk, dec_inflight, NULL);

After iterating through all the candidates, if a gc candidate still has a positive inflight count it means that it is referenced by objects outside of the gc_candidates list and therefore is reachable. These candidates should not be included in the gc_candidates list so the related inflight counts need to be restored.

To do this, the kernel will put the candidate to not_cycle_list instead and iterates through its receiver queue of each transmitted file in the gc_candidates list (garbage.c#L281) and increments the inflight count back. The entire process is done recursively, in order for the garbage collection to avoid purging reachable sockets:

/* Restore the references for children of all candidates,

 * which have remaining references.  Do this recursively, so

 * only those remain, which form cyclic references.

 *

 * Use a "cursor" link, to make the list traversal safe, even

 * though elements might be moved about.

 */

list_add(&cursor, &gc_candidates);

while (cursor.next != &gc_candidates) {

        u = list_entry(cursor.next, struct unix_sock, link);

        /* Move cursor to after the current position. */

        list_move(&cursor, &u->link);

        if (atomic_long_read(&u->inflight) > 0) {

                list_move_tail(&u->link, &not_cycle_list);

                __clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);

                scan_children(&u->sk, inc_inflight_move_tail, NULL);

        }

}

list_del(&cursor);

Now gc_candidates contains only “garbage”. The kernel restores original inflight counts from gc_candidates, moves candidates from not_cycle_list back to gc_inflight_list and invokes __skb_queue_purge for cleaning up garbage (garbage.c#L306).

/* Now gc_candidates contains only garbage.  Restore original

 * inflight counters for these as well, and remove the skbuffs

 * which are creating the cycle(s).

 */

skb_queue_head_init(&hitlist);

list_for_each_entry(u, &gc_candidates, link)

        scan_children(&u->sk, inc_inflight, &hitlist);

/* not_cycle_list contains those sockets which do not make up a

 * cycle.  Restore these to the inflight list.

 */

while (!list_empty(&not_cycle_list)) {

        u = list_entry(not_cycle_list.next, struct unix_sock, link);

        __clear_bit(UNIX_GC_CANDIDATE, &u->gc_flags);

        list_move_tail(&u->link, &gc_inflight_list);

}

spin_unlock(&unix_gc_lock);

/* Here we are. Hitlist is filled. Die. */

__skb_queue_purge(&hitlist);

spin_lock(&unix_gc_lock);

__skb_queue_purge clears every skb from the receiver queue:

/**

 *      __skb_queue_purge - empty a list

 *      @list: list to empty

 *

 *      Delete all buffers on an &sk_buff list. Each buffer is removed from

 *      the list and one reference dropped. This function does not take the

 *      list lock and the caller must hold the relevant locks to use it.

 */

void skb_queue_purge(struct sk_buff_head *list);

static inline void __skb_queue_purge(struct sk_buff_head *list)

{

        struct sk_buff *skb;

        while ((skb = __skb_dequeue(list)) != NULL)

                kfree_skb(skb);

}

There are two ways to trigger the garbage collection process:

  1. wait_for_unix_gc is invoked at the beginning of the sendmsg function if there are more than 16,000 inflight sockets
  2. When a socket file is released by the kernel (i.e., a file descriptor is closed), the kernel will directly invoke unix_gc.

Note that unix_gc is not preemptive. If garbage collection is already in process, the kernel will not perform another unix_gc invocation.

Now, let’s check this example (a breakable cycle) with a pair of sockets f00 and f01, and a single socket 𝛼:

  1. Socket f 00 sends socket f 00 to socket f 01.
  2. Socket f 01 sends socket f 01 to socket 𝛼.
  3. Close f 00.
  4. Close f 01.

Before starting the garbage collection process, the status of socket file descriptors are:

  • f 00: ref = 1, inflight = 1
  • f 01: ref = 1, inflight = 1
  • 𝛼: ref = 1, inflight = 0

f00, f01 and alpha form a breakable cycle.

Breakable cycle by f 00, f 01 and 𝛼

During the garbage collection process, f 00 and f 01 are considered garbage candidates. The inflight count of f 00 is dropped to zero, but the count of f 01 is still 1 because 𝛼 is not a candidate. Thus, the kernel will restore the inflight count from f 01’s receive queue. As a result, f 00 and f 01 are not treated as garbage anymore.

CVE-2021-0920 Root Cause Analysis

When a user receives SCM_RIGHTS message from recvmsg without the MSG_PEEK flag, the kernel will wait until the garbage collection process finishes if it is in progress. However, if the MSG_PEEK flag is on, the kernel will increment the reference count of the transmitted file structures without synchronizing with any ongoing garbage collection process. This may lead to inconsistency of the internal garbage collection state, making the garbage collector mark a non-garbage sock object as garbage to purge.

recvmsg without MSG_PEEK flag

The kernel function unix_stream_read_generic (af_unix.c#L2290) parses the SCM_RIGHTS message and manages the file inflight count when the MSG_PEEK flag is NOT set. Then, the function unix_stream_read_generic calls unix_detach_fds to decrement the inflight count. Then, unix_detach_fds clears the list of passed file descriptors (scm_fp_list) from the skb:

static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)

{

        int i;

        scm->fp = UNIXCB(skb).fp;

        UNIXCB(skb).fp = NULL;

        for (i = scm->fp->count-1; i >= 0; i--)

                unix_notinflight(scm->fp->user, scm->fp->fp[i]);

}

The unix_notinflight from unix_detach_fds will reverse the effect of unix_inflight by decrementing the inflight count:

void unix_notinflight(struct user_struct *user, struct file *fp)

{

        struct sock *s = unix_get_socket(fp);

        spin_lock(&unix_gc_lock);

        if (s) {

                struct unix_sock *u = unix_sk(s);

                BUG_ON(!atomic_long_read(&u->inflight));

                BUG_ON(list_empty(&u->link));

                if (atomic_long_dec_and_test(&u->inflight))

                        list_del_init(&u->link);

                unix_tot_inflight--;

        }

        user->unix_inflight--;

        spin_unlock(&unix_gc_lock);

}

Later skb_unlink and consume_skb are invoked from unix_stream_read_generic (af_unix.c#2451) to destroy the current skb. Following the call chain kfree(skb)->__kfree_skb, the kernel will invoke the function pointer skb->destructor (code) which redirects to unix_destruct_scm:

static void unix_destruct_scm(struct sk_buff *skb)

{

        struct scm_cookie scm;

        memset(&scm, 0, sizeof(scm));

        scm.pid  = UNIXCB(skb).pid;

        if (UNIXCB(skb).fp)

                unix_detach_fds(&scm, skb);

        /* Alas, it calls VFS */

        /* So fscking what? fput() had been SMP-safe since the last Summer */

        scm_destroy(&scm);

        sock_wfree(skb);

}

In fact, the unix_detach_fds will not be invoked again here from unix_destruct_scm because UNIXCB(skb).fp is already cleared by unix_detach_fds. Finally, fd_install(new_fd, get_file(fp[i])) from scm_detach_fds is invoked for installing a new file descriptor.

recvmsg with MSG_PEEK flag

The recvmsg process is different if the MSG_PEEK flag is set. The MSG_PEEK flag is used during receive to “peek” at the message, but the data is treated as unread. unix_stream_read_generic will invoke scm_fp_dup instead of unix_detach_fds. This increases the reference count of the inflight file (af_unix.c#2149):

/* It is questionable, see note in unix_dgram_recvmsg.

 */

if (UNIXCB(skb).fp)

        scm.fp = scm_fp_dup(UNIXCB(skb).fp);

sk_peek_offset_fwd(sk, chunk);

if (UNIXCB(skb).fp)

        break;

Because the data should be treated as unread, the skb is not unlinked and consumed when the MSG_PEEK flag is set. However, the receiver will still get a new file descriptor for the inflight socket.

recvmsg Examples

Let’s see a concrete example. Assume there are the following socket pairs:

  • f 00, f 01
  • f 10, f 11

Now, the program does the following operations:

  • f 00 → [f 00] → f 01 (means f 00 sends [f 00] to f 01)
  • f 10 → [f 00] → f 11
  • Close(f 00)

f00, f01, f10, f11 forms a breakable cycle.

Breakable cycle by f 00, f 01, f 10 and f 11

Here is the status:

  • inflight(f 00) = 2, ref(f 00) = 2
  • inflight(f 01) = 0, ref(f 01) = 1
  • inflight(f 10) = 0, ref(f 10) = 1
  • inflight(f 11) = 0, ref(f 11) = 1

If the garbage collection process happens now, before any recvmsg calls, the kernel will choose f 00 as the garbage candidate. However, f 00 will not have the inflight count altered and the kernel will not purge any garbage.

If f 01 then calls recvmsg with MSG_PEEK flag, the receive queue doesn’t change and the inflight counts are not decremented. f 01 gets a new file descriptor f 00' which increments the reference count on f 00:

After f01 receives the socket file descriptor by MSG_PEEK, the reference count of f00 is incremented and the receive queue from f01 remains the same.

MSG_PEEK increment the reference count of f 00 while the receive queue is not cleared

Status:

  • inflight(f 00) = 2, ref(f 00) = 3
  • inflight(f 01) = 0, ref(f 01) = 1
  • inflight(f 10) = 0, ref(f 10) = 1
  • inflight(f 11) = 0, ref(f 11) = 1

Then, f 01 calls recvmsg without MSG_PEEK flag, f 01’s receive queue is removed. f 01 also fetches a new file descriptor f 00'':

After f01 receives the socket file descriptor without MSG_PEEK, the receive queue is cleared and file descriptor f00''' is obtained.

The receive queue of f 01 is cleared and f 01'' is obtained from f 01

Status:

  • inflight(f 00) = 1, ref(f 00) = 3
  • inflight(f 01) = 0, ref(f 01) = 1
  • inflight(f 10) = 0, ref(f 10) = 1
  • inflight(f 11) = 0, ref(f 11) = 1

UAF Scenario

From a very high level perspective, the internal state of Linux garbage collection can be non-deterministic because MSG_PEEK is not synchronized with the garbage collector. There is a race condition where the garbage collector can treat an inflight socket as a garbage candidate while the file reference is incremented at the same time during the MSG_PEEK receive. As a consequence, the garbage collector may purge the candidate, freeing the socket buffer, while a receiver may install the file descriptor, leading to a UAF on the skb object.

Let’s see how the captured 0-day sample triggers the bug step by step (simplified version, in reality you may need more threads working together, but it should demonstrate the core idea). First of all, the sample allocates the following socket pairs and single socket 𝛼:

  • f 00, f 01
  • f 10, f 11
  • f 20, f 21
  • f 30, f 31
  • sock 𝛼 (actually there might be even thousands of 𝛼 for protracting the garbage collection process in order to evade a BUG_ON check which will be introduced later).

Now, the program does the below operations:

Close the following file descriptors prior to any recvmsg calls:

  • Close(f 00)
  • Close(f 01)
  • Close(f 11)
  • Close(f 10)
  • Close(f 30)
  • Close(f 31)
  • Close(𝛼)

Here is the status:

  • inflight(f 00) = N + 1, ref(f 00) = N + 1
  • inflight(f 01) = 2, ref(f 01) = 2
  • inflight(f 10) = 3, ref(f 10) = 3
  • inflight(f 11) = 1, ref(f 11) = 1
  • inflight(f 20) = 0, ref(f 20) = 1
  • inflight(f 21) = 0, ref(f 21) = 1
  • inflight(f 31) = 1, ref(f 31) = 1
  • inflight(𝛼) = 1, ref(𝛼) = 1

If the garbage collection process happens now, the kernel will do the following scrutiny:

  • List f 00, f 01, f 10,  f 11, f 31, 𝛼 as garbage candidates. Decrease inflight count for the candidate children in each receive queue.
  • Since f 21 is not considered a candidate, f 11’s inflight count is still above zero.
  • Recursively restore the inflight count.
  • Nothing is considered garbage.

A potential skb UAF by race condition can be triggered by:

  1. Call recvmsg with MSG_PEEK flag from f 21 to get f 11’.
  2. Call recvmsg with MSG_PEEK flag from f 11 to get f 10’.
  3. Concurrently do the following operations:
  1. Call recvmsg without MSG_PEEK flag from f 11 to get f 10’’.
  2. Call recvmsg with MSG_PEEK flag from f 10

How is it possible? Let’s see a case where the race condition is not hit so there is no UAF:

Thread 0

Thread 1

Thread 2

Call unix_gc

Stage0: List f 00, f 01, f 10,  f 11, f 31, 𝛼 as garbage candidates.

Call recvmsg with MSG_PEEK flag from f 21 to get f 11

Increase reference count: scm.fp = scm_fp_dup(UNIXCB(skb).fp);

Stage0: decrease inflight count from the child of every garbage candidate

Status after stage 0:

inflight(f 00) = 0

inflight(f 01) = 0

inflight(f 10) = 0

inflight(f 11) = 1

inflight(f 31) = 0

inflight(𝛼) = 0

Stage1: Recursively restore inflight count if a candidate still has inflight count.

Stage1: All inflight counts have been restored.

Stage2: No garbage, return.

Call recvmsg with MSG_PEEK flag from f 11 to get f 10

Call recvmsg without MSG_PEEK flag from f 11 to get f 10’’

Call recvmsg with MSG_PEEK flag from f 10

Everyone is happy

Everyone is happy

Everyone is happy

However, if the second recvmsg occurs just after stage 1 of the garbage collection process, the UAF is triggered:

Thread 0

Thread 1

Thread 2

Call unix_gc

Stage0: List f 00, f 01, f 10,  f 11, f 31, 𝛼 as garbage candidates.

Call recvmsg with MSG_PEEK flag from f 21 to get f 11

Increase reference count: scm.fp = scm_fp_dup(UNIXCB(skb).fp);

Stage0: decrease inflight count from the child of every garbage candidates

Status after stage 0:

inflight(f 00) = 0

inflight(f 01) = 0

inflight(f 10) = 0

inflight(f 11) = 1

inflight(f 31) = 0

inflight(𝛼) = 0

Stage1: Start restoring inflight count.

Call recvmsg with MSG_PEEK flag from f 11 to get f 10

Call recvmsg without MSG_PEEK flag from f 11 to get f 10’’

unix_detach_fds: UNIXCB(skb).fp = NULL

Blocked by spin_lock(&unix_gc_lock)

Stage1: scan_inflight cannot find candidate children from f 11. Thus, the inflight count accidentally remains the same.

Stage2: f 00, f 01, f 10, f 31, 𝛼 are garbage.

Stage2: start purging garbage.

Start calling recvmsg with MSG_PEEK flag from f 10’, which would expect to receive f 00'

Get skb = skb_peek(&sk->sk_receive_queue), skb is going to be freed by thread 0.

Stage2: for, calls __skb_unlink and kfree_skb later.

state->recv_actor(skb, skip, chunk, state) UAF

GC finished.

Start garbage collection.

Get f 10’’

Therefore, the race condition causes a UAF of the skb object. At first glance, we should blame the second recvmsg syscall because it clears skb.fp, the passed file list. However, if the first recvmsg syscall doesn’t set the MSG_PEEK flag, the UAF can be avoided because unix_notinflight is serialized with the garbage collection. In other words, the kernel makes sure the garbage collection is either not processed or finished before decrementing the inflight count and removing the skb. After unix_notinflight, the receiver obtains f11' and inflight sockets don't form an unbreakable cycle.

Since MSG_PEEK is not serialized with the garbage collection, when recvmsg is called with MSG_PEEK set, the kernel still considers f 11 as a garbage candidate. For this reason, the following next recvmsg will eventually trigger the bug due to the inconsistent state of the garbage collection process.

 

Patch Analysis

CVE-2021-0920 was found in 2016

The vulnerability was initially reported to the Linux kernel community in 2016. The researcher also provided the correct patch advice but it was not accepted by the Linux kernel community:

Linux kernel developers: Why would I apply a patch that's an RFC, doesn't have a proper commit message, lacks a proper signoff, and also lacks ACK's and feedback from other knowledgable developers?

Patch was not applied in 2016

In theory, anyone who saw this patch might come up with an exploit against the faulty garbage collector.

Patch in 2021

Let’s check the official patch for CVE-2021-0920. For the MSG_PEEK branch, it requests the garbage collection lock unix_gc_lock before performing sensitive actions and immediately releases it afterwards:

+       spin_lock(&unix_gc_lock);

+       spin_unlock(&unix_gc_lock);

The patch is confusing - it’s rare to see such lock usage in software development. Regardless, the MSG_PEEK flag now waits for the completion of the garbage collector, so the UAF issue is resolved.

BUG_ON Added in 2017

Andrey Ulanov from Google in 2017 found another issue in unix_gc and provided a fix commit. Additionally, the patch added a BUG_ON for the inflight count:

void unix_notinflight(struct user_struct *user, struct file *fp)

        if (s) {

                struct unix_sock *u = unix_sk(s);

 

+               BUG_ON(!atomic_long_read(&u->inflight));

                BUG_ON(list_empty(&u->link));

 

                if (atomic_long_dec_and_test(&u->inflight))

At first glance, it seems that the BUG_ON can prevent CVE-2021-0920 from being exploitable. However, if the exploit code can delay garbage collection by crafting a large amount of fake garbage,  it can waive the BUG_ON check by heap spray.

New Garbage Collection Discovered in 2021

CVE-2021-4083 deserves an honorable mention: when I discussed CVE-2021-0920 with Jann Horn and Ben Hawkes, Jann found another issue in the garbage collection, described in the Project Zero blog post Racing against the clock -- hitting a tiny kernel race window.

\

Part I Conclusion

To recap, we have discussed the kernel internals of SCM_RIGHTS and the designs and implementations of the Linux kernel garbage collector. Besides, we have analyzed the behavior of MSG_PEEK flag with the recvmsg syscall and how it leads to a kernel UAF by a subtle and arcane race condition.

The bug was spotted in 2016 publicly, but unfortunately the Linux kernel community did not accept the patch at that time. Any threat actors who saw the public email thread may have a chance to develop an LPE exploit against the Linux kernel.

In part two, we'll look at how the vulnerability was exploited and the functionalities of the post compromise modules.

2022 0-day In-the-Wild Exploitation…so far

30 June 2022 at 13:00

Posted by Maddie Stone, Google Project Zero

This blog post is an overview of a talk, “ 0-day In-the-Wild Exploitation in 2022…so far”, that I gave at the FIRST conference in June 2022. The slides are available here.

For the last three years, we’ve published annual year-in-review reports of 0-days found exploited in the wild. The most recent of these reports is the 2021 Year in Review report, which we published just a few months ago in April. While we plan to stick with that annual cadence, we’re publishing a little bonus report today looking at the in-the-wild 0-days detected and disclosed in the first half of 2022.        

As of June 15, 2022, there have been 18 0-days detected and disclosed as exploited in-the-wild in 2022. When we analyzed those 0-days, we found that at least nine of the 0-days are variants of previously patched vulnerabilities. At least half of the 0-days we’ve seen in the first six months of 2022 could have been prevented with more comprehensive patching and regression tests. On top of that, four of the 2022 0-days are variants of 2021 in-the-wild 0-days. Just 12 months from the original in-the-wild 0-day being patched, attackers came back with a variant of the original bug.  

Product

2022 ITW 0-day

Variant

Windows win32k

CVE-2022-21882

CVE-2021-1732 (2021 itw)

iOS IOMobileFrameBuffer

CVE-2022-22587

CVE-2021-30983 (2021 itw)

Windows

CVE-2022-30190 (“Follina”)

CVE-2021-40444 (2021 itw)

Chromium property access interceptors

CVE-2022-1096

CVE-2016-5128 CVE-2021-30551 (2021 itw) CVE-2022-1232 (Addresses incomplete CVE-2022-1096 fix)

Chromium v8

CVE-2022-1364

CVE-2021-21195

WebKit

CVE-2022-22620 (“Zombie”)

Bug was originally fixed in 2013, patch was regressed in 2016

Google Pixel

CVE-2021-39793*

* While this CVE says 2021, the bug was patched and disclosed in 2022

Linux same bug in a different subsystem

Atlassian Confluence

CVE-2022-26134

CVE-2021-26084

Windows

CVE-2022-26925 (“PetitPotam”)

CVE-2021-36942 (Patch regressed)

So, what does this mean?

When people think of 0-day exploits, they often think that these exploits are so technologically advanced that there’s no hope to catch and prevent them. The data paints a different picture. At least half of the 0-days we’ve seen so far this year are closely related to bugs we’ve seen before. Our conclusion and findings in the 2020 year-in-review report were very similar.

Many of the 2022 in-the-wild 0-days are due to the previous vulnerability not being fully patched. In the case of the Windows win32k and the Chromium property access interceptor bugs, the execution flow that the proof-of-concept exploits took were patched, but the root cause issue was not addressed: attackers were able to come back and trigger the original vulnerability through a different path. And in the case of the WebKit and Windows PetitPotam issues, the original vulnerability had previously been patched, but at some point regressed so that attackers could exploit the same vulnerability again. In the iOS IOMobileFrameBuffer bug, a buffer overflow was addressed by checking that a size was less than a certain number, but it didn’t check a minimum bound on that size. For more detailed explanations of three of the 0-days and how they relate to their variants, please see the slides from the talk.

When 0-day exploits are detected in-the-wild, it’s the failure case for an attacker. It’s a gift for us security defenders to learn as much as we can and take actions to ensure that that vector can’t be used again. The goal is to force attackers to start from scratch each time we detect one of their exploits: they’re forced to discover a whole new vulnerability, they have to invest the time in learning and analyzing a new attack surface, they must develop a brand new exploitation method. To do that effectively, we need correct and comprehensive fixes.

This is not to minimize the challenges faced by security teams responsible for responding to vulnerability reports. As we said in our 2020 year in review report:

Being able to correctly and comprehensively patch isn't just flicking a switch: it requires investment, prioritization, and planning. It also requires developing a patching process that balances both protecting users quickly and ensuring it is comprehensive, which can at times be in tension. While we expect that none of this will come as a surprise to security teams in an organization, this analysis is a good reminder that there is still more work to be done.

Exactly what investments are likely required depends on each unique situation, but we see some common themes around staffing/resourcing, incentive structures, process maturity, automation/testing, release cadence, and partnerships.

 

Practically, some of the following efforts can help ensure bugs are correctly and comprehensively fixed. Project Zero plans to continue to help with the following efforts, but we hope and encourage platform security teams and other independent security researchers to invest in these types of analyses as well:

  • Root cause analysis

Understanding the underlying vulnerability that is being exploited. Also tries to understand how that vulnerability may have been introduced. Performing a root cause analysis can help ensure that a fix is addressing the underlying vulnerability and not just breaking the proof-of-concept. Root cause analysis is generally a pre-requisite for successful variant and patch analysis.

  • Variant analysis

Looking for other vulnerabilities similar to the reported vulnerability. This can involve looking for the same bug pattern elsewhere, more thoroughly auditing the component that contained the vulnerability, modifying fuzzers to understand why they didn’t find the vulnerability previously, etc. Most researchers find more than one vulnerability at the same time. By finding and fixing the related variants, attackers are not able to simply “plug and play” with a new vulnerability once the original is patched.

  • Patch analysis

Analyzing the proposed (or released) patch for completeness compared to the root cause vulnerability. I encourage vendors to share how they plan to address the vulnerability with the vulnerability reporter early so the reporter can analyze whether the patch comprehensively addresses the root cause of the vulnerability, alongside the vendor’s own internal analysis.

  • Exploit technique analysis

Understanding the primitive gained from the vulnerability and how it’s being used. While it’s generally industry-standard to patch vulnerabilities, mitigating exploit techniques doesn’t happen as frequently. While not every exploit technique will always be able to be mitigated, the hope is that it will become the default rather than the exception. Exploit samples will need to be shared more readily in order for vendors and security researchers to be able to perform exploit technique analysis.

Transparently sharing these analyses helps the industry as a whole as well. We publish our analyses at this repository. We encourage vendors and others to publish theirs as well. This allows developers and security professionals to better understand what the attackers already know about these bugs, which hopefully leads to even better solutions and security overall.  

The curious tale of a fake Carrier.app

23 June 2022 at 16:01

Posted by Ian Beer, Google Project Zero


NOTE: This issue was CVE-2021-30983 was fixed in iOS 15.2 in December 2021. 


Towards the end of 2021 Google's Threat Analysis Group (TAG) shared an iPhone app with me:

App splash screen showing the Vodafone carrier logo and the text My Vodafone.

App splash screen showing the Vodafone carrier logo and the text "My Vodafone" (not the legitimate Vodadone app)



Although this looks like the real My Vodafone carrier app available in the App Store, it didn't come from the App Store and is not the real application from Vodafone. TAG suspects that a target receives a link to this app in an SMS, after the attacker asks the carrier to disable the target's mobile data connection. The SMS claims that in order to restore mobile data connectivity, the target must install the carrier app and includes a link to download and install this fake app.


This sideloading works because the app is signed with an enterprise certificate, which can be purchased for $299 via the Apple Enterprise developer program. This program allows an eligible enterprise to obtain an Apple-signed embedded.mobileprovision file with the ProvisionsAllDevices key set. An app signed with the developer certificate embedded within that mobileprovision file can be sideloaded on any iPhone, bypassing Apple's App Store review process. While we understand that the Enterprise developer program is designed for companies to push "trusted apps" to their staff's iOS devices, in this case, it appears that it was being used to sideload this fake carrier app.


In collaboration with Project Zero, TAG has published an additional post with more details around the targeting and the actor. The rest of this blogpost is dedicated to the technical analysis of the app and the exploits contained therein.

App structure

The app is broken up into multiple frameworks. InjectionKit.framework is a generic privilege escalation exploit wrapper, exposing the primitives you'd expect (kernel memory access, entitlement injection, amfid bypasses) as well as higher-level operations like app installation, file creation and so on.


Agent.framework is partially obfuscated but, as the name suggests, seems to be a basic agent able to find and exfiltrate interesting files from the device like the Whatsapp messages database.


Six privilege escalation exploits are bundled with this app. Five are well-known, publicly available N-day exploits for older iOS versions. The sixth is not like those others at all.


This blog post is the story of the last exploit and the month-long journey to understand it.

Something's missing? Or am I missing something?

Although all the exploits were different, five of them shared a common high-level structure. An initial phase where the kernel heap was manipulated to control object placement. Then the triggering of a kernel vulnerability followed by well-known steps to turn that into something useful, perhaps by disclosing kernel memory then building an arbitrary kernel memory write primitive.


The sixth exploit didn't have anything like that.


Perhaps it could be triggering a kernel logic bug like Linuz Henze's Fugu14 exploit, or a very bad memory safety issue which gave fairly direct kernel memory access. But neither of those seemed very plausible either. It looked, quite simply, like an iOS kernel exploit from a decade ago, except one which was first quite carefully checking that it was only running on an iPhone 12 or 13.


It contained log messages like:


  printf("Failed to prepare fake vtable: 0x%08x", ret);


which seemed to happen far earlier than the exploit could possibly have defeated mitigations like KASLR and PAC.


Shortly after that was this log message:


  printf("Waiting for R/W primitives...");


Why would you need to wait?


Then shortly after that:


  printf("Memory read/write and callfunc primitives ready!");


Up to that point the exploit made only four IOConnectCallMethod calls and there were no other obvious attempts at heap manipulation. But there was another log message which started to shed some light:


  printf("Unexpected data read from DCP: 0x%08x", v49);

DCP?

In October 2021 Adam Donenfeld tweeted this:


Screenshot of Tweet from @doadam on 11 Oct 2021, which is a retweet from @AmarSaar on 11 October 2021. The tweet from @AmarSaar reads 'So, another IOMFB vulnerability was exploited ITW (15.0.2). I bindiffed the patch and built a POC. And, because it's a great bug, I just finished writing a short blogpost with the tech details, to share this knowledge :) Check it out! https://saaramar.github.io/IOMFB_integer_overflow_poc/' and the retweet from @doadam reads 'This has been moved to the display coprocessor (DCP) starting from 15, at least on iPhone 12 (and most probably other ones as well)'.

DCP is the "Display Co-Processor" which ships with iPhone 12 and above and all M1 Macs.


There's little public information about the DCP; the most comprehensive comes from the Asahi linux project which is porting linux to M1 Macs. In their August 2021 and September 2021 updates they discussed their DCP reverse-engineering efforts and the open-source DCP client written by @alyssarzg. Asahi describe the DCP like this:


On most mobile SoCs, the display controller is just a piece of hardware with simple registers. While this is true on the M1 as well, Apple decided to give it a twist. They added a coprocessor to the display engine (called DCP), which runs its own firmware (initialized by the system bootloader), and moved most of the display driver into the coprocessor. But instead of doing it at a natural driver boundary… they took half of their macOS C++ driver, moved it into the DCP, and created a remote procedure call interface so that each half can call methods on C++ objects on the other CPU!

https://asahilinux.org/2021/08/progress-report-august-2021/


The Asahi linux project reverse-engineered the API to talk to the DCP but they are restricted to using Apple's DCP firmware (loaded by iBoot) - they can't use a custom DCP firmware. Consequently their documentation is limited to the DCP RPC API with few details of the DCP internals.

Setting the stage

Before diving into DCP internals it's worth stepping back a little. What even is a co-processor in a modern, highly integrated SoC (System-on-a-Chip) and what might the consequences of compromising it be?


Years ago a co-processor would likely have been a physically separate chip. Nowadays a large number of these co-processors are integrated along with their interconnects directly onto a single die, even if they remain fairly independent systems. We can see in this M1 die shot from Tech Insights that the CPU cores in the middle and right hand side take up only around 10% of the die:

M1 die-shot from techinsights.com with possible location of DCP highlighted.

M1 die-shot from techinsights.com with possible location of DCP added

https://www.techinsights.com/blog/two-new-apple-socs-two-market-events-apple-a14-and-m1


Companies like SystemPlus perform very thorough analysis of these dies. Based on their analysis the DCP is likely the rectangular region indicated on this M1 die. It takes up around the same amount of space as the four high-efficiency cores seen in the centre, though it seems to be mostly SRAM.


With just this low-resolution image it's not really possible to say much more about the functionality or capabilities of the DCP and what level of system access it has. To answer those questions we'll need to take a look at the firmware.

My kingdom for a .dSYM!

The first step is to get the DCP firmware image. iPhones (and now M1 macs) use .ipsw files for system images. An .ipsw is really just a .zip archive and the Firmware/ folder in the extracted .zip contains all the firmware for the co-processors, modems etc. The DCP firmware is this file:



  Firmware/dcp/iphone13dcp.im4p


The im4p in this case is just a 25 byte header which we can discard:


  $ dd if=iphone13dcp.im4p of=iphone13dcp bs=25 skip=1

  $ file iphone13dcp

  iphone13dcp: Mach-O 64-bit preload executable arm64


It's a Mach-O! Running nm -a to list all symbols shows that the binary has been fully stripped:


  $ nm -a iphone13dcp

  iphone13dcp: no symbols


Function names make understanding code significantly easier. From looking at the handful of strings in the exploit some of them looked like they might be referencing symbols in a DCP firmware image ("M3_CA_ResponseLUT read: 0x%08x" for example) so I thought perhaps there might be a DCP firmware image where the symbols hadn't been stripped.


Since the firmware images are distributed as .zip files and Apple's servers support range requests with a bit of python and the partialzip tool we can relatively easily and quickly get every beta and release DCP firmware. I checked over 300 distinct images; every single one was stripped.


Guess we'll have to do this the hard way!

Day 1; Instruction 1

$ otool -h raw_fw/iphone13dcp

raw_fw/iphone13dcp:

Mach header

magic      cputype   cpusubtype caps filetype ncmds sizeofcmds flags

0xfeedfacf 0x100000C 0          0x00 5        5     2240       0x00000001


That cputype is plain arm64 (ArmV8) without pointer authentication support. The binary is fairly large (3.7MB) and IDA's autoanalysis detects over 7000 functions.


With any brand new binary I usually start with a brief look through the function names and the strings. The binary is stripped so there are no function name symbols but there are plenty of C++ function names as strings:


A short list of C++ prototypes like IOMFB::UPBlock_ALSS::init(IOMFB::UPPipe *).

The cross-references to those strings look like this:


log(0x40000001LL,

    "UPBlock_ALSS.cpp",

    341,

    "%s: capture buffer exhausted, aborting capture\n",

    "void IOMFB::UPBlock_ALSS::send_data(uint64_t, uint32_t)");


This is almost certainly a logging macro which expands __FILE__, __LINE__ and __PRETTY_FUNCTION__. This allows us to start renaming functions and finding vtable pointers.

Object structure

From the Asahi linux blog posts we know that the DCP is using an Apple-proprietary RTOS called RTKit for which there is very little public information. There are some strings in the binary with the exact version:


ADD  X8, X8, #aLocalIphone13d@PAGEOFF ; "local-iphone13dcp.release"

ADD  X9, X9, #aRtkitIos182640@PAGEOFF ; "RTKit_iOS-1826.40.9.debug"


The code appears to be predominantly C++. There appear to be multiple C++ object hierarchies; those involved with this vulnerability look a bit like IOKit C++ objects. Their common base class looks like this:


struct __cppobj RTKIT_RC_RTTI_BASE

{

  RTKIT_RC_RTTI_BASE_vtbl *__vftable /*VFT*/;

  uint32_t refcnt;

  uint32_t typeid;

};


(These structure definitions are in the format IDA uses for C++-like objects)


The RTKit base class has a vtable pointer, a reference count and a four-byte Run Time Type Information (RTTI) field - a 4-byte ASCII identifier like BLHA, WOLO, MMAP, UNPI, OSST, OSBO and so on. These identifiers look a bit cryptic but they're quite descriptive once you figure them out (and I'll describe the relevant ones as we encounter them.)


The base type has the following associated vtable:


struct /*VFT*/ RTKIT_RC_RTTI_BASE_vtbl

{

  void (__cdecl *take_ref)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *drop_ref)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *take_global_type_ref)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *drop_global_type_ref)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *getClassName)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *dtor_a)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *unk)(RTKIT_RC_RTTI_BASE *this);

};

Exploit flow

The exploit running in the app starts by opening an IOKit user client for the AppleCLCD2 service. AppleCLCD seems to be the application processor of IOMobileFrameBuffer and AppleCLCD2 the DCP version.


The exploit only calls 3 different external method selectors on the AppleCLCD2 user client: 68, 78 and 79.


The one with the largest and most interesting-looking input is 78, which corresponds to this user client method in the kernel driver:


IOReturn

IOMobileFramebufferUserClient::s_set_block(

  IOMobileFramebufferUserClient *this,

  void *reference,

  IOExternalMethodArguments *args)

{

  const unsigned __int64 *extra_args;

  u8 *structureInput;

  structureInput = args->structureInput;

  if ( structureInput && args->scalarInputCount >= 2 )

  {

    if ( args->scalarInputCount == 2 )

      extra_args = 0LL;

    else

      extra_args = args->scalarInput + 2;

    return this->framebuffer_ap->set_block_dcp(

             this->task,

             args->scalarInput[0],

             args->scalarInput[1],

             extra_args,

             args->scalarInputCount - 2,

             structureInput,

             args->structureInputSize);

  } else {

    return 0xE00002C2;

  }

}


this unpacks the IOConnectCallMethod arguments and passes them to:


IOMobileFramebufferAP::set_block_dcp(

        IOMobileFramebufferAP *this,

        task *task,

        int first_scalar_input,

        int second_scalar_input,

        const unsigned __int64 *pointer_to_remaining_scalar_inputs,

        unsigned int scalar_input_count_minus_2,

        const unsigned __int8 *struct_input,

        unsigned __int64 struct_input_size)


This method uses some autogenerated code to serialise the external method arguments into a buffer like this:


arg_struct:

{

  struct task* task

  u64 scalar_input_0

  u64 scalar_input_1

  u64[] remaining_scalar_inputs

  u64 cntExtraScalars

  u8[] structInput

  u64 CntStructInput

}


which is then passed to UnifiedPipeline2::rpc along with a 4-byte ASCII method identifier ('A435' here):


UnifiedPipeline2::rpc(

      'A435',

      arg_struct,

      0x105Cu,

      &retval_buf,

      4u);


UnifiedPipeline2::rpc calls DCPLink::rpc which calls AppleDCPLinkService::rpc to perform one more level of serialisation which packs the method identifier and a "stream identifier" together with the arg_struct shown above.


AppleDCPLinkService::rpc then calls rpc_caller_gated to allocate space in a shared memory buffer, copy the buffer into there then signal to the DCP that a message is available.


Effectively the implementation of the IOMobileFramebuffer user client has been moved on to the DCP and the external method interface is now a proxy shim, via shared memory, to the actual implementations of the external methods which run on the DCP.

Exploit flow: the other side

The next challenge is to find where the messages start being processed on the DCP. Looking through the log strings there's a function which is clearly called ​​rpc_callee_gated - quite likely that's the receive side of the function rpc_caller_gated we saw earlier.


rpc_callee_gated unpacks the wire format then has an enormous switch statement which maps all the 4-letter RPC codes to function pointers:


      switch ( rpc_id )

      {

        case 'A000':

          goto LABEL_146;

        case 'A001':

          handler_fptr = callback_handler_A001;

          break;

        case 'A002':

          handler_fptr = callback_handler_A002;

          break;

        case 'A003':

          handler_fptr = callback_handler_A003;

          break;

        case 'A004':

          handler_fptr = callback_handler_A004;

          break;

        case 'A005':

          handler_fptr = callback_handler_A005;

          break;


At the the bottom of this switch statement is the invocation of the callback handler:


ret = handler_fptr(

          meta,

          in_struct_ptr,

          in_struct_size,

          out_struct_ptr,

          out_struct_size);


in_struct_ptr points to a copy of the serialised IOConnectCallMethod arguments we saw being serialized earlier on the application processor:


arg_struct:

{

  struct task* task

  u64 scalar_input_0

  u64 scalar_input_1

  u64[] remaining_scalar_inputs

  u32 cntExtraScalars

  u8[] structInput

  u64 cntStructInput

}


The callback unpacks that buffer and calls a C++ virtual function:


unsigned int

callback_handler_A435(

  u8* meta,

  void *args,

  uint32_t args_size,

  void *out_struct_ptr,

  uint32_t out_struct_size

{

  int64 instance_id;

  uint64_t instance;

  int err;

  int retval;

  unsigned int result;

  instance_id = meta->instance_id;

  instance =

    global_instance_table[instance_id].IOMobileFramebufferType;

  if ( !instance ) {

    log_fatal(

      "IOMFB: %s: no instance for instance ID: %u\n",

      "static T *IOMFB::InstanceTracker::instance"

        "(IOMFB::InstanceTracker::tracked_entity_t, uint32_t)"

        " [T = IOMobileFramebuffer]",

      instance_id);

  }

  err = (instance-16)->vtable_0x378( // virtual call

         (instance-16),

         args->task,

         args->scalar_input_0,

         args->scalar_input_1,

         args->remaining_scalar_inputs,

         args->cntExtraScalars,

         args->structInput,

         args->cntStructInput);

  retval = convert_error(err);

  result = 0;

  *(_DWORD *)out_struct_ptr = retval;

  return result;

}


The challenge here is to figure out where that virtual call goes. The object is being looked up in a global table based on the instance id. We can't just set a breakpoint and whilst emulating the firmware is probably possible that would likely be a long project in itself. I took a hackier approach: we know that the vtable needs to be at least 0x380 bytes large so just go through all those vtables, decompile them and see if the prototypes look reasonable!


There's one clear match in the vtable for the UNPI type:

UNPI::set_block(

        UNPI* this,

        struct task* caller_task_ptr,

        unsigned int first_scalar_input,

        int second_scalar_input,

        int *remaining_scalar_inputs,

        uint32_t cnt_remaining_scalar_inputs,

        uint8_t *structure_input_buffer,

        uint64_t structure_input_size)


Here's my reversed implementation of UNPI::set_block

UNPI::set_block(

        UNPI* this,

        struct task* caller_task_ptr,

        unsigned int first_scalar_input,

        int second_scalar_input,

        int *remaining_scalar_inputs,

        uint32_t cnt_remaining_scalar_inputs,

        uint8_t *structure_input_buffer,

        uint64_t structure_input_size)

{

  struct block_handler_holder *holder;

  struct metadispatcher metadisp;

  if ( second_scalar_input )

    return 0x80000001LL;

  holder = this->UPPipeDCP_H13P->block_handler_holders;

  if ( !holder )

    return 0x8000000BLL;

  metadisp.address_of_some_zerofill_static_buffer = &unk_3B8D18;

  metadisp.handlers_holder = holder;

  metadisp.structure_input_buffer = structure_input_buffer;

  metadisp.structure_input_size = structure_input_size;

  metadisp.remaining_scalar_inputs = remaining_scalar_inputs;

  metadisp.cnt_remaining_sclar_input = cnt_remaining_scalar_inputs;

  metadisp.some_flags = 0x40000000LL;

  metadisp.dispatcher_fptr = a_dispatcher;

  metadisp.offset_of_something_which_looks_serialization_related = &off_1C1308;

  return metadispatch(holder, first_scalar_input, 1, caller_task_ptr, structure_input_buffer, &metadisp, 0);

}


This method wraps up the arguments into another structure I've called metadispatcher:


struct __attribute__((aligned(8))) metadispatcher

{

 uint64_t address_of_some_zerofill_static_buffer;

 uint64_t some_flags;

 __int64 (__fastcall *dispatcher_fptr)(struct metadispatcher *, struct BlockHandler *, __int64, _QWORD);

 uint64_t offset_of_something_which_looks_serialization_related;

 struct block_handler_holder *handlers_holder;

 uint64_t structure_input_buffer;

 uint64_t structure_input_size;

 uint64_t remaining_scalar_inputs;

 uint32_t cnt_remaining_sclar_input;

};


That metadispatcher object is then passed to this method:


return metadispatch(holder, first_scalar_input, 1, caller_task_ptr, structure_input_buffer, &metadisp, 0);


In there we reach this code:


  block_type_handler = lookup_a_handler_for_block_type_and_subtype(

                         a1,

                         first_scalar_input, // block_type

                         a3);                // subtype


The exploit calls this set_block external method twice, passing two different values for first_scalar_input, 7 and 19. Here we can see that those correspond to looking up two different block handler objects here.


The lookup function searches a linked list of block handler structures; the head of the list is stored at offset 0x1448 in the UPPipeDCP_H13P object and registered dynamically by a method I've named add_handler_for_block_type:


add_handler_for_block_type(struct block_handler_holder *handler_list,

                           struct BlockHandler *handler)

The logging code tells us that this is in a file called IOMFBBlockManager.cpp. IDA finds 44 cross-references to this method, indicating that there are probably that many different block handlers. The structure of each registered block handler is something like this:


struct __cppobj BlockHandler : RTKIT_RC_RTTI_BASE

{

  uint64_t field_16;

  struct handler_inner_types_entry *inner_types_array;

  uint32_t n_inner_types_array_entries;

  uint32_t field_36;

  uint8_t can_run_without_commandgate;

  uint32_t block_type;

  uint64_t list_link;

  uint64_t list_other_link;

  uint32_t some_other_type_field;

  uint32_t some_other_type_field2;

  uint32_t expected_structure_io_size;

  uint32_t field_76;

  uint64_t getBlock_Impl;

  uint64_t setBlock_Impl;

  uint64_t field_96;

  uint64_t back_ptr_to_UPPipeDCP_H13P;

};


The RTTI type is BLHA (BLock HAndler.)


For example, here's the codepath which builds and registers block handler type 24:


BLHA_24 = (struct BlockHandler *)CXXnew(112LL);

BLHA_24->__vftable = (BlockHandler_vtbl *)BLHA_super_vtable;

BLHA_24->block_type = 24;

BLHA_24->refcnt = 1;

BLHA_24->can_run_without_commandgate = 0;

BLHA_24->some_other_type_field = 0LL;

BLHA_24->expected_structure_io_size = 0xD20;

typeid = typeid_BLHA();

BLHA_24->typeid = typeid;

modify_typeid_ref(typeid, 1);

BLHA_24->__vftable = vtable_BLHA_subclass_type_24;

BLHA_24->inner_types_array = 0LL;

BLHA_24->n_inner_types_array_entries = 0;

BLHA_24->getBlock_Impl = BLHA_24_getBlock_Impl;

BLHA_24->setBlock_Impl = BLHA_24_setBlock_Impl;

BLHA_24->field_96 = 0LL;

BLHA_24->back_ptr_to_UPPipeDCP_H13P = a1;

add_handler_for_block_type(list_holder, BLHA_24);


Each block handler optionally has getBlock_Impl and setBlock_Impl function pointers which appear to implement the actual setting and getting operations.


We can go through all the callsites which add block handlers; tell IDA the type of the arguments and name all the getBlock and setBlock implementations:


The IDA Pro Names window showing a list of symbols like BLHA_15_getBlock_Impl.

You can perhaps see where this is going: that's looking like really quite a lot of reachable attack surface! Each of those setBlock_Impl functions is reachable by passing a different value for the first scalar argument to IOConnectCallMethod 78.


There's a little bit more reversing though to figure out how exactly to get controlled bytes to those setBlock_Impl functions:

Memory Mapping

The raw "block" input to each of those setBlock_Impl methods isn't passed inline in the IOConnectCallMethod structure input. There's another level of indirection: each individual block handler structure has an array of supported "subtypes" which contains metadata detailing where to find the (userspace) pointer to that subtype's input data in the IOConnectCallMethod structure input. The first dword in the structure input is the id of this subtype - in this case for the block handler type 19 the metadata array has a single entry:


<2, 0, 0x5F8, 0x600>


The first value (2) is the subtype id and 0x5f8 and 0x600 tell the DCP from what offset in the structure input data to read a pointer and size from. The DCP then requests a memory mapping from the AP for that memory from the calling task:


return wrap_MemoryDescriptor::withAddressRange(

  *(void*)(structure_input_buffer + addr_offset),

  *(unsigned int *)(structure_input_buffer + size_offset),

  caller_task_ptr);


We saw earlier that the AP sends the DCP the struct task pointer of the calling task; when the DCP requests a memory mapping from a user task it sends those raw task struct pointers back to the AP such that the kernel can perform the mapping from the correct task. The memory mapping is abstracted as an MDES object on the DCP side; the implementation of the mapping involves the DCP making an RPC to the AP:


make_link_call('D453', &req, 0x20, &resp, 0x14);


which corresponds to a call to this method on the AP side:


IOMFB::MemDescRelay::withAddressRange(unsigned long long, unsigned long long, unsigned int, task*, unsigned long*, unsigned long long*)


The DCP calls ::prepare and ::map on the returned MDES object (exactly like an IOMemoryDescriptor object in IOKit), gets the mapped pointer and size to pass via a few final levels of indirection to the block handler:


a_descriptor_with_controlled_stuff->dispatcher_fptr(

  a_descriptor_with_controlled_stuff,

  block_type_handler,

  important_ptr,

  important_size);


where the dispatcher_fptr looks like this:


a_dispatcher(

        struct metadispatcher *disp,

        struct BlockHandler *block_handler,

        __int64 controlled_ptr,

        unsigned int controlled_size)

{

  return block_handler->BlockHandler_setBlock(

           block_handler,

           disp->structure_input_buffer,

           disp->structure_input_size,

           disp->remaining_scalar_inputs,

           disp->cnt_remaining_sclar_input,

           disp->handlers_holder->gate,

           controlled_ptr,

           controlled_size);

}


You can see here just how useful it is to keep making structure definitions while reversing; there are so many levels of indirection that it's pretty much impossible to keep it all in your head.


BlockHandler_setBlock is a virtual method on BLHA. This is the implementation for BLHA 19:


BlockHandler19::setBlock(

  struct BlockHandler *this,

  void *structure_input_buffer,

  int64 structure_input_size,

  int64 *remaining_scalar_inputs,

  unsigned int cnt_remaining_scalar_inputs,

  struct CommandGate *gate,

  void* mapped_mdesc_ptr,

  unsigned int mapped_mdesc_length)


This uses a Command Gate (GATI) object (like a call gate in IOKit to serialise calls) to finally get close to actually calling the setBlock_Impl function.


We need to reverse the gate_context structure to follow the controlled data through the gate:


struct __attribute__((aligned(8))) gate_context

{

 struct BlockHandler *the_target_this;

 uint64_t structure_input_buffer;

 void *remaining_scalar_inputs;

 uint32_t cnt_remaining_scalar_inputs;

 uint32_t field_28;

 uint64_t controlled_ptr;

 uint32_t controlled_length;

};


The callgate object uses that context object to finally call the BLHA setBlock handler:


callback_used_by_callgate_in_block_19_setBlock(

  struct UnifiedPipeline *parent_pipeline,

  struct gate_context *context,

  int64 a3,

  int64 a4,

  int64 a5)

{

  return context->the_target_this->setBlock_Impl)(

           context->the_target_this->back_ptr_to_UPPipeDCP_H13P,

           context->structure_input_buffer,

           context->remaining_scalar_inputs,

           context->cnt_remaining_scalar_inputs,

           context->controlled_ptr,

           context->controlled_length);

}

SetBlock_Impl

And finally we've made it through the whole callstack following the controlled data from IOConnectCallMethod in userspace on the AP to the setBlock_Impl methods on the DCP!


The prototype of the setBlock_Impl methods looks like this:


setBlock_Impl(struct UPPipeDCP_H13P *pipe_parent,

              void *structure_input_buffer,

              int *remaining_scalar_inputs,

              int cnt_remaining_scalar_inputs,

              void* ptr_via_memdesc,

              unsigned int len_of_memdesc_mapped_buf)


The exploit calls two setBlock_Impl methods; 7 and 19. 7 is fairly simple and seems to just be used to put controlled data in a known location. 19 is the buggy one. From the log strings we can tell that block type 19 handler is implemented in a file called UniformityCompensator.cpp.


Uniformity Compensation is a way to correct for inconsistencies in brightness and colour reproduction across a display panel. Block type 19 sets and gets a data structure containing this correction information. The setBlock_Impl method calls UniformityCompensator::set and reaches the following code snippet where controlled_size is a fully-controlled u32 value read from the structure input and indirect_buffer_ptr points to the mapped buffer, the contents of which are also controlled:

uint8_t* pages = compensator->inline_buffer; // +0x24

for (int pg_cnt = 0; pg_cnt < 3; pg_cnt++) {

  uint8_t* this_page = pages;

  for (int i = 0; i < controlled_size; i++) {

    memcpy(this_page, indirect_buffer_ptr, 4 * controlled_size);

    indirect_buffer_ptr += 4 * controlled_size;

    this_page += 0x100;

  }

  pages += 0x4000;

}

There's a distinct lack of bounds checking on controlled_size. Based on the structure of the code it looks like it should be restricted to be less than or equal to 64 (as that would result in the input being completely copied to the output buffer.) The compensator->inline_buffer buffer is inline in the compensator object. The structure of the code makes it look that that buffer is probably 0xc000 (three 16k pages) large. To verify this we need to find the allocation site of this compensator object.


It's read from the pipe_parent object and we know that at this point pipe_parent is a UPPipeDCP_H13P object.


There's only one write to that field, here in UPPipeDCP_H13P::setup_tunables_base_target:


compensator = CXXnew(0xC608LL);

...

this->compensator = compensator;


The compensator object is a 0xc608 byte allocation; the 0xc000 sized buffer starts at offset 0x24 so the allocation has enough space for 0xc608-0x24=0xC5E4 bytes before corrupting neighbouring objects.


The structure input provided by the exploit for the block handler 19 setBlock call looks like this:


struct_input_for_block_handler_19[0x5F4] = 70; // controlled_size

struct_input_for_block_handler_19[0x5F8] = address;

struct_input_for_block_handler_19[0x600] = a_size;


This leads to a value of 70 (0x46) for controlled_size in the UniformityCompensator::set snippet shown earlier. (0x5f8 and 0x600 correspond to the offsets we saw earlier in the subtype's table:  <2, 0, 0x5F8, 0x600>)


The inner loop increments the destination pointer by 0x100 each iteration so 0x46 loop iterations will write 0x4618 bytes.


The outer loop writes to three subsequent 0x4000 byte blocks so the third (final) iteration starts writing at 0x24 + 0x8000 and writes a total of 0x4618 bytes, meaning the object would need to be 0xC63C bytes; but we can see that it's only 0xc608, meaning that it will overflow the allocation size by 0x34 bytes. The RTKit malloc implementation looks like it adds 8 bytes of metadata to each allocation so the next object starts at 0xc610.


How much input is consumed? The input is fully consumed with no "rewinding" so it's 3*0x46*0x46*4 = 0xe5b0 bytes. Working backwards from the end of that buffer we know that the final 0x34 bytes of it go off the end of the 0xc608 allocation which means +0xe57c in the input buffer will be the first byte which corrupts the 8 metadata bytes and +0x8584 will be the first byte to corrupt the next object:


A diagram showing that the end of the overflower object overlaps with the metadata and start of the target object.

This matches up exactly with the overflow object which the exploit builds:


  v24 = address + 0xE584;

  v25 = *(_DWORD *)&v54[48];

  v26 = *(_OWORD *)&v54[32];

  v27 = *(_OWORD *)&v54[16];

  *(_OWORD *)(address + 0xE584) = *(_OWORD *)v54;

  *(_OWORD *)(v24 + 16) = v27;

  *(_OWORD *)(v24 + 32) = v26;

  *(_DWORD *)(v24 + 48) = v25;


The destination object seems to be allocated very early and the DCP RTKit environment appears to be very deterministic with no ASLR. Almost certainly they are attempting to corrupt a neighbouring C++ object with a fake vtable pointer.


Unfortunately for our analysis the trail goes cold here and we can't fully recreate the rest of the exploit. The bytes for the fake DCP C++ object are read from a file in the app's temporary directory (base64 encoded inside a JSON file under the exploit_struct_offsets key) and I don't have a copy of that file. But based on the flow of the rest of the exploit it's pretty clear what happens next:

sudo make me a DART mapping

The DCP, like other coprocessors on iPhone, sits behind a DART (Device Address Resolution Table.) This is like an SMMU (IOMMU in the x86 world) which forces an extra layer of physical address lookup between the DCP and physical memory. DART was covered in great detail in Gal Beniamini's Over The Air - Vol. 2, Pt. 3 blog post.


The DCP clearly needs to access lots of buffers owned by userspace tasks as well as memory managed by the kernel. To do this the DCP makes RPC calls back to the AP which modifies the DART entries accordingly. This appears to be exactly what the DCP exploit does: the D45X family of DCP->AP RPC methods appear to expose an interface for requesting arbitrary physical as well as virtual addresses to be mapped into the DCP DART.


The fake C++ object is most likely a stub which makes such calls on behalf of the exploit, allowing the exploit to read and write kernel memory.

Conclusions

Segmentation and isolation are in general a positive thing when it comes to security. However, splitting up an existing system into separate, intercommunicating parts can end up exposing unexpected code in unexpected ways.


We've had discussions within Project Zero about whether this DCP vulnerability is interesting at all. After all, if the UniformityCompensator code was going to be running on the Application Processors anyway then the Display Co-Processor didn't really introduce or cause this bug.


Whilst that's true, it's also the case that the DCP certainly made exploitation of this bug significantly easier and more reliable than it would have been on the AP. Apple has invested heavily in memory corruption mitigations over the last few years, so moving an attack surface from a "mitigation heavy" environment to a "mitigation light" one is a regression in that sense.


Another perspective is that the DCP just isn't isolated enough; perhaps the intention was to try to isolate the code on the DCP such that even if it's compromised it's limited in the effect it could have on the entire system. For example, there might be models where the DCP to AP RPC interface is much more restricted.


But again there's a tradeoff: the more restrictive the RPC API, the more the DCP code has to be refactored - a significant investment. Currently, the codebase relies on being able to map arbitrary memory and the API involves passing userspace pointers back and forth.


I've discussed in recent posts how attackers tend to be ahead of the curve. As the curve slowly shifts towards memory corruption exploitation getting more expensive, attackers are likely shifting too. We saw that in the logic-bug sandbox escape used by NSO and we see that here in this memory-corruption-based privilege escalation that side-stepped kernel mitigations by corrupting memory on a co-processor instead. Both are quite likely to continue working in some form in a post-memory tagging world. Both reveal the stunning depth of attack surface available to the motivated attacker. And both show that defensive security research still has a lot of work to do.

An Autopsy on a Zombie In-the-Wild 0-day

14 June 2022 at 16:00

Posted by Maddie Stone, Google Project Zero

Whenever there’s a new in-the-wild 0-day disclosed, I’m very interested in understanding the root cause of the bug. This allows us to then understand if it was fully fixed, look for variants, and brainstorm new mitigations. This blog is the story of a “zombie” Safari 0-day and how it came back from the dead to be disclosed as exploited in-the-wild in 2022. CVE-2022-22620 was initially fixed in 2013, reintroduced in 2016, and then disclosed as exploited in-the-wild in 2022. If you’re interested in the full root cause analysis for CVE-2022-22620, we’ve published it here.

In the 2020 Year in Review of 0-days exploited in the wild, I wrote how 25% of all 0-days detected and disclosed as exploited in-the-wild in 2020 were variants of previously disclosed vulnerabilities. Almost halfway through 2022 and it seems like we’re seeing a similar trend. Attackers don’t need novel bugs to effectively exploit users with 0-days, but instead can use vulnerabilities closely related to previously disclosed ones. This blog focuses on just one example from this year because it’s a little bit different from other variants that we’ve discussed before. Most variants we’ve discussed previously exist due to incomplete patching. But in this case, the variant was completely patched when the vulnerability was initially reported in 2013. However, the variant was reintroduced 3 years later during large refactoring efforts. The vulnerability then continued to exist for 5 years until it was fixed as an in-the-wild 0-day in January 2022.

Getting Started

In the case of CVE-2022-22620 I had two pieces of information to help me figure out the vulnerability: the patch (thanks to Apple for sharing with me!) and the description from the security bulletin stating that the vulnerability is a use-after-free. The primary change in the patch was to change the type of the second argument (stateObject) to the function FrameLoader::loadInSameDocument from a raw pointer, SerializedScriptValue* to a reference-counted pointer, RefPtr<SerializedScriptValue>.

trunk/Source/WebCore/loader/FrameLoader.cpp 

1094

1094

// This does the same kind of work that didOpenURL does, except it relies on the fact

1095

1095

// that a higher level already checked that the URLs match and the scrolling is the right thing to do.

1096

 

void FrameLoader::loadInSameDocument(const URL& url, SerializedScriptValue* stateObject, bool isNewNavigation)

 

1096

void FrameLoader::loadInSameDocument(URL url, RefPtr<SerializedScriptValue> stateObject, bool isNewNavigation)

Whenever I’m doing a root cause analysis on a browser in-the-wild 0-day, along with studying the code, I also usually search through commit history and bug trackers to see if I can find anything related. I do this to try and understand when the bug was introduced, but also to try and save time. (There’s a lot of 0-days to be studied! 😀)

The Previous Life

In the case of CVE-2022-22620, I was scrolling through the git blame view of FrameLoader.cpp. Specifically I was looking at the definition of loadInSameDocument. When looking at the git blame for this line prior to our patch, it’s a very interesting commit. The commit was actually changing the stateObject argument from a reference-counted pointer, PassRefPtr<SerializedScriptValue>, to a raw pointer, SerializedScriptValue*. This change from December 2016 introduced CVE-2022-22620. The Changelog even states:


(WebCore::FrameLoader::loadInSameDocument): Take a raw pointer for the

serialized script value state object. No one was passing ownership.

But pass it along to statePopped as a Ref since we need to pass ownership

of the null value, at least for now.

Now I was intrigued and wanted to track down the previous commit that had changed the stateObject argument to PassRefPtr<SerializedScriptValue>. I was in luck and only had to go back in the history two more steps. There was a commit from 2013 that changed the stateObject argument from the raw pointer, SerializedScriptValue*, to a reference-counted pointer, PassRefPtr<SerializedScriptValue>. This commit from February 2013 was doing the same thing that our commit in 2022 was doing. The commit was titled “Use-after-free in SerializedScriptValue::deserialize” and included a good description of how that use-after-free worked.

The commit also included a test:

Added a test that demonstrated a crash due to use-after-free

of SerializedScriptValue.

Test: fast/history/replacestate-nocrash.html

The trigger from this test is:

Object.prototype.__defineSetter__("foo",function(){history.replaceState("", "")});

history.replaceState({foo:1,zzz:"a".repeat(1<<22)}, "");

history.state.length;

My hope was that the test would crash the vulnerable version of WebKit and I’d be done with my root cause analysis and could move on to the next bug. Unfortunately, it didn’t crash.

The commit description included the comment to check out a Chromium bug. (During this time Chromium still used the WebKit rendering engine. Chromium forked the Blink rendering engine in April 2013.) I saw that my now Project Zero teammate, Sergei Glazunov, originally reported the Chromium bug back in 2013, so I asked him for the details.

The use-after-free from 2013 (no CVE was assigned) was a bug in the implementation of the History API. This API allows access to (and modification of) a stack of the pages visited in the current frame, and these page states are stored as a SerializedScriptValue. The History API exposes a getter for state, and a method replaceState which allows overwriting the "most recent" history entry.

The bug was that in the implementation of the getter for
state, SerializedScriptValue::deserialize was called on the current "most recent" history entry value without increasing its reference count. As SerializedScriptValue::deserialize could trigger a callback into user JavaScript, the callback could call replaceState to drop the only reference to the history entry value by replacing it with a new value. When the callback returned, the rest of SerializedScriptValue::deserialize ran with a free'd this pointer.

In order to fix this bug, it appears that the developers decided to change every caller of SerializedScriptValue::deserialize to increase the reference count on the stateObject by changing the argument types from a raw pointer to PassRefPtr<SerializedScriptValue>.  While the originally reported trigger called deserialize on the stateObject through the V8History::stateAccessorGetter function, the developers’ fix also caught and patched the path to deserialize through loadInSameDocument.

The timeline of the changes impacting the stateObject is:

  • HistoryItem.m_stateObject is type RefPtr<SerializedScriptValue>
  • HistoryItem::stateObject() returns SerializedScriptValue*
  • FrameLoader::loadInSameDocument takes stateObject argument as SerializedScriptValue*
  • HistoryItem::stateObject returns a PassRefPtr<SerializedScriptValue>
  • FrameLoader::loadInSameDocument takes stateObject argument as PassRefPtr<SerializedScriptValue>
  • HistoryItem::stateObject returns RefPtr instead of PassRefPtr
  • HistoryItem::stateObject() is changed to return raw pointer instead of RefPtr
  • FrameLoader::loadInSameDocument changed to take stateObject as a raw pointer instead of PassRefPtr<SerializedScriptValue>
  • FrameLoader::loadInSameDocument changed to take stateObject as a RefPtr<SerializedScriptValue>

The Autopsy

When we look at the timeline of changes for FrameLoader::loadInSameDocument it seems that the bug was re-introduced in December 2016 due to refactoring. The question is, why did the patch author think that loadInSameDocument would not need to hold a reference. From the December 2016 commit ChangeLog: Take a raw pointer for the serialized script value state object. No one was passing ownership.

My assessment is that it’s due to the October 2016 changes in HistoryItem:stateObject. When the author was evaluating the refactoring changes needed in the dom directory in December 2016, it would have appeared that the only calls to loadInSameDocument passed in either a null value or the result of stateObject() which as of October 2016 now passed a raw SerializedScriptValue* pointer. When looking at those two options for the type of an argument, then it’s potentially understandable that the developer thought that loadInSameDocument did not need to share ownership of stateObject.

So why then was HistoryItem::stateObject’s return value changed from a RefPtr to a raw pointer in October 2016? That I’m struggling to find an explanation for.

According to the description, the patch in October 2016 was intended to “Replace all uses of ExceptionCodeWithMessage with WebCore::Exception”. However when we look at the ChangeLog it seems that the author decided to also do some (seemingly unrelated) refactoring to HistoryItem. These are some of the only changes in the commit whose descriptions aren’t related to exceptions. As an outsider looking at the commits, it seems that the developer by chance thought they’d do a little “clean-up” while working through the required refactoring on the exceptions. If this was simply an additional ad-hoc step while in the code, rather than the goal of the commit, it seems plausible that the developer and reviewers may not have further traced the full lifetime of HistoryItem::stateObject.

While the change to HistoryItem in October 2016 was not sufficient to introduce the bug, it seems that that change likely contributed to the developer in December 2016 thinking that loadInSameDocument didn’t need to increase the reference count on the stateObject.

Both the October 2016 and the December 2016 commits were very large. The commit in October changed 40 files with 900 additions and 1225 deletions. The commit in December changed 95 files with 1336 additions and 1325 deletions. It seems untenable for any developers or reviewers to understand the security implications of each change in those commits in detail, especially since they’re related to lifetime semantics.

The Zombie

We’ve now tracked down the evolution of changes to fix the 2013 vulnerability…and then revert those fixes… so I got back to identifying the 2022 bug. It’s the same bug, but triggered through a different path. That’s why the 2013 test case wasn’t crashing the version of WebKit that should have been vulnerable to CVE-2022-22620:

  1. The 2013 test case triggers the bug through the V8History::stateAccessorAndGetter path instead of FrameLoader::loadInSameDocument, and
  2. As a part of Sergei’s 2013 bug report there were additional hardening measures put into place that prevented user-code callbacks being processed during deserialization. 

Therefore we needed to figure out how to call loadInSameDocument and instead of using the deserialization to trigger a JavaScript callback, we needed to find another event in the loadInSameDocument function that would trigger the callback to user JavaScript.

To quickly figure out how to call loadInSameDocument, I modified the WebKit source code to trigger a test failure if loadInSameDocument was ever called and then ran all the tests in the fast/history directory. There were 5 out of the 80 tests that called loadInSameDocument:

The tests history-back-forward-within-subframe-hash.html and fast/history/history-traversal-is-asynchronous.html were the most helpful. We can trigger the call to loadInSameDocument by setting the history stack with an object whose location is the same page, but includes a hash. We then call history.back() to go back to that state that includes the URL with the hash. loadInSamePage is responsible for scrolling to that location.

history.pushState("state1", "", location + "#foo");

history.pushState("state2", ""); // current state

history.back(); //goes back to state1, triggering loadInSameDocument

 

Now that I knew how to call loadInSameDocument, I teamed up with Sergei to identify how we could get user code execution sometime during the loadInSameDocument function, but prior to the call to statePopped (FrameLoader.cpp#1158):

m_frame.document()->statePopped(stateObject ? Ref<SerializedScriptValue> { *stateObject } : SerializedScriptValue::nullValue());

The callback to user code would have to occur prior to the call to statePopped because stateObject was cast to a reference there and thus would now be reference-counted. We assumed that this would be the place where the “freed” object was “used”.

If you go down the rabbit hole of the calls made in loadInSameDocument, we find that there is a path to the blur event being dispatched. We could have also used a tool like CodeQL to see if there was a path from loadInSameDocument to dispatchEvent, but in this case we just used manual auditing. The call tree to the blur event is:

FrameLoader::loadInSameDocument

  FrameLoader::scrollToFragmentWithParentBoundary

    FrameView::scrollToFragment

      FrameView::scrollToFragmentInternal

        FocusController::setFocusedElement

          FocusController::setFocusedFrame

            dispatchWindowEvent(Event::create(eventNames().blurEvent, Event::CanBubble::No, Event::IsCancelable::No));

The blur event fires on an element whenever focus is moved from that element to another element. In our case loadInSameDocument is triggered when we need to scroll to a new location within the current page. If we’re scrolling and therefore changing focus to a new element, the blur event is fired on the element that previously had the focus.

The last piece for our trigger is to free the stateObject in the onblur event handler. To do that we call replaceState, which overwrites the current history state with a new object. This causes the final reference to be dropped on the stateObject and it’s therefore free’d. loadInSameDocument still uses the free’d stateObject in its call to statePopped.

input = document.body.appendChild(document.createElement("input"));

a = document.body.appendChild(document.createElement("a"));

a.id = "foo";

history.pushState("state1", "", location + "#foo");

history.pushState("state2", "");

setTimeout(() => {

        input.focus();

        input.onblur = () => history.replaceState("state3", "");

        setTimeout(() => history.back(), 1000);

}, 1000);

In both the 2013 and 2022 cases, the root vulnerability is that the stateObject is not correctly reference-counted. In 2013, the developers did a great job of patching all the different paths to trigger the vulnerability, not just the one in the submitted proof-of-concept. This meant that they had also killed the vulnerability in loadInSameDocument. The refactoring in December 2016 then revived the vulnerability to enable it to be exploited in-the-wild and re-patched in 2022.

Conclusion

Usually when we talk about variants, they exist due to incomplete patches: the vendor doesn’t correctly and completely fix the reported vulnerability. However, for CVE-2022-22620 the vulnerability was correctly and completely fixed in 2013. Its fix was just regressed in 2016 during refactoring. We don’t know how long an attacker was exploiting this vulnerability in-the-wild, but we do know that the vulnerability existed (again) for 5 years: December 2016 until January 2022.

There’s no easy answer for what should have been done differently. The developers responding to the initial bug report in 2013 followed a lot of best-practices:

  • Patched all paths to trigger the vulnerability, not just the one in the proof-of-concept. This meant that they patched the variant that would become CVE-2022-22620.
  • Submitted a test case with the patch.
  • Detailed commit messages explaining the vulnerability and how they were fixing it.
  • Additional hardening measures during deserialization.

As an offensive security research team, we can make assumptions about what we believe to be the core challenges facing modern software development teams: legacy code, short reviewer turn-around expectations, refactoring and security efforts are generally under-appreciated and under-rewarded, and lack of memory safety mitigations. Developers and security teams need time to review patches, especially for security issues, and rewarding these efforts, will make a difference. It also will save the vendor resources in the long run. In this case, 9 years after a vulnerability was initially triaged, patched, tested, and released, the whole process had to be duplicated again, but this time under the pressure of in-the-wild exploitation.

While this case study was a 0-day in Safari/WebKit, this is not an issue unique to Safari. Already in 2022, we’ve seen in-the-wild 0-days that are variants of previously disclosed bugs targeting Chromium, Windows, Pixel, and iOS as well. It’s a good reminder that as defenders we all need to stay vigilant in reviewing and auditing code and patches.

❌
❌