Normal view

There are new articles available, click to refresh the page.
Before yesterdayZero Day Initiative - Blog

But You Told Me You Were Safe: Attacking the Mozilla Firefox Sandbox (Part 2)

23 August 2022 at 16:34

In the first part of this series, we reviewed how Pwn2Own contestant Manfred Paul was able to compromise the Mozilla Firefox renderer process via a prototype pollution vulnerability in the await implementation. In modern browser architecture design, compromising the renderer gets us just half the way there, since the sandbox prevents further damage. In this blog post, we discuss a second prototype pollution vulnerability that allowed the execution of attacker-controlled JavaScript in the privileged parent process, escaping the sandbox. This vulnerability is known as CVE-2022-1529 and is tracked as ZDI-22-798 on the Zero Day Initiative advisory page. Mozilla fixed this vulnerability along with the first one in Firefox 100.0.2 via Mozilla Foundation Security Advisory 2022-19.

Root Cause

As described in the previous post, the exploit compromised the renderer by leveraging a prototype pollution vulnerability in some built-in JavaScript code that executes in the renderer process. For the sandbox escape part of the exploit, the researcher used a second prototype pollution vulnerability. This second vulnerability exists in built-in JavaScript code that runs in the fully privileged parent process, also known as the chrome process (not to be confused with Google’s Chrome browser).

How can the sandboxed renderer process affect JavaScript running in the chrome process? The answer is that the renderer can communicate with the chrome process via various interfaces. In fact, some of these interfaces can be reached directly from JavaScript when running in a “privileged” JavaScript context (not to be confused with any OS-level concept of privilege). As we will see, achieving “privileged” JavaScript execution will be the exploit’s first step.

After achieving privileged JavaScript execution, the exploit can reach out to various endpoints for communication with the chrome process. One of the endpoints is called NotificationDB. It is implemented almost entirely in JavaScript. It processes various messages, which it receives via the content process message manager. In the case of a “Notification:Save” message, a “save” task is queued:

After the “save” task is put on the queue, it is handled in the chrome process, in the “taskSave” function:

At [1], both origin and notification.id are taken directly from the message data sent by the renderer, without any validation. This means we can set either of these to any serializable JavaScript value. More specifically, we can set them to any values supported by the structured clone algorithm, since this is the algorithm used to marshal data from the renderer to the chrome process. If we set origin to the string "__proto__", then this.notifications[origin] will not access a normal data property. Instead, it will access the object’s prototype. This prototype is Object.prototype, since this.notifications is a plain Object. This gives us a prototype pollution primitive. It allows us to write any serializable JavaScript value to any property of Object.prototype with only one restriction: the value we write must have an id property that matches the property name we are writing to.

Using this prototype pollution, we can corrupt the global JavaScript state in the chrome process. This affects all JavaScript that runs in the chrome process, far beyond NotificationDB.jsm itself. Since JavaScript execution contexts are largely shared, all chrome-level JavaScript modules are now exposed to unexpected properties in Object.prototype. The exploit will use this corruption to gain chrome-level XSS during tab restoration, leading to native code execution outside the sandbox.

Now that we have a complete picture of what we want to do, let’s begin.

Achieving Privileged JavaScript Execution

As mentioned above, before we can invoke NotificationDB, we need to access a privileged JavaScript context. In particular, what we need is access to an object called components. This is a different object than a much more limited object confusingly also named Components, which is intended to be exposed to untrusted script.

To gain access to components, the attacker script performs the following steps. Note that all this is made possible because the attacker script has already gained full native code execution within the renderer sandbox, as detailed in part one of this series:

        1 -- Mark the current JavaScript compartment as system by setting the corresponding flag in memory.
        2 -- Patch CanCreateWrapper to always return NS_OK. This prevents further security checks on the calling context.
        3 -- Call the GetComponents method to add the components object to the scope.

Triggering the Prototype Pollution Primitive

Once we have obtained the components object, we are nearly ready to trigger the prototype pollution. One obstacle remains: due to the details of Firefox's “cross-compartment” handling of JavaScript objects, the ContentProcessMessageManager object we want to access is hidden behind an opaque proxy object. This can be circumvented by reading the proxy’s underlying object pointer and using a “fakeObj” to convert it to a JavaScript object. We can now call the vulnerable NotificationDB interface:

Remember that a limitation applies to the way that we can overwrite properties of Object.prototype: we can set any property name to any value val, but val.id must equal name. For our purposes, the exact value of val will not matter. Only its string representation is important (more precisely, the result of running the ECMAScript ToString algorithm). The loose type system of JavaScript helps us here. Consider the following array object:

This object has its id property set to the arbitrary string "foo", but ToString will represent the object by just the string "bar". Therefore, as long as we only care about the string representation, we can set any property of Object.prototype to any value we desire.

Leveraging the Prototype Pollution for Sandbox Escape

Consider the following code in browser/components/sessionstore/TabAttributes.jsm, which executes in the chrome process:

Note that a for ... in loop will traverse all properties found in the prototype chain, and not only the properties found on the object itself. Therefore, by invoking the code shown above after we have polluted Object.prototype, we can cause tab.setAttribute to be called with arbitrary parameters. This will set an arbitrary HTML (technically XUL) attribute of a tab.

How can we cause this function to run? It turns out that the only time it is called is during the restoration of tabs. There are multiple ways to trigger this functionality:

        1 -- Session restoration after restarting the browser.
        2 -- Use of the “reopen closed tab” feature (Ctrl+Shift+T).
        3 -- Reactivating a tab after “Tab Unloading”, which occurs when Firefox starts to run out of memory.
        4 -- Automatically restoring a tab after it has crashed.

The first choice is not an option, since restarting the browser would not preserve the polluted prototype. In the real world, waiting for option #2 might work, but it requires user interaction, making it unsuitable for Pwn2Own. It’s also possible to force option #3 by allocating large chunks of memory. However, by default, it takes at least 10 minutes of inactivity before unloading will happen, which exceeds the Pwn2Own time constraint. This leaves just option #4. Fortunately, crashing the renderer process is trivial: we have already achieved memory corruption, and we can simply write to an invalid address to force a segmentation fault.

So far, the sandbox escape exploit proceeds as follows:

        1 -- Trigger the prototype pollution, adding a property and value to Object.prototype in the chrome process. The name/value pair we add corresponds to the parameters we want to pass to tab.setAttribute. For example, if we add a property named "a" with string value "b", then tab.setAttribute will ultimately be invoked with parameters ("a", "b").
        2 -- Open a new background tab. Note that a simple window.open method call without prior user interaction is blocked by the popup blocker. However, the check is entirely renderer-side, and the services.ww.openWindow API obtained from the components object has no such restriction.
        3 -- In this background tab, crash the renderer. The chrome process will immediately restore the background tab. The polluted prototype will cause the tab restoration logic to set our chosen attribute on the tab.

Next, we must consider: what parameters do we want to pass to tab.setAttribute? As the browser UI that contains the tab element is written not in HTML but rather the similar XUL markup language, attributes such as “onload” or “onerror” that are commonly used for XSS do not seem to work. Going through a list of XUL event handlers, there are only two that seem to work without any direct user interaction: “onoverflow” and “onunderflow”. These are triggered when the tab’s title text starts to exceed or no longer exceeds the available space. We can trigger the former by setting a style attribute with the value text-indent: 500px.

Once we have achieved JavaScript execution within the chrome process, there are many ways to complete the sandbox escape. For example, we could disable all sandboxing in the future by setting a preference:

  Services.prefs.setIntPref("security.sandbox.content.level", 0);

Afterward, the exploit could run script in a new tab, which will be created without any sandbox protections. Alternatively, it could run script directly in the chrome process. Either way, the file and process APIs that are available in chrome-level JavaScript can be used to gain native code execution not constrained by any sandbox:

Here is a short video demonstrating running the full exploit against Mozilla Firefox 100.0.1 (64-bit):

Final Notes

Modern browsers process large volumes of data coming from numerous untrusted sources. Modern browser architecture goes a long way towards containing damage in cases where the renderer process is compromised. However, there remain multiple security checks that are performed on the renderer side. We have seen how these checks could be bypassed, ultimately leading to full compromise of the main browser process. In general, it is wise to reduce renderer-side security checks and move them to the main process wherever it is practical.

You can find me on Twitter at @hosselot and follow the team on Twitter or Instagram for the latest in exploit techniques and security patches.

But You Told Me You Were Safe: Attacking the Mozilla Firefox Sandbox (Part 2)

But You Told Me You Were Safe: Attacking the Mozilla Firefox Renderer (Part 1)

18 August 2022 at 15:31

Vulnerabilities and exploits in common targets like browsers are often associated with memory safety issues. Typically this involves either a direct error in memory management or a way to corrupt internal object state in the JavaScript engine. One way to eliminate such memory safety issues is to use a memory-safe language such as Rust or even JavaScript itself. At Pwn2Own Vancouver 2022, Manfred Paul compromised the Mozilla Firefox browser using a full chain exploit that broke the mold. Although his exploit used some memory corruptions, the vulnerable code was written in a memory-safe programming language: JavaScript! In fact, both vulnerabilities used in the chain were related to one rather notorious language aspect of JavaScript – prototypes. In this blog, we will look at the first vulnerability in the chain, which was used to compromise the Mozilla Firefox renderer process. This vulnerability, known as CVE-2022-1802, is a prototype pollution vulnerability in the await implementation. You can find more information about this vulnerability on the Zero Day Initiative advisory page tracked as ZDI-22-799. Mozilla fixed this vulnerability in Firefox 100.0.2 via Mozilla Foundation Security Advisory 2022-19.

Note: this blog series is heavily reliant on the details provided by Manfred Paul at the Pwn2Own competition.

 Compromising The Renderer Process

Modern JavaScript features the module syntax, which allows developers to split code into individual files. An even newer feature is the support of asynchronous modules, or, more precisely, the feature known as top level await. In Firefox’s JavaScript engine, SpiderMonkey, large parts of this feature are implemented using built-in JavaScript code. Consider the following function from the SpiderMonkey codebase, in /js/src/builtin/Module.js:

There are three facts we must note the code shown above:

      1 -- This function runs in the same JavaScript context as the user’s code. This is true for most JavaScript-based functions in Firefox. This means that global state, including prototypes of global objects, is shared between this built-in code and untrusted website code.

      2 -- The function has a default argument of execList = []. In practice, the function is called without specifying this argument (except for the recursive call in the function itself). Therefore, a new empty array object is constructed and used for this argument. Like any other ordinary array, this array object has the unique object Array.prototype as its prototype.

      3 -- The function invokes std_Array_push on this array object. The std_Array_push function leads to a call to the Array.prototype.push JavaScript method. While the usage of std_Array_push function instead of Array.prototype.push helps prevent side effects up to a certain point, the function still can interact with the object’s prototype. (Note that in various other places within this same built-in JavaScript file /js/src/builtin/Module.js, a different function is used to assign array values: DefineDataProperty. In contrast to std_Array_push, DefineDataProperty is safe and will not interact in any way with the object’s prototype.)

The semantics of Array.prototype.push with a single argument are very roughly equivalent to the following:

Notably, the assignment is not just the definition of a data property on the object itself. Instead, it searches the object’s prototype chain for existing properties as per usual JavaScript semantics. If the imported module defines a getter/setter for property 0 on the Array prototype (Array.prototype), this assignment operation will trigger the setter function. This call technically violates the ECMAScript specification that defines GatherAsyncParentCompletions in terms of abstract lists and not actual JavaScript arrays. Crucially, this has yet another effect: it leaks the value that is assigned to our setter, so we recover the value “m” representing a module! This object is not the same as the module namespace returned by import(), but rather, it is an internal type of the JavaScript engine not meant to be accessible to untrusted script. It exposes some unsafe methods via its prototype, such as GatherAsyncParentCompletions. Calling GatherAsyncParentCompletions results in a call to the UnsafeSetReservedSlot method, which can be used to achieve memory corruption if we pass in a non-module object.

Triggering The Vulnerability

It is easy to trigger the vulnerability and obtain a Module object:

As described, we simply need to attach a setter to the 0 property of Array.prototype and wait for it to be called. Note that this snippet will only work when imported as a module from another file. The last line exists solely to mark the module as asynchronous, which is needed to trigger the bug.

Achieving Memory Corruption

To achieve memory corruption, we can now call mod.gatherAsyncParentCompletions with an object of the form {asyncParentModules:[obj]}, resulting in a call to UnsafeSetReservedSlot . This will attempt to write the value obj.pendingAsyncDependencies-1 to the internal object slot with number MODULE_OBJECT_PENDING_ASYNC_DEPENDENCIES_SLOT=20. In SpiderMonkey, objects have space for up to 16 so-called fixed slots which are for internal use only. This number is defined by the MAX_FIXED_SLOTS constant. Slots with a higher index are indexed from an array pointed to by the slots_ field. This means our write will be directed to the array pointed to by slots_. No bounds checking exists to make sure that the slots_ array is large enough to accommodate the specified index, because the UnsafeSetReservedSlot function assumes, as the name implies, that the caller will pass only suitable objects.

The general idea now is to:

       1 -- Create a new array object.

       2 -- Set some named properties of the object to force the allocation of a slots_ array for the object. Among these properties, we should create one with the name pendingAsyncDependencies.

       3 -- Write to a few numbered elements of the object to ensure the allocation of elements_ (the backing store for array elements).

By getting the alignment right, slots_[4] will then point to the capacity value of elements_, which we can then overwrite. This is not trivial. Fortunately, the heap allocator is very simple and deterministic. All of the allocations so far will take place in the nursery heap, which is a special area for small short-lived objects. Memory in that area will be allocated by a simple bump allocator. After increasing the capacity, we can write out-of-bounds of the object’s elements_ array and corrupt other nearby objects. From here, arbitrary read and write primitives are easily constructed by overwriting the data pointer of a typed array. Note that corruption in objects in the nursery heap cannot be used for very long since the objects created there will be soon moved to the tenured heap. The best way to proceed is to use corruption in the nursery heap as a first stage only, and immediately use it to produce corruption in the tenured heap. For example, this can be done by corrupting ArrayBuffer objects.

Executing Shellcode

Firefox uses W^X JIT, which means all JIT-produced executable pages are non-writable. This prevents us from overwriting executable JIT code with our shellcode. There is an already well-known method to force JIT to emit arbitrary ROP gadgets by embedding chosen floating-point constants into a JIT-compiled JavaScript function. This results in the appearance of arbitrary short byte sequences in an executable page. Manfred Paul further enhanced this technique. Now it does not even need ROP at all! Instead of using a JavaScript function, the floating-point constants are embedded into a WebAssembly method, so they are compiled into consecutive memory in order of appearance. This makes it possible to insert not just ROP gadgets, but even somewhat longer stretches of shellcode by encoding them in the floating-point constants. There are still some restrictions, though: no 8-byte block may appear twice, or the constant will only be emitted once. Also, due to ambiguity in representation, byte sequences that are equal to NaN might not be encoded correctly. Therefore, Manfred Paul opted for a minimal first-stage shellcode that offers just the following two pieces of functionality:

       1 -- The ability to read a pointer from the Windows PEB structure.

       2 -- The ability to invoke a function given the function’s address.

The attacker, from ordinary JavaScript, triggers execution of the shellcode’s first function to leak a value from the PEB. Next, the JavaScript uses this value together with the arbitrary read primitive to locate kernel32.dll and its functions in memory. Once it has located the address for VirtualProtect, it invokes the shellcode’s second function to mark the backing store of an ArrayBuffer object as executable, making it possible to run a second-stage shellcode without constraints and compromise the renderer process.

Now that we have code execution inside the renderer, it is time to prepare to attack the sandbox. This will be covered in the second blog, coming next week.

Final Notes

For a long time, developers have tried to fight memory corruption vulnerabilities by introducing various mitigations, and they have succeeded in making it more difficult for attackers to fully compromise applications. However, attackers have also come up with their own creative methods to bypass mitigations. Using a memory-safe programming language is a critical move. If the introduction of memory corruption vulnerabilities can be avoided in the first place, it would not be necessary to rely upon the strength of mitigations. This post looked at a great vulnerability demonstrating that even if you replace existing code with JavaScript, you could still be prone to memory corruption.

Stay tuned to this blog for part two of this series coming next week. Until then, you can find me on Twitter at @hosselot and follow the team on Twitter or Instagram for the latest in exploit techniques and security patches.

But You Told Me You Were Safe: Attacking the Mozilla Firefox Renderer (Part 1)

CVE-2022-26381: Gone by others! Triggering a UAF in Firefox

7 April 2022 at 15:51

Memory corruption vulnerabilities have been well known for a long time and programmers have developed various methods to prevent them. One type of memory corruption that is very hard to prevent is the use-after-free and the reason is that it has too many faces! Since it cannot be associated with any specific pattern in source code, it is not trivial to eliminate this vulnerability class. In this blog, a use-after-free vulnerability in Mozilla Firefox will be explained which has been assigned CVE-2022-26381. The Mozilla bug entry 1756793 is still closed to the public as of this writing, but the Zero Day Initiative advisory page ZDI-22-502 can provide a bit more information.

What Is a Use-After-Free Vulnerability?

A use-after-free (UAF) vulnerability happens when a pointer to a freed object is accessed. It does not make sense! Why would a programmer free an object and afterward access it again?

It happens due to the complexity of today’s software. A browser, for example, has many components and each of them may allocate different objects. They may even pass these objects to each other for processing. A component may free an object when it is done using it, while other components still have a pointer to that object. Any dereference of that pointer can lead to a use-after-free vulnerability.

Proof-of-Concept

Let’s start quickly by having a look at the minimized proof-of-concept:

When running this on the latest vulnerable release version of Mozilla Firefox, which is 97.0.1, it gives a very promising crash:

This is what the crash point looks like in IDA. It happens inside a loop:

It dereferences a value from memory and then makes an indirect call (a virtual function call) using the fetched value. Thus, this is rated as a remote code execution vulnerability. The value of the “rax” register which is used during dereferencing is particularly interesting: 0xE5E5E5E5E5E5E5E5. This is a magic value that Firefox uses to “poison” the memory of a freed object so that a dereference of a value fetched from that freed object will cause a crash, as this value is never a valid memory address. This helps greatly to catch use-after-free conditions.

To analyze a use-after-free vulnerability, it is always desired to have more information about the freed object: its type, size, where it is allocated, where it is freed, and where it is subsequently used. On Windows, this is usually done by enabling advanced debugging features using the GFlags tool to enable various global flags. Specifically, it can be used to enable pageheap and create a﷟ user-mode stack trace to capture the stack trace at the time a particular object is allocated. Unfortunately, this will not help us on Mozilla Firefox, because Firefox has its own memory management mechanism called jemalloc. The way we can get more information about the object is to run the PoC on an ASAN version of Firefox. You can see the result below:

We got lots of information. Let’s break it down a bit by checking where the object is allocated:

Let’s further check this by looking at the source code (line 1164 of /builds/worker/checkouts/gecko/layout/svg/SVGObserverUtils.cpp). You can download the source code of Firefox 97.0.1 or use the online version (note that line numbers of the online version may not match, as it gets updated constantly):

And this is how it looks in the compiled release version:

So the object size is 0x70 (112) bytes and it is used to store and track properties of frames during reflow triggered by scrolling.

Then we want to know where it is freed and reused. ASAN provides a long stack trace. A closer look gives a good hint. Let’s first check the stack trace when the object is freed:

And now the stack trace when the object is subsequently used:

We can see the “mozilla::SVGRenderingObserverSet::InvalidateAll” function in the stack trace when the crash happens and when the object free is initiated. This also matches the crash point of the release version which is inside the OnNonDOMMutationRenderingChange function (it says it is inlined in xul!mozilla::SVGRenderingObserverSet::InvalidateAll). We can now make an initial educated guess: while an object was being processed in a loop in the “mozilla::SVGRenderingObserverSet::InvalidateAll” function, a code path was reached that freed the object being processed, leading to a use-after-free vulnerability.

Now that we have all the details, we can validate this hypothesis step-by-step by running the PoC on the released version of Firefox.

First, we want to know the address of an allocated object so we can monitor it. This can easily be achieved by setting a breakpoint that prints the address of the object upon allocation:

Then, let’s see how the objects are processed in the loop we saw in IDA inside the “mozilla::SVGRenderingObserverSet::InvalidateAll” function. We will print the address of the object that is going to be processed. We also set a breakpoint on the subsequent virtual function call:

We run the PoC, and the debugger stops before calling the virtual function. As you can see, two objects are allocated and these two are going to be processed in the loop. First, one object is processed and a call to the “SVGTextPathObserver::OnRenderingChange” function is made, which eventually frees various allocated objects including the second object which is awaiting processing!

We can see this clearly in the picture below, which is taken immediately after the return from the call. As you can see, the second object has been freed (and poisoned with 0xe5) during the processing of the first object:

In the second iteration, the freed object is loaded for processing, leading to a load of the poison value and resulting in a crash:

Release Versus ASAN Behavior

When running the PoC against the release version, we got a crash during a dereference of 0xE5E5E5E5E5E5E5E5. However, in the ASAN version, it crashed when writing to memory. Why is there a difference? The reason is as follows:

In a release (non-ASAN) build, when freeing an object, its memory remains accessible (not unmapped), and thus any read and write to that memory will still succeed without triggering an immediate crash. That is why the instruction “mov byte ptr [rcx+8], 0” in the above picture executed without error. A crash is likely to occur further along, though. As in our case, if a value is fetched value from a freed object and then dereferenced, the dereference may cause a crash. This is especially true if the freed object content is overwritten by “poison” values as seen above. Note that there is a chance that there will be no crash at all, for example, if there are only reads and writes to the freed object without any dereference operations on fetched values, or if the poison value becomes overwritten with unrelated data. This means that if we fuzz a release version, there is a chance we could miss a vulnerability.

ASAN, on the other hand, monitors all read, write, and dereferences on memory and can catch such vulnerabilities as soon as possible. That is why it is recommended to use an ASAN version for fuzzing.

The Patch

Use-after-free vulnerabilities are often fixed by converting raw pointers to smart pointers or by correcting the management of the object reference count. Here, it was fixed by changing how continuations frame reflows are handled in the engine:

Final Notes

Developers have expended a great deal of effort to eliminate vulnerabilities associated with known patterns in source code, and they have mostly succeeded in decreasing their prevalence. However, there are some classes of vulnerabilities that are harder to prevent, and use-after-free is one of them. Assuring perfect management of object lifecycles in software with a million lines of code is extremely difficult. This is one of the main motivations behind languages like Rust that enforce proper object ownership and lifetime management.

You can find me on Twitter at @hosselot and follow the team for the latest in exploit techniques and security patches.

CVE-2022-26381: Gone by others! Triggering a UAF in Firefox

Exploitation of CVE-2021-21220 – From Incorrect JIT Behavior to RCE

16 December 2021 at 14:38

In this third and final blog in the series, ZDI Vulnerability Researcher Hossein Lotfi looks at the method of exploiting CVE-2021-21220 for code execution. This bug was used by Bruno Keith (@bkth_) and Niklas Baumstark (@_niklasb) of Dataflow Security (@dfsec_com) during Pwn2Own Vancouver 2021 to exploit both Chrome and Edge (Chromium) to earn $100,000 at the event. Today’s blog looks at the exploitation technique used at the contest.

You can find Part One of this series here and Part Two here.


Exploiting Incorrect Numeric Results in JIT

In the second blog in this series, we discussed how CVE-2021-21220 can be used to make the JIT generate code that produces an incorrect numeric result. We now need to explain how this can be leveraged to produce an effect that has a security impact, such as an out-of-bounds memory access.

In the past, turning an incorrect numeric result into an OOB memory access was often accomplished by abusing array bounds check elimination. This method was effective for a long time. Take a look at the following simplified sample:

The length of array arr is 4, and we are returning an element of this array. V8 will perform run-time bounds checking to make sure that the last statement does not access memory outside the bounds of the array. During optimization of such a function, V8 might remove array bounds checking if it concluded that typer_index is always zero (or, in general, if typer_index * 10 is provably always inside the bounds of the array). This saves a few more CPU cycles during execution of the optimized function. In the event that JITted code produces an erroneous numeric result, though, it may be possible fool the V8 engine into thinking typer_index must be zero, while in actuality it will be set to a different (erroneous) value. Then, when the array access is performed, it will trigger an out-of-bounds memory access.

This method was so successful that the V8 developers eventually decided to remove array-bounds-check elimination. See this blog for more information about this exploitation technique, as well as this blog for further discussion.

Since V8 mitigated the array bounds elimination exploitation technique, a new technique is necessary. At Pwn2Own, the contestants used a technique that produces out-of-bounds access via ArrayPrototypePop and ArrayPrototypeShift. I was able to trace this method back to late 2020 by searching the Chromium bug tracking system. It was mitigated a week after the Pwn2Own competition by adding a new CheckBounds node. Here I provide you with a quick analysis of this method:

When a function undergoing optimization has calls to the Array.shift method, the execution flow eventually reaches the function JSCallReducer::ReduceArrayPrototypeShift function (see src/compiler/js-call-reducer.cc). Since a call to the built-in shift JavaScript method is relatively slow, the optimizer replaces the call with a series of operations that can be performed at the assembly level. As you may know, "Array.shift" removes the first element from an array and returns that removed element. After removing that element, the JIT-produced code computes the new array length by subtracting 1 from the original array length:

After subtracting 1, the JIT-produced code stores the result as the new array length. How can this be exploited? Well, it turns out that if we can abuse a JIT vulnerability to fool the engine into thinking that the array length is zero where it is not, it blindly subtracts one from zero. The integer underflow sets the array length to -1, which allows a subsequent OOB memory access to occur (array bounds checks are unsigned). This Chromium bug entry provides more information if you are interested.

Although the two exploitation techniques described above have now both been mitigated, new methods are still coming out using JIT vulnerabilities to cause side effects and achieve out-of-bounds memory access.

From Out-of-Bounds Access to Code Execution

The method of V8 exploitation after obtaining an OOB read/write primitive is well known. Here are the steps:

1 - Trigger the vulnerability and the side effect to get a “relative” out-of-bounds memory access to corrupt the length of one or more arrays sitting next to the original array.

2 - Make addrof/fakeobj primitives. The addrof primitive leaks the address of an arbitrary JavaScript object. The fakeobj primitive performs the reverse action: it injects into the engine an arbitrary value that the engine will interpret as a pointer to a JavaScript object.

3 - Use fakeobj to forge a JavaScript array object whose data buffer field is an arbitrary attacker-specified address. The attacker can then use the forged array to read or write arbitrary memory addresses. (Compare with the OOB access of step 1 above, which only permits access to arbitrary specified offsets past the start of the original array.)

4 - Use the addrof primitive to leak the address of a wasm function. This will be where we copy our shellcode. A wasm function is a good choice because the memory it occupies is marked with RWX (Read-Write-Execute) permissions.

5 - Use the fakeobj primitive to copy shellcode to the RWX page. To make copying the shellcode easier, an ArrayBuffer that has an uncompressed backing_store pointer is often used. This overwrites the wasm function instructions with our shellcode.

6 - Execute the shellcode by calling the wasm function.

Here is how it was actually done at Pwn2Own. The exploit starts by defining some helper functions to convert between floats and integers:

It then triggers the JIT vulnerability:

After triggering the vulnerability, the value of the “bad” variable is huge, and thus it goes into a series of Math.max calls to achieve a smaller value (1). This confused value is then used to create an array, and a shift on this array is used to produce an array having length -1. This allows the exploit to access memory at arbitrary offsets past the end of the array.

Setting up the wasm RWX memory is the next step:

Note that the contents of the wasm function is not important, as its instructions will be replaced with shellcode.

Next, the exploit allocates 3 arrays:

• A PACKED_DOUBLE_ELEMENTS array (after_dbl)
• This is followed in memory by a PACKED_ELEMENTS array (after_obj)
• This is followed in memory by another PACKED_DOUBLE_ELEMENTS (after_dbl2)

Using the out-of-bounds access via the array with length -1, it then increases the length of the after_dbl and after_obj:

After the lengths have been altered, some of the data of after_dbl overlaps with some of the data of after_obj. Similarly, some of the data of after_obj overlaps with some of the data of after_dbl2. This will allow the exploit to perform type confusions.

Now the exploit is all ready to create the addrof and fakeobj primitives, which is done as follows:

• The addrof primitive: To leak the address of an object, it first assigns it into index 0x2f of the after_obj array. As mentioned above, after_obj now partially overlaps with after_dbl2. The exploit then read the pointer from after_dbl. It is returned as a double, allowing the exploit to learn the numeric value of the object’s address.

• The fakeobj primitive: To inject an arbitrary pointer value, the exploit assigns it into after_dbl. In a way similar to the operation of addrof explained above, the data can then be read as a different type by reading it from a different (overlapping) array, in this case after_obj. By fetching it from after_obj, the exploit obtains a reference to a “fake” JavaScript object at the specified address.

From here, all that remains is to copy the shellcode to the leaked address of the wasm function and execute it.

After the shellcode is run, the page is idle and will be subject to garbage collection. This may cause a crash of the renderer process. To handle this, the exploit developers tried to smooth over corruptions as much as possible to prevent a crash:

Here is a demo video:

Conclusion

JIT vulnerabilities tend to be powerful, providing strong primitives and reliable exploitation methods. The inherent complexity of JIT compilation makes it very challenging for engine developers to correctly handle all corner cases, despite their impressive efforts. However, incorrect JIT behavior can impact security only if a technique is available to achieve an effect such as out-of-bounds memory access. This is one area where engine developers can focus by introducing additional hardening.

You can find me on Twitter at @hosselot and follow the team for the latest in exploit techniques and security patches.

Exploitation of CVE-2021-21220 – From Incorrect JIT Behavior to RCE

Understanding the Root Cause of CVE-2021-21220 – A Chrome Bug from Pwn2Own 2021

9 December 2021 at 16:59

In this second blog in the series, ZDI Vulnerability Researcher Hossein Lotfi looks at the root cause of CVE-2021-21220. This bug was used by Bruno Keith (@bkth_) and Niklas Baumstark (@_niklasb) of Dataflow Security (@dfsec_com) during Pwn2Own Vancouver 2021 to exploit both Chrome and Edge (Chromium) to earn $100,000 at the event. Today’s blog starts with a look at how to trigger the vulnerability and goes on to describe why the bug occurs.


I begin Part 2 of this blog series with a discussion of how to trigger the vulnerability. For clarity, I modified the PoC slightly and came up with the following:

I covered lines 3 through 5 in our first blog. Lines 4 and 6 simply use “console.log” to print data. Let’s see what happens in the first and second line:

Line 1: Constructs a Uint32Array (a typed array that can hold 32-bit unsigned integers). The array contains just one element, having the value 231 (2,147,483,648 in decimal or 0x80000000 in hex). The array is assigned to variable arr.

Line 2: A function called “foo” will take the first element of arr (which is 231), XOR it with a constant integer 0, add a constant integer 1, and return the result.

There are some interesting points in these two lines:

        1 - 0x80000000 has its most significant bit set. This is known as the sign bit when handling signed integers.
        2 - XORing any value with zero will return the original value unchanged. If this XOR does not have any effect, then why was it necessary to include it? We will answer this soon.

Save this PoC as “poc.js” and run it with the following command:

$ ./d8 --allow_natives_syntax '/home/lab/Desktop/poc.js'

It should print the following output:

Interesting! Results of the interpreted and JITted versions are different, which should not happen. JIT supposed to speed up the function but should never change the results.

 Now that we are here, let’s have a look at the patch as it may give us some hints as to why this is happening:

The only change is inside the function InstructionSelector::VisitChangeInt32ToInt64, found within the file src/compiler/backend/x64/instruction-selector-x64.cc. There is also a nice comment, which can provide us an educated guess. As mentioned in the first blog, a JITted function will be compiled to assembly to achieve maximum speed. Before the patch, on the x64 platform, if there was a load of a signed int32 into a 64-bit register, the kX64Movsxlq opcode would be selected. Conversely, when an unsigned int32 was loaded into a 64-bit register, the kX64Movl opcode would be used. This choice between two opcodes is intended to ensure that the upper 32 bits of the destination register are set properly by the load: When loading an unsigned 32-bit value, the upper 32 bits in the destination should be set to all zeros, whereas when loading a signed 32-bit value, the upper 32 bits in the destination should all be set to match the sign bit of the source value. After the patch, the kX64Movsxlq opcode is used in all cases. As the function name denotes, it expects a signed int32 input, so the kX64Movsxlq opcode is always the correct choice.

Apparently, though, the PoC somehow managed to provide an unsigned input to this function! How is this possible? This is what we must investigate next.

Deep Blue Sea of Nodes

To find the root cause of this vulnerability, we can pass the “--trace-turbo-graph” argument to d8 to see generated turbofan graphs:

./d8 --allow_natives_syntax --trace_turbo_graph '/home/lab/Desktop/poc.js'

As this vulnerability has something to do with the type of input, it seems like a good idea to first check how the typer assigned types the nodes. For this purpose, we need to find “Graph after V8.TFTyper” in the graph and check its data:

This is what we see:

LoadTypedElement: This shows loading the element from our typed array. The type is Unsigned32.
SpeculativeNumberBitwiseXor: For the XOR operation. The type is Signed32.
NumberConstant[1]: For the constant number 1.
SpeculativeNumberAdd: For adding 1 to the result of the XOR.

All types make sense. Let’s move on to a later phase called “simplified lowering”:

After the simplified lowering phase this becomes:

LoadTypedElement: Type is still Unsigned32.
Word32Xor: Type is still Signed32.
ChangeInt32ToInt64 (#31:Word32Xor): This node is new. It takes the result of the XOR and converts it to Int64. Remember that the patch fixed this vulnerability by changing the InstructionSelector::VisitChangeInt32ToInt64 function. That means this node will be important in our analysis. For now, it seems OK as this node takes a Word32Xor node that is signed.
Int64Constant[1]: For the constant number 1.
Int64Add: For adding 1 to the result of the XOR.

The “--trace-turbo-graph” output shows how the engine optimizes the graph by performing numerous transformations. During the early optimization phase, the execution flow reaches a function called MachineOperatorReducer::ReduceWordNXor within v8/src/compiler/machine-operator-reducer.cc to deal with the XOR operation in our PoC:

Let’s have a quick look at the XOR in our PoC again. We XOR arr[0] by 0, and we know that XOR by 0 has no effect and returns arr[0]. Now check the highlighted section in the picture above. Here the engine checks if the right operand is provably equal to 0 and, if so, it replaces the XOR operation with the left node (arr[0]). In this way, the engine removes the no-op XOR to achieve better speed. How cool! Unfortunately, there is a small problem: the replaced XOR operation had an output type of Signed32, but arr[0] has a types of Unsigned32. The EarlyOptimization phase output shows this clearly:

The nodes now are:

When you compare this output with output of simplified lowering phase, we can see 2 major changes:

         1 - The Word32Xor node is not available anymore. It has been replaced.
         2 - The ChangeInt32ToInt64 (#31:Word32Xor) node has been changed to ChangeInt32ToInt64 (#45:LoadTypedElement). This is where the vulnerability occurs. ChangeInt32ToInt64 needs a Signed32 node. This was ok before, because Word32Xor was signed, but now it gets a LoadTypedElement node, which is unsigned.

As a side note: Now that we know the root cause of this vulnerability, we can develop some variants. For example, we can replace the XOR with a SAR using the “>>” operand (check the “MachineOperatorReducer::ReduceWord64Sar” function) or a SHL using the “<<” data-preserve-html-node="true" data-preserve-html-node="true" operand (check the “MachineOperatorReducer::ReduceWord64Shl” function).

Later, execution reaches the vulnerable function InstructionSelector::VisitChangeInt32ToInt64:

It checks if it is a signed load, but we changed the type to unsigned, and thus kX64Movl is chosen.

How can this cause a problem? The kX64Movsxlq opcode translates to an Intel movsxd instruction, while the kX64Movl opcode translates to an intel mov instruction. For a 32-bit source value with the most significant bit not set, there are no differences between these two. However, if the source has a 1 as the most significant bit, these deliver two very different results. Recall that the value stored in the array is 0x80000000, which has the most significant bit set. Let’s illustrate the difference between movsxd and mov by doing a small experiment in x64dbg. We will perform a ‘movsxd’ of a 32-bit value 0x80000000 to ‘rbx” and ‘mov’ of the same 32-bit value 0x80000000 to rcx. Here are the registers before the move instructions:

And here are the results after the moves:

As you can see, the value of rbx is very different than rcx. As opposed to the mov instruction, the movsxd instruction sign-extended the value. Now if the engine chooses the wrong instruction, it may load incorrect value into registers causing various problems.

Before finishing this blog, I would like to clarify one more point. Why is it needed to have an “add 1”? In fact, if you remove it, this vulnerability is not triggered anymore, and the PoC does not reach to the vulnerable function! Why is that?

To answer this question, we can remove the “add 1” from the PoC and examine the effect on the graph.

First, the graph if the “add 1” is removed:

When the “add 1” is removed, there is no need for a “ChangeInt32ToInt64” node in the graph anymore. Instead, a “ChangeInt32ToTagged” node is used to directly convert the result of the XOR to a tagged value and return.

Compare with the graph of the PoC including the “add 1”:

By including an “add 1” operation, the result of XOR (which is Signed32) needs to be first converted to int64 using a ChangeInt32ToInt64 node in preparation for the addition. Note that 1 is an Int64Constant. After the add, the result is changed to a tagged value and returned.

Therefore, we conclude that the “add 1” is needed to trigger insertion of a “ChangeInt32ToInt64” node.

Conclusion

In this blog post we identified the root cause of the vulnerability used at Pwn2Own and saw how the contestants chained a series of clever values and operations to trigger an incorrect behavior in the JIT engine. In the final blog in this series, we will explore how this issue was exploited. That blog will be published one week from today.

Until then, you can find me on Twitter at @hosselot and follow the team for the latest in exploit techniques and security patches.

Understanding the Root Cause of CVE-2021-21220 – A Chrome Bug from Pwn2Own 2021

Two Birds with One Stone: An Introduction to V8 and JIT Exploitation

7 December 2021 at 17:30

In this special blog series, ZDI Vulnerability Researcher Hossein Lotfi looks at the exploitation of V8 – Google’s open-source high-performance JavaScript and WebAssembly engine – through the lens of a bug used during Pwn2Own Vancouver 2021. The contest submission from Bruno Keith (@bkth_) and Niklas Baumstark (@_niklasb) of Dataflow Security (@dfsec_com) exploited both Google Chrome and Microsoft Edge (Chromium) with the same bug, which earned them $100,000 during the event. This bug was subsequently found in the wild prior to being patched by Google. This blog series provides an introduction to V8, a look at the root cause of the bug, and details on exploitation during the contest and beyond. @bkth_ @_niklasb @dfsec_com


At our Pwn2Own Vancouver contest this year, the web browser category included the Google Chrome and Microsoft Edge (Chromium) browsers as targets. For this year’s event, a successful demonstration no longer required a sandbox escape. There was also a special bonus for exploits that worked against both Chrome and Edge. On Day Two of the event, Bruno Keith and Niklas Baumstark successfully demonstrated their V8 JIT vulnerability on both the Chrome and Microsoft Edge renderers with a single exploit. This earned them $100,000 USD and 10 Master of Pwn points.

In this blog series, we’ll be covering this exploit in three separate entries:

1 - Two Birds with One Stone: An Introduction to V8 and JIT Exploitation

2 - Understanding the Root Cause of CVE-2021-21220 – A Chrome Bug from Pwn2Own 2021

3 - Exploitation of CVE-2021-21220 – From Pwn2Own to Active Exploit

We’ll begin with the basics of V8 and JIT exploitation.

Gathering Information

This vulnerability has been addressed by Google. More information about the bug can be found on the ZDI advisory page as ZDI-21-411, where there is a link to the Google fix:

This provides us with the Chromium bug entry amongst other details. There are some details provided by the researchers and the actual exploit tested on Chrome 89.0.4389.114 and Edge version 89.0.774.63 (which we will cover in-depth in the final blog in this series). You can see the developers fixed this issue in a commit by making changes in just one file. There is also a proof of concept (PoC) for us to review. Great! Now that we have a PoC, we can have a deeper look at the vulnerability, but we need to set up our analysis environment first.

Setting Up the Environment

It was possible to exploit both the Google Chrome and Microsoft Edge (Chromium) renderer processes with one exploit since both are using V8 as the JavaScript and WebAssembly engine. V8 is developed by Google in C++ and runs on Windows 7 or later, macOS 10.12 and newer, and Linux systems that use x64, IA-32, ARM, or MIPS processors.

V8 is an open-source project. This means you can compile it from the source code. Usually, it is easier to compile such projects on Linux. Thus, I am going to use Ubuntu 18.04.5 to compile V8 (see below):

You can use any other supported operating system you want. The official build document is pretty good and provides abundant detail.

To begin, we need to install a package of scripts called depot_tools to manage checkouts and other tasks:

We then add “depot_tools” path to the list of available paths:

It is now time to download the V8 source code, which may take a while based on your internet speed. After the download is complete, there will be a new folder called v8. You will need to navigate to this directory to make it the working directory:

This gives us the latest version of V8. However, for this blog series, we need the vulnerable version of V8. We need to first find the affected version of Google Chrome which was available in the Chromium bug entry: 89.0.4389.114.

Cool. Now that we have an affected version of Google Chrome, we can look up information about that version in a service called omahaproxy. Just enter 89.0.4389.114 in the lookup field and press enter:

It gives us some information, including the affected V8 commit:

Now that we have the affected V8 commit, we can checkout that version. You may want to take a snapshot of the latest version of V8 first:

Now it is time to build V8. You can have a release or a debug build. A release build will give you a clean, optimized build that is faster but provides fewer details when running commands. A debug build is an unoptimized, slower build. However, it provides a lot of debug information that can help us to understand this vulnerability. Thus, we are going to choose the debug build:

If all went well, there will be an executable called “d8” in the “out/x64.debug” directory:

You should see this:

V8 is an astonishing piece of engineering that has tons of documentation and details. We can’t go too much into all these details of course, but some concepts need to be covered as they are relevant to this blog series.

Like many other Linux executables, you can pass “--help” to the compiled “d8” to provide you with a long list of all supported options. For this blog post, we are interested in just two of them:

        1 - allow_natives_syntax: By adding this as an argument when running d8, you can access special runtime functions that can be called from JavaScript using the % prefix. To find all supported runtime functions, just go to the “src/runtime” directory and grep for the string “RUNTIME_FUNCTION”. We are just interested in two of them, both of which are available in the “src/runtime/runtime-test.cc” file:

        PrepareFunctionForOptimization: Prepares a specified JavaScript function for JIT optimization. As we will explain below, JIT optimization has certain prerequisites: the function being optimized must first have been translated to bytecodes, and the engine must have collected data regarding runtime type informtion..

        OptimizeFunctionOnNextCall: This function marks the target function so that the JIT engine will compile the function into an optimized form immediately before the next execution of the target function.

We will detail how these two are used in our next blog. If you do not want to use these two runtime functions, it is usually enough to call the target function many times in a loop.

        2 - trace-turbo-graph: This argument can be used to trace the generated graph (see below) when it goes through various optimizations. We will see this in action in the second blog.

When the V8 engine loads a JavaScript file, it parses the input and builds an Abstract Syntax Tree (AST). The V8 engine interpreter called “Ignition” generates bytecode from this syntax tree. Check the header file “bytecodes.h” (located inside the “src/interpreter” directory) for a complete list of V8 bytecodes. These bytecodes are then executed (interpreted) by Ignition handlers (check src/interpreter/interpreter-generator.cc). The interpreter has little to do with our vulnerability and thus we do not discuss it any further. There are lots of resources available if you want to study this topic more.

If a function is called many times, or optimization is explicitly requested using runtime functions as described above, the V8 engine will optimize (compile) that function. Optimization is heavily dependent upon information that the engine has previously collected during interpreted executions of the function, especially concerning the data types found in variables. Note that variables in JavaScript are not strongly typed, and to achieve meaningful optimizations, the engine needs to speculate that the types that were encountered in variables during interpreted execution will usually be the same as the types encountered in the future.

The optimizing compiler’s first step is to convert the bytecode into an intermediate representation, which has the form of a graph. This step is performed in PipelineImpl::CreateGraph, found within src/compiler/pipeline.cc:

As you can see, the graph creation has 3 main phases:

         1 - GraphBuilderPhase: A graph is generated by visiting bytecodes previously generated.

         2 - InliningPhase: An initial attempt is made to optimize the generated graph by eliminating dead code, reducing calls, inlining, etc.

         3 - EarlyGraphTrimmingPhase: This phase removes dead->live edges from the graph.

More sophisticated optimizations are performed by PipelineImpl::OptimizeGraph, found in src/compiler/pipeline.cc:

Discussing all the optimizations implemented by V8 is out of scope for this blog series. Instead, we’ll just cover some of the ones we will see in the second blog in this series:

1 -   Typer: The nodes in the graph will get a type which covers possible values of that node. For example, a variable that has values like false or true is typed as a Boolean. As another example, a numeric value that is known to always equal 1 will have a type of range(1, 1).

2 -   Simplified lowering: Some operations are lowered (reduced) to a simplified series of nodes. The example below shows how the Math.abs operation is lowered:

3 -  Early Optimization:  Various optimizations are done in this stage, which is clear when looking at the EarlyOptimizationPhase struct:

As you can see, further optimizations are done in this phase including dead code elimination, redundancy elimination, and something called the MachineOperatorReducer. In the next blog, we will detail how the MachineOperatorReducer plays a major role in this vulnerability.

After all optimizations are completed on the graph, the compiler translates the graph to assembler. All future calls to the optimized function will invoke the assembly version and not the interpreted (bytecode) version. As explained above, though, optimization is performed using speculated assumptions. As a result, the assembly version of the function must contain guards to detect all possible situations where an assumption has been violated. In that circumstance, the assembly version falls back to the interpreter again. This is known as a “bailout”.

This way the V8 engine can run any (optimized) function much faster. Please note this blog is a simplification of the process, and the whole procedure is much more complex. The V8 turbofan documentation is a good starting point if you want to explore it any further.

Conclusion of Part One

In this blog, we set up the V8 environment and played a bit with some of its features. In the next blog, we will analyze the vulnerability used at Pwn2Own. Expect to see that blog in just two days from now.

Until then, you can find me on Twitter at @hosselot and follow the team for the latest in exploit techniques and security patches.

Two Birds with One Stone: An Introduction to V8 and JIT Exploitation

❌
❌