Normal view

There are new articles available, click to refresh the page.
Before yesterdayTrail of Bits Blog

5 reasons to strive for better disclosure processes

15 April 2024 at 13:00

By Max Ammann

This blog showcases five examples of real-world vulnerabilities that we’ve disclosed in the past year (but have not publicly disclosed before). We also share the frustrations we faced in disclosing them to illustrate the need for effective disclosure processes.

Here are the five bugs:

Discovering a vulnerability in an open-source project necessitates a careful approach, as publicly reporting it (also known as full disclosure) can alert attackers before a fix is ready. Coordinated vulnerability disclosure (CVD) uses a safer, structured reporting framework to minimize risks. Our five example cases demonstrate how the lack of a CVD process unnecessarily complicated reporting these bugs and ensuring their remediation in a timely manner.

In the Takeaways section, we show you how to set up your project for success by providing a basic security policy you can use and walking you through a streamlined disclosure process called GitHub private reporting. GitHub’s feature has several benefits:

  • Discreet and secure alerts to developers: no need for PGP-encrypted emails
  • Streamlined process: no playing hide-and-seek with company email addresses
  • Simple CVE issuance: no need to file a CVE form at MITRE

Time for action: If you own well-known projects on GitHub, use private reporting today! Read more on Configuring private vulnerability reporting for a repository, or skip to the Takeaways section of this post.

Case 1: Undefined behavior in borsh-rs Rust library

The first case, and reason for implementing a thorough security policy, concerned a bug in a cryptographic serialization library called borsh-rs that was not fixed for two years.

During an audit, I discovered unsafe Rust code that could cause undefined behavior if used with zero-sized types that don’t implement the Copy trait. Even though somebody else reported this bug previously, it was left unfixed because it was unclear to the developers how to avoid the undefined behavior in the code and keep the same properties (e.g., resistance against a DoS attack). During that time, the library’s users were not informed about the bug.

The whole process could have been streamlined using GitHub’s private reporting feature. If project developers cannot address a vulnerability when it is reported privately, they can still notify Dependabot users about it with a single click. Releasing an actual fix is optional when reporting vulnerabilities privately on GitHub.

I reached out to the borsh-rs developers about notifying users while there was no fix available. The developers decided that it was best to notify users because only certain uses of the library caused undefined behavior. We filed the notification RUSTSEC-2023-0033, which created a GitHub advisory. A few months later, the developers fixed the bug, and the major release 1.0.0 was published. I then updated the RustSec advisory to reflect that it was fixed.

The following code contained the bug that caused undefined behavior:

impl<T> BorshDeserialize for Vec<T>
where
    T: BorshDeserialize,
{
    #[inline]
    fn deserialize<R: Read>(reader: &mut R) -> Result<Self, Error> {
        let len = u32::deserialize(reader)?;
        if size_of::<T>() == 0 {
            let mut result = Vec::new();
            result.push(T::deserialize(reader)?);

            let p = result.as_mut_ptr();
            unsafe {
                forget(result);
                let len = len as usize;
                let result = Vec::from_raw_parts(p, len, len);
                Ok(result)
            }
        } else {
            // TODO(16): return capacity allocation when we can safely do that.
            let mut result = Vec::with_capacity(hint::cautious::<T>(len));
            for _ in 0..len {
                result.push(T::deserialize(reader)?);
            }
            Ok(result)
        }
    }
}

Figure 1: Use of unsafe Rust (borsh-rs/borsh-rs/borsh/src/de/mod.rs#123–150)

The code in figure 1 deserializes bytes to a vector of some generic data type T. If the type T is a zero-sized type, then unsafe Rust code is executed. The code first reads the requested length for the vector as u32. After that, the code allocates an empty Vec type. Then it pushes a single instance of T into it. Later, it temporarily leaks the memory of the just-allocated Vec by calling the forget function and reconstructs it by setting the length and capacity of Vec to the requested length. As a result, the unsafe Rust code assumes that T is copyable.

The unsafe Rust code protects against a DoS attack where the deserialized in-memory representation is significantly larger than the serialized on-disk representation. The attack works by setting the vector length to a large number and using zero-sized types. An instance of this bug is described in our blog post Billion times emptiness.

Case 2: DoS vector in Rust libraries for parsing the Ethereum ABI

In July, I disclosed multiple DoS vulnerabilities in four Ethereum API–parsing libraries, which were difficult to report because I had to reach out to multiple parties.

The bug affected four GitHub-hosted projects. Only the Python project eth_abi had GitHub private reporting enabled. For the other three projects (ethabi, alloy-rs, and ethereumjs-abi), I had to research who was maintaining them, which can be error-prone. For instance, I had to resort to the trick of getting email addresses from maintainers by appending the suffix .patch to GitHub commit URLs. The following link shows the non-work email address I used for committing:

https://github.com/trailofbits/publications/commit/a2ab5a1cab59b52c4fa
71b40dae1f597bc063bdf.patch

In summary, as the group of affected vendors grows, the burden on the reporter grows as well. Because you typically need to synchronize between vendors, the effort does not grow linearly but exponentially. Having more projects use the GitHub private reporting feature, a security policy with contact information, or simply an email in the README file would streamline communication and reduce effort.

Read more about the technical details of this bug in the blog post Billion times emptiness.

Case 3: Missing limit on authentication tag length in Expo

In late 2022, Joop van de Pol, a security engineer at Trail of Bits, discovered a cryptographic vulnerability in expo-secure-store. In this case, the vendor, Expo, failed to follow up with us about whether they acknowledged or had fixed the bug, which left us in the dark. Even worse, trying to follow up with the vendor consumed a lot of time that could have been spent finding more bugs in open-source software.

When we initially emailed Expo about the vulnerability through the email address listed on its GitHub, [email protected], an Expo employee responded within one day and confirmed that they would forward the report to their technical team. However, after that response, we never heard back from Expo despite two gentle reminders over the course of a year.

Unfortunately, Expo did not allow private reporting through GitHub, so the email was the only contact address we had.

Now to the specifics of the bug: on Android above API level 23, SecureStore uses AES-GCM keys from the KeyStore to encrypt stored values. During encryption, the tag length and initialization vector (IV) are generated by the underlying Java crypto library as part of the Cipher class and are stored with the ciphertext:

/* package */ JSONObject createEncryptedItem(Promise promise, String plaintextValue, Cipher cipher, GCMParameterSpec gcmSpec, PostEncryptionCallback postEncryptionCallback) throws GeneralSecurityException, JSONException {

  byte[] plaintextBytes = plaintextValue.getBytes(StandardCharsets.UTF_8);
  byte[] ciphertextBytes = cipher.doFinal(plaintextBytes);
  String ciphertext = Base64.encodeToString(ciphertextBytes, Base64.NO_WRAP);

  String ivString = Base64.encodeToString(gcmSpec.getIV(), Base64.NO_WRAP);
  int authenticationTagLength = gcmSpec.getTLen();

  JSONObject result = new JSONObject()
    .put(CIPHERTEXT_PROPERTY, ciphertext)
    .put(IV_PROPERTY, ivString)
    .put(GCM_AUTHENTICATION_TAG_LENGTH_PROPERTY, authenticationTagLength);

  postEncryptionCallback.run(promise, result);

  return result;
}

Figure 2: Code for encrypting an item in the store, where the tag length is stored next to the cipher text (SecureStoreModule.java)

For decryption, the ciphertext, tag length, and IV are read and then decrypted using the AES-GCM key from the KeyStore.

An attacker with access to the storage can change an existing AES-GCM ciphertext to have a shorter authentication tag. Depending on the underlying Java cryptographic service provider implementation, the minimum tag length is 32 bits in the best case (this is the minimum allowed by the NIST specification), but it could be even lower (e.g., 8 bits or even 1 bit) in the worst case. So in the best case, the attacker has a small but non-negligible probability that the same tag will be accepted for a modified ciphertext, but in the worst case, this probability can be substantial. In either case, the success probability grows depending on the number of ciphertext blocks. Also, both repeated decryption failures and successes will eventually disclose the authentication key. For details on how this attack may be performed, see Authentication weaknesses in GCM from NIST.

From a cryptographic point of view, this is an issue. However, due to the required storage access, it may be difficult to exploit this issue in practice. Based on our findings, we recommended fixing the tag length to 128 bits instead of writing it to storage and reading it from there.

The story would have ended here since we didn’t receive any responses from Expo after the initial exchange. But in our second email reminder, we mentioned that we were going to publicly disclose this issue. One week later, the bug was silently fixed by limiting the minimum tag length to 96 bits. Practically, 96 bits offers sufficient security. However, there is also no reason not to go with the higher 128 bits.

The fix was created exactly one week after our last reminder. We suspect that our previous email reminder led to the fix, but we don’t know for sure. Unfortunately, we were never credited appropriately.

Case 4: DoS vector in the num-bigint Rust library

In July 2023, Sam Moelius, a security engineer at Trail of Bits, encountered a DoS vector in the well-known num-bigint Rust library. Even though the disclosure through email worked very well, users were never informed about this bug through, for example, a GitHub advisory or CVE.

The num-bigint project is hosted on GitHub, but GitHub private reporting is not set up, so there was no quick way for the library author or us to create an advisory. Sam reported this bug to the developer of num-bigint by sending an email. But finding the developer’s email is error-prone and takes time. Instead of sending the bug report directly, you must first confirm that you’ve reached the correct person via email and only then send out the bug details. With GitHub private reporting or a security policy in the repository, the channel to send vulnerabilities through would be clear.

But now let’s discuss the vulnerability itself. The library implements very large integers that no longer fit into primitive data types like i128. On top of that, the library can also serialize and deserialize those data types. The vulnerability Sam discovered was hidden in that serialization feature. Specifically, the library can crash due to large memory consumption or if the requested memory allocation is too large and fails.

The num-bigint types implement traits from Serde. This means that any type in the crate can be serialized and deserialized using an arbitrary file format like JSON or the binary format used by the bincode crate. The following example program shows how to use this deserialization feature:

use num_bigint::BigUint;
use std::io::Read;

fn main() -> std::io::Result<()> {
    let mut buf = Vec::new();
    let _ = std::io::stdin().read_to_end(&mut buf)?;
    let _: BigUint = bincode::deserialize(&buf).unwrap_or_default();
    Ok(())
}

Figure 3: Example deserialization format

It turns out that certain inputs cause the above program to crash. This is because implementing the Visitor trait uses untrusted user input to allocate a specific vector capacity. The following figure shows the lines that can cause the program to crash with the message memory allocation of 2893606913523067072 bytes failed.

impl<'de> Visitor<'de> for U32Visitor {
    type Value = BigUint;

    {...omitted for brevity...}

    #[cfg(not(u64_digit))]
    fn visit_seq<S>(self, mut seq: S) -> Result<Self::Value, S::Error>
    where
        S: SeqAccess<'de>,
    {
        let len = seq.size_hint().unwrap_or(0);
        let mut data = Vec::with_capacity(len);

        {...omitted for brevity...}
    }

    #[cfg(u64_digit)]
    fn visit_seq<S>(self, mut seq: S) -> Result<Self::Value, S::Error>
    where
        S: SeqAccess<'de>,
    {
        use crate::big_digit::BigDigit;
        use num_integer::Integer;

        let u32_len = seq.size_hint().unwrap_or(0);
        let len = Integer::div_ceil(&u32_len, &2);
        let mut data = Vec::with_capacity(len);

        {...omitted for brevity...}
    }
}

Figure 4: Code that allocates memory based on user input (num-bigint/src/biguint/serde.rs#61–108)

We initially contacted the author on July 20, 2023, and the bug was fixed in commit 44c87c1 on August 22, 2023. The fixed version was released the next day as 0.4.4.

Case 5: Insertion of MMKV database encryption key into Android system log with react-native-mmkv

The last case concerns the disclosure of a plaintext encryption key in the react-native-mmkv library, which was fixed in September 2023. During a secure code review for a client, I discovered a commit that fixed an untracked vulnerability in a critical dependency. Because there was no security advisory or CVE ID, neither I nor the client were informed about the vulnerability. The lack of vulnerability management caused a situation where attackers knew about a vulnerability, but users were left in the dark.

During the client engagement, I wanted to validate how the encryption key was used and handled. The commit fix: Don’t leak encryption key in logs in the react-native-mmkv library caught my attention. The following code shows the problematic log statement:

MmkvHostObject::MmkvHostObject(const std::string& instanceId, std::string path,
                               std::string cryptKey) {
  __android_log_print(ANDROID_LOG_INFO, "RNMMKV",
                      "Creating MMKV instance \"%s\"... (Path: %s, Encryption-Key: %s)",
                      instanceId.c_str(), path.c_str(), cryptKey.c_str());
  std::string* pathPtr = path.size() > 0 ? &path : nullptr;
  {...omitted for brevity...}

Figure 5: Code that initializes MMKV and also logs the encryption key

Before that fix, the encryption key I was investigating was printed in plaintext to the Android system log. This breaks the threat model because this encryption key should not be extractable from the device, even with Android debugging features enabled.

With the client’s agreement, I notified the author of react-native-mmkv, and the author and I concluded that the library users should be informed about the vulnerability. So the author enabled private reporting and together we published a GitHub advisory. The ID CVE-2024-21668 was assigned to the bug. The advisory now alerts developers if they use a vulnerable version of react-native-mmkv when running npm audit or npm install.

This case highlights that there is basically no way around GitHub advisories when it comes to npm packages. The only way to feed the output of the npm audit command is to create a GitHub advisory. Using private reporting streamlines that process.

Takeaways

GitHub’s private reporting feature contributes to securing the software ecosystem. If used correctly, the feature saves time for vulnerability reporters and software maintainers. The biggest impact of private reporting is that it is linked to the GitHub advisory database—a link that is missing, for example, when using confidential issues in GitLab. With GitHub’s private reporting feature, there is now a process for security researchers to publish to that database (with the approval of the repository maintainers).

The disclosure process also becomes clearer with a private report on GitHub. When using email, it is unclear whether you should encrypt the email and who you should send it to. If you’ve ever encrypted an email, you know that there are endless pitfalls.

However, you may still want to send an email notification to developers or a security contact, as maintainers might miss GitHub notifications. A basic email with a link to the created advisory is usually enough to raise awareness.

Step 1: Add a security policy

Publishing a security policy is the first step towards owning a vulnerability reporting process. To avoid confusion, a good policy clearly defines what to do if you find a vulnerability.

GitHub has two ways to publish a security policy. Either you can create a SECURITY.md file in the repository root, or you can create a user- or organization-wide policy by creating a .github repository and putting a SECURITY.md file in its root.

We recommend starting with a policy generated using the Policymaker by disclose.io (see this example), but replace the Official Channels section with the following:

We have multiple channels for receiving reports:

* If you discover any security-related issues with a specific GitHub project, click the *Report a vulnerability* button on the *Security* tab in the relevant GitHub project: https://github.com/%5BYOUR_ORG%5D/%5BYOUR_PROJECT%5D.
* Send an email to [email protected]

Always make sure to include at least two points of contact. If one fails, the reporter still has another option before falling back to messaging developers directly.

Step 2: Enable private reporting

Now that the security policy is set up, check out the referenced GitHub private reporting feature, a tool that allows discreet communication of vulnerabilities to maintainers so they can fix the issue before it’s publicly disclosed. It also notifies the broader community, such as npm, Crates.io, or Go users, about potential security issues in their dependencies.

Enabling and using the feature is easy and requires almost no maintenance. The only key is to make sure that you set up GitHub notifications correctly. Reports get sent via email only if you configure email notifications. The reason it’s not enabled by default is that this feature requires active monitoring of your GitHub notifications, or else reports may not get the attention they require.

After configuring the notifications, go to the “Security” tab of your repository and click “Enable vulnerability reporting”:

Emails about reported vulnerabilities have the subject line “(org/repo) Summary (GHSA-0000-0000-0000).” If you use the website notifications, you will get one like this:

If you want to enable private reporting for your whole organization, then check out this documentation.

A benefit of using private reporting is that vulnerabilities are published in the GitHub advisory database (see the GitHub documentation for more information). If dependent repositories have Dependabot enabled, then dependencies to your project are updated automatically.

On top of that, GitHub can also automatically issue a CVE ID that can be used to reference the bug outside of GitHub.

This private reporting feature is still officially in beta on GitHub. We encountered minor issues like the lack of message templates and the inability of reporters to add collaborators. We reported the latter as a bug to GitHub, but they claimed that this was by design.

Step 3: Get notifications via webhooks

If you want notifications in a messaging platform of your choice, such as Slack, you can create a repository- or organization-wide webhook on GitHub. Just enable the following event type:

After creating the webhook, repository_advisory events will be sent to the set webhook URL. The event includes the summary and description of the reported vulnerability.

How to make security researchers happy

If you want to increase your chances of getting high-quality vulnerability reports from security researchers and are already using GitHub, then set up a security policy and enable private reporting. Simplifying the process of reporting security bugs is important for the security of your software. It also helps avoid researchers becoming annoyed and deciding not to report a bug or, even worse, deciding to turn the vulnerability into an exploit or release it as a 0-day.

If you use GitHub, this is your call to action to prioritize security, protect the public software ecosystem’s security, and foster a safer development environment for everyone by setting up a basic security policy and enabling private reporting.

If you’re not a GitHub user, similar features also exist on other issue-tracking systems, such as confidential issues in GitLab. However, not all systems have this option; for instance, Gitea is missing such a feature. The reason we focused on GitHub in this post is because the platform is in a unique position due to its advisory database, which feeds into, for example, the npm package repository. But regardless of which platform you use, make sure that you have a visible security policy and reliable channels set up.

Introducing Ruzzy, a coverage-guided Ruby fuzzer

29 March 2024 at 13:30

By Matt Schwager

Trail of Bits is excited to introduce Ruzzy, a coverage-guided fuzzer for pure Ruby code and Ruby C extensions. Fuzzing helps find bugs in software that processes untrusted input. In pure Ruby, these bugs may result in unexpected exceptions that could lead to denial of service, and in Ruby C extensions, they may result in memory corruption. Notably, the Ruby community has been missing a tool it can use to fuzz code for such bugs. We decided to fill that gap by building Ruzzy.

Ruzzy is heavily inspired by Google’s Atheris, a Python fuzzer. Like Atheris, Ruzzy uses libFuzzer for its coverage instrumentation and fuzzing engine. Ruzzy also supports AddressSanitizer and UndefinedBehaviorSanitizer when fuzzing C extensions.

This post will go over our motivation behind building Ruzzy, provide a brief overview of installing and running the tool, and discuss some of its interesting implementation details. Ruby revelers rejoice, Ruzzy* is here to reveal a new era of resilient Ruby repositories.

* If you’re curious, Ruzzy is simply a portmanteau of Ruby and fuzz, or fuzzer.

Bringing fuzz testing to Ruby

The Trail of Bits Testing Handbook provides the following definition of fuzzing:

Fuzzing represents a dynamic testing method that inputs malformed or unpredictable data to a system to detect security issues, bugs, or system failures. We consider it an essential tool to include in your testing suite.

Fuzzing is an important testing methodology when developing high-assurance software, even in Ruby. Consider AFL’s extensive trophy case, rust-fuzz’s trophy case, and OSS-Fuzz’s claim that it’s helped find and fix over 10,000 security vulnerabilities and 36,000 bugs with fuzzing. As mentioned previously, Python has Atheris. Java has Jazzer. The Ruby community deserves a high-quality, modern fuzzing tool too.

This isn’t to say that Ruby fuzzers haven’t been built before. They have: kisaten, afl-ruby, FuzzBert, and perhaps some we’ve missed. However, all these tools appear to be either unmaintained, difficult to use, lacking features, or all of the above. To address these challenges, Ruzzy is built on three principles:

  1. Fuzz pure Ruby code and Ruby C extensions
  2. Make fuzzing easy by providing a RubyGems installation process and simple interface
  3. Integrate with the extensive libFuzzer ecosystem

With that, let’s give this thing a test drive.

Installing and running Ruzzy

The Ruzzy repository is well documented, so this post will provide an abridged version of installing and running the tool. The goal here is to provide a quick overview of what using Ruzzy looks like. For more information, check out the repository.

First things first, Ruzzy requires a Linux environment and a recent version of Clang (we’ve tested back to version 14.0.0). Releases of Clang can be found on its GitHub releases page. If you’re on a Mac or Windows computer, then you can use Docker Desktop on Mac or Windows as your Linux environment. You can then use Ruzzy’s Docker development environment to run the tool. With that out of the way, let’s get started.

Run the following command to install Ruzzy from RubyGems:

MAKE="make --environment-overrides V=1" \
CC="/path/to/clang" \
CXX="/path/to/clang++" \
LDSHARED="/path/to/clang -shared" \
LDSHAREDXX="/path/to/clang++ -shared" \
    gem install ruzzy

These environment variables ensure the tool is compiled and installed correctly. They will be explored in greater detail later in this post. Make sure to update the /path/to portions to point to your clang installation.

Fuzzing Ruby C extensions

To facilitate testing the tool, Ruzzy includes a “dummy” C extension with a heap-use-after-free bug. This section will demonstrate using Ruzzy to fuzz this vulnerable C extension.

First, we need to configure Ruzzy’s required sanitizer options:

export ASAN_OPTIONS="allocator_may_return_null=1:detect_leaks=0:use_sigaltstack=0"

(See the Ruzzy README for why these options are necessary in this context.)

Next, start fuzzing:

LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby -e 'require "ruzzy"; Ruzzy.dummy'

LD_PRELOAD is required for the same reason that Atheris requires it. That is, it uses a special shared object that provides access to libFuzzer’s sanitizers. Now that Ruzzy is fuzzing, it should quickly produce a crash like the following:

INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2527961537
...
==45==ERROR: AddressSanitizer: heap-use-after-free on address 0x50c0009bab80 at pc 0xffff99ea1b44 bp 0xffffce8a67d0 sp 0xffffce8a67c8
...
SUMMARY: AddressSanitizer: heap-use-after-free /var/lib/gems/3.1.0/gems/ruzzy-0.7.0/ext/dummy/dummy.c:18:24 in _c_dummy_test_one_input
...
==45==ABORTING
MS: 4 EraseBytes-CopyPart-CopyPart-ChangeBit-; base unit: 410e5346bca8ee150ffd507311dd85789f2e171e
0x48,0x49,
HI
artifact_prefix='./'; Test unit written to ./crash-253420c1158bc6382093d409ce2e9cff5806e980
Base64: SEk=

Fuzzing pure Ruby code

Fuzzing pure Ruby code requires two Ruby scripts: a tracer script and a fuzzing harness. The tracer script is required due to an implementation detail of the Ruby interpreter. Every tracer script will look nearly identical. The only difference will be the name of the Ruby script you’re tracing.

First, the tracer script. Let’s call it test_tracer.rb:

require 'ruzzy'

Ruzzy.trace('test_harness.rb')

Next, the fuzzing harness. A fuzzing harness wraps a fuzzing target and passes it to the fuzzing engine. In this case, we have a simple fuzzing target that crashes when it receives the input “FUZZ.” It’s a contrived example, but it demonstrates Ruzzy’s ability to find inputs that maximize code coverage and produce crashes. Let’s call this harness test_harness.rb:

require 'ruzzy'

def fuzzing_target(input)
  if input.length == 4
    if input[0] == 'F'
      if input[1] == 'U'
        if input[2] == 'Z'
          if input[3] == 'Z'
            raise
          end
        end
      end
    end
  end
end

test_one_input = lambda do |data|
  fuzzing_target(data) # Your fuzzing target would go here
  return 0
end

Ruzzy.fuzz(test_one_input)

You can start the fuzzing process with the following command:

LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby test_tracer.rb

This should quickly produce a crash like the following:

INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2311041000
...
/app/ruzzy/bin/test_harness.rb:12:in `block in ': unhandled exception
    from /var/lib/gems/3.1.0/gems/ruzzy-0.7.0/lib/ruzzy.rb:15:in `c_fuzz'
    from /var/lib/gems/3.1.0/gems/ruzzy-0.7.0/lib/ruzzy.rb:15:in `fuzz'
    from /app/ruzzy/bin/test_harness.rb:35:in `'
    from bin/test_tracer.rb:7:in `require_relative'
    from bin/test_tracer.rb:7:in `
' ... SUMMARY: libFuzzer: fuzz target exited MS: 1 CopyPart-; base unit: 24b4b428cf94c21616893d6f94b30398a49d27cc 0x46,0x55,0x5a,0x5a, FUZZ artifact_prefix='./'; Test unit written to ./crash-aea2e3923af219a8956f626558ef32f30a914ebc Base64: RlVaWg==

Ruzzy used libFuzzer’s coverage-guided instrumentation to discover the input (“FUZZ”) that produces a crash. This is one of Ruzzy’s key contributions: coverage-guided support for pure Ruby code. We will discuss coverage support and more in the next section.

Interesting implementation details

You don’t need to understand this section to use Ruzzy, but fuzzing can often be more art than science, so we wanted to share some details to help demystify this dark art. We certainly learned a lot from the blog posts describing Atheris and Jazzer, so we figured we’d pay it forward. Of course, there are many interesting details that go into creating a tool like this but we’ll focus on three: creating a Ruby fuzzing harness, compiling Ruby C extensions with libFuzzer, and adding coverage support for pure Ruby code.

Creating a Ruby fuzzing harness

One of the first things you need when embarking on a fuzzing campaign is a fuzzing harness. The Trail of Bits Testing Handbook defines a fuzzing harness as follows:

A harness handles the test setup for a given target. The harness wraps the software and initializes it such that it is ready for executing test cases. A harness integrates a target into a testing environment.

When fuzzing Ruby code, naturally we want to write our fuzzing harness in Ruby, too. This speaks to goal number 2 from the beginning of this post: make fuzzing Ruby simple and easy. However, a problem arises when we consider that libFuzzer is written in C/C++. When using libFuzzer as a library, we need to pass a C function pointer to LLVMFuzzerRunDriver to initiate the fuzzing process. How can we pass arbitrary Ruby code to a C/C++ library?

Using a foreign function interface (FFI) like Ruby-FFI is one possibility. However, FFIs are generally used to go the other direction: calling C/C++ code from Ruby. Ruby C extensions seem like another possibility, but we still need to figure out a way to pass arbitrary Ruby code to a C extension. After much digging around in the Ruby C extension API, we discovered the rb_proc_call function. This function allowed us to use Ruby C extensions to bridge the gap between Ruby code and the libFuzzer C/C++ implementation.

In Ruby, a Proc is “an encapsulation of a block of code, which can be stored in a local variable, passed to a method or another Proc, and can be called. Proc is an essential concept in Ruby and a core of its functional programming features.” Perfect, this is exactly what we needed. In Ruby, all lambda functions are also Procs, so we can write fuzzing harnesses like the following:

require 'json'
require 'ruzzy'

json_target = lambda do |data|
  JSON.parse(data)
  return 0
end

Ruzzy.fuzz(json_target)

In this example, the json_target lambda function is passed to Ruzzy.fuzz. Behind the scenes Ruzzy uses two language features to bridge the gap between Ruby code and a C interface: Ruby Procs and C function pointers. First, Ruzzy calls LLVMFuzzerRunDriver with a function pointer. Then, every time that function pointer is invoked, it calls rb_proc_call to execute the Ruby target. This allows the C/C++ fuzzing engine to repeatedly call the Ruby target with fuzzed data. Considering the example above, since all lambda functions are Procs, this accomplishes the goal of calling arbitrary Ruby code from a C/C++ library.

As with all good, high-level overviews, this is an oversimplification of how Ruzzy works. You can see the exact implementation in cruzzy.c.

Compiling Ruby C extensions with libFuzzer

Before we proceed, it’s important to understand that there are two Ruby C extensions we are considering: the Ruzzy C extension that hooks into the libFuzzer fuzzing engine and the Ruby C extensions that become our fuzzing targets. The previous section discussed the Ruzzy C extension implementation. This section discusses Ruby C extension targets. These are third-party libraries that use Ruby C extensions that we’d like to fuzz.

To fuzz a Ruby C extension, we need a way to compile the extension with libFuzzer and its associated sanitizers. Compiling C/C++ code for fuzzing requires special compile-time flags, so we need a way to inject these flags into the C extension compilation process. Dynamically adding these flags is important because we’d like to install and fuzz Ruby gems without having to modify the underlying code.

The mkmf, or MakeMakefile, module is the primary interface for compiling Ruby C extensions. The gem install process calls a gem-specific Ruby script, typically named extconf.rb, which calls the mkmf module. The process looks roughly like this:

gem install -> extconf.rb -> mkmf -> Makefile -> gcc/clang/CC -> extension.so

Unfortunately, by default mkmf does not respect common C/C++ compilation environment variables like CC, CXX, and CFLAGS. However, we can force this behavior by setting the following environment variable: MAKE="make --environment-overrides". This tells make that environment variables override Makefile variables. With that, we can use the following command to install Ruby gems containing C extensions with the appropriate fuzzing flags:

MAKE="make --environment-overrides V=1" \
CC="/path/to/clang" \
CXX="/path/to/clang++" \
LDSHARED="/path/to/clang -shared" \
LDSHAREDXX="/path/to/clang++ -shared" \
CFLAGS="-fsanitize=address,fuzzer-no-link -fno-omit-frame-pointer -fno-common -fPIC -g" \
CXXFLAGS="-fsanitize=address,fuzzer-no-link -fno-omit-frame-pointer -fno-common -fPIC -g" \
    gem install msgpack

The gem we’re installing is msgpack, an example of a gem containing a C extension component. Since it deserializes binary data, it makes a great fuzzing target. From here, if we wanted to fuzz msgpack, we would create an msgpack fuzzing harness and initiate the fuzzing process.

If you’d like to find more fuzzing targets, searching GitHub for extconf.rb files is one of the best ways we’ve found to identify good C extension candidates.

Adding coverage support for pure Ruby code

Instead of Ruby C extensions, what if we want to fuzz pure Ruby code? That is, Ruby projects that do not contain a C extension component. If modifying install-time functionality via lengthy, not-officially-supported environment variables is a hacky solution, then what follows is not for the faint of heart. But, hey, a working solution with a little artistic freedom is better than no solution at all.

First, we need to cover the motivation for coverage support. Fuzzers derive some of their “smarts” from analyzing coverage information. This is a lot like code coverage information provided by unit and integration tests. While fuzzing, most fuzzers prioritize inputs that unlock new code branches. This increases the likelihood that they will find crashes and bugs. When fuzzing Ruby C extensions, Ruzzy can punt coverage instrumentation for C code to Clang. With pure Ruby code, we have no such luxury.

While implementing Ruzzy, we discovered one supremely useful piece of functionality: the Ruby Coverage module. The problem is that it cannot easily be called in real time by C extensions. If you recall, Ruzzy uses its own C extension to pass fuzz harness code to LLVMFuzzerRunDriver. To implement our pure Ruby coverage “smarts,” we need to pass in Ruby coverage information to libFuzzer in real time as the fuzzing engine executes. The Coverage module is great if you have a known start and stop point of execution, but not if you need to continuously gather coverage information and pass it to libFuzzer. However, we know the Coverage module must be implemented somehow, so we dug into the Ruby interpreter’s C implementation to learn more.

Enter Ruby event hooking. The TracePoint module is the official Ruby API for listening for certain types of events like calling a function, returning from a routine, executing a line of code, and many more. When these events fire, you can execute a callback function to handle the event however you’d like. So, this sounds great, and exactly like what we need. When we’re trying to track coverage information, what we’d really like to do is listen for branching events. This is what the Coverage module is doing, so we know it must exist under the hood somewhere.

Fortunately, the public Ruby C API provides access to this event hooking functionality via the rb_add_event_hook2 function. This function takes a list of events to hook and a callback function to execute whenever one of those events fires. By digging around in the source code a bit, we find that the list of possible events looks very similar to the list in the TracePoint module:

 37    #define RUBY_EVENT_NONE      0x0000 /**
 38    #define RUBY_EVENT_LINE      0x0001 /**
 39    #define RUBY_EVENT_CLASS     0x0002 /**
 40    #define RUBY_EVENT_END       0x0004 /**
       ...

Ruby event hook types

If you keep digging, you’ll notice a distinct lack of one type of event: coverage events. But why? The Coverage module appears to be handling these events. If you continue digging, you’ll find that there are in fact coverage events, and that is how the Coverage module works, but you don’t have access to them. They’re defined as part of a private, internal-only portion of the Ruby C API:

 2182    /* #define RUBY_EVENT_RESERVED_FOR_INTERNAL_USE 0x030000 */ /* from vm_core.h */
 2183    #define RUBY_EVENT_COVERAGE_LINE                0x010000
 2184    #define RUBY_EVENT_COVERAGE_BRANCH              0x020000

Private coverage event hook types

That’s the bad news. The good news is that we can define the RUBY_EVENT_COVERAGE_BRANCH event hook ourselves and set it to the correct, constant value in our code, and rb_add_event_hook2 will still respect it. So we can use Ruby’s built-in coverage tracking after all! We can feed this data into libFuzzer in real time and it will fuzz accordingly. Discussing how to feed this data into libFuzzer is beyond the scope of this post, but if you’d like to learn more, we use SanitizerCoverage’s inline 8-bit counters, PC-Table, and data flow tracing.

There’s just one more thing.

During our testing, even though we added the correct event hook, we still weren’t successfully hooking coverage events. The Coverage module must be doing something we’re not seeing. If we call Coverage.start(branches: true), per the Coverage documentation, then things work as expected. The details here involve a lot of sleuthing in the Ruby interpreter source code, so we’ll cut to the chase. As best we can tell, it appears that calling Coverage.start, which effectively calls Coverage.setup, initializes some global state in the Ruby interpreter that allows for hooking coverage events. This initialization functionality is also part of a private, internal-only API. The easiest solution we could come up with was calling Coverage.setup(branches: true) before we start fuzzing. With that, we began successfully hooking coverage events as expected.

Having coverage events included in the standard library made our lives a lot easier. Without it, we may have had to resort to much more invasive and cumbersome solutions like modifying the Ruby code the interpreter sees in real time. However, it would have made our lives even easier if hooking coverage events were part of the official, public Ruby C API. We’re currently tracking this request at trailofbits/ruzzy#9.

Again, the information presented here is a slight oversimplification of the implementation details; if you’d like to learn more, then cruzzy.c and ruzzy.rb are great places to start.

Find more Ruby bugs with Ruzzy

We faced some interesting challenges while building this tool and attempted to hide much of the complexity behind a simple, easy to use interface. When using the tool, the implementation details should not become a hindrance or an annoyance. However, discussing them here in detail may spur the next fuzzer implementation or step forward in the fuzzing community. As mentioned previously, the Atheris and Jazzer posts were a great inspiration to us, so we figured we’d pay it forward.

Building the tool is just the beginning. The real value comes when we start using the tool to find bugs. Like Atheris for Python, and Jazzer for Java before it, Ruzzy is an attempt to bring a higher level of software assurance to the Ruby community. If you find a bug using Ruzzy, feel free to open a PR against our trophy case with a link to the issue.

If you’d like to read more about our work on fuzzing, check out the following posts:

Contact us if you’re interested in custom fuzzing for your project.

Why fuzzing over formal verification?

22 March 2024 at 13:00

By Tarun Bansal, Gustavo Grieco, and Josselin Feist

We recently introduced our new offering, invariant development as a service. A recurring question that we are asked is, “Why fuzzing instead of formal verification?” And the answer is, “It’s complicated.”

We use fuzzing for most of our audits but have used formal verification methods in the past. In particular, we found symbolic execution useful in audits such as Sai, Computable, and Balancer. However, we realized through experience that fuzzing tools produce similar results but require significantly less skill and time.

In this blog post, we will examine why the two principal assertions in favor of formal verification often fall short: proving the absence of bugs is typically unattainable, and fuzzing can identify the same bugs that formal verification uncovers.

Proving the absence of bugs

One of the key selling points of formal verification over fuzzing is its ability to prove the absence of bugs. To do that, formal verification tools use mathematical representations to check whether a given invariant holds for all input values and states of the system.

While such a claim can be attainable on a simple codebase, it’s not always achievable in practice, especially with complex codebases, for the following reasons:

  • The code may need to be rewritten to be amenable to formal verification. This leads to the verification of a pseudo-copy of the target instead of the target itself. For example, the Runtime Verification team verified the pseudocode of the deposit contract for the ETH2.0 upgrade, as mentioned in this excerpt from their blog post:

    Specifically, we first rigorously formalized the incremental Merkle tree algorithm. Then, we extracted a pseudocode implementation of the algorithm employed in the deposit contract, and formally proved the correctness of the pseudocode implementation.

  • Complex code may require a custom summary of some functionality to be analyzed. In these situations, the verification relies on the custom summary to be correct, which shifts the responsibility of correctness to that summary. To build such a summary, users might need to use an additional custom language, such as CVL, which increases the complexity.
  • Loops and recursion may require adding manual constraints (e.g., unrolling the loop for only a given amount of time) to help the prover. For example, the Certora prover might unroll some loops for a fixed number of iterations and report any additional iteration as a violation, forcing further involvement from the user.
  • The solver can time out. If the tool relies on a solver for equations, finding a solution in a reasonable time may not be possible. In particular, proving code with a high number of nonlinear arithmetic operations or updates to storage or memory is challenging. If the solver times out, no guarantee can be provided.

So while proving the absence of bugs is a benefit of formal verification methods in theory, it may not be the case in practice.

Finding bugs

When formally verifying the code is not possible, formal verification tools can still be used as bug finding tools. However, the question remains, “Can formal verification find real bugs that cannot be found by a fuzzer?” At this point, wouldn’t it just be easier to use a fuzzer?

To answer this question, we looked at two bugs found using formal verification in MakerDAO and Compound and then attempted to find these same bugs with only a fuzzer. Spoiler alert: we succeeded.

We selected these two bugs because they were widely advertised as having been discovered through formal verification, and they affected two popular protocols. To our surprise, it was difficult to find public issues discovered solely through formal verification, in contrast with the many bugs found by fuzzing (see our security reviews).

Our fuzzer found both bugs in a matter of minutes, running on a typical development laptop. The bugs we evaluated, as well as the formal verification and fuzz testing harnesses we used to discover them, are available on our GitHub page about fuzzing formally verified contracts to reproduce popular security issues.

Fundamental invariant of DAI

MakerDAO found a bug in its live code after four years. You can read more about the bug in When Invariants Aren’t: DAI’s Certora Surprise. Using the Certora prover, MakerDAO found that the fundamental invariant of DAI, which is that the sum of all collateral-backed debt and unbacked debt should equal the sum of all DAI balances, could be violated in a specific case. The core issue is that calling the init function when a vault’s Rate state variable is zero and its Art state variable is nonzero changes the vault’s total debt, which violates the invariant checking sum of total debt and total DAI supply. The MakerDAO team concluded that calling the init function after calling the fold function is a path to break the invariant.

function sumOfDebt() public view returns (uint256) {
    uint256 length = ilkIds.length;
    uint256 sum = 0;
    for (uint256 i=0; i < length; ++i){
        sum = sum + ilks[ilkIds[i]].Art * ilks[ilkIds[i]].rate;
    }
    return sum;
}

function echidna_fund_eq() public view returns (bool) {
    return debt == vice + sumOfDebt();
}

Figure 1: Fundamental equation of DAI invariant in Solidity

We implemented the same invariant in Solidity, as shown in figure 1, and checked it with Echidna. To our surprise, Echidna violated the invariant and found a unique path to trigger the violation. Our implementation is available in the Testvat.sol file of the repository. Implementing the invariant was easy because the source code under test was small and required only logic to compute the sum of all debts. Echidna took less than a minute on an i5 12-GB RAM Linux machine to violate the invariant.

Liquidation of collateralized account in Compound V3 Comet

The Certora team used their Certora Prover to identify an interesting issue in the Compound V3 Comet smart contracts that allowed a fully collateralized account to be liquidated. The root cause of this issue was using an 8-bit mask for a 16-bit vector. The mask remains zero for the higher bits in the vector, which skips assets while calculating total collateral and results in the liquidation of the collateralized account. More on this issue can be found in the Formal Verification Report of Compound V3 (Comet).

function echidna_used_collateral() public view returns (bool) {
    for (uint8 i = 0; i < assets.length; ++i) {
        address asset = assets[i].asset;
        uint256 userColl = sumUserCollateral(asset, true);
        uint256 totalColl = comet.getTotalCollateral(asset);
        if (userColl != totalColl) {
            return false;
        }
    }
    return true;
}

function echidna_total_collateral_per_asset() public view returns (bool) {
    for (uint8 i = 0; i < assets.length; ++i) {
        address asset = assets[i].asset;
        uint256 userColl = sumUserCollateral(asset, false);
        uint256 totalColl = comet.getTotalCollateral(asset);
        if (userColl != totalColl) {
            return false;
        }
    }
    return true;
}

Figure 2: Compound V3 Comet invariant in Solidity

Echidna discovered the issue with the implementation of the invariant in Solidity, as shown in figure 2. This implementation is available in the TestComet.sol file in the repository. Implementing the invariant was easy; it required limiting the number of users interacting with the test contract and adding a method to calculate the sum of all user collateral. Echidna broke the invariant within minutes by generating random transaction sequences to deposit collateral and checking invariants.

Is formal verification doomed?

Formal verification tools require a lot of domain-specific knowledge to be used effectively and require significant engineering efforts to apply. Grigore Rosu, Runtime Verification’s CEO, summarized it as follows:

Figure 3: A tweet from the founder of Runtime Verification Inc.

While formal verification tools are constantly improving, which reduces the engineering effort, none of the existing tools reach the ease of use of existing fuzzers. For example, the Certora Prover makes formal verification more accessible than ever, but it is still far less user-friendly than a fuzzer for complex codebases. With the rapid development of these tools, we hope for a future where formal verification tools become as accessible as other dynamic analysis tools.

So does that mean we should never use formal verification? Absolutely not. In some cases, formally verifying a contract can provide additional confidence, but these situations are rare and context-specific.

Consider formal verification for your code only if the following are true:

  • You are following an invariant-driven development approach.
  • You have already tested many invariants with fuzzing.
  • You have a good understanding of which remaining invariants and components would benefit from formal methods.
  • You have solved all the other issues that would decrease your code maturity.

Writing good invariants is the key

Over the years, we have observed that the quality of invariants is paramount. Writing good invariants is 80% of the work; the tool used to check/verify them is important but secondary. Therefore, we recommend starting with the easiest and most effective technique—fuzzing—and relying on formal verification methods only when appropriate.

If you’re eager to refine your approach to invariants and integrate them into your development process, contact us to leverage our expertise.

Streamline your static analysis triage with SARIF Explorer

20 March 2024 at 13:30

By Vasco Franco

Today, we’re releasing SARIF Explorer, the VSCode extension that we developed to streamline how we triage static analysis results. We make heavy use of static analysis tools during our audits, but the process of triaging them was always a pain. We designed SARIF Explorer to provide an intuitive UI inside VSCode, with features that make this process less painful:

  • Open multiple SARIF files: Triage all your results at once.
  • Browse results: Browse results by clicking on them to open their associated location in VSCode. You can also browse a result’s dataflow steps, if present.
  • Classify results: Add metadata to each result by classifying it as a “bug,” “false positive,” or “TODO” and adding a custom text comment. Keyboard shortcuts are supported.
  • Filter results: Filter results by keyword, path (to include or exclude), level (“error,” “warning,” “note,” or “none”), and status (“bug,” “false positive,” or “TODO”).
  • Open GitHub issues: Copy GitHub permalinks to locations associated with results and create GitHub issues directly from SARIF Explorer.
  • Send bugs to weAudit: Send all bugs to weAudit once you’ve finished triaging them and continue with the weAudit workflow.
  • Collaborate: Share the .sarifexplorer file with your colleagues (e.g., on GitHub) to share your comments and classified results.

You can install it through the VSCode marketplace and find its code in our vscode-sarif-explorer repo.

Why we built SARIF Explorer

Have you ever had to triage hundreds of static analysis results, many of which were likely to be false positives? At Trail of Bits, we extensively use static analysis tools such as Semgrep and CodeQL, sometimes with rules that produce many false positives, so this is an experience we’re all too familiar with. As security engineers, we use these low-precision rules because if there’s a bug we can detect automatically, we want to know about it, even if it means sieving through loads of false positive results.

Long ago, you would have found me triaging these results by painstakingly going over a text file or looking into a tiny terminal window. This was grueling work that I did not enjoy at all. You read the result’s description, you copy the path to the code, you go to that file, and you analyze the code. Then, you annotate your conclusions in some other text file, and you repeat.

A few years ago, we started using SARIF Viewer at Trail of Bits. This was a tremendous improvement, as it allowed us to browse a neat list of results organized by rule and click on each one to jump to the corresponding code. Still, it lacked several features that we wanted:

  • The ability to classify results as bugs or false positives directly in the UI
  • Better result filtering
  • The ability to export results as GitHub issues
  • Better integration with weAudit—our tool for bookmarking code regions, marking files as reviewed, and more (check out our recent blog post announcing the release of this tool!)

This is why we built SARIF Explorer!

SARIF Explorer was designed with user efficiency in mind, providing an intuitive interface so that users can easily access all of the features we built into it, as well as support for keyboard shortcuts to move through and classify results.

The SARIF Explorer static analysis workflow

But why did we want all these new features, and how do we use them? At Trail of Bits, we follow this workflow when using static analysis tools:

  1. Run all static analysis tools (configured to output SARIF files).
  2. Open SARIF Explorer and open all of the SARIF files generated in step 1.
  3. Filter out the noisy results.
    • Are there rules that you are not interested in seeing? Hide them!
    • Are there folders for which you don’t care about the results (e.g., the ./third_party folder)? Filter them out!
  4. Classify the results.
    • Determine if each result is a false positive or a bug.
    • Swipe left or right accordingly (i.e., click the left or right arrow).
    • Add additional context with a comment if necessary.
  5. Working with other team members? Share your progress by committing the .sarifexplorer file to GitHub.
  6. Send all results marked as bugs to weAudit and proceed with the weAudit workflow.

SARIF Explorer features

Now, let’s take a closer look at the SARIF Explorer features that enable this workflow:

  • Open multiple SARIF files: You can open and browse the results of multiple SARIF files simultaneously. Use the “Sarif files” tab to browse the list of opened SARIF files and to close or reload any of them. If you open a SARIF file in your workspace, SARIF Explorer will also automatically open it.

  • Browse results: You can navigate to the locations of the results by clicking on them in the “Results” tab. The detailed view of the result, among other data, includes dataflow information, which you can navigate from source to sink (if available). In the GIF below, the user follows the XSS vulnerability from the source (an event message) to the sink (a DOM parser).

GIF showing how to browse results

  • Classify results: You can add metadata to each result by classifying it as a “bug,” “false positive,” or “TODO” and adding a custom text comment. You can use either the mouse or keyboard to do this:
    • Using the mouse: With a result selected, click one of the “bug,” “false positive,” or “TODO” buttons to classify it as such. These buttons appear next to the result and in the result’s detailed view.
    • Using the keyboard: With a result selected, press the right arrow key to classify it as a bug, the left arrow key to classify it as a false positive, and the backspace key to reset the classification to a TODO. This method is more efficient.

  • Filter results: You can filter results by keyword, path (to include or exclude), level (“error,” “warning,” “note,” or “none”), and status (“bug,” “false positive,” or “TODO”). You can also hide all results from a specific SARIF file or from a specific rule. For example, if you want to remove all results from the test and extensions folders and to see only results classified as TODOs, you should:
    • Set “Exclude Paths Containing” to “/test/, /extensions/”
    • Check the “Todo” box and uncheck the “Bug” and “False Positive” boxes in the “Status” section

  • Copy GitHub permalinks: You can copy a GitHub permalink to the location associated with a result. This requires having weAudit installed.

  • Create GitHub issues: You can create formatted GitHub issues for a specific result or for all unfiltered results under a given rule. This requires having weAudit installed.

  • Send bugs to weAudit: You can send all results classified as bugs to weAudit (results are automatically de-duplicated if you send them twice). This requires having weAudit installed.

  • Collaborate: You can share the .sarifexplorer file with your colleagues (e.g., on GitHub) to share your comments and classified results. The file is a prettified JSON file, which helps resolve conflicts if more than one person writes to the file in parallel.

You can find even more details about these features in our README.

Try it!

SARIF Explorer and weAudit greatly improved our efficiency when auditing code, and we hope it improves yours too.

Go try both of these tools out and let us know what you think! We welcome any bug reports, feature requests, and contributions in our vscode-sarif-explorer and vscode-weaudit repos.

If you’re interested in VSCode extension security, check out our “Escaping misconfigured VSCode extensions” and “Escaping well-configured VSCode extensions (for profit)” blog posts.

Contact us if you need help securing your VSCode extensions or any other application.

Read code like a pro with our weAudit VSCode extension

19 March 2024 at 13:30

By Filipe Casal

Today, we’re releasing weAudit, the collaborative code-reviewing tool that we use during our security audits. With weAudit, we review code more efficiently by taking notes and tracking bugs in a codebase directly inside VSCode, reducing our reliance on external tools, ensuring we never lose track of bugs we find, and enabling us to share that information with teammates.

We designed weAudit with features that are crucial to our auditing process:

  • Bookmarks for findings and notes: Bookmark code regions to identify findings or add audit notes.
  • Tracking of audited files: Mark entire files as reviewed.
  • Collaboration: View and share findings with multiple users.
  • Creation of GitHub issues: Fill in detailed information about a finding and create a preformatted GitHub issue right from weAudit.

You can install it through the VSCode marketplace and find its code in our vscode-weaudit repo.

Why we built weAudit

When we review complex codebases, we often compile detailed notes about both the high-level structure and specific low-level implementation details to share with our project team. For high-level notes, standard document sharing tools more than suffice. But those tools are not ideal for sharing low-level, code-specific notes. For those, we need a tool that allows us to share notes that are more tightly coupled with the codebase itself, almost like using post-it notes to navigate through a complex book. Specifically, we need a tool that allows us to do the following:

  • Quickly navigate through areas of interest in the codebase
  • Visually highlight significant areas of the code
  • Add audit notes to certain parts of the codebase

For some time, I used a very simple extension for VSCode called “Bookmarks”, which allowed me to add basic notes to lines of code. However, I was never satisfied with this extension, as it was missing crucial features:

  • The highlighted code did not display the notes I had written next to the code.
  • I had no way of sharing code coverage information with my client or fellow engineers auditing the codebase.
  • I had no way of sharing my notes and bookmarks. During an audit with a team of engineers, I need to be able to share these things with my team so that my knowledge is their knowledge, and vice versa.

All of us engineers at Trail of Bits agreed that we needed a better tool for this purpose. We realized that if we wanted an extension tailored to our needs, we would need to create it. That is why we built weAudit.

weAudit’s main features

The features we built into weAudit streamline our process of bookmarking, annotating, and tracking code files under audit, sharing our notes, and creating GitHub issues for findings we discover.

Bookmarks

The extension supports two types of bookmarks: findings, which represent buggy or suspicious regions of code, and notes, which represent personal annotations about the code.

You can add findings and notes to the current code snippet selection by running the corresponding VSCode commands or using the keyboard shortcuts:

  • “weAudit: New Finding from Selection” (shortcut: Cmd + J)
  • “weAudit: New Note from Selection” (shortcut: Cmd + K)

These commands will highlight the code in the editor and create a new bookmark in the “List of Findings” view in the sidebar.

By clicking on an item in the “List of Findings” view, you can navigate to the corresponding region of code.

Files with a finding will have a “!” annotation next to the file name in both the file tree of VSCode’s default “Explorer” view and in the tab above the editor, making it immediately clear which files have findings.

The highlight colors can be customized in the extension settings.

Tracking audited files

After reviewing a file, you can mark it as audited by running the “weAudit: Mark File as Reviewed” command or its keyboard shortcut, Cmd + 7. The whole file will be highlighted, and the file name in both the file tree and the tab above the editor will be annotated with a ✓.

The highlight color can be customized in the extension settings.

Daily log

Have you ever had trouble remembering which files you reviewed the previous week? Or do you just really like meaningless statistics such as the number of lines of code you read in a single day? You can see these stats by showing the daily log, accessible from the “List of Findings” panel.

You can also view the daily log by running the “weAudit: Show Daily Log” command in the command palette.

Collaboration with multiple users

You can share weAudit files (located in the .vscode folder) with your co-auditors to share findings and notes about the code. In the “weAudit Files” panel, you can toggle to show or hide the findings from each user by clicking on each entry. The colors for other users’ findings and notes and for your own findings and notes are customizable in the extension settings.

Detailed findings

You can fill in detailed information about a finding by clicking on it in the “List of Findings” view in the sidebar, where you can add all the information we include in our audit reports: title, severity, difficulty, description, exploit scenario, and recommendations for resolving the issue.

This information is then used to prefill a template, allowing you to quickly open a GitHub issue with all of the relevant details for the finding.

You can find more details and information about other features in our README.

Try it out for yourself!

If you use VSCode to navigate through large codebases, we invite you to try weAudit—even if you are not looking for bugs—and let us know what you think!

We welcome any bug reports, feature requests, and contributions in our vscode-weaudit repo.

If you’re interested in VSCode extension security, check out our “Escaping misconfigured VSCode extensions” and “Escaping well-configured VSCode extensions (for profit)” blog posts.

Contact us if you need help securing your VSCode extensions or any other application.

Releasing the Attacknet: A new tool for finding bugs in blockchain nodes using chaos testing

18 March 2024 at 13:00

By Benjamin Samuels (@thebensams)

Today, Trail of Bits is publishing Attacknet, a new tool that addresses the limitations of traditional runtime verification tools, built in collaboration with the Ethereum Foundation. Attacknet is intended to augment the EF’s current test methods by subjecting their execution and consensus clients to some of the most challenging network conditions imaginable.

Blockchain nodes must be held to the highest level of security assurance possible. Historically, the primary tools used to achieve this goal have been exhaustive specification, tests, client diversity, manual audits, and testnets. While these tools have traditionally done their job well, they collectively have serious limitations that can lead to critical bugs manifesting in a production environment, such as the May 2023 finality incident that occurred on Ethereum mainnet. Attacknet addresses these limitations by subjecting devnets to a much wider range of network conditions and misconfigurations than is possible on a conventional testnet.

How Attacknet works

Attacknet uses chaos engineering, a testing methodology that proactively injects faults into a production environment to verify that the system is tolerant to certain failures. These faults reproduce real-world problem scenarios and misconfigurations, and can be used to create exaggerated scenarios to test the boundary conditions of the blockchain.

Attacknet uses Chaos Mesh to inject faults into a devnet environment generated by Kurtosis. By building on top of Kurtosis and Chaos Mesh, Attacknet can create various network topologies with ensembles of different kinds of faults to push a blockchain network to its most extreme edge cases.

Some of the faults include:

  • Clock skew, where a node’s clock is skewed forwards or backwards for a specific duration. Trail of Bits was able to reproduce the Ethereum finality incident using a clock skew fault, as detailed in our TrustX talk last year.
  • Network latency, where a node’s connection to the network (or its corresponding EL/CL client) is delayed by a certain amount of time. This fault can help reproduce global latency conditions or help detect unintentional synchronicity assumptions in the blockchain’s consensus.
  • Network partition, where the network is split into two or more segments that cannot communicate with each other. This fault can test the network’s fork choice rule, ability to re-org, and other edge cases.
  • Network packet drop/corruption, where gossip packets are dropped or have their contents corrupted by a certain amount. This fault can test a node’s gossip validation and test the robustness of the network under hostile network conditions.
  • Forced node crashes/offlining, where a certain client or type of client is ungracefully shut down. This fault can test the network’s resilience to validator inactivity, and test the ability of clients to re-sync to the network.
  • I/O disk faults/latency, where a certain amount of latency or error rate is applied to all I/O operations a node makes. This fault can help profile nodes to understand their resource requirements, as I/O is often the largest limiting factor of node performance.

Once the fault concludes, Attacknet performs a battery of health checks against each node in the network to verify that they were able to recover from the fault. If all nodes recover from the fault, Attacknet moves on to the next configured fault. If one or more nodes fail health checks, Attacknet will generate an artifact of logs and test information to allow debugging.

Future work

In this first release, Attacknet supports two run modes: one with a manually configured network topology and fault parameters, and a “planner mode” where a range of faults are run against a specific client with loosely defined topology parameters. In the future, we plan on adding an “Exploration mode” that will dynamically define fault parameters, inject them, and monitor network health repeatedly, similar to a fuzzer.

Attacknet is currently being used to test the Dencun hard fork, and is being regularly updated to improve coverage, performance, and debugging UX. However, Attacknet is not an Ethereum-specific tool, and was designed to be modular and easily extended to support other types of chains with drastically different designs and topologies. In the future, we plan on extending Attacknet to target other chains, including other types of blockchain systems such as L2s.

If you’re interested in integrating Attacknet with your chain/L2’s testing process, please contact us.

Secure your blockchain project from the start

13 March 2024 at 13:00

Systemic security issues in blockchain projects often appear early in development. Without an initial focus on security, projects may choose flawed architectures or make insecure design or development choices that result in hard-to-maintain or vulnerable solutions. Traditional security reviews can be used to identify some security issues, but by the time they are complete, it may be too late to fix some of the issues that could have been addressed at the design and development stages.

To help clients identify and address potential security issues earlier in the project, Trail of Bits is rolling out a new service: Early Stage Security Review. The service, already requested by many of our clients, is ideal for early-stage projects seeking feedback, where code, documentation, testing, and technical solutions are still evolving. As part of the service, Trail of Bits engineers will perform a thorough review of a project, including:

  • Architectural components review
  • Risk mitigation analysis
  • Identification of gaps in security practices
  • Code maturity evaluation
  • Tailored design recommendations
  • Lightweight code review of critical project areas
  • Actionable advice, recommendations, and next steps to improve the project’s security

Fix potential issues before they become real problems

Early stage security review provides an all-encompassing security assessment of your project’s design and structure, designed to guide developers and security decisions throughout the project’s lifecycle. We leverage years of code review experience accumulated across various domains—including smart contracts, bridges, decentralized finance, and gaming applications—to guide your project’s development with security as a primary focus. We’ll also apply our deep expertise in blockchain nodes (L1 and L2), especially those based on geth.

Our early-stage review of your project will focus on identifying areas of improvement that will include:

  • Architectural components review. We will assess architectural choices for risks, review access controls for proper privilege separation, propose changes to simplify code complexity, ensure the advertised degree of decentralization is accurate, recommend on-chain/off-chain logic separation, and evaluate the upgradeability process, including migration and pausable mechanisms.
  • Risk mitigation analysis. We will identify existing risks and suggest mitigations, ensuring that MEV and Oracle risks are considered. We will assess the protocol’s reliance on blockchain risks (e.g., reorgs). We will examine the handling of common ERCs, and evaluate third-party component integration risks.
  • Identification of gaps in security practices. We will pinpoint security practice gaps, including issues identified in documentation, and assess whether the project’s testing is sufficient for the long-term health of the project. We will evaluate the monitoring plan, and recommend improvements in automated security tool usage.
  • Code maturity evaluation. Through our reviews, we will evaluate the maturity of the protocol and offer actionable security improvement recommendations.
  • Tailored design recommendations. We will adapt our review based on the project’s unique needs and requirements and provide recommendations tailored toward the protocol business logic.
  • Lightweight code review of critical project areas. We will review the code to understand and assess the technical solution for potential security issues or concerns. However, we won’t look for in-depth vulnerabilities during an early-stage review, as the code review is intended to identify surface-level bugs.

Clients using our Early Stage Security Review will get preferential scheduling and pricing for blockchain and other Trail of Bits services. Insights from the initial review will help reduce the effort required for a comprehensive review after substantial development completes.

Get ahead of security issues

The early-stage security review service will enable you to:

  • Set a strong security foundation. Early feedback sets your solutions on a path to success, minimizing potential security oversights.
  • Receive expert recommendations earlier. Tailored guidance for your unique codebase empowers you to make informed decisions and enhance your protocol’s security.
  • Reduce cost by preventing late refactoring. A proactive security approach from inception avoids costly late-stage refactoring and streamlines the development cycle.

Don’t wait until your project is code complete to prioritize security. Contact us to take advantage of our experience to help you secure your project from the start.

DARPA awards $1 million to Trail of Bits for AI Cyber Challenge

11 March 2024 at 17:46

By Michael D. Brown

We’re excited to share that Trail of Bits has been selected as one of the seven exclusive teams to participate in the small business track for DARPA’s AI Cyber Challenge (AIxCC). Our team will receive a $1 million award to create a Cyber Reasoning System (CRS) and compete in the AIxCC Semifinal Competition later this summer. This recognition not only highlights our dedication to advancing cybersecurity but also marks a significant milestone in our journey in pioneering solutions that could shape the future of AI-driven security. Our involvement in the AIxCC represents a step forward in our commitment to pushing the boundaries of what’s possible, envisioning a future where cybersecurity challenges are met with innovative, AI-powered solutions.

It’s official: Trail of Bits was selected as one of the seven exclusive teams for the AIxCC small business track.

As we move beyond the initial phase of the competition, we’re eager to offer a sneak peek into the driving forces behind our approach, without spilling all of our secrets, of course. In a field where competitors often hold their cards close to their chests, we at Trail of Bits believe in the value of openness and sharing. Our motivation stems from more than just the desire to compete; it’s about contributing to a broader understanding and development within the cybersecurity community. While we navigate through this challenge with an eye on victory, our aim is also to foster a culture of transparency and collaboration, aligning with our deep-rooted open-source ethos.

For background on the challenge, see our two previous posts on the AIxCC:

*** Disclaimer: Information about AIxCC’s rules, structure, and events referenced in this document are subject to change. This post is NOT an authoritative document. Please refer to DARPA’s website and official documents for first-hand information. ***

Congrats to the 7 companies that will receive $1 million each to develop AI-enabled cyber reasoning systems that automatically find and fix software vulnerabilities as part of the #AIxCC Small Business Track! Full announcement: https://t.co/SC6yEFsooy. pic.twitter.com/MRt3eoNuJd

— DARPA (@DARPA) March 11, 2024

The guiding principles for building our CRS

In addition to competing in the AIxCC’s spiritual predecessor, the Cyber Grand Challenge (CGC), our team at Trail of Bits has been working to apply AI/ML techniques to critical cybersecurity problems for many years. These experiences have heavily influenced our approach to the AIxCC. While we’ll be waiting until later in the competition to share specific details, we would like to share the guiding principles for building our AI/ML-driven CRS that have come from this work:

CRS architecture is key to achieving scalability, resiliency, and versatility

DARPA’s CGC, like the AIxCC, tasked competitors with developing CRSs that find vulnerabilities at scale (i.e., that scan many challenge programs in a limited period of time) without any human intervention. The CRS Trail of Bits created to compete in the CGC, Cyberdyne, addressed these problems with a distributed system architecture. Cyberdyne provisioned many independent nodes, each capable of performing key tasks such as fuzzing and symbolic execution. Each node was tasked with one or more challenge problems, and could even cooperate with other nodes on the same challenge.

This design had several advantages. First, the CRS maximized coverage of the 131 challenges via parallel processing. This allowed the CRS to both achieve the scale needed to succeed in the competition and avoid being bogged down with particularly challenging problems. Second, the CRS was resilient to localized failures. If nodes experienced a catastrophic error while analyzing a challenge problem, the operation of other independent nodes was not affected, limiting the damage to the CRS’s overall score. The care taken in this design paid off in the competition: Cyberdyne ranked second among all CRSs in terms of the total number of verified bugs found!

The format of the AIxCC bears a strong resemblance to that of the CGC, so the CRS we build for the AIxCC will also need to be scalable and resilient to failures. However, the AIxCC has an additional wrinkle—challenge diversity. The AIxCC’s challenge problem set will include programs written in languages other than C/C++, including many interpreted languages such as Java and Python. This will require a successful CRS to be highly versatile. Fortunately, the distributed architecture used in Cyberdyne can be adapted for the AIxCC to address versatility in a manner similar to scalability and resiliency. The key difference is that problem-solving nodes used for AIxCC challenges will need to be specialized for different types of challenge problems.

AI/ML is best for complementing conventional techniques, not replacing them

I, along with my co-authors from Georgia Tech, recently presented work at the USENIX Security Symposium on an ML-based static analysis tool we built called VulChecker. VulChecker uses graph-based ML algorithms to locate and classify vulnerabilities in program source code. We evaluated VulChecker against a commercial static analysis tool and found that VulChecker outperformed the commercial tool at detecting certain vulnerability types that rule-based tools typically struggle with, such as integer overflow/underflow vulnerabilities. However, for vulnerabilities that are amenable to rule-based checks (e.g., stack buffer overflow vulnerabilities), VulChecker was effective but did not outperform conventional static analysis.

Considering that rule-based checks are generally less costly to implement than ML models, it doesn’t make sense to replace conventional analysis entirely with AI/ML. Rather, AI/ML is best suited to complement conventional approaches by addressing the problem instances that they struggle with. In the context of the AIxCC, our experience suggests that an AI/ML-only approach is a losing proposition due to high compute costs and the effect of compounding false positives, inaccuracies, and/or confabulations at each step. With that in mind, we plan to use AI/ML in our CRS only where it is best suited or where no conventional options exist. For now, we are planning to use AI/ML approaches primarily for vulnerability detection/classification, patch generation, and input generation tasks in our CRS.

Use the right AI/ML models for the job!

LLMs have been demonstrated to have many emergent capabilities due to the sheer size of their training sets. Among the tasks a CRS must complete in the AIxCC that are suitable for AI/ML, several are tailor-made for LLMs, such as generating code snippets and seed inputs for fuzzing. However, based on our past research, we’ve found that LLMs may not actually be the best option for such tasks.

Last fall, our team supported the United Kingdom’s Frontier AI Taskforce’s efforts to evaluate the risks posed by frontier AI models. We created a framework for rigorously assessing the offensive cyber capabilities of LLMs, which allowed us to 1) rate the model’s independent capabilities relative to human skill levels (i.e., novice, intermediate, expert) and 2) rate the model’s ability to upskill a novice or intermediate human operator. We used this framework to assess different LLMs’ abilities to handle several distinct tasks, including those highly relevant to AIxCC (e.g., vulnerability discovery and contextualization).

We found that LLMs could perform only as well as experts or significantly upskill novices for tasks that were reducible to natural language processing, such as writing phishing emails and conducting misinformation campaigns. For other cyber tasks (including those relevant to the AIxCC) such as creating malicious software, finding vulnerabilities in source code, and creating exploits, current-generation LLMs had novice-like capabilities and could only marginally upskill novice users. These results speak to the lack of reasoning and planning capabilities in LLMs, which has been well documented.

Because LLMs will struggle greatly with tasks that are reasoning-intensive, such as identifying novel instances of vulnerabilities in source code or classifying vulnerabilities, we’ll avoid their use in our CRS. Other types of AI/ML models with narrower scopes are a better option. Expecting LLMs to perform well on these tasks risks high levels of inaccuracy or false positives that can derail late tasks (e.g., generating patches).

What’s next?

Next month, DARPA will hold its AIxCC kickoff event where we should learn more about the infrastructure DARPA will provide for the competition. Once released, we expect this information will allow us (and other competing teams) to make more concrete progress toward building our CRS.

Out of the kernel, into the tokens

8 March 2024 at 14:00

By Max Ammann and Emilio López

We’re digging up the archives of vulnerabilities that Trail of Bits has reported over the years. This post shares the story of two such issues: a denial-of-service (DoS) vulnerability hidden in JSON Web Tokens (JWTs), and an oversight in the Linux kernel that could enable circumvention of critical kernel security mechanisms (KASLR).

Unraveling a DoS vulnerability in JOSE libraries

JWT and JSON Object Signing and Encoding (JOSE) are expansive standards that describe the creation and use of encrypted and/or signed JSON-based tokens. While these standards are widely used and represent a significant improvement over previous solutions for identity claims, they are not without drawbacks, and have several well-known footguns, like the JWT “none” signature algorithm.

Our finding concerns an attack that was part of a lineup of new JWT attacks presented by Tom Tervoort at BlackHat USA 2023: “Three New Attacks Against JSON Web Tokens.” The “billion hashes attack”, which results in denial-of-service due to a lack of validation in JWT key encryption, caught our colleague Matt Schwager’s attention. Upon further examination, he discovered it applied to several more libraries in the Go and Rust ecosystems: go-jose, jose2go, square/go-jose, and josekit-rs.

These libraries all support key encryption with PBES2, a feature meant to allow for password-based encryption of the Content Encryption Key (CEK) in JSON Web Encryption (JWE). A key is first derived from a password by using PBES2 schemes, which execute a number of PBKDF2 iterations. Then that key is used to encrypt and decrypt the token contents.

This wouldn’t normally be an issue, but unfortunately, the number of iterations is contained as part of the token, on the p2c header parameter, which an attacker can easily manipulate. Consider, for example, the token header shown below:

Figure 1: A JWE token header indicating PBES2 key encryption with a large number of iterations

By using a very large iteration count in the p2c field, an attacker can cause a DoS on any application that attempts to process this token. Whoever receives and attempts to verify this token will first need to perform 2,147,483,647 PBKDF2 iterations to derive the CEK before they can even verify if the token is valid, costing significant amounts of compute time.

We reported the issue to the go-jose, jose2go, and josekit-rs library maintainers, and it has been fixed by limiting the maximum value usable for p2c in go-jose/go-jose on version 3.0.1 (commit 65351c27657d); on dvsekhvalnov/jose2go on version 1.6.0 (commits a4584e9dd712 and 8e9e0d1c6b39); and on hidekatsu-izuno/josekit-rs on version 0.8.5 (commits 1f3278a33f0e, 8b60bd0ea8ce, and 7e448ce66c1c). square/go-jose remains unfixed, as the library is deprecated, and users are encouraged to migrate to go-jose/go-jose.

Alternatively, the risk can also be mitigated by not relying purely on the token’s alg parameter. After all, if your application does not expect to receive a token using PBES2 or any lesser-used algorithm, there is no reason to try to process one. jose2go allows implementing opt-in stricter validation of alg and enc parameters today, and go-jose’s next major version will require passing a list of acceptable algorithms when processing a token, allowing developers to explicitly list a set of expected algorithms.

KASLR bypass in privilege-less containers

Next is a vulnerability that has been fixed since 2020, but never got a CVE assigned by the Linux kernel maintainers. In the following paragraphs, we’ll go into the details of a previously unknown but fixed KASLR bypass.

Back in 2020, Trail of Bits engineer Dominik Czarnota (aka disconnect3d) discovered a vulnerability in the Linux kernel that could expose internal pointer addresses within unprivileged Docker containers, allowing a malicious actor to bypass Kernel Address Space Layout Randomization (KASLR) for kernel modules.

KASLR is an important defense mechanism in operating systems, primarily used to deter exploit attempts. It is a security technique that randomizes the kernel memory address locations between reboots. On top of that, kernel addresses must be hidden from userspace; otherwise, the mitigation would make no sense, as such kernel address disclosure would effectively bypass the KASLR mitigation.

While there are places where kernel addresses are shown to userspace programs, on many systems they should be available only when the user has the CAP_SYSLOG Linux capability. (Capabilities split root user privileges so it is possible to be the root user, or a user with uid 0, while having a limited set of privileges.) In particular, the manual page for the CAP_SYSLOG capability reads: “View kernel addresses exposed via /proc and other interfaces when /proc/sys/kernel/kptr_restrict has the value 1.” This means that only processes that are executed with the capability CAP_SYSLOG should be able to read kernel addresses.

However, Dominik discovered that this was not the case from within a Docker container where processes that are run from the root user without CAP_SYSLOG were able to observe kernel addresses. By default, Docker containers are unprivileged, which means that root users are restricted in what they can do (e.g., they cannot perform actions that require CAP_SYSLOG). This can also be demonstrated without Docker by using the capsh tool run from the root user to remove the CAP_SYSLOG capability:

The underlying cause of the issue was that the credentials were checked incorrectly. The sysctl toggle kernel.kptr_restrict indicates whether restrictions are placed on exposing kernel addresses: the value “2” means that the addresses are always hidden; “1” means that they are shown only if the user has CAP_SYSLOG; and “0” means that they are always shown. Instead of ensuring that the user had the CAP_SYSLOG capability before showing the addresses, only the value of kptr_restrict was being considered to decide whether to show or hide the addresses. The addresses were always exposed if kptr_restrict was 1, while they should have been hidden if the user did not have CAP_SYSLOG. The issue was fixed in commit b25a7c5af905.

After discovering this vulnerability, we followed a coordinated disclosure process with Docker and Linux kernel security team. Dominik initially notified the Docker team about this, since he thought the vulnerability originated from Docker, and also reported other sysfs filesystem leaks (where other sysfs paths leaked information such as the names of services run outside of the container, other container IDs, and information about devices). The disclosure timeline is provided at the end of this post.

Although we received only silence from Docker despite multiple requests for updates, the Linux community swiftly rectified the issue in the kernel. The KASLR bypass bug fix was backported to various Ubuntu LTS versions, while the other sysfs leaks from Docker were not fixed at all. However, Linux kernel releases before 4.19 are vulnerable to the KASLR bypass. Ubuntu 18, which uses kernel 4.15, is still vulnerable because the fix was not backported.

Disclosure timeline for KASLR bypass in privilege-less containers

  • June 6, 2020: Reported the vulnerability to Docker.
  • June 11, 2020: Docker replied that they would probably block the sysfs paths that leaks information via the “masked paths” feature, and that the memory address disclosure should be reported to the Linux kernel developers.
  • June 11, 2020: Informed the intent to contact [email protected] about the kASLR bypass.
  • June 11 to June 18, 2020: Performed a deeper analysis of the kASLR bypass.
  • June 18, 2020: Reported the bug to [email protected].
  • June 18, 2020: Bug confirmed by Kees Cook.
  • June 19 to June 21, 2020: Kernel developers discuss how to patch the issue.
  • June 30, 2020: Requested an update from Docker.
  • July 3 to July 14, 2020: Patches that fix the issue land in the Linux kernel.
  • July 11, 2020: Requested an update from Docker again about other sysfs leaks, and informed them that the KASLR bypass issue has been fixed in Linux 4.19, 5.4 and 5.7 kernels.
  • December 3, 2020: Requested an update from Docker once again, and informed the intent to disclose the issues publicly. Docker did not reply.

Do you need audits in 2024?

These two vulnerabilities are quite different: The DoS issue relates to parsing and interpreting user input, while the kernel vulnerability is an information leak (strictly speaking, it is an access control vulnerability). These differences affect the detectability of bugs: if you cause a DoS, you’ll likely notice right away because the availability of your service will be compromised. By contrast, if an attacker exploits an access control vulnerability, you probably won’t notice when your service is exploited.

This difference in detectability is important for automated testing. For instance, fuzzing, as showcased in the Trail of Bits Testing Handbook, typically requires the program to crash or hang. Therefore, we mostly find DoS bugs in the memory-safe programs we fuzz. Automatically finding access control bugs through fuzzing is more challenging because it requires the implementation of fuzzing invariants.

Security audits are still indispensable tools for finding vulnerabilities, just like fuzzing is! Our audits integrate fuzzing whenever possible, and we look for opportunities to enforce invariants to catch nasty logic bugs.

Cryptographic design review of Ockam

5 March 2024 at 14:00

By Marc Ilunga, Jim Miller, Fredrik Dahlgren, and Joop van de Pol

In October 2023, Ockam hired Trail of Bits to review the design of its product, a set of protocols that aims to enable secure communication (i.e., end-to-end encrypted and mutually authenticated channels) across various heterogeneous networks. A secure system starts at the design phase, which lays the foundation for secure implementation and deployment, particularly in cryptography, where a secure design can prevent entire vulnerabilities.

In this blog post, we give some insight into our cryptographic design review of Ockam’s protocols, highlight several positive aspects of the initial design, and describe the recommendations we made to further strengthen the system’s security. For anyone considering working with us to improve their design, this blog post also gives a general behind-the-scenes look at our cryptographic design review offerings, including how we use formal modeling to prove that a protocol satisfies certain security properties.

Here is what Ockam’s CTO, Mrinal Wadhwa, had to say about working with Trail of Bits:

Trail of Bits brought tremendous protocol design expertise, careful scrutiny, and attention to detail to our review. In depth and nuanced discussions with them helped us further bolster our confidence in our design choices, improve our documentation, and ensure that we’ve carefully considered all risks to our customers’ data.

Overview of the Ockam system and Ockam Identities

Ockam is a set of protocols and managed infrastructure enabling secure communication. Users may also deploy Ockam on their premises, removing the need to trust Ockam’s infrastructure completely. Our review was based on two use cases of Ockam:

  1. TCP portals: secure TCP communication spanning various networks and traversing NATs
  2. Kafka portals: secure data streaming through Apache Kafka

A key design feature of Ockam is that secure channels are established using an instantiation of the Noise framework’s XX pattern in a way that is agnostic to the networking layer (i.e., the channels can be established for both TCP and Kafka networking, as well as others).

A major component of an Ockam deployment is the concept of Ockam Identities. Identities uniquely identify a node in an Ockam deployment. Each node has a self-generated identifier and an associated primary key pair that is rotated over time. Each rotation is cryptographically attested to with the current and next primary keys, thereby creating a change history. An identity is therefore defined by an identifier and the associated signed change history. The concrete constructions are shown in figure 1.

Diagram of an Ockam Identity showing an example of a signed change history with three blocks

Figure 1: Ockam Identities

Primary keys are not used directly for authentication or session key establishment in the Noise protocol. Rather, they are used to attest to purpose keys used for secure channel establishment and credential issuance. These credentials play a role akin to certificates in traditional PKI systems to enable mutual trust and enforce attribute-based access control policies.

The manual assessment process

We conducted a manual review of the Ockam design specification, including the secure channels, routing and transports, identities, and credentials, focusing on potential cryptographic threats that we see in similar communication protocols. The manual review process identified five issues, mostly related to the insufficient documentation for assumptions and the expected security guarantees. These findings indicate that insufficient information in the specifications, such as threat modeling, may lead Ockam users to make security-critical decisions based on an incomplete understanding of the protocol.

We also raised a few issues related to discrepancies between the specifications and the implementation that we identified from a cursory review of the implementation. Even though the implementation was not in scope for this review, we often find that it serves as a ground truth in cases when the design documentation is unclear and can be interpreted in different ways.

Formal verification with Verifpal and CryptoVerif

In addition to reviewing the Ockam design manually, we used formal modeling tools to verify specific security properties automatically. Our formal modeling efforts primarily focused on Ockam Identities, a critical element of the Ockam system. To achieve comprehensive automated analysis, we used the protocol analyzers Verifpal and CryptoVerif.

Verifpal works in the symbolic model, whereas CryptoVerif works in the computational model, making them a complementary set of tools. Verifpal finds potential high-level attacks against protocols, enabling quick iterations on a protocol until a secure design is found, while CryptoVerif provides more low-level analysis and can more precisely relate the security of the protocol to the cryptographic security guarantees of the individual primitives used in the implementation.

Using Verifpal’s convenient modeling capabilities and built-in primitives, we modeled a (simplified) scenario for Ockam Identities where Alice proves to Bob that she owns the primary key associated with the peer identifier Bob is currently trying to verify. We also modeled a scenario where Bob verifies a new change initiated by Alice.

Modeling the protocol using Verifpal shows that the design of Ockam Identities achieves the expected security guarantees. For a given identifier, only the primary key holder may produce a valid initial change block that binds the public key to the identifier. Any subsequent changes are guaranteed to be generated by an entity holding the previous and current primary keys. Despite the ease of modeling, proving security guarantees with Verifpal requires a few tricks to prevent the tool from identifying trivial or invalid attacks. We discuss these considerations in our comprehensive report.

The current implementation of Ockam Identities can be instantiated with either of two signature schemes, ECDSA or Ed25519, which have different security properties. CryptoVerif highlighted that ECDSA and Ed25519 will not necessarily provide the same security guarantees, depending on what is expected from the protocol. However, this is not explicitly mentioned in the documentation.

Ed25519 is the preferred scheme, but ECDSA is also accepted because it is currently supported by the majority of cloud hardware security modules (HSMs). For the current design of Ockam Identities, ECDSA and Ed25519 theoretically offer the same guarantees. However, future changes to Ockam Identities may require other security guarantees that are provided only by Ed25519.

Occasionally, protocols require stronger properties than what is usually expected from the signature schemes’ properties (see Seems Legit: Automated Analysis of Subtle Attacks on Protocols that Use Signatures). Therefore, from a design perspective, it is desirable that properties expected from a protocol’s building blocks be well understood and explicitly stated.

Our recommendations for strengthening Ockam

Our review did not uncover any issues in the in-scope use cases that would pose an immediate risk to the confidentiality and integrity of data handled by Ockam. But we made several recommendations to strengthen the security of Ockam’s protocols. Our recommendations aim at enabling defense in depth, future-proofing the protocols, improving threat modeling, expanding documentation, and clearly defining the security guarantees of Ockam’s protocols. For example, one of our recommendations describes important considerations for protecting against “store now, decrypt later” attacks from future quantum computers.

We also worked with the Ockam team to flesh out information missing from the specification, such as documenting the exact meaning of certain primary key fields and creating a formal threat model. This information is important to allow Ockam users to make sound decisions when deploying Ockam’s protocols.

Generally, we recommended that Ockam explicitly document the assumptions made about cryptographic protocols and the expected security guarantees of each component of the Ockam system. Doing so will ensure that future development of the protocols builds upon well-understood and explicit assumptions. Good examples of assumptions and expected security guarantees that should be documented are the theoretical issue around ECDSA vs. EdDSA that we identified with CryptoVerif and how using primitives with lower security margins will not significantly impact security.

Ockam’s CTO responded to the above recommendations with the following statement:

We believe that easy to understand and open documentation of Ockam’s protocols and implementation is essential to continuously improve the security and privacy offered by our products. Trail of Bits’ thorough third-party review of our protocol documentation and formal modeling of our protocols has helped make our documentation much more approachable for continuous scrutiny and improvement by our open source community.

Lastly, we strongly recommended an (internal or external) assessment of the Ockam protocols implementation, as a secure design does not imply a secure implementation. Issues in the deployment of a protocol may arise from discrepancies between the design and the implementation, or from specific implementation choices that violate the assumptions in the design.

Security is an ongoing process

At the start of the assessment, we observed that the Ockam design follows best practices, such as using robust primitives that are well accepted in the industry (e.g., the Noise XX protocol with AES-GCM and ChachaPoly1305 as AEADs and with Ed25519 and ECDSA for signatures). Furthermore, the design reflects that Ockam considered many aspects of the system’s security and reliability, including, for instance, various relevant threat models and the root of trust for identities. Moreover, by open-sourcing its implementation and publishing the assessment result, the Ockam team creates a transparent environment and invites further scrutiny from the community.

Our review identified some areas for improvement, and we provided recommendations to strengthen the security of the product, which already stands on a good foundation. You can find more detailed information about the assessment, our findings, and our recommendations in the comprehensive report.

This project also demonstrates that security is an ongoing process, and including security considerations early in the design phase establishes a strong footing that the implementation can safely rely on. But it is always necessary to continuously work on improving the system’s security posture while responding adequately to newer threats. Assessing the design and the implementation are two of the most crucial steps in ensuring a system’s security.

Please contact us if you want to work with our cryptography team to help improve your design—we’d love to work with you!

Relishing new Fickling features for securing ML systems

4 March 2024 at 14:00

By Suha S. Hussain

We’ve added new features to Fickling to offer enhanced threat detection and analysis across a broad spectrum of machine learning (ML) workflows. Fickling is a decompiler, static analyzer, and bytecode rewriter for the Python pickle module that can help you detect, analyze, or create malicious pickle files.

While the ML community has seen the rise of safer serialization methods such as the safetensors file format, the security risk posed by the prevalence of pickle is far from resolved. The persistent widespread adoption of pickle in the ML ecosystem allows ML model files to be attack vectors for backdoors, ransomware, reverse shells, and other malicious payloads, making it important that we effectively identify and mitigate this issue.

To that end, we’ve added the following new features:

  1. Modular analysis API: Generate detailed results analyzing pickle files for malicious behaviors, with convenient JSON outputs.
  2. PyTorch module: Statically analyze and inject code into PyTorch files.
  3. Polyglot module: Differentiate, identify, and create polyglots for the different PyTorch file formats.

Pickle overlaying Python code snippet for the fickling tool

ICYMI: To our knowledge, Fickling was the first pickle security tool tailored for ML use cases. Our original blog post detailed why ML pickle files are exploitable and how Fickling specifically addresses this issue. We highlighted that Fickling is safe to run on potentially malicious files because it symbolically executes code using its own implementation of the Pickle Machine (PM). This enables Fickling to be used and deployed by incident response and ML infrastructure engineers to integrate novel ML threat detection and analysis into their pipelines. For instance, Fickling has been used to analyze malicious ML models found in the wild.

Modular analysis API

Malicious pickle files can incorporate obfuscation mechanisms to bypass direct scanning. However, Fickling facilitates a thorough analysis of such files by performing static analysis on the decompiled representations using its modular analysis API.

This API offers a detailed, systematic approach that dissects the analysis into specific categories of malicious behavior so that it’s easy to determine how and why a file was flagged. This makes Fickling an effective tool for inspecting and evaluating model artifacts whether you want to examine a model before using it in a project or investigate artifacts post-compromise.

The analysis is encapsulated in an easy-to-use JSON output format, accessible from both the CLI and Python API. The output details the severity of the file, provides a rationale for its assessment, and pinpoints specific analysis classes that were triggered, along with any relevant artifacts. This unified output format improves the usability of the modular analysis API, making it easy to customize and integrate the detection process across different tools and workflows.

Take, for example, the output from a sample malicious pickle file, generated by Fickling’s Numpy PoC (figure 1):

  • The severity field indicates that Fickling has labeled this file LIKELY_OVERTLY_MALICIOUS.
  • The analysis field explains why: Fickling detected both an unsafe import and an unused variable. The former is a much stronger determinant of severity than the latter. However, not only does detecting the unused variable provide more insight into the use of the unsafe import, but including more granular elements in analysis is especially useful for artifacts that are designed to evade detection.
  • The detailed_results field, expanding upon the analysis field, clearly indicates that the UnsafeImports and UnusedVariables analysis classes were triggered by this file and includes the artifact that triggered both classes. This information can help users make informed decisions based on Fickling’s analysis.
{
    "severity": "LIKELY_OVERTLY_MALICIOUS",
    "analysis": "`from posix import system` is suspicious and indicative of an overtly malicious pickle file. Variable `_var0` is assigned value `system(...)` but unused afterward; this is suspicious and indicative of a malicious pickle file",
    "detailed_results": {
        "AnalysisResult": {
            "UnsafeImports": "from posix import system",
            "UnusedVariables": [
                "_var0",
                "system(...)"
            ]
        }
    }
}

Figure 1: The JSON output of Fickling’s analysis of the malicious pickle file from numpy_poc.py

PyTorch module

PyTorch, one of the most popular frameworks for ML, is an integral component of ML workflows. This framework is dependent on pickle, which makes Fickling an excellent choice for carrying the torch. Fickling’s PyTorch module can help you dill with these files. More concretely, this module extends Fickling’s decompilation, static analysis, and injection capabilities to PyTorch files so you can apply the modular analysis API and other features. This broadens Fickling’s capacity to assess the impact of pickles in production systems.

In figure 2, we demonstrate how this PyTorch module can be used. An ML model saved as a PyTorch file is transformed and serialized into a malicious file using Fickling. This example illustrates just one of the many use cases made possible by this module—injections.

import torch
import torchvision.models as models

from fickling.pytorch import PyTorchModelWrapper

# Load example PyTorch model
model = models.mobilenet_v2()
torch.save(model, "mobilenet.pth")

# Wrap model file into fickling
result = PyTorchModelWrapper("mobilenet.pth")

# Inject payload, overwriting the existing file instead of creating a new one
temp_filename = "temp_filename.pt"
result.inject_payload(
    "print('!!!!!!Never trust a pickle!!!!!!')",
    temp_filename,
    injection="insertion",
    overwrite=True,
)

# Load file with injected payload
# This outputs “!!!!!!Never trust a pickle!!!!!!”. 
torch.load("mobilenet.pth")

Figure 2: Fickling injects arbitrary code into a PyTorch model file.

Polyglot module

What are PyTorch files?

Before we dive into the Polyglot module, let’s talk a bit more about PyTorch files. PyTorch files encompass multiple different file formats. It is a common misconception, however, that a PyTorch file refers to only one specific file format. Improper differentiation between formats hampers detection and analysis efforts and aids exploits that use these files. Fickling can differentiate these formats so that they can be effectively analyzed when used in real-world deployments.

Fickling can identify the following file formats:

  1. PyTorch v0.1.1: Tar file with the sys_info, pickle, storages, and tensors directories
  2. PyTorch v0.1.10: Stacked pickle files
  3. TorchScript v1.0: ZIP file with the model.json file
  4. TorchScript v1.1: ZIP file with the model.json and attributes.pkl files (one pickle file)
  5. TorchScript v1.3: ZIP file with the data.pkl and constants.pkl files (two pickle files)
  6. TorchScript v1.4: ZIP file with the data.pkl, constants.pkl, and version files set at 2 or higher (two pickle files)
  7. PyTorch v1.3: ZIP file containing data.pkl (one pickle file)
  8. PyTorch model archive format [ZIP]: ZIP file that includes Python code files and pickle files

This list is subject to change and we’re continually adding more file formats as needed. If you’re interested in exploring the space of ML file formats beyond PyTorch files, check out our comprehensive list of ML file formats.

The PyTorch file formats differ both in structure and in the contexts where they appear. The Polyglot module’s file format identification feature can help you ensure that the correct files are being used in the correct contexts:

  • The torch.load function parses PyTorch v1.3, TorchScript v1.4, PyTorch v0.1.10, and PyTorch v0.1.1 files. The PyTorch v1.3 file format is the most common format of these and is typically deemed the canonical file format.
  • Meanwhile, TorchServe systems rely on the PyTorch model archive format.
  • Deprecated file formats such as TorchScript v1.1 are deliberately included in Fickling because these formats can still be compatible with external parsers and potentially exploitable.

Figure 3 showcases how Fickling can identify different PyTorch file formats. We used torch.save to serialize a PyTorch model as a PyTorch v1.3 file and a PyTorch v0.1.10 file. Fickling can clearly distinguish these two different formats.

> import torch
> import torchvision.models as models
> import fickling.polyglot as polyglot
> model = models.mobilenet_v2()
> torch.save(model, "mobilenet.pth")
> polyglot.identify_pytorch_file_format("mobilenet.pth", print_results=True)
Your file is most likely of this format:  PyTorch v1.3 
> torch.save(model, "legacy_mobilenet.pth", _use_new_zipfile_serialization=False)
> polyglot.identify_pytorch_file_format("legacy_mobilenet.pth",
print_results=True)
Your file is most likely of this format:  PyTorch v0.1.10

Figure 3: Fickling distinguishes between a PyTorch v1.3 file and a PyTorch v0.1.10 file.

Polyglots? In my PyTorch? It’s more likely than you think

Polyglot files are files that can be validly interpreted as more than one file format. They have been used to bypass code-signing checks and distribute malware, among many other unwanted behaviors. You can learn more about polyglot files and other byproducts of unruly parsers in our blog post on PolyFile and PolyTracker. Fickling’s identification of PyTorch file formats is polyglot-aware because you can make polyglots between these files. This raises the question: Why should we care about polyglots for ML model files?

Polyglot ML model files can bypass checks in ML tools and infiltrate model hubs to mislead consumers of that model. Specifically, in the context of ML model files, polyglot files can be a vector for backdoored ML models. You can construct a polyglot file so that it is a benign model when parsed as one file format but a backdoored model when parsed as another file format. During our audit of safetensors, a now resolved finding allowed us to create multiple polyglots with safetensors files (fun fact: the report itself is a PDF/ZIP polyglot with the ZIP file containing the polyglots from the audit).

It’s important that this threat can be identified whether you’re analyzing a model artifact post-compromise for polyglottery, or building strict, well-defined parsers for MLOps tools that deal with model files. Broadly, Fickling’s Polyglot module can help us begin to determine the potential impact of polyglot files on the ML ecosystem.

Fickling also supports the creation of these polyglot files for testing and demonstration. For instance, we can use Fickling to make a file that can be validly interpreted as both a PyTorch v0.1.10 file and a PyTorch model archive (MAR) file.

Since pickle is a streaming format that stops parsing as soon as it reaches the STOP opcode, we can append arbitrary data to a pickle file without disrupting valid parsing. In a similar vein, many ZIP parsers don’t enforce the specified magic to start at offset 0, which allows us to prepend data to a ZIP file while preserving valid parsing. These two capabilities, when combined, allow us to construct a file that is both a valid pickle file and a valid ZIP file—a pickle/ZIP polyglot!

Recall that a PyTorch v0.1.10 file is composed of stacked pickles. The PyTorch MAR parser is one of many ZIP parsers that accepts files with prepended data. This means that we can build on the pickle/ZIP polyglot to make a PyTorch v0.1.10 / PyTorch MAR polyglot by appending the MAR file to the PyTorch v0.1.10 file. This process is captured in Fickling, as shown in this example:

> import fickling.polyglot as polyglot 
> polyglot.create_polyglot("mar_example.mar","legacy_example.pt") 
Making a PyTorch v0.1.10/PyTorch MAR polyglot
The polyglot is contained in polyglot.mar.pt

Figure 4: Fickling creates a PyTorch v0.1.10 / PyTorch MAR polyglot.

The resulting file can be accurately identified using Fickling, as shown below:

> import fickling.polyglot as polyglot 
> polyglot.identify_pytorch_file_format('polyglot.mar.pt',print_results=True)
Your file is most likely of this format: PyTorch v0.1.10
It is also possible that your file can be validly interpreted as: [‘PyTorch model archive format’]

Figure 5: Fickling identifies a PyTorch v0.1.10 / PyTorch MAR polyglot.

Contribute to Fickling

We are actively maintaining and adding new capabilities to Fickling, including new injection methods, analysis classes, and polyglot combinations. We want Fickling to be a usable tool for both offensive and defensive security, so we invite you to share your feedback by raising an issue on our GitHub or reaching out directly on our Contact us page.

Beyond Fickling

While Fickling can help you identify threats to ML systems caused by malicious pickle files, we recommend moving away from pickle entirely. Restricted unpicklers may seem useful, but they are not a foolproof solution. To help the ecosystem move forward from pickles, we’ve audited a safer alternative, safetensors; reported pickle vulnerabilities in open-source codebases; and written Semgrep rules to catch instances of pickling under the hood in ML libraries.

We’re dedicated to improving the overall security and integrity of the ML ecosystem. Keep an eye out for upcoming blog posts on securing ML systems.

How we applied advanced fuzzing techniques to cURL

1 March 2024 at 14:30

By Shaun Mirani

Near the end of 2022, Trail of Bits was hired by the Open Source Technology Improvement Fund (OSTIF) to perform a security assessment of the cURL file transfer command-line utility and its library, libcurl. The scope of our engagement included a code review, a threat model, and the subject of this blog post: an engineering effort to analyze and improve cURL’s fuzzing code.

We’ll discuss several elements of this process, including how we identified important areas of the codebase lacking coverage, and then modified the fuzzing code to hit these missed areas. For example, by setting certain libcurl options during fuzzer initialization and introducing new seed files, we doubled the line coverage of the HTTP Strict Transport Security (HSTS) handling code and quintupled it for the Alt-Svc header. We also expanded the set of fuzzed protocols to include WebSocket and enabled the fuzzing of many new libcurl options. We’ll conclude this post by explaining some more sophisticated fuzzing techniques the cURL team could adopt to increase coverage even further, bring fuzzing to the cURL command line, and reduce inefficiencies intrinsic to the current test case format.

How is cURL fuzzed?

OSS-Fuzz, a free service provided by Google for open-source projects, serves as the continuous fuzzing infrastructure for cURL. It supports C/C++, Rust, Go, Python, and Java codebases, and uses the coverage-guided libFuzzer, AFL++, and Honggfuzz fuzzing engines. OSS-Fuzz adopted cURL on July 1, 2017, and the incorporated code lives in the curl-fuzzer repository on GitHub, which was our focus for this part of the engagement.

The repository contains the code (setup scripts, test case generators, harnesses, etc.) and corpora (the sets of initial test cases) needed to fuzz cURL and libcurl. It’s designed to fuzz individual targets, which are protocols supported by libcurl, such as HTTP(S), WebSocket, and FTP. curl-fuzzer downloads the latest copy of cURL and its dependencies, compiles them, and builds binaries for these targets against them.

Each target takes a specially structured input file, processes it using the appropriate calls to libcurl, and exits. Associated with each target is a corpus directory that contains interesting seed files for the protocol to be fuzzed. These files are structured using a custom type-length-value (TLV) format that encodes not only the raw protocol data, but also specific fields and metadata for the protocol. For example, the fuzzer for the HTTP protocol includes options for the version of the protocol, custom headers, and whether libcurl should follow redirects.

First impressions: HSTS and Alt-Svc

We’d been tasked with analyzing and improving the fuzzer’s coverage of libcurl, the library providing curl’s internals. The obvious first question that came to mind was: what does the current coverage look like? To answer this, we wanted to peek at the latest coverage data given in the reports periodically generated by OSS-Fuzz. After some poking around at the URL for the publicly accessible oss-fuzz-coverage Google Cloud Storage bucket, we were able to find the coverage reports for cURL (for future reference, you can get there through the OSS-Fuzz introspector page). Here’s a report from September 28, 2022, at the start of our engagement.

Reading the report, we quickly noticed that several source files were receiving almost no coverage, including some files that implemented security features or were responsible for handling untrusted data. For instance, hsts.c, which provides functions for parsing and handling the Strict-Transport-Security response header, had only 4.46% line coverage, 18.75% function coverage, and 2.56% region coverage after over five years on OSS-Fuzz:

The file responsible for processing the Alt-Svc response header, altsvc.c, was similarly coverage-deficient:

An investigation of the fuzzing code revealed why these numbers were so low. The first problem was that the corpora directory was missing test cases that included the Strict-Transport-Security and Alt-Svc headers, which meant there was no way for the fuzzer to quickly jump into testing these regions of the codebase for bugs; it would have to use coverage feedback to construct these test cases by itself, which is usually a slow(er) process.

The second issue was that the fuzzer never set the CURLOPT_HSTS option, which instructs libcurl to use an HSTS cache file. As a result, HSTS was never enabled during runs of the fuzzer, and most code paths in hsts.c were never hit.

The final impediment to achieving good coverage of HSTS was an issue with its specification, which tells user agents to ignore the Strict-Transport-Security header when sent over unencrypted HTTP. However, this creates a problem in the context of fuzzing: from the perspective of our fuzzing target, which never stood up an actual TLS connection, every connection was unencrypted, and Strict-Transport-Security was always ignored. For Alt-Svc, libcurl already included a workaround to relax the HTTPS requirement for debug builds when a certain environment variable was set (although curl-fuzzer did not set this variable). So, resolving this issue was just a matter of adding a similar feature for HSTS to libcurl and ensuring that curl-fuzzer set all necessary environment variables.

Our changes to address these issues were as follows:

  1. We added seed files for Strict-Transport-Security and Alt-Svc to curl-fuzzer (ee7fad2).
  2. We enabled CURLOPT_HSTS in curl-fuzzer (0dc42e4).
  3. We added a check to allow debug builds of libcurl to bypass the HTTPS restriction for HSTS when the CURL_HSTS_HTTP environment variable is set, and we set the CURL_HSTS_HTTP and CURL_ALTSVC_HTTP environment variables in curl-fuzzer (6efb6b1 and 937597c).

The day after our changes were merged upstream, OSS-Fuzz reported a significant bump in coverage for both files:

A little over a year of fuzzing later (on January 29, 2024), our three fixes had doubled the line coverage for hsts.c and nearly quintupled it for altsvc.c:

Sowing the seeds of bugs

Exploring curl-fuzzer further, we saw a number of other opportunities to boost coverage. One low-hanging fruit we spotted was the set of seed files found in the corpora directory. While libcurl supports numerous protocols (some of which surprised us!) and features, not all of them were represented as seed files in the corpora. This is important: as we alluded to earlier, a comprehensive set of initial test cases, touching on as much major functionality as possible, acts as a shortcut to attaining coverage and significantly cuts down on the time spent fuzzing before bugs are found.

The functionality we created new seed files for, with the hope of promoting new coverage, included (ee7fad2):

  • CURLOPT_LOGIN_OPTIONS: Sets protocol-specific login options for IMAP, LDAP, POP3, and SMTP
  • CURLOPT_XOAUTH2_BEARER: Specifies an OAuth 2.0 Bearer Access Token to use with HTTP, IMAP, LDAP, POP3, and SMTP servers
  • CURLOPT_USERPWD: Specifies a username and password to use for authentication
  • CURLOPT_USERAGENT: Specifies the value of the User-Agent header
  • CURLOPT_SSH_HOST_PUBLIC_KEY_SHA256: Sets the expected SHA256 hash of the remote server for an SSH connection
  • CURLOPT_HTTPPOST: Sets POST request data. curl-fuzzer had been using only the CURLOPT_MIMEPOST option to achieve this, while the similar but deprecated CURLOPT_HTTPPOST option wasn’t exercised. We also added support for this older method.

Certain other CURLOPTs, as with CURLOPT_HSTS in the previous section, made more sense to set globally in the fuzzer’s initialization function. These included:

  • CURLOPT_COOKIEFILE: Points to a filename to read cookies from. It also enables fuzzing of the cookie engine, which parses cookies from responses and includes them in future requests.
  • CURLOPT_COOKIEJAR: Allows fuzzing the code responsible for saving in-memory cookies to a file
  • CURLOPT_CRLFILE: Specifies the certificate revocation list file to read for TLS connections

Where to go from here

As we started to understand more about curl-fuzzer’s internals, we drew up several strategic recommendations to improve the fuzzer’s efficacy that the timeline of our engagement didn’t allow us to implement ourselves. We presented these recommendations to the cURL team in our final report, and expand on a few of them below.

Dictionaries

Dictionaries are a feature of libFuzzer that can be especially useful for the text-based protocols spoken by libcurl. The dictionary for a protocol is a file enumerating the strings that are interesting in the context of the protocol, such as keywords, delimiters, and escape characters. Providing a dictionary to libFuzzer may increase its search speed and lead to the faster discovery of new bugs.

curl-fuzzer already takes advantage of this feature for the HTTP target, but currently supplies no dictionaries for the numerous other protocols supported by libcurl. We recommend that the cURL team create dictionaries for these protocols to boost the fuzzer’s speed. This may be a good use case for an LLM; ChatGPT can generate a starting point dictionary in response to the following prompt (replace with the name of the target protocol):

A dictionary can be used to guide the fuzzer. A dictionary is passed as a file to the fuzzer. The simplest input accepted by libFuzzer is an ASCII text file where each line consists of a quoted string. Strings can contain escaped byte sequences like "\xF7\xF8". Optionally, a key-value pair can be used like hex_value="\xF7\xF8" for documentation purposes. Comments are supported by starting a line with #. Write me an example dictionary file for a <PROTOCOL> parser.

argv fuzzing

During our first engagement with curl, one of us joked, “Have we tried curl AAAAAAAAAA… yet?” There turned out to be a lot of wisdom behind this quip; it spurred us to fuzz curl’s command-line interface (CLI), which yielded multiple vulnerabilities (see our blog post, cURL audit: How a joke led to significant findings).

This CLI fuzzing was performed using AFL++’s argv-fuzz-inl.h header file. The header defines macros that allow a target program to build the argv array containing command-line arguments from fuzzer-provided data on standard input. We recommend that the cURL team use this feature from AFL++ to continuously fuzz cURL’s CLI (implementation details can be found in the blog post linked above).

Structure-aware fuzzing

One of curl-fuzzer’s weaknesses is intrinsic to the way it currently structures its inputs, which is with a custom Type-length-value (TLV) format. A TLV scheme (or something similar) can be useful for fuzzing a project like libcurl, which supports a wealth of global and protocol-specific options and parameters that need to be encoded in test cases.

However, the brittleness of this binary format makes the fuzzer inefficient. This is because libFuzzer has no idea about the structure that inputs are supposed to adhere to. curl-fuzzer expects input data in a strict format: a 2-byte field for the record type (of which only 52 were valid at the time of our engagement), a 4-byte field for the length of the data, and finally the data itself. Because libFuzzer doesn’t take this format into account, most of the mutations it generates wind up being invalid at the TLV-unpacking stage and have to be thrown out. Google’s fuzzing guidance warns about using TLV inputs for this reason.

As a result, the coverage feedback used to guide mutations toward interesting code paths performs much worse than it would if we dealt only with raw data. In fact, libcurl may contain bugs that will never be found with the current naive TLV strategy.

So, how can the cURL team address this issue while keeping the flexibility of a TLV format? Enter structure-aware fuzzing.

The idea with structure-aware fuzzing is to assist libFuzzer by writing a custom mutator. At a high level, the custom mutator’s job comprises just three steps:

  1. Try to unpack the input data coming from libFuzzer as a TLV.
  2. If the data can’t be parsed into a valid TLV, instead of throwing it away, return a syntactically correct dummy TLV. This can be anything, as long as it can be successfully unpacked.
  3. If the data does constitute a valid TLV, mutate the fields parsed out in step 1 by calling the LLVMFuzzerMutate function. Then, serialize the mutated fields and return the resultant TLV.

With this approach, no time is wasted discarding inputs because every input is valid; the mutator only ever creates correctly structured TLVs. Performing mutations at the level of the decoded data (rather than at the level of the encoding scheme) allows better coverage feedback, which leads to a faster and more effective fuzzer.

An open issue on curl-fuzzer proposes several changes, including an implementation of structure-aware fuzzing, but there hasn’t been any movement on it since 2019. We strongly recommend that the cURL team revisit the subject, as it has the potential to significantly improve the fuzzer’s ability to find bugs.

Our 2023 follow-up

At the end of 2023, we had the chance to revisit cURL and its fuzzing code in another audit supported by OSTIF. Stay tuned for the highlights of our follow-up work in a future blog post.

When try, try, try again leads to out-of-order execution bugs

1 March 2024 at 12:00

By Troy Sargent

Have you ever wondered how a rollup and its base chain—the chain that the rollup commits state checkpoints to—communicate and interact? How can a user with funds only on the base chain interact with contracts on the rollup?

In Arbitrum Nitro, one way to call a method on a contract deployed on the rollup from the base chain is by using retryable transactions (a.k.a. retryable tickets). While this feature enables these interactions, it does not come without its pitfalls. During our reviews of Arbitrum and contracts integrating with it, we identified footguns in the use of retryable tickets that are not widely known and should be considered when creating such transactions. In this post, we’ll share how using retryable tickets may allow unexpected race conditions and result in out-of-order execution bugs. What’s more, we’ve created a new Slither detector for this issue. Now you’ll be able to not only recognize these footguns in your code, but test for them too.

Retryable tickets

In Arbitrum Nitro, retryable tickets facilitate communication between the Ethereum mainnet, or Layer 1 (L1), and the Arbitrum Nitro rollup, or Layer 2 (L2). To create retryable tickets, users can call createRetryableTicket on the L1 Inbox contract of the Arbitrum rollup, as shown in the code snippet below. When retryable tickets are created and queued, ArbOS will attempt to automatically “redeem” them by executing them one after another on L2.

/**
 * @notice Put a message in the L2 inbox that can be reexecuted for some fixed amount of time if it reverts
 * @dev all msg.value will deposited to callValueRefundAddress on L2
 * @dev Gas limit and maxFeePerGas should not be set to 1 as that is used to trigger the RetryableData error
 * @param to destination L2 contract address
 * @param l2CallValue call value for retryable L2 message
 * @param maxSubmissionCost Max gas deducted from user's L2 balance to cover base submission fee
 * @param excessFeeRefundAddress gasLimit x maxFeePerGas - execution cost gets credited here on L2 balance
 * @param callValueRefundAddress l2Callvalue gets credited here on L2 if retryable txn times out or gets cancelled
 * @param gasLimit Max gas deducted from user's L2 balance to cover L2 execution. Should not be set to 1 (magic value used to trigger the RetryableData error)
 * @param maxFeePerGas price bid for L2 execution. Should not be set to 1 (magic value used to trigger the RetryableData error)
 * @param data ABI encoded data of L2 message
 * @return unique message number of the retryable transaction
 */
function createRetryableTicket(
    address to,
    uint256 l2CallValue,
    uint256 maxSubmissionCost,
    address excessFeeRefundAddress,
    address callValueRefundAddress,
    uint256 gasLimit,
    uint256 maxFeePerGas,
    bytes calldata data
) external payable returns (uint256);

The createRetryableTicket function interface

Assuming the gas costs are covered by the sender and no failures occur, the transactions will be executed sequentially, and the final state results from applying transaction B immediately following transaction A.

Figure 1: The happy path is when the transactions are all executed in order.

Wait, what does “retryable” mean?

Because any transaction may fail (e.g., the L2 gas price rises significantly following the creation of a transaction, and the user has insufficient gas to cover the new cost), Arbitrum created these types of transactions so that users can “retry” them by supplying additional gas. Failing retryable tickets will be persisted in memory and may be re-executed by any user who manually calls the redeem method of the ArbRetryableTx precompiled contract, sponsoring the gas costs. A retryable ticket that fails is different from a normal transaction that reverts, in that it does not require a new transaction to be signed to be executed again.

Additionally, retryable tickets in memory can be redeemed up to one week after they are created. A retryable ticket’s lifetime can be extended for another week by paying an additional fee for storing it; otherwise, it will be discarded after its expiration date.

Where things go wrong

While these types of transactions are useful—in that they facilitate L2-to-L1 communication and allow users to retry their transactions if failures occur—they come with pitfalls, risks that users and developers may not be aware of. Specifically, retryable tickets are expected to execute in the order they are submitted, but this is not always guaranteed to happen.

In scenario 1, both transactions A and B fail and enter the memory region. The state of the application is left unchanged.

Consider the three scenarios below in which two retryable tickets are created within the same transaction.

Figure 2: Two retryable tickets are created in the same transaction, but both fail and enter the memory region.

However, anyone can manually redeem transaction B before transaction A, which means that the transactions will be executed out of order unexpectedly.

Figure 3: Anyone can manually redeem transactions in the memory region out of order.

In scenario 2, transaction A fails and enters the memory region, but transaction B succeeds. Once again, the transactions are executed out of order (i.e., transaction A is not executed at all), and the final state is not what was expected.

Figure 4: Only transaction B is included in the final state.

In scenario 3, transaction A succeeds, but transaction B does not. That means transaction B must be re-executed manually. Transactions can be created more than once, which means that a second set of transactions A and B could be submitted before the first transaction B is re-executed. If developers of a protocol using the Arbitrum rollup system don’t account for the possibility that the protocol could receive a second transaction A prior to transaction B’s success, the protocol may not handle this case correctly.

Figure 5: Only transaction A is included in the final state.

The out-of-order execution vulnerability

In light of these scenarios, developers should consider that transactions may execute out of order. For instance, if the second transaction in a queue relies on completion of the first, but it executes before the first executes due to an insufficient gas failure, it may revert or not work correctly. It’s important that the callee, or message recipient, on the rollup can robustly handle situations such as the receipt of transactions in a different order than they were created and smaller subsets of transactions due to failures. If a protocol does not anticipate cases of reorderings and failures of retryable tickets, the protocol could break or be hacked.

Let’s consider the following L2 contract, which users can call to claim rewards based on some staked tokens. When they decide to unstake their tokens, any rewards that they haven’t yet claimed are lost:

function claim_rewards(address user) public onlyFromL1 {
    // rewards is computed based on balance and staking period
    uint unclaimed_rewards = _compute_and_update_rewards(user);
    token.safeTransfer(user, unclaimed_rewards);
}


// Call claim_rewards before unstaking, otherwise you lose your rewards
function unstake(address user) public onlyFromL1 {
    _free_rewards(user); // clean up rewards related variables
    balance = balance[user];
    balance[user] = 0;
    staked_token.safeTransfer(user, balance);
}

Users can submit retryable tickets for such operations with the following logic in the L1 handler:

// Retryable A
IInbox(inbox).createRetryableTicket({
    to: l2contract,
    l2CallValue: 0,
    maxSubmissionCost: maxSubmissionCost,
    excessFeeRefundAddress: msg.sender,
    callValueRefundAddress: msg.sender,
    gasLimit: gasLimit,
    maxFeePerGas: maxFeePerGas,
    data: abi.encodeCall(l2contract.claim_rewards, (msg.sender))
});
// Retryable B
IInbox(inbox).createRetryableTicket({
    to: l2contract,
    l2CallValue: 0,
    maxSubmissionCost: maxSubmissionCost,
    excessFeeRefundAddress: msg.sender,
    callValueRefundAddress: msg.sender,
    gasLimit: gasLimit,
    maxFeePerGas: maxFeePerGas,
    data: abi.encodeCall(l2contract.unstake, (msg.sender))
});

Here it is expected that claim_rewards will be called before unstake. However, as we’ve seen, the claim_rewards transaction is not guaranteed to execute before the unstake transaction. As covered in scenario 1 and shown in figure 3, an attacker can make it so that unstake is executed before claim_rewards if both transactions fail, causing the user to lose their rewards. It’s also possible that only the second transaction, unstake, succeeds, as shown in scenario 2.

To mitigate such risks, it’s essential to design protocols in a way that retryable tickets have an independent ordering, where the success of each transaction does not depend on the order or outcome of others. How independent ordering is implemented depends on the protocol and the given operations. In this example, claim_rewards could be called within unstake.

Slither to the rescue

As security researchers, we always try to find ways to automatically find these sorts of issues and flag them early in the development cycle, such as during code review. To that end, we’ve written a Slither detector that will flag functions that create multiple retryable tickets via the Arbitrum Nitro Inbox contract to alert developers of this pitfall. Following its release, you can use this detector by installing Slither and running the following command in the root of a Solidity project: python3 -m pip install slither-analzyer==0.10.1 && slither . –detect out-of-order-retryable. On our example contract, Slither provides the following diagnostic:

Multiple retryable tickets created in the same function:
         -IInbox(inbox).createRetryableTicket({to:address(l2contract),l2CallValue:0,maxSubmissionCost:maxSubmissionCost,excessFeeRefundAddress:msg.sender,callValueRefundAddress:msg.sender,gasLimit:gasLimit,maxFeePerGas:maxFeePerGas,data:abi.encodeCall(l2contract.claim_rewards,(msg.sender))}) (out_of_order_retryable.sol#25-34)
         -IInbox(inbox).createRetryableTicket({to:address(l2contract),l2CallValue:0,maxSubmissionCost:maxSubmissionCost,excessFeeRefundAddress:msg.sender,callValueRefundAddress:msg.sender,gasLimit:gasLimit,maxFeePerGas:maxFeePerGas,data:abi.encodeCall(l2contract.unstake,(msg.sender))}) (out_of_order_retryable.sol#36-45)
Reference: https://github.com/crytic/slither/wiki/Detector-Documentation#out-of-order-retryable-transactions
INFO:Slither:out_of_order_retryable.sol analyzed (3 contracts with 1 detectors), 1 result(s) found

Conclusion

If you are developing a protocol that uses retryable tickets, ensure that your protocol is equipped to handle the scenarios we’ve outlined here. Specifically, the use of retryable tickets shouldn’t rely on their order or on successful execution. You can spot potential out-of-order execution bugs using our new Slither detector!

If your application interacts with Arbitrum Nitro components or you’re building software that features rollup–base chain communication, contact us to see how we help.

Our response to the US Army’s RFI on developing AIBOM tools

28 February 2024 at 16:30

By Michael Brown and Adelin Travers

The US Army’s Program Executive Office for Intelligence, Electronic Warfare and Sensors (PEO IEW&S) recently issued a request for information (RFI) on methods to implement and automate production of an artificial intelligence bill of materials (AIBOM) as part of Project Linchpin. The RFI describes the AIBOM as a detailed list of the components necessary to build, train, validate, and configure AI models and their supply chain relationships. As with the software bill of materials (SBOM) concept, the goal of the AIBOM concept is to allow providers and consumers of AI models to effectively address supply chain vulnerabilities. In this blog post, we summarize our response, which includes our recommendations for improving the concept, ensuring AI model security, and effectively implementing an AIBOM tool.

Background details and initial impressions

While the US Army is leading research efforts into adopting this technology, our responses to this RFI could be useful to any organization that is using AI/ML models and is looking to assess the security of these models, their components and architecture, and their supply chains.

Project Linchpin is a PEO IEW&S initiative to create an operational pipeline for developing and deploying AI/ML capabilities to intelligence, cyber, and electronic warfare systems. The proposed AIBOM concept for Project Linchpin will detail the components and supply chain relationships involved in creating AI/ML models and will be used to assess such models for vulnerabilities. As currently proposed, the US Army’s AIBOM concept includes the following:

  1. An SBOM detailing the components used to build and validate the given AI model
  2. A component for detailing the model’s properties, architecture, training data, hyperparameters, and intended use
  3. A component for detailing the lineage and pedigree of the data used to create the model

AIBOMs are a natural extension of bill of materials (BOM) concepts used to document and audit the software and hardware components that make up complex systems. As AI/ML models become more widespread, developing effective AIBOM tools presents an opportunity to proactively ensure the security and performance of such systems before they become pervasive. However, we argue that the currently proposed AIBOM concept has drawbacks that need to be addressed to ensure that the unique aspects of AI/ML systems are taken into account when implementing AIBOM tools.

Pros and cons of the AIBOM concept

An AIBOM would be ideal for enumerating an AI/ML model’s components that SBOM tools would miss, such as raw data sets, interfaces to ML frameworks and traditional software, and AI/ML model types, hyperparameters, algorithms, and loss functions. However, the proposed AIBOM concept has some significant shortcomings.

First, it would not be able to provide a complete security audit of the given AI model because certain aspects of model training and usage cannot be captured statically; the AIBOM tool would have to be complemented by other security auditing approaches. (We cover these approaches in more detail in the next section.) Second, the proposed concept does not account for AI/ML-specific hardware components via a hardware bill of materials (HBOM). Like the rest of the ML supply chain, specialized hardware components that are commonly used in deployed AI/ML systems like GPUs may have unique vulnerabilities like data leakage and should thus be captured by the AIBOM.

Additionally, the AIBOM tool would miss important AI/ML-specific downstream system dependencies and supply chain paradigms like machine-learning-as-a-service prediction APIs (common with LLMs). For instance, an AI model provider may be subject to attack vectors that would be difficult or impossible to detect, such as poisoning of web-scale training datasets and “sleeper agents” within LLMs.

Ensuring AI/ML model security

Many aspects of AI/ML model training and use cannot be captured statically and thus would limit the proposed AIBOM concept’s ability to provide a complete security audit. For example, it would not capture whether attackers had control over the order in which a model ingests training data, a potential data poisoning attack vector. To ensure that the given AI model has strong supply chain security, the AIBOM concept should be complemented by other security techniques, such as data cleaning/normalization tools, anomaly detection and integrity checks, and verification of training and inference environment configurations.

Additionally, we recommend extending the AIBOM concept to account for data and model transformation components in AI/ML models. For example, the AIBOM concept should be able to obtain detailed information about the data labels and labeling procedure in use, data transformation procedures in the model pipeline, model construction process, and infrastructure security configuration for the data pipeline. Capturing such items could help detect and address vulnerabilities in the AI/ML model supply chain.

Implementing the AIBOM concept

There are several barriers to building effective and automated tools to conduct AIBOM-based security audits today. First, a robust database of weaknesses and vulnerabilities specific to AI/ML models and their data pipelines (e.g., model hyperparameters, data transformation procedures) is sorely needed. Proposed databases do not provide a strong definition of what an AI/ML vulnerability is and thus do not provide the ground truth needed for security auditing. This database should define a unique abstraction for AI/ML weaknesses and enforce a machine-readable format so that the abstraction can be used as a data source for AIBOM security auditing.

AIBOM tools must be used during the data collection/transformation and model configuration/creation stages of the AI/ML operations pipeline. In many cases (e.g., tools built on ChatGPT), these stages may be controlled by a third party. We advocate for third-party AI/ML-as-a-service providers to adopt transparent, open-source principles for their models to help ensure the safety and security of tools built using their platforms.

Finally, further research and development is needed to create tools for automatically tracing data lineage and provenance. Security and safety concerns with advanced AI/ML models have started to highlight the need for such capabilities, but practical tools are still years away.

Once these key research problems are solved, we anticipate that implementing AIBOM tools and auditing programs will require similar effort to implementing SBOM tools and programs. There will, however, be several key differences that will require specialized knowledge and skills. Today’s developers, security engineers, and IT teams will need to upskill in technical domains such as data science, data management, and AI/ML-specific frameworks and hardware.

Final thoughts

We’re excited to continue discussing and developing techniques and automated tools that support high-fidelity AIBOM-based security auditing. We plan to continue engaging with the community and invite you to read our full response for more details.

Circomspect has been integrated into the Sindri CLI

26 February 2024 at 14:00

By Jim Miller

Our tool Circomspect is now integrated into the Sindri command-line interface (CLI)! We designed Circomspect to help developers build Circom circuits more securely, particularly given the limited tooling support available for this novel programming framework. Integrating this tool into a development environment like that provided by Sindri is a significant step toward more widespread use of Circomspect and thus better support for developers writing Circom circuits.

Developing zero-knowledge proof circuits is a difficult task. Even putting aside technical complexities, running non-trivial circuits for platforms like Circom is extremely computationally intensive: running basic tests can take several minutes (or longer), which could massively increase development time. Sindri aims to help alleviate this problem by giving users access to dedicated hardware that significantly accelerates the execution of these circuits. Their simple API and CLI tool allows developers to integrate their circuits with this dedicated hardware without having to manage any of their own infrastructure.

Stasia Carson, the CEO of Sindri Labs, had this to say about the announcement:

Our ongoing focus with the Sindri CLI is to make it more generally and widely useful for circuit developers independent of whether or not they use the Sindri service. The key to this is a unified cross-framework interface over tools for static analysis, linting, compiling, and proving coupled with installation-free tool distribution using optimized Docker containers. Circomspect is a crucial tool for developing secure Circom circuits, and honestly probably the best such tool across all of the frameworks, so we see it as one of the most vital integrations.

Being integrated into the Sindri CLI is an important step for Circomspect. With now even more users, we plan to extend Circomspect with more analysis ideas, which we will reveal throughout the year. Stay tuned to our blog for future updates about Circomspect and zero-knowledge circuit development generally!

Continuously fuzzing Python C extensions

23 February 2024 at 14:30

By Matt Schwager

Deserializing, decoding, and processing untrusted input are telltale signs that your project would benefit from fuzzing. Yes, even Python projects. Fuzzing helps reduce bugs in high-assurance software developed in all programming languages. Fortunately for the Python ecosystem, Google has released Atheris, a coverage-guided fuzzer for both pure Python code and Python C extensions. When it comes to Python projects, Atheris is really the only game in town if you’re looking for a mature fuzzer. Fuzzing pure Python code typically uncovers unexpected exceptions, which can ultimately lead to denial of service. Fuzzing Python C extensions may uncover memory errors, data races, undefined behavior, and other classes of bugs. Side effects include: memory corruption, remote code execution, and, more generally, all the headaches we’ve come to know and love about C. This post will focus on fuzzing Python C extensions.

We’ll walk you through using Atheris to fuzz Python C extensions, adding a Python project to OSS-Fuzz, and setting up continuous fuzzing through OSS-Fuzz’s integrated CIFuzz tool. OSS-Fuzz is Google’s continuous fuzzing service for open-source projects, making it a valuable tool for open-source developers; as of August 2023, it has helped find and fix over 10,000 vulnerabilities and 36,000 bugs. We will target the cbor2 Python library in our fuzzing campaign. This library is the perfect target because it performs serialization and deserialization of a JSON-like, binary format and has an optional C extension implementation for improved performance. Additionally, Concise Binary Object Representation (CBOR) is used heavily within the blockchain community, which tends to have high assurance and security requirements.

In the end, we found multiple memory corruption bugs in cbor2 that could become security vulnerabilities under the right circumstances.

Fuzzing Python C extensions

Under the hood, Atheris uses libFuzzer to perform its fuzzing. Since libFuzzer is built on top of LLVM and Clang, we will need a Clang installation to fuzz our target. To simplify the installation process, I wrote a Dockerfile to package up all the necessary components into a single Docker image. This creates a repeatable process for fuzzing the current target and an easily extensible artifact for fuzzing future targets. The resulting Docker image includes a Python fuzzing harness to initiate the fuzzing process.

First, we’ll discuss some interesting parts of this Dockerfile, then we’ll investigate the fuzz.py fuzzing harness, and finally we’ll build and run the Docker image and find some memory corruption bugs!

Fuzzing environment

Dockerfiles are a great way to create a self-documenting, reproducible environment. Since fuzzing can often be more art than science, this section will also include some discussion on interesting and non-obvious bits in the Dockerfile. The following Dockerfile was used to fuzz cbor2:

FROM debian:12-slim

RUN apt update && apt install -y \
    git \
    python3-full \
    python3-pip \
    wget \
    xz-utils \
    && rm -rf /var/lib/apt/lists/*

RUN python3 --version

ENV APP_DIR "/app"
ENV CLANG_DIR "$APP_DIR/clang"
RUN mkdir $APP_DIR
RUN mkdir $CLANG_DIR
WORKDIR $APP_DIR

ENV VIRTUAL_ENV "/opt/venv"
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH "$VIRTUAL_ENV/bin:$PATH"

ARG CLANG_URL=https://github.com/llvm/llvm-project/releases/download/llvmorg-17.0.6/clang+llvm-17.0.6-aarch64-linux-gnu.tar.xz
ARG CLANG_CHECKSUM=6dd62762285326f223f40b8e4f2864b5c372de3f7de0731cb7cd55ca5287b75a

ENV CLANG_FILE clang.tar.xz
RUN wget -q -O $CLANG_FILE $CLANG_URL && \
    echo "$CLANG_CHECKSUM  $CLANG_FILE" | sha256sum -c - && \
    tar xf $CLANG_FILE -C $CLANG_DIR --strip-components 1 && \
    rm $CLANG_FILE

# https://github.com/google/atheris#building-from-source
RUN LIBFUZZER_LIB=$($CLANG_DIR/bin/clang -print-file-name=libclang_rt.fuzzer_no_main.a) \
    python3 -m pip install --no-binary atheris atheris

# https://github.com/google/atheris/blob/master/native_extension_fuzzing.md#step-1-compiling-your-extension
ENV CC "$CLANG_DIR/bin/clang"
ENV CFLAGS "-fsanitize=address,undefined,fuzzer-no-link"
ENV CXX "$CLANG_DIR/bin/clang++"
ENV CXXFLAGS "-fsanitize=address,undefined,fuzzer-no-link"
ENV LDSHARED "$CLANG_DIR/bin/clang -shared"

ARG BRANCH=master

# https://github.com/agronholm/cbor2
ENV CBOR2_BUILD_C_EXTENSION "1"
RUN git clone --branch $BRANCH https://github.com/agronholm/cbor2.git
RUN python3 -m pip install cbor2/

# Allow Atheris to find fuzzer sanitizer shared libs
# https://github.com/google/atheris/blob/master/native_extension_fuzzing.md#option-a-sanitizerlibfuzzer-preloads
ENV LD_PRELOAD "$VIRTUAL_ENV/lib/python3.11/site-packages/asan_with_fuzzer.so"

# Subject to change by upstream, but it's just a sanity check
RUN nm $(python3 -c "import _cbor2; print(_cbor2.__file__)") | grep asan \
    && echo "Found ASAN" \
    || echo "Missing ASAN"

# 1. Skip allocation failures and memory leaks for now, they are common, and low impact (DoS)
# 2. https://github.com/google/atheris/blob/master/native_extension_fuzzing.md#leak-detection
# 3. Provide the symbolizer to turn virtual addresses to file/line locations
ENV ASAN_OPTIONS "allocator_may_return_null=1,detect_leaks=0,external_symbolizer_path=$CLANG_DIR/bin/llvm-symbolizer"

COPY fuzz.py fuzz.py

ENTRYPOINT ["python3", "fuzz.py"]
CMD ["-help=1"]

The following bits of the Dockerfile are relevant for customizations or future projects and are worth discussing further:

  1. Installing Clang from the llvm-project repository
  2. Customizing the image at build-time using Docker build arguments (e.g., ARG)
  3. Installing the cbor2 project
  4. Sanity checking the compiled cbor2 C extension for AddressSanitizer (ASan) symbols using nm
  5. Using ASAN_OPTIONS to customize the fuzzing process

First, installing Clang from the llvm-project repository:

ENV APP_DIR "/app"
ENV CLANG_DIR "$APP_DIR/clang"
...
RUN mkdir $CLANG_DIR
...
ARG CLANG_URL=https://github.com/llvm/llvm-project/releases/download/llvmorg-17.0.6/clang+llvm-17.0.6-aarch64-linux-gnu.tar.xz
ARG CLANG_CHECKSUM=6dd62762285326f223f40b8e4f2864b5c372de3f7de0731cb7cd55ca5287b75a
...
ENV CLANG_FILE clang.tar.xz
RUN wget -q -O $CLANG_FILE $CLANG_URL && \
    echo "$CLANG_CHECKSUM  $CLANG_FILE" | sha256sum -c - && \
    tar xf $CLANG_FILE -C $CLANG_DIR --strip-components 1 && \
    rm $CLANG_FILE

This code installs the 17.0.6-aarch64-linux-gnu tarball of Clang. There is nothing particularly special about this tarball other than the fact that it is built for AArch64 and Linux. If you are running this Docker container on a different architecture, you will need to use the corresponding release tarball. You can then specify the CLANG_URL and CLANG_CHECKSUM build arguments as necessary or simply modify the Dockerfile according to your system’s requirements.

The Dockerfile also provides a BRANCH build argument. This allows the builder to specify a Git branch or tag that they would like to fuzz against. For example, if you’re working on a pull request and want to fuzz its corresponding branch, you can use this build argument to do so.

Next up, installing the cbor2 project:

ENV CBOR2_BUILD_C_EXTENSION "1"
RUN git clone --branch $BRANCH https://github.com/agronholm/cbor2.git
RUN python3 -m pip install cbor2/

This installs the cbor2 package from GitHub rather than from PyPI. This is necessary because we need to compile the underlying C extension. We could install the package from the PyPI source distribution, but using Git provides us more control over which branch, tag, or commit we install.

The CBOR2_BUILD_C_EXTENSION environment variable instructs setup.py to ensure the C extension is built:

 30    cpython = platform.python_implementation() == "CPython"
 31    windows = sys.platform.startswith("win")
 32    use_c_ext = os.environ.get("CBOR2_BUILD_C_EXTENSION", None)
 33    if use_c_ext == "1":
 34        build_c_ext = True
 35    elif use_c_ext == "0":
 36        build_c_ext = False
 37    else:
 38        build_c_ext = cpython and (windows or check_libc())

The environment flag for building the C extension (setup.py#30–38)

This is a common pattern for Python packages with C extensions. Investigating a project’s setup.py is a great way to better understand how a C extension is built. For more information, see the setuptools documentation on building extension modules.

On to sanity checking the compiled C extension:

RUN nm $(python3 -c "import _cbor2; print(_cbor2.__file__)") | grep asan \
    && echo "Found ASAN" \
    || echo "Missing ASAN"

This command searches the compiled C extension symbol table for ASan symbols. If they exist, then we know the C extension was compiled correctly. It is interesting to note that the __file__ attribute also works for shared objects in Python and thus enables this check:

$ python3 -c "import _cbor2; print(_cbor2.__file__)"
/opt/venv/lib/python3.11/site-packages/_cbor2.cpython-311-aarch64-linux-gnu.so

Finally, let’s dig into ASAN_OPTIONS:

ENV ASAN_OPTIONS "allocator_may_return_null=1,detect_leaks=0,external_symbolizer_path=$CLANG_DIR/bin/llvm-symbolizer"

We are specifying three options:

  1. allocator_may_return_null=1: We’re disabling this check because fuzzing runs were producing Python MemoryError exceptions. We’re only looking for C memory corruption bugs, not Python exceptions.
  2. detect_leaks=0: This option is recommended by the Atheris documentation.
  3. external_symbolizer_path=$CLANG_DIR/bin/llvm-symbolizer: This enables the LLVM symbolizer to turn virtual addresses to file/line locations in fuzzing output.

You can find the full list of ASan sanitizer flags and common sanitizer options in Google’s sanitizers repository.

Fuzzing harness

The fuzzing harness used for cbor2 was largely inspired by the harness used by ujson in Google’s oss-fuzz repository. There are hundreds of projects being fuzzed in this repository. Reading through their fuzzing harnesses is a great way to gather ideas for your fuzzing project.

The following is the Python code used as the fuzzing harness:

#!/usr/bin/python3

import sys
import atheris

# _cbor2 ensures the C library is imported
from _cbor2 import loads

def test_one_input(data: bytes):
    try:
        loads(data)
    except Exception:
        # We're searching for memory corruption, not Python exceptions
        pass

def main():
    atheris.Setup(sys.argv, test_one_input)
    atheris.Fuzz()

if __name__ == "__main__":
    main()

Remember, we are fuzzing only the C extension, not the Python code. Two features of the harness enable that behavior: importing _cbor2 instead of cbor2, and the try/except block around the loads call. Looking again at setup.py, we see that _cbor2 is the Python module name for the C extension:

 47    if build_c_ext:
 48        _cbor2 = Extension(
 49            "_cbor2",
 50            # math.h routines are built-in to MSVCRT
 51            libraries=["m"] if not windows else [],
 52            extra_compile_args=["-std=c99"] + gnu_flag,
 53            sources=[
 54                "source/module.c",
 55                "source/encoder.c",
 56                "source/decoder.c",
 57                "source/tags.c",
 58                "source/halffloat.c",
 59            ],
 60            optional=True,
 61        )
 62        kwargs = {"ext_modules": [_cbor2]}
 63    else:
 64        kwargs = {}

The _cbor2 Python module name (setup.py#47–64)

That is how we know to import _cbor2 instead of cbor2. In addition to the import, the try/except block effectively ignores crashes caused by Python exceptions.

With the fuzzing environment provided by the Docker image and the fuzzing harness provided by the Python code, we are ready to do some fuzzing!

Running the fuzzer

First, copy the Dockerfile and Python code to files named Dockerfile and fuzz.py, respectively. You can then build the Docker image with the following command:

$ docker build --build-arg BRANCH=5.5.1 -t cbor2-fuzz -f Dockerfile

Note that the APT packages and Clang installation require large downloads, so the build may take a while. Since version 5.5.1 was the latest cbor2 release when these bugs were found, we are building against that Git tag to reproduce the crashes. When the build is done, you can start the fuzzing process with the following command:

$ docker run -v $(pwd):/tmp/output/ cbor2-fuzz -artifact_prefix=/tmp/output/

Specifying /tmp/output as both a Docker volume and the libFuzzer artifact_prefix will cause any crash output files to persist to the host’s filesystem rather than the container’s ephemeral filesystem. See the libFuzzer options documentation for more information on flags that can be passed at runtime.

Running the fuzzer should quickly produce the following crash:

/usr/include/python3.11/object.h:537:15: runtime error: member access within null pointer of type 'PyObject' (aka 'struct _object')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /usr/include/python3.11/object.h:537:15 in 
AddressSanitizer:DEADLYSIGNAL
=================================================================
==1==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0xffff921a94b4 bp 0xffffe8dc8ce0 sp 0xffffe8dc8ca0 T0)
==1==The signal is caused by a READ memory access.
==1==Hint: address points to the zero page.
    #0 0xffff921a94b4 in Py_DECREF /usr/include/python3.11/object.h:537:9
    #1 0xffff921a94b4 in decode_definite_string /app/cbor2/source/decoder.c:653:9
    #2 0xffff921a94b4 in decode_string /app/cbor2/source/decoder.c:718:15
    #3 0xffff921a5cc8 in decode /app/cbor2/source/decoder.c:1735:27
    #4 0xffff921b1d98 in CBORDecoder_decode_stringref_ns /app/cbor2/source/decoder.c:1456:15
    #5 0xffff921ab90c in decode_semantic /app/cbor2/source/decoder.c:973:31
    #6 0xffff921a5d48 in decode /app/cbor2/source/decoder.c:1738:27
    #7 0xffff921aac90 in decode_map /app/cbor2/source/decoder.c:909:27
    #8 0xffff921a5d28 in decode /app/cbor2/source/decoder.c:1737:27
    #9 0xffff921d4e28 in CBOR2_load /app/cbor2/source/module.c:318:19
    #10 0xffff921d4e28 in CBOR2_loads /app/cbor2/source/module.c:367:19
    ...
==1==ABORTING
MS: 1 ChangeByte-; base unit: 096adbe21e6ccdcdaf3b466eae0eecc042a4ce48
0xa9,0xd9,0x1,0x0,0x67,0x0,0xfa,0xfa,0x0,0x0,0x4,0x4,
\251\331\001\000g\000\372\372\000\000\004\004
artifact_prefix='/tmp/output/'; Test unit written to /tmp/output/crash-092ce4a82026ba5ca35d4ee4ef5c9ba41623d61d
Base64: qdkBAGcA+voAAAQE

The output gives us the full stack trace and a crash file to reproduce the issue:

$ python -m cbor2.tool -p crash-092ce4a82026ba5ca35d4ee4ef5c9ba41623d61d 
Segmentation fault: 11

The crash happens in the Py_DECREF call in decode_definite_string:

 640    PyObject *ret = NULL;
 641    char *buf;
 642    
 643    buf = PyMem_Malloc(length);
 644    if (!buf)
 645       return PyErr_NoMemory();
 646    
 647    if (fp_read(self, buf, length) == 0)
 648       ret = PyUnicode_DecodeUTF8(
 649               buf, length, PyBytes_AS_STRING(self->str_errors));
 650    PyMem_Free(buf);
 651    
 652    if (string_namespace_add(self, ret, length) == -1) {
 653       Py_DECREF(ret);
 654       return NULL;
 655    }
 656    return ret;

The Py_DECREF call (source/decoder.c#640–656)

A NULL pointer dereference in the Python standard library produces the crash. Since the Py_DECREF documentation states that the passed object must not be NULL, the cbor2 developers fixed this bug by adding code that will detect a NULL pointer and return an error before Py_DECREF is reached.

Integrating a project into OSS-Fuzz

Google created OSS-Fuzz to improve the state of security for open-source projects. The service describes itself as “… a free service that runs fuzzers for open source projects and privately alerts developers to the bugs detected.” Integrating a project into OSS-Fuzz is a straightforward process. However, be aware that acceptance into OSS-Fuzz is ultimately at the discretion of the OSS-Fuzz team. There is no guarantee that a project will be accepted. OSS-Fuzz gives each new project proposal a criticality score and uses this value to determine if a project should be accepted.

Integrating a project into OSS-Fuzz requires four files:

  1. project.yaml: This file contains metadata about your project like contact information, repository location, programming language, and fuzzing engine.
  2. Dockerfile: This file clones your project and copies any necessary fuzzing resources like corpora or dictionaries into a Docker image. OSS-Fuzz will then run the Docker image as part of the fuzzing process.
  3. build.sh: This file installs your project and any of its dependencies into the Docker image fuzzing environment.
  4. A fuzzing harness file: This initiates the fuzzing process against a target. For example, to fuzz a specific Python function, the harness would be a Python script that initializes the fuzzing process with the target function.

If you would like to learn more about any of these files and their respective options, see the OSS-Fuzz documentation on setting up a new project. Once your project has been accepted to OSS-Fuzz, you will be granted access to the ClusterFuzz web interface, which provides access to crashes, coverage information, and fuzzer statistics. OSS-Fuzz will then fuzz your project in the background and notify you when it produces findings.

As part of our work fuzzing the cbor2 project, we integrated it into OSS-Fuzz in this pull request: google/oss-fuzz#11444. cbor2 will now be continuously fuzzed for bugs as development proceeds. To get a better idea of what this looks like in practice, see the cbor2 project in OSS-Fuzz.

Continuous fuzzing with CIFuzz

There’s continuous, and then there’s continuous. OSS-Fuzz fuzzes your project about once a day. If you need something more continuous than that, like, say, on every commit, then you will have to reach for another tool. Fortunately, Google and the OSS-Fuzz ecosystem have you covered with CIFuzz. CIFuzz integrates into the OSS-Fuzz ecosystem to fuzz your project on every commit. It does require a project to already be accepted and integrated in OSS-Fuzz, but non-OSS-Fuzz projects can use ClusterFuzzLite.

To take our cbor2 fuzzing one step further, we added a CIFuzz job to the project’s GitHub Actions. This will fuzz the project on every commit and every pull request. Using OSS-Fuzz and CIFuzz allows for both faster fuzz feedback on proposed changes and deeper fuzz testing as part of a scheduled nightly job. The best of both worlds. Think of it like the testing pyramid: unit tests are fast and run on every commit, whereas end-to-end tests are slow and may be run only as part of a lengthier, nightly CI job.

Once your project is integrated into OSS-Fuzz, adding CIFuzz is as simple as adding a GitHub Actions workflow to your project. This workflow file specifies similar metadata as the project’s project.yaml file, information like the project programming language, libFuzzer sanitizers to use, and fuzzing duration.

You may be asking yourself, “how long should I be fuzzing my project for?” The answer often ends up being more art than science. CIFuzz’s default duration is 600 seconds, or 10 minutes. This is a great starting point. In this situation, bigger is not always better. Remember, you could be waiting for this job to complete on every commit. How long would you and your teammates like to wait for a CI job? A good rule of thumb is that continuous fuzzing on every commit should be run for minutes, not hours or days, and that scheduled, nightly fuzzing should be run for hours, or even days. Start with something reasonable and be prepared to tweak it as necessary.

As part of our work fuzzing the cbor2 project, we added a CIFuzz workflow in this pull request: agronholm/cbor2#212. This should complement the scheduled OSS-Fuzz job nicely.

Build your own trophy case with fuzzing

Fuzzing is a great testing methodology for uncovering hard-to-find bugs and security vulnerabilities. It is particularly useful for projects performing decoding or deserialization functionality or taking in untrusted input. It has a proven track record, considering AFL’s extensive trophy case, rust-fuzz’s trophy case, and OSS-Fuzz’s claim of over 10,000 security vulnerabilities and 36,000 bugs found. Fuzzing is an advanced testing methodology, so it is not the first tool you should reach for when looking to improve your project’s robustness, but it is unquestionably a useful tool when you are looking to go to the next level.

In this post, we walked you through setting up a fuzzing environment and harness for Python C extensions and then went over the process of integrating a project into OSS-Fuzz and adding a CIFuzz GitHub Actions workflow. In the end, we found some interesting memory corruption bugs in the cbor2 Python library and made the open-source software community a little bit more secure.

If you’d like to read more about our work on fuzzing, we have used its capabilities in several ways, such as fuzzing x86_64 instruction decoders, breaking the Solidity compiler with a fuzzer, and fuzzing wolfSSL with tlspuffin.

Contact us if you’re interested in custom fuzzing for your project.

Breaking the shared key in threshold signature schemes

20 February 2024 at 14:30

By Fredrik Dahlgren

Today we are disclosing a denial-of-service vulnerability that affects the Pedersen distributed key generation (DKG) phase of a number of threshold signature scheme implementations based on the Frost, DMZ21, GG20, and GG18 protocols. The vulnerability allows a single malicious participant to surreptitiously raise the threshold required to reconstruct the shared key, which could cause signatures generated using the shared key to be invalid.

We first became aware of this vulnerability on a client engagement with Chainflip last year. When we reviewed Chainflip’s implementation of the Frost threshold signature scheme, we noticed it was doing something unusual—something that we had never seen before. Usually, these kinds of observations are an indication that there is a weakness or vulnerability in the codebase, but in this case, Chainflip’s defensive coding practices actually ended up protecting its implementation from a vulnerability. By being extra cautious, Chainflip also avoided introducing a vulnerability into the codebase that could be used by a single party to break the shared key created during the key-generation phase of the protocol. When we realized this, we became curious if other implementations were vulnerable to this issue. This started a long investigation that resulted in ten separate vulnerability disclosures.

What is the Pedersen DKG protocol?

The vulnerability is actually very easy to understand, but to be able to explain it we need to go through some of the mathy details behind the Pedersen DKG protocol. Don’t worry—if you understand what a polynomial is, you should be fine, and if you’ve heard about Shamir’s secret sharing before, you’re most of the way there already.

The Pedersen DKG protocol is based on Feldman’s verifiable secret sharing (VSS) scheme, which is an extension of Shamir’s secret sharing scheme. Shamir’s scheme allows n parties to share a key that can then be reconstructed by t + 1 parties. (Here, we assume that the group has somehow agreed on the threshold t and group size n in advance.) Shamir’s scheme assumes a trusted dealer and is not suitable for multi-party computation schemes where participants may be compromised and act maliciously. This is where Feldman’s VSS scheme comes in. Building on Shamir’s scheme, it allows participants to verify that shares are generated honestly.

Let G be a commutative group where the discrete logarithm problem is hard, and let g be a generator of G. In a (t, n)-Feldman VSS context, the dealer generates a random degree t polynomial p(x) = a0 + a1 x + … + at xt, where a0 represents the original secret to be shared. She then computes the individual secret shares as s1 = p(1), s2 = p(2), …, sn = p(n). This part is exactly identical to Shamir’s scheme. To allow other participants to verify their secret shares, the dealer publishes the values A0 = ga0, A1 = ga1, …, At = gat. Participants can then use the coefficient commitments (A0, Ai, …, At) to verify their secret share si by recomputing p(i) “in the exponent” as follows:

  • Compute V = gp(i) = g s i.
  • Compute V’ = gp(i) = ga0 + a1 i + … + at i t = ∏k (gak) i k = ∏k Aki k.
  • Check that V = V’.

As in Shamir’s secret sharing, the secret s = a0 can be recovered with t + 1 shares using Lagrange interpolation.

In Feldman’s VSS scheme, the shared secret is known to the dealer. To generate a shared key that is unknown to all participants of the protocol, the Pedersen DKG protocol essentially runs n instances of Feldman’s VSS schemes in parallel. The result is a (t, n)-Shamir’s secret sharing of a value that is unknown to all participants: each participant Pi starts by generating a random polynomial pi(x) = ai,0 + ai,1 x + … + ai,t xt of degree t. She publishes the coefficient commitments (Ai,0 = gai,0, Ai,1 = gai,1, …, Ai,t = gai,t) and then sends the secret share si, j = pi(j) to Pj. (Note that the index j must start at 1, otherwise Pi ends up revealing her secret value ai,0 = pi (0).) Pj can check that the secret share si, j was computed correctly by computing V and V’ as above and checking that they agree. To obtain their secret share sj, each participant Pj simply sums the secret shares obtained from the other participants. That is, they compute their secret share as

sj = s1, j + s2, j + … + sn, j = p1(j) + p2(j) + … + pn(j)

Notice that if we define p(x) as the polynomial p(x) = p1(x) + p2(x) + … + pn(x), it is easy to see that what we obtain in the end is a Shamir’s secret sharing of the constant term of p(x), s = p(0) = a1, 0 + a2, 0 + … + an, 0. Since the degree of each polynomial pi(x) is t, the degree of p(x) is also t, and we can recover the secret s with t + 1 shares using Lagrange interpolation as before.

(There are a few more considerations that need to be made when implementing the Pedersen DKG protocol, but they are not relevant here. For more detail, refer to any of the papers linked in the introduction section.)

Moving the goalposts in the Pedersen DKG

Now, we are ready to come back to the engagement with Chainflip that started all of this. While reviewing Chainflip’s implementation of the Frost signature scheme, we noticed that the implementation was summing the commitments for the highest coefficient A1,t + A1,t + … + An,t and checking if the result was equal to the identity element in G, which would mean that the highest coefficients of the resulting polynomial p(x) was 0. This is clearly undesirable since it would allow fewer than t + 1 participants to recover the shared key, but the probability of this happening is cryptographically negligible (even with actively malicious participants). By checking, Chainflip reduced this probability to 0.

This made us wonder, what would happen if a participant used a polynomial pi(x) of a different degree than t in the Pedersen DKG protocol? In particular, what would happen if a participant used a polynomial pi(x) of degree T greater than t? Since p(x) is equal to the sum p1(x) + p2(x) + … + pn(x), the degree of p(x) would then be T rather than t, meaning that the signing protocol would require T + 1 rather than t + 1 participants to complete successfully. If this change were not detected by other participants, it would allow any of the participants to surreptitiously render the shared key unusable by choosing a threshold that was strictly greater than the total number of participants. If the DKG protocol were used to generate a shared key as part of a threshold signature scheme (like one of the schemes referenced in the introduction), any attempt to sign a message with t + 1 participants would fail. Depending on the implementation, this could also cause the system to misattribute malicious behavior to honest participants when the failure is detected. More seriously, this attack could also be used to render the shared key unusable and unrecoverable in most key-resharing schemes based on Feldman’s VSS. This includes the key resharing schemes described in CGGMP21 and earlier versions of Lindell22. In this case, the shared key may already control large sums of money or tokens, which would then be irrevocably lost.

Clearly, this type of malicious behavior could be prevented by simply checking the length of the coefficient commitment vector (Ai,0, Ai,1, …, Ai,T) published by each participant and aborting if any of the lengths is found to be different from t + 1. It turned out that Chainflip already checked for this, but we were curious if other implementations did as well. All in all, we found ten implementations that were vulnerable to this attack in the sense that they allowed a single participant to raise the threshold of the shared key generated using the Pedersen DKG without detection. (We did not find any vulnerable implementations of key-resharing schemes.)

Disclosure process

We reached out to the maintainers of the following vulnerable codebases on January 3, 2024:

Seven of the maintainers responded to acknowledge that they had received the disclosure. Four of those maintainers (Chelsea Komlo, Jesse Possner, Safeheron, and the ZCash Foundation) also reported that they either already have, or are planning to resolve the issue.

We reached out again to the three unresponsive maintainers (Toposware, Trust Machines, and LatticeX) on February 7, 2024. Following this, Toposware also responded to acknowledge that they had received our disclosure.

A few notes on AWS Nitro Enclaves: Images and attestation

16 February 2024 at 14:30

By Paweł Płatek (GrosQuildu)

AWS Nitro Enclaves are locked-down virtual machines with support for attestation. They are Trusted Execution Environments (TEEs), similar to Intel SGX, making them useful for running highly security-critical code.

However, the AWS Nitro Enclaves platform lacks thorough documentation and mature tooling. So we decided to do some deep research into it to fill in some of the documentation gaps and, most importantly, to find security footguns and offer some advice for avoiding them.

This blog post focuses specifically on enclave images and the attestation process.

First, here’s a tl;dr on our recommendations to avoid security footguns while building and signing an enclave:

Running an enclave

To run an enclave, use SSH to connect to an AWS EC2 instance and use the nitro-cli tool to do the following:

  1. Build an enclave image from a Docker image and a few pre-compiled files.
  • Docker is used to create an archive of files for the enclave’s user space.
  • The pre-compiled binaries are described later in this blog post.
  • Start the enclave from the enclave image.
  • The enclave image is a binary blob in the enclave image file (EIF) format.

    Figure 1: The flow of building an enclave

    This is what’s happening under the hood when an enclave is started:

    1. Memory and CPUs are freed from the EC2 instance and reserved for the enclave.
    2. The EIF is copied to the newly reserved memory.
    3. The EC2 instance asks the Nitro Hypervisor to start the enclave.

    The Nitro Hypervisor is responsible for securing the enclave (e.g., clearing memory before it’s returned to the EC2 instance). The enclave is attached to its parent EC2 instance and cannot be moved between EC2 instances. All of the code that is executed inside the enclave is provided in the EIF. So what does the EIF look like?

    The EIF format

    The best “specification” for the EIF format that we have is the code in the aws-nitro-enclaves-image-format repo. The EIF format is rather simple: a header and an array of sections. Each section is a header and a binary blob.

    Figure 2: The header and sections of an EIF

    The CRC32 checksum is computed over the header (minus 4 bytes reserved for the checksum itself) and all of the sections (including the headers).

    There are five types of EIF sections:

    Section type Format Description
    Kernel Binary A bzImage file
    Cmdline String The boot command line for the kernel
    Metadata JSON The build information, such as the kernel configuration and the Cargo and Docker versions used
    Ramdisk cpio The bootstrap ramfs, which includes the NSM driver and init file
    The user space ramfs, which includes files from the Docker image
    Signature CBOR A vector of tuples in the form (certificate, signature)

    So with an EIF, we have all that’s needed to run a VM: a kernel image, a command line for it, bootstrap binaries (the NSM driver and init executable), and a user space filesystem.

    But where does this data come from, and can you trust it?

    Who do you trust?

    Before we get into the details, you should know that there are quite a few implicit trust relationships involved in the data that flows into an EIF when it is created. For that reason, it is important to verify how data gets into your EIF images.

    To verify dataflows into an EIF image, we need to look into the enclave_build package that is used by the nitro-cli tool.

    A kernel image (which is a bzImage file), the init executable, and the NSM driver are pre-compiled and stored in the /usr/share/nitro_enclaves/blobs/ folder (on an EC2 instance). They are pulled to the instance when the aws-nitro-enclaves-cli-devel package is installed.

    Figure 3: Part of the Nitro Enclaves CLI installation documentation

    The pre-compiled binaries of the kernel image, the init executable, and the NSM driver are generated by the code in the aws-nitro-enclaves-sdk-bootstrap repo, according to the repo’s README (though we have no way to verify this claim). That code does the following:

    The binaries can also be found in the aws-nitro-enclaves-cli repo. We can compare SHA-384 hashes of the pre-compiled binaries from the three sources—the EC2 instance, the aws-nitro-enclaves-cli repo, and those generated by the aws-nitro-enclaves-sdk-bootstrap repo (for nitro-cli version 1.2.2):

    In the EC2 instance In aws-nitro-enclaves-cli Built with aws-nitro-enclaves-sdk-bootstrap
    Kernel 127b32...9821c4 127b32...9821c4 4b3719...016c58
    Kernel config e9704c...7d9d35 e9704c...7d9d35 9e634d...663f99
    Cmdline cefb92...ab0b0f cefb92...ab0b0f N/A
    init 7680fd...a435bb e23a90...4272ea 601ec5...d4b25e
    NSM driver 2357cb...8192c 993d1f...657b50 96d0df...4f5306
    linuxkit 31ed3c...035664 581ddc...2ee024 N/A

    The kernel source code is obtained securely and the hashes are consistent. A manually built kernel has a different hash than that of the pre-compiled kernel probably because its configuration is different. We can manually verify the kernel’s configuration and boot command line, so their hashes are not so important.

    Interestingly, the hashes of the init and the NSM driver are completely off. To ensure that these executables were not maliciously modified, we would have to build them from the source code and debug the differences between the freshly built and pre-compiled versions (with a tool like GDB or Ghidra). Alternatively, we have to trust that the pre-compiled files are safe to use.

    Next, there are the ramdisk sections, which are simply cpio archives that store binary files. There are at least two ramdisks in every EIF:

    • The first ramdisk contains the init executable and the NSM driver.
    • The second ramdisk is created from the Docker image provided to the nitro-cli command.
      • It stores a command that init uses to pivot (in the .cmd file), environment variables (in the .env file), and all files from the Docker image (in the rootfs/ directory).
      • The command and environment variables are parsed from the Dockerfile.

    To construct cpio archives for ramdisks, the nitro-cli tool uses the linuxkit tool, which is downloaded along with the other pre-compiled files. AWS uses “a slightly modified” version of the tool (that’s why the hashes don’t match). linuxkit downloads the Docker image and extracts files from it, trying to make identical, reproducible copies of them. Notably, nitro-cli uses version 0.8 of linuxkit, which is outdated.

    Figure 4: A depiction of how an EIF is created

    Here’s how nitro-cli gets the Docker image used to build an EIF:

    1. nitro-cli builds the image locally if the --docker-dir command line option is provided.
    2. Otherwise, nitro-cli checks if the image is locally available.
    3. If it’s not, then it pulls the image using the shiplift library and credentials from a local file.
    4. linuxkit also tries to use locally available images; if images are not locally available, it pulls them from a remote registry using credentials obtained through the docker login command.

    Producing enclaves from Docker files in a reproducible, transparent, and easy-to-audit way is tricky—you can read more about that fact in Artur Cygan’s “Enhancing trust for SGX enclaves” blog post. When building EIFs, you should at least make sure that nitro-cli uses the right image. To do so, consult the Docker build logs (as Docker images and the daemon do not store information about image origin).

    What do you attest?

    The main feature of AWS Nitro Enclaves is cryptographic attestation. A running enclave can ask the Nitro Hypervisor to compute (measure) hashes of the enclave’s code and sign them with AWS’s private key, or more precisely with a certificate that is signed by a certificate that is signed by a certificate… that is signed by the AWS root certificate.

    You can use the cryptographic attestation feature to establish trust between an enclave’s source code and the code that is actually executed. Just make sure to get the AWS root certificate from a trusted source and to verify its hash.

    What’s important is the fact that AWS owns both the attestation key and the infrastructure. This means that you must completely trust AWS. If AWS is compromised or acts maliciously, it’s game over. This security model is different from the SGX architecture, where trust is divided between Intel (the attestation key owner) and a cloud provider.

    When the Hypervisor signs an enclave’s hashes, it’s specifically signing a CBOR-encoded document specified in the aws-nitro-enclaves-nsm-api repo. There are a few items in the document, but for now we are interested in the platform configuration registers (PCRs), which are measurements (cryptographic hashes) associated with the enclave. The first three PCRs are the hashes of the enclave’s code.

    Figure 5: The first three PCRs of an enclave

    PCRs 0 through 2 are just SHA-384 hashes over the sections’ data:

    • PCR-0: sha384(‘\0’*48 | sha384(Kernel | Cmdline | Ramdisk[:]))
    • PCR-1: sha384(‘\0’*48 | sha384(Kernel | Cmdline | Ramdisk[0]))
    • PCR-2: sha384(‘\0’*48 | sha384(Ramdisk[1:]))

    As you can see, there is no domain separation between the sections’ data—sections are simply concatenated. Moreover, PCR hashes do not include the section headers. This means that we can move bytes between adjacent sections without changing PCRs. For example, if we strip bytes from the beginning of the second ramdisk and append them to the first one, the PCR-0 measurement won’t change. That’s a ticking pipe bomb, but it is currently not exploitable. Regardless, we recommend checking PCR-1 and PCR-2 in addition to PCR-0 whenever possible.

    One more observation is that the metadata section of the EIF is not attested. It’s unspecified how and when users should use that section, so it’s hard to imagine an exploit scenario for this property. Just make sure your system’s security doesn’t depend on content from that section.

    Where do you sign?

    Finally, we’ll discuss the signature section of the EIF. This section contains a CBOR-encoded vector of tuples, each of which is a certificate-signature pair. The signature is a CBOR-encoded COSE_Sign1 structure that contains the encoded payload (tuples of PCR index-value pairs), the actual signature over the payload, and some metadata. The certificate is in PEM format.

    Section = [(certificate, COSE structure), (certificate, COSE structure), …]
    COSE structure = COSE_Sign1([(PCR index, PCR value), (PCR index, PCR value), …])
    COSE_Sign1(payload) = structure {
        payload = payload
        signature = sign(payload)
        metadata = signing algorithm (etc)
    }
    

    In the current version of the EIF format, the section contains only the signature for PCR-0, the hash of the entire enclave image. (But note that you can make an EIF with many signature elements; it will still be run by the Hypervisor, but it won’t validate signatures after the first one.)

    The signing code is implemented by the aws-nitro-enclaves-cose library.

    PCR-8 is a hash of the EIF file’s signing certificate and is computed as follows. The certificate first is decoded from its original PEM format and encoded as DER.

    PCR-8 = sha384(‘\0’*48 | sha384(SignatureSection[0].certificate))
    

    Now, how do you validate the signature? The documentation instructs users to decrypt the payload from the COSE_Sign1 object to get the PCR index-value pair and compare the PCR value with the expected PCR. We think there is a terminology issue here and that they mean to verify the actual signature, and then extract the PCR from the payload and compare it with the expected one. However, we instead recommend reconstructing the COSE_Sign1 payload from the expected PCR and verifying the signature against that. That should save you from encountering bugs due to invalid parsing. (We discuss such bugs in the next section.)

    The official way to sign an enclave is to use the nitro-cli tool on an EC2 instance (figure 6). That forces you to push a private key to the instance (figure 7). That’s really not an ideal way to handle private keys. Even worse, the AWS documentation doesn’t instruct users to protect their keys with passphrases…

    But there’s nothing stopping you from running nitro-cli outside of an EC2 instance, or even from running it in an offline environment. After all, the EIF is just a bunch of headers and binary blobs—the Nitro Hypervisor is not required to build and sign the image. The AWS repository even has an example of building an EIF in a Docker container. Moreover, there is pending PR in the aws-nitro-enclaves-cli repository that will enable EIFs to be signed with KMS once merged.

    Figure 6: The AWS documentation states that nitro-cli must be run on an EC2 instance.

    nitro-cli build-enclave --docker-uri hello-world:latest --output-file 
    hello-signed.eif --private-key key_name.pem --signing-certificate certificate.pem
    

    Figure 7: Private keys must be stored in a local file.

    Overall, we recommend not following the AWS documentation when it comes to signing EIFs. Instead, here are a few options to ensure that EIFs are signed securely (in order of recommendation):

    • Push your private key and Docker image to an offline environment and sign the EIF there.
    • Modify nitro-cli to enable more secure signing (with HSM, KMS, keyring, etc.).
    • Wait for the nitro-cli PR that will enable EIFs to be signed with KMS to be merged; that way, you won’t have to modify nitro-cli yourself to do so.
    • Push your private key to your EC2 instance and sign the EIF there, as AWS recommends, but protect the key with a passphrase first. (nitro-cli will ask for the passphrase while building the EIF.)

    How do you parse?

    Now that we know what an enclave image looks like, we’ll discuss how it is parsed. If you are familiar with security bugs in file format parsers, you’ve probably already spotted ambiguities and potential issues in the parsing process.

    There are two EIF parsers:

    1. Public one: The nitro-cli describe-eif command
    2. Private one: Used by the Nitro Hypervisor to start an enclave

    The parser we care about is the private one—it provides the Hypervisor with an actual view of the EIF. However, it is not open sourced, and there is no specification on the EIF format, so we don’t have any insight into how the private parser actually works. To get some understanding of the private parser’s behavior, we have to treat it as a black box and run experiments on it. By modifying valid EIFs and trying to run them on the Hypervisor, I came up with some answers to the following questions, some of which I included in an issue I submitted to the aws-nitro-enclaves-image-format repo:

    • Is the CRC32 checksum verified? Yes. The enclave does not boot if the CRC32 checksum is invalid.
    • Can an EIF have more than two ramdisk sections? Yes. All ramdisk sections are just concatenated together.
    • Can you truncate (corrupt) a cpio archive in a ramdisk section? Yes! Some cpio errors are ignored by the Hypervisor.
    • Can an EIF have more than a single kernel or cmdline section? Probably not, but it’s hard to ensure that something is not possible.
    • Can you swap sections of different types (e.g., put the cmdline section before the kernel section)? Yes. Doing so changes the PCR-0 measurement.
    • Are the section sizes indicated in the EIF header metadata validated against the sizes indicated in the sections’ actual headers? Yes.
    • Can an EIF contain data between its sections? Yes. If so, the CRC32 checksum is also computed over that data.
    • Is an EIF header’s num_sections field validated against items in the section sizes and section offsets? No. Items after num_sections are ignored.
    • Do the sizes in the section_sizes array include section headers? No. The array stores data lengths only.
    • Can an EIF have more than one PCR index-value tuple in the signature section? No.
    • Can an EIF use an empty PCR index-value vector? No.
    • Can you sign a PCR other than PCR-0? It’s complicated, but no. The PCR index can be arbitrary data (not even a number), but the value must be a PCR-0 value.
    • Can an EIF store more than one certificate-signature pair in the signature section? Yes.
    • Are all certificate-signature pairs validated? No. Only the first pair is validated.

    If you compare the findings above with the nitro-cli parser code you will see that the two parsers work differently. Maybe the most important difference is that the nitro-cli parser does not respect the header metadata like num_sections and the section offsets. Therefore, the nitro-cli parser may produce different measurements than the Hypervisor parser. We recommend not using the nitro-cli describe-eif command to learn the PCRs of untrusted EIFs. Instead, build your EIFs from sources or run them and use the nitro-cli describe-enclaves command. That command consults the Hypervisor for measurements.

    Why is this relevant?

    We run code in TEEs like AWS Nitro Enclaves when that code is highly security-critical, so we have to get the details right. But the documentation on AWS Nitro Enclaves is severely lacking, making it hard to understand those details. The feature also lacks mature tooling and contains several security footguns. So if you’re going to use AWS Nitro Enclaves, be sure to follow the checklist provided in the beginning of this post! And if you need further guidance, our AppSec team holds regular office hours. Contact us to schedule a meeting where you can ask our experts any questions.

    To learn more about AWS, check out Scott Arciszewski’s blog post “Cloud cryptography demystified: Amazon Web Services” and Joop van de Pol’s blog post “A trail of flipping bits” about TEE-specific issues.

    Cloud cryptography demystified: Amazon Web Services

    14 February 2024 at 14:00

    By Scott Arciszewski

    This post, part of a series on cryptography in the cloud, provides an overview of the cloud cryptography services offered within Amazon Web Services (AWS): when to use them, when not to use them, and important usage considerations. Stay tuned for future posts covering other cloud services.

    At Trail of Bits, we frequently encounter products and services that make use of cloud providers’ cryptography offerings to satisfy their security goals. However, some cloud providers’ cryptography tools and services have opaque names or non-obvious use cases. This is particularly true for AWS, whose huge variety of services are tailored for a multitude of use cases but can be overwhelming for developers with limited experience. This guide—informed by Trail of Bits’ extensive auditing experience as well as my own experience as a developer at AWS—dives into the differences between these services and explains important considerations, helping you choose the right solution to enhance your project’s security.

    Introduction

    The cryptography offered by cloud computing providers can be parceled into two broad categories with some overlap: cryptography services and client-side cryptography software. In the case of AWS, the demarcation between the two is mostly clear.

    By client-side, we mean that the service runs in your application (the client), rather than in the Service in question. This doesn’t mean that the service necessarily runs in a web browser or on your users’ devices. Even if the client is running on a virtual machine in EC2, the cryptography is not happening at the back-end service level, and is therefore client-side.

    Some examples of AWS cryptography services include the Key Management Service (KMS) and Cloud Hardware Security Module (CloudHSM). In the other corner, AWS’s client-side cryptography software (i.e., tools) includes the AWS Encryption SDK, the AWS Database Encryption SDK, and the S3 Encryption Client.

    One product from AWS that blurs the line between both categories is the Cryptographic Computing for Clean Rooms (C3R): a client-side tool tightly integrated into the AWS Clean Rooms service. Another is Secrets Manager, which runs client-side but is its own service. (Some powerful features that use cryptography, such as AWS Nitro, will be explored in detail in a future blog post.)

    Let’s explore some of these AWS offerings, including when they’re the most useful and some sharp edges that we often discover in our audits.

    AWS cryptography services

    AWS CloudHSM

    You want to use CloudHSM: If industry or government regulations require you to use an HSM directly for a specific use case. Otherwise, prioritize KMS.

    You don’t want to use CloudHSM: If KMS is acceptable instead.

    CloudHSM is simply an AWS-provisioned HSM accessible in your cloud environment. If you don’t have a legal requirement to use an HSM directly in your architecture, you can skip CloudHSM entirely.

    AWS KMS

    You want to use KMS: Any time you use Amazon’s services (even non-cryptographic services) or client-side libraries.

    You don’t want to use KMS: For encrypting or decrypting large messages (use key-wrapping with KMS instead).

    AWS KMS can be thought of as a usability wrapper around FIPS-validated HSMs. It offers digital signatures, symmetric HMAC, and encryption/decryption capabilities with keys that never leave the HSM. However, KMS encryption is intended for key-wrapping in an envelope encryption setup, rather than for the actual encryption or decryption of your actual data.

    One important, but under-emphasized, feature of KMS is Encryption Context. When you pass Encryption Context to KMS during an Encrypt call, it logs the Encryption Context in CloudTrail, and the encrypted data is valid only if the identical Encryption Context is provided on the later Decrypt call.

    It’s important to note that the Encryption Context is not stored as part of the encrypted data in KMS. If you’re working with KMS directly, you’re responsible for storing and managing this additional data.

    Both considerations are solvable by using client-side software for AWS, which are discussed below.

    Recently, KMS added support for external key stores, where KMS will call an HSM in your data center as part of its normal operation. This feature exists to comply with some countries’ data sovereignty requirements, and should be used only if legally required. What you gain in compliance with this feature, you lose in durability, availability, and performance. It’s generally not worth the trade-off.

    AWS client-side cryptography software

    AWS Encryption SDK

    You want to use the AWS Encryption SDK: For encrypting arbitrary-length secrets in a cloud-based application.

    You don’t want to use the AWS Encryption SDK: If you’re working with encrypting data for relational or NoSQL databases. The AWS Database Encryption SDK should be used instead.

    The AWS Encryption SDK is a general-purpose encryption utility for applications running in the cloud. Its feature set can be as simple as “wraps KMS to encrypt blobs of text” with no further considerations, if that’s all you need, or as flexible as supporting hierarchical key management to minimize network calls to KMS in a multi-keyring setup.

    Regardless of how your cryptographic materials are managed, the AWS Encryption SDK stores the Encryption Context passed to KMS in the encrypted message header, so you don’t need to remember to store it separately.

    Additionally, if you use an Algorithm Suite that includes ECDSA, it will generate an ephemeral keypair for each message, and the public key will be stored in the Encryption Context. This has two implications:

    1. Because Encryption Context is logged in CloudTrail by KMS, service operators can track the flow of messages through their fleet without ever decrypting them.
    2. Because each ECDSA keypair is used only once and then the secret key discarded, you can guarantee that a given message was never mutated after its creation, even if multiple keyrings are used.

    One important consideration for AWS Encryption SDK users is to ensure that you’re specifying your wrapping keys and not using KMS Discovery. Discovery is an anti-pattern that exists only for backwards compatibility.

    If you’re not using the hierarchical keyring, you’ll also want to look at data key caching to reduce the number of KMS calls and reduce latency in your cloud applications.

    AWS Database Encryption SDK

    You want to use the AWS Database Encryption SDK: If you’re storing sensitive data in a database, and would prefer to never reveal plaintext to the database.

    You don’t want to use the AWS Database Encryption SDK: If you’re not doing the above.

    As of this writing, the AWS Database Encryption SDK exists only for DynamoDB in Java. The documentation implies that support for more languages and database back ends is coming in the future.

    The AWS Database Encryption SDK (DB-ESDK) is the successor to the DynamoDB Encryption Client. Although it is backwards compatible, the new message format offers significant improvements and the ability to perform queries against encrypted fields without revealing your plaintext to the database service, using a mechanism called Beacons.

    At their core, Beacons are a truncated instance of the HMAC function. Given the same key and plaintext, HMAC is deterministic. If you truncate the output of the HMAC to a few bits, you can reduce the lookup time from a full table scan to a small, tolerable number of false positives.

    Extra caution should be taken when using Beacons. If you cut them too short, you can waste a lot of resources on false positive rejection. If you don’t cut them short enough, an attacker with access to your encrypted database may be able to infer relationships between the beacons—and, in turn, the plaintext values they were calculated from. (Note that the risk of relationship leakage isn’t unique to Beacons, but to any techniques that allow an encrypted database to be queried.)

    AWS provides guidance for planning your Beacons, based on the birthday bound of PRFs to ensure a healthy distribution of false positives in a dataset.

    Disclaimer: I designed the cryptography used by the AWS Database Encryption SDK while employed at Amazon.

    Other libraries and services

    AWS Secrets Manager

    You want to use AWS Secrets Manager: If you need to manage and rotate service passwords (e.g., to access a relational database).

    You don’t want to use AWS Secrets Manager: If you’re looking to store your online banking passwords.

    AWS Secrets Manager can be thought of as a password manager like 1Password, but intended for cloud applications. Unlike consumer-facing password managers, Secrets Manager’s security model is predicated on access to AWS credentials rather than a master password or other client-managed secret. Furthermore, your secrets are versioned to prevent operational issues during rotation.

    Secrets Manager can be configured to automatically rotate some AWS passwords at a regular interval.

    In addition to database credentials, AWS Secrets Manager can be used for API keys and other sensitive values that might otherwise be committed into source code.

    AWS Cryptographic Computing for Clean Rooms (C3R)

    You want to use AWS C3R: If you and several industry partners want to figure out how many database entries you have in common without revealing the contents of your exclusive database entries to each other.

    You don’t want to use AWS C3R: If you’re not doing that.

    C3R uses server-aided Private Set Intersection to allow multiple participants to figure out how many records they have in common, without revealing unrelated records to each other.

    For example: If two or more medical providers wanted to figure out if they have any patients in common (i.e., because they provide services that are not clinically safe together, but are generally safe separately), they could use C3R to calculate the intersection of their private sets and not violate the privacy of the patients that only one provider services.

    The main downside of C3R is that it has a rather narrow use-case.

    Wrapping up

    We hope that this brief overview has clarified some of AWS’s cryptography offerings and will help you choose the best one for your project. Stay tuned for upcoming posts in this blog series that will cover other cloud cryptography services!

    In the meantime, if you’d like a deeper dive into these products and services to evaluate whether they’re appropriate for your security goals, feel free to contact our cryptography team. We regularly hold office hours, where we schedule around an hour to give you a chance to meet with our cryptographers and ask any questions.

    Why Windows can’t follow WSL symlinks

    12 February 2024 at 14:30

    By Yarden Shafir

    Did you know that symbolic links (or symlinks) created through Windows Subsystem for Linux (WSL) can’t be followed by Windows?

    I recently encountered this rather frustrating issue as I’ve been using WSL for my everyday work over the last few months. No doubt others have noticed it as well, so I wanted to document it for anyone who may be seeking answers.

    Let’s look at an example of the issue. I’ll use Ubuntu as my Linux client with WSL2 and create a file followed by a symlink to a file in the same directory (via ln -s):

    echo "this is a symlink test" > test_symlink.txt
    ln -s test_symlink.txt targetfile.txt
    

    In WSL, I can easily read both the original file (test_symlink.txt) and the symlink (targetfile.txt). But when I try to open the symlink from the Windows file explorer, an error occurs:

    The Windows file explorer error

    The same error occurs when I try to access targetfile.txt from the command line:

    The command line error

    Looking at the directory, I can see the target file, but it has a size of 0 KB:

    The symlink in the directory with a size of 0 KB

    And when I run dir, I can see that Windows recognizes targetfile.txt as an NTFS junction but can’t find where the link points to, like it would for a native Windows symlink:

    Windows can’t find where the link points to.

    When I asked about this behavior on Twitter, Bill Demirkapi had an answer—the link that is created by WSL is an “LX symlink,” which isn’t recognized by Windows. That’s because symlinks on Linux are implemented differently than symlinks on Windows: on Windows, a symlink is an object, implemented and interpreted by the kernel. On Linux, a symlink is simply a file with a special flag, whose content is a path to the destination. The path doesn’t even have to be valid!

    Using FileTest, we can easily verify that this is a Linux symlink, not a Windows link. If you look carefully, you can even see the path to the destination file in the file’s DataBuffer:

    FileTest verifies the link as a Linux symlink.

    FileTest can also provide a more specific error message regarding the file open failure:

    FileTest’s file open failure error message

    It turns out that trying to open this file with NtCreateFile fails with an STATUS_IO_REPARSE_TAG_NOT_HANDLED error, meaning that Windows recognizes this file as a reparse point but can’t identify the LX symlink tag and can’t follow it. Windows knows how to handle some parts of the Linux filesystem, as explained by Microsoft, but that doesn’t include the Linux symlink format.

    If I go back to WSL, the symlink works just fine—the system can see the symlink target and open the file as expected:

    The symlink works in WSL.

    It’s interesting to note that symlinks created on Windows work normally on WSL. I can create a new file in the same directory and create a symlink for it using the Windows command line (cmd.exe):

    echo "this is a test for windows symlink" > test_win_symlink.txt
    mklink win_targetfile.txt test_win_symlink.txt
    

    Now Windows treats this as a regular symlink that it can identify and follow:

    Windows can follow symlinks created on Windows.

    But the Windows symlink works just as well if we access it from within WSL:

    The Windows symlink can also be accessed from WSL.

    We get the same result if we create a file junction using the Windows command line and try to open it with WSL:

    echo "this is a test for windows junctions" > test_win_junction.txt
    mklink /J junction_targetfile.txt test_win_junction.txt
    

    This is how the directory now looks from Windows’s point of view:

    The directory from Windows’s point of view

    And this is how it looks from WSL’s point of view:

    The directory from WSL’s point of view

    Hard links created by WSL do work normally on Windows, so this issue applies only to symlinks.

    To summarize, Windows handles only symlinks that were created by Windows, using its standard tags, and fails to process WSL symlinks of the “LX symlink” type. However, WSL handles both types of symlinks with no issues. If you use Windows and WSL to access the same files, it’s worth paying attention to your symlinks and how they are created to avoid the same issues I ran into.

    One last thing to point out is that when Bill Demirkapi tested this behavior, he noticed that Windows could follow WSL’s symlinks when they were created with a relative path but not with an absolute path. On all systems I tested, Windows couldn’t follow any symlinks created by WSL. So there is still some mystery left here to investigate.

    Master fuzzing with our new Testing Handbook chapter

    9 February 2024 at 14:00

    Our latest addition to the Trail of Bits Testing Handbook is a comprehensive guide to fuzzing: an essential, effective, low-effort method to find bugs in software that involves repeatedly running a program with random inputs to cause unexpected results.

    At Trail of Bits, we don’t just rely on standard static analysis. We tailor our approach to each project, fine-tuning our methods to rigorously fuzz critical code segments. We’ve seen how challenging it can be to start with fuzzing; it’s a field with diverse methodologies and no one-size-fits-all solution. We believe that distilling our knowledge into this handbook will help those seeking to integrate fuzzing into their methodology do so quickly and easily, with better results.

    Designed for developers eager to integrate fuzzing into their workflow, this chapter demystifies the fuzzing process. Within a jungle of fuzzer forks, each with numerous variations, it’s easy to get lost. Our guide focuses on the most proven and widely used fuzzers, providing a solid foundation to get you results.

    This chapter focuses on how to fuzz C/C++ and Rust projects. We describe how to install and start using three of the most mature fuzzers commonly used for C/C++ and Rust projects: libFuzzer, AFL++, and cargo-fuzz. We discuss common challenges when fuzzing, using an example C/C++ project. One of the challenges of starting your fuzzing is that there is no uniform way to set up fuzzing; some developers use CMake, while others use Autotools or plain Makefiles. We will also go through several real-world examples that use different build systems to demonstrate how to fuzz real projects.

    For every language and technology stack, and throughout the chapter, we will show you how to discover the following exemplary bug using each of the discussed fuzzers.

    void check_buf(char *buf, size_t buf_len) {
        if(buf_len > 0 && buf[0] == 'a') {
            if(buf_len > 1 && buf[1] == 'b') {
                if(buf_len > 2 && buf[2] == 'c') {
                    abort();
                }
            }
        }
    }
    

    We also describe more advanced techniques, like using AddressSanitizer, a memory sanitizer that detects memory corruption bugs, with each fuzzer. We also detail how to use fuzzing dictionaries efficiently, and how to write good fuzzing harnesses.

    Our goal is to continuously update the handbook—including this chapter— so that it remains a key resource for security practitioners and developers in configuring, deploying, and automating the tools we use at Trail of Bits. We plan on keeping this chapter updated to reflect future changes to the fuzzing ecosystem and to include the most advanced fuzzing techniques.

    Binary type inference in Ghidra

    7 February 2024 at 14:00

    By Ian Smith

    Trail of Bits is releasing BTIGhidra, a Ghidra extension that helps reverse engineers by inferring type information from binaries. The analysis is inter-procedural, propagating and resolving type constraints between functions while consuming user input to recover additional type information. This refined type information produces more idiomatic decompilation, enhancing reverse engineering comprehension. The figures below demonstrate how BTIGhidra improves decompilation readability without any user interaction:

    Figure 1: Default Ghidra decompiler output

    Figure 2: Ghidra output after running BTIGhidra

    Precise typing information transforms odd pointer arithmetic into field accesses and void* into the appropriate structure type; introduces array indexing where appropriate; and reduces the clutter of void* casts and dereferences. While type information is essential to high-quality decompilation, the recovery of precise type information unfortunately presents a major challenge for decompilers and reverse engineers. Information about a variable’s type is spread throughout the program wherever the variable is used. For reverse engineers, it is difficult to keep a variable’s dispersed usages in their heads while reasoning about a local type. We created BTIGhidra in an effort to make this challenge a thing of the past.

    A simple example

    Let’s see how BTIGhidra can improve decompiler output for an example binary taken from a CTF challenge called mooosl (figure 3). (Note: Our GitHub repository has directions for using the plugin to reproduce this demo.) The target function, called lookup, iterates over nodes in a linked list until it finds a node with a matching key in a hashmap stored in list_heads.1 This function hashes the queried key, then selects the linked list that stores all nodes that have a key equal to that hash. Next, it traverses the linked list looking for a key that is equal to the key parameter.

    Figure 3: Linked-list lookup function from mooosl

    The structure for linked list nodes (figure 4) is particularly relevant to this example. The structure has buffers for the key and value stored in the node, along with sizes for each buffer. Additionally, each node has a next pointer that is either null or points to the next node in the linked list.

    Figure 4: Linked list node structure definition

    Figure 5 shows Ghidra’s initial decompiler output for the lookup function (FUN_001014fb). The overall decompilation quality is low due to poor type information across the function. For example, the recursive pointer next in the source code causes Ghidra to emit a void** type for the local variable (local_18), and the return type. Also, the type of the key_size function parameter, referred to as param_2 in the output, is treated as a void* type despite not being loaded from. Finally, the access to the global variable that holds linked list head nodes, referred to as DAT_00104010, is not treated as an array indexing operation.

    Figure 5: Ghidra decompiler output for the lookup function without type inference.
    Highlighted red text is changed after running type inference.

    Figure 6 shows a diff against the code in figure 5 after running BTIGhidra. Notice that the output now captures the node structure and the recursive type for the next pointer, typed as struct_for_node_0_9* instead of void**. BTIGhidra also resolves the return type to the same type. Additionally, the key_size parameter (param_2) is no longer treated as a pointer. Finally, the type of the global variable is updated to a pointer to linked list node pointers (PTR_00104040), causing Ghidra to treat the load as an array indexing operation.

    Figure 6: Ghidra decompiler output for the lookup function with type inference.
    Highlighted green text was added by type inference.

    BTIGhidra infers types by collecting a set of subtyping constraints and then solving those constraints. Usages of known function signatures act as sources for type constraints. For instance, the call to memcmp in figure 5 results in a constraint on param_2 declaring that param2 must be a subtype size_t. Notice in the figure that BTIGhidra also successfully identifies the four fields used in this function, while also recovering the additional fields used elsewhere in the binary.

    Additionally, users can supply a known function signature to provide additional type information for the type inference algorithm to propagate across the decompiled program. Figure 6 demonstrates how new type information from a known function signature (value_dump in this case) flows from a call site to the return type from the lookup function (referred to as FUN_001014fb in the decompiled output) in figure 5. The red line depicts how the user-defined function signature for value_dump is used to infer the types of field_at_8 and field_at_24 for the returned struct_for_node_0_9 from the original function FUN_001014fb. The type information derived from this call is combined with all other call sites to FUN_001014fb in order to remain conservative in the presence of polymorphism.

    Figure 7: Back-propagation of type information derived from value_dump function signature

    Ultimately, BTIGhidra fills in the type information for the recovered structure’s used fields, shown in figure 8. Here, we see that the types for field_at_8 and field_at_24 are inferred via the invocation of value_dump. However, the fields with type undefined8 indicate that the field was not sufficiently constrained by the added function signature to derive an atomic type for the field (i.e., there are no usages that relate the field to known type information); the inference algorithm has determined only that the field must be eight bytes.

    Figure 8: Struct type information table for decompiled linked list nodes

    Ghidra’s decompiler does perform some type propagation using known function signatures provided by its predefined type databases that cover common libraries such as libc. When decompiling the binary’s functions that call known library functions, these type signatures are used to guess likely types for the variables and parameters of the calling function. This approach has several limitations. Ghidra does not attempt to synthesize composite types (i.e., structs and unions) without user intervention; it is up to the user to define when and where structs are created. Additionally, this best-effort type propagation approach has limited inter-procedural power. As shown in figure 9, Ghidra’s default type inference results in conflicting types for FUN_1014fb and FUN_001013db (void* versus long and ulong), even though parameters are passed directly between the two functions.

    Figure 9: Default decompiler output using Ghidra’s basic type inference

    Our primary motivation for developing BTIGhidra is the need for a type inference algorithm in Ghidra that can propagate user-provided type information inter-procedurally. For such an algorithm to be useful, it should not guess a “wrong” type. If the user submits precise and correct type information, then the type inference algorithm should not derive conflicting type information that prevents user-provided types from being used. For instance, if the user provides a correct type float and we infer a type int, then these types will conflict resulting in a type error (represented formally by a bottom lattice value). Therefore, inferred types must be conservative; the algorithm should not derive a type for a program variable that conflicts with its source-level type. In a type system with subtyping, this property can be phrased more precisely as “an inferred type for a program variable should always be a supertype of the actual type of the program variable.”

    In addition to support for user-provided types, BTIGhidra overcomes many other shortcomings of Ghidra’s built-in type inference algorithm. Namely, BTIGhidra can operate over stripped binaries, synthesize composite types, ingest user-provided type constraints, derive conservative typing judgments, and collect a well-defined global view of a binary’s types.

    Bringing type-inference to binaries

    At the source level, type inference algorithms work by collecting type constraints on program terms that are expressed in the program text, which are then solved to produce a type for each term. BTIGhidra operates on similar principles, but needs to compensate for information loss introduced by compilation and C’s permissive types. BTIGhidra uses an expressive type system that supports subtyping, polymorphism, and recursive types to reason about common programming idioms in C that take advantage of the language’s weak types to emulate these type system features. Also, subtyping, when combined with reaching definitions analysis, allows the type inference algorithm to handle compiler-introduced behavior, such as register and stack variable reuse.

    Binary type inference proceeds similarly, but information lost during compilation increases the difficulty of collecting type constraints. To meet this challenge, BTIGhidra runs various flow-sensitive data-flow analyses (e.g., value-set analysis) provided by and implemented using FKIE-CAD’s cwe_checker to track how values flow between program variables. These flows inform which variables or memory objects must be subtypes of other objects. Abstractly, if a value flows from a variable x into a variable y, then we can conservatively conclude that x is a subtype of y.

    Using this data-flow information, BTIGhidra independently generates subtyping constraints for each strongly connected component (SCC)2 of functions in the binary’s call graph. Next, BTIGhidra simplifies signatures by using a set of proof rules to solve for all derivable relationships between interesting variables (i.e., type constants like int and size_t, functions, and global variables) within an SCC. These signatures act as a summary of the function’s typing effects when it is called. Finally, BTIGhidra solves for the type sketch of each SCC, using the signatures of called SCCs as needed.

    Type sketches are our representation of recursively constrained types. They represent a type as a directed graph, with edges labeled by fields that represent the capabilities of a type and nodes labeled by a bound [lb,ub]. Figure 10 shows an example of a type sketch for the value_dump function signature. As an example, the path from node 3 to 8 can be read as “the type with ID 3 is a function that has a second in parameter which is an atomic type that is a subtype of size_t and a supertype of bottom.” These sketches provide a convenient representation of types when lowering to C types through a fairly straightforward graph traversal. Type sketches also form a lattice with a join and meet operator defined by language intersection and union, respectively. These operations are useful for manipulating types while determining the most precise polymorphic type we can infer for each function in the binary. Join allows the algorithm to determine the least supertype of two sketches, and meet allows the algorithm to determine the greatest subtype of two sketches.

    Figure 10: Type sketch for the value_dump function signature

    The importance of polymorphic type inference

    Using a type system that supports polymorphism may seem odd for inferring C types when C has no explicit support for polymorphism. However, polymorphism is critical for maintaining conservative types in the presence of C idioms, such as handling multiple types in a function by dispatching over a void pointer. Perhaps the most canonical examples of polymorphic functions in C are malloc and free.

    Figure 11: Example program that uses free polymorphically

    In the example above, we consider a simple (albeit contrived) program that passes two structs to free. We access the fields of both foo and bar to reveal field information to the type inference algorithm. To demonstrate the importance of polymorphism, I modified the constraint generation phase of type inference to generate a single formal type variable for each function, rather than a type variable per call site. This change has the effect of unifying all constraints on free, regardless of the calling context.

    The resulting unsound decompilation is as follows:

    struct_for_node_0_13 * produce(struct_for_node_0_13
    *param_1,struct_for_node_0_13 *param_2)
    
    {
      param_1->field_at_0 = param_2->field_at_8;
      param_1->field_at_8 = param_2->field_at_0;
      param_1->field_at_16 = param_2->field_at_0;
      free(param_1);
      free(param_2);
      return param_1;
    }
    

    Figure 12: Unsound inferred type for the parameters to produce

    The assumption that function calls are non-polymorphic leads to inferring an over-precise type for the function’s parameters (shown in figure 12), causing both parameters to have the same type with three fields.

    Instead of unifying all call sites of a function, BTIGhidra generates a type variable per call site and unifies the actual parameter type with the formal parameter type only if the inferred type is structurally equal after a refinement pass. This conservative assumption allows BTIGhidra to remain sound and derive the two separate types for the parameters to the function in figure 11:

    struct_for_node_0_16 * produce(struct_for_node_0_16
    *param_1,struct_for_node_0_20 *param_2)
    
    {
      param_1->field_at_0 = param_2->field_at_8;
      param_1->field_at_8 = param_2->field_at_0;
      param_1->field_at_16 = param_2->field_at_0;
      free(param_1);
      free(param_2);
      return param_1;
    } 
    

    Evaluating BTIGhidra

    Inter-procedural type inference on binaries operates over a vast set of information collected on the target program. Each analysis involved is a hard computational problem in its own right. Ghidra and our flow-sensitive analyses use heuristics related to control flow, ABI information, and other constructs. These heuristics can lead to incorrect type constraints, which can have wide-ranging effects when propagated.

    Mitigating these issues requires a strong testing and validation strategy. In addition to BTIGhidra itself, we also released BTIEval, a tool for evaluating the precision of type inference on binaries with known ground-truth types. BTIEval takes a binary with debug information and compares the types recovered by BTIGhidra to those in the debug information (the debug info is ignored during type inference). The evaluation utility aggregates soundness and precision metrics. Utilizing BTIEval more heavily and over more test binaries will help us provide better correctness guarantees to users. BTIEval also collects timing information, allowing us to evaluate the performance impacts of changes.

    Give BTIGhidra a try

    The pre-built Ghidra plugin is located here or can be built from the source. The walkthrough instructions are helpful for learning how to run the analysis and update it with new type signatures. We look forward to getting feedback on the tool and welcome any contributions!

    Acknowledgments

    BTIGhidra’s underlying type inference algorithm was inspired by and is based on an algorithm proposed by Noonan et al. The methods described in the paper are patented under process patent US10423397B2 held by GrammaTech, Inc. Any opinions, findings, conclusions, or recommendations expressed in this blog post are those of the author(s) and do not necessarily reflect the views of GrammaTech, Inc.

    We would also like to thank the team at FKIE-CAD behind CWE Checker. Their static analysis platform over Ghidra PCode provided an excellent base set of capabilities in our analysis.

    This research was conducted by Trail of Bits based upon work supported by DARPA under Contract No. HR001121C0111 (Distribution Statement A, Approved for Public Release: Distribution Unlimited). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Government or DARPA.

    1Instructions for how to use the plugin to reproduce this demo are available here.
    2A strongly connected component of a graph is a set of nodes in a directed graph where there exists a path from each node in the set to every other node in the set. Conceptually an SCC of functions separates the call graphs into groups of functions that do not recursively call each other.

    Improving the state of Cosmos fuzzing

    5 February 2024 at 14:00

    By Gustavo Grieco

    Cosmos is a platform enabling the creation of blockchains in Go (or other languages). Its reference implementation, Cosmos SDK, leverages strong fuzz testing extensively, following two approaches: smart fuzzing for low-level code, and dumb fuzzing for high-level simulation.

    In this blog post, we explain the differences between these approaches and show how we added smart fuzzing on top of the high-level simulation framework. As a bonus, our smart fuzzer integration led us to identify and fix three minor issues in Cosmos SDK.

    Laying low

    The first approach to Cosmos code fuzzing leverages well-known smart fuzzers such as AFL, go-fuzz, or Go native fuzzing for specific parts of the code. These tools rely on source code instrumentation to extract useful information to guide a fuzzing campaign. This is essential to explore the input space of a program efficiently.

    Using fuzzing for low-level testing of Go functions in Cosmos SDK is very straightforward. First, we select a suitable target function, usually stateless code, such as testing the parsing of normalized coins:

    func FuzzTypesParseCoin(f *testing.F) {
        f.Fuzz(func(t *testing.T, data []byte) {
            _, _ = types.ParseCoinNormalized(string(data))
        })
    }

    Figure 1: A small fuzz test for testing the parsing of normalized coins

    Smart fuzzers can quickly find issues in stateless code like this; however, it is clear that the limitations of being applied only to low-level code will not help uncover more complex and interesting issues in the cosmos-sdk execution.

    Moving up!

    If we want to catch more interesting bugs, we need to go beyond low-level fuzz testing in Cosmos SDK. Fortunately, there is already a high-level approach for testing: this works from the top down, instead of the bottom up. Specifically, cosmos-sdk provides the Cosmos Blockchain Simulator, a high-level, end-to-end transaction fuzzer, to uncover issues in Cosmos applications.

    This tool allows executing random operation transactions, starting either from a random genesis state or a predefined one. To get this tool to work, application developers must implement several important functions that will generate both a random genesis state and transactions. Fortunately for us, this is fully implemented for all the cosmos-sdk features.

    For instance, to test the MsgSend operation from the x/nft module, the developers defined the SimulateMsgSend function to generate a random NFT transfer:

    // SimulateMsgSend generates a MsgSend with random values.
    func SimulateMsgSend(
            cdc *codec.ProtoCodec,
            ak nft.AccountKeeper,
            bk nft.BankKeeper,
            k keeper.Keeper,
    ) simtypes.Operation {
            return func(
                    r *rand.Rand, app *baseapp.BaseApp, ctx sdk.Context, accs []simtypes.Account, chainID string,
            ) (simtypes.OperationMsg, []simtypes.FutureOperation, error) {
                    sender, _ := simtypes.RandomAcc(r, accs)
                    receiver, _ := simtypes.RandomAcc(r, accs)
                    …

    Figure 2: Header of the SimulateMsgSend function from the x/nft module

    While the simulator can produce end-to-end execution of transaction sequences, there is an important difference with the use of smart fuzzers such as go-fuzz. When the simulator is invoked, it will use only a single source of randomness for producing values. This source is configured when the simulation starts:

    func SimulateFromSeed(
            tb testing.TB,
            w io.Writer,
            app *baseapp.BaseApp,
            appStateFn simulation.AppStateFn,
            randAccFn simulation.RandomAccountFn,
            ops WeightedOperations,
            blockedAddrs map[string]bool,
            config simulation.Config,
            cdc codec.JSONCodec,
    ) (stopEarly bool, exportedParams Params, err error) {
            // in case we have to end early, don't os.Exit so that we can run cleanup code.
            testingMode, _, b := getTestingMode(tb)
    
            fmt.Fprintf(w, "Starting SimulateFromSeed with randomness created with seed %d\n", int(config.Seed))
            r := rand.New(rand.NewSource(config.Seed))
            params := RandomParams(r)
            …

    Figure 3: Header of the SimulateFromSeed function

    Since the simulation mode will only loop through a number of purely random transactions, it is pure random testing (also called dumb fuzzing).

    Why don’t we have both?

    It turns out, there is a simple way to combine these approaches, allowing the native Go fuzzing engine to randomly explore the cosmos-sdk genesis, the generation of transactions, and the block creation. The first step is to create a fuzz test that invokes the simulator. We based this code on the unit tests in the same file:

    func FuzzFullAppSimulation(f *testing.F) {
        f.Fuzz(func(t *testing.T, input [] byte) {
           …
           config.ChainID = SimAppChainID
    
           appOptions := make(simtestutil.AppOptionsMap, 0)
           appOptions[flags.FlagHome] = DefaultNodeHome
           appOptions[server.FlagInvCheckPeriod] = simcli.FlagPeriodValue
            db := dbm.NewMemDB()
           logger := log.NewNopLogger()
    
           app := NewSimApp(logger, db, nil, true, appOptions, interBlockCacheOpt(), baseapp.SetChainID(SimAppChainID))
           require.Equal(t, "SimApp", app.Name())
    
           // run randomized simulation
           _,_, err := simulation.SimulateFromSeed(
                   t,
                   os.Stdout,
                   app.BaseApp,
                   simtestutil.AppStateFn(app.AppCodec(), app.SimulationManager(), app.DefaultGenesis()),
                   simtypes.RandomAccounts,
                   simtestutil.SimulationOperations(app, app.AppCodec(), config),
                   BlockedAddresses(),
                   config,
                   app.AppCodec(),
           )
    
           if err != nil {
                   panic(err)
           }
        })

    Figure 4: Template of a Go fuzz test running a full simulation of cosmos-sdk

    We still need a way to let the fuzzer control possible inputs. A simple approach would be to let the smart fuzzer directly control the seed of the random value generator:

    func FuzzFullAppSimulation(f *testing.F) {
        f.Fuzz(func(t *testing.T, input [] byte) {
           config.Seed = IntFromBytes(input)
           …

    Figure 5: A fuzz test that receives a single seed as input

    func SimulateFromSeed(
            …
            config simulation.Config,
            … 
    ) (stopEarly bool, exportedParams Params, err error) {
            …
            r := rand.New(rand.NewSource(config.Seed))
            …

    Figure 6: Lines modified in SimulateFromSeed to load a seed from the fuzz test

    However, there is an important flaw in this: changing the seed directly will give the fuzzer a very limited amount of control over the input, so their smart mutations will be very ineffective. Instead, we need to allow the fuzzer to better control the input from the random number generator but without refactoring every simulated function from every module. 😱

    Against all odds

    The Go standard library already ships a variety of general functions and data structs. In that sense, Go has “batteries included.” In particular, it provides a random number generator in the math/rand module:

    // A Rand is a source of random numbers.
    type Rand struct {
        src Source
        s64 Source64 // non-nil if src is source64
    
        // readVal contains remainder of 63-bit integer used for bytes
        // generation during most recent Read call.
        // It is saved so next Read call can start where the previous
        // one finished.
        readVal int64
        // readPos indicates the number of low-order bytes of readVal
        // that are still valid.
        readPos int8
    }
    
    … 
    // Seed uses the provided seed value to initialize the generator to a deterministic state.
    // Seed should not be called concurrently with any other Rand method.
    func (r *Rand) Seed(seed int64) {
        if lk, ok := r.src.(*lockedSource); ok {
            lk.seedPos(seed, &r.readPos)
            return
        }
    
        r.src.Seed(seed)
        r.readPos = 0
    }
    
    // Int63 returns a non-negative pseudo-random 63-bit integer as an int64.
    func (r *Rand) Int63() int64 { return r.src.Int63() }
    
    // Uint32 returns a pseudo-random 32-bit value as a uint32.
    func (r *Rand) Uint32() uint32 { return uint32(r.Int63() >> 31) }
    …

    Figure 7: Rand data struct and some of its implementation code

    However, we can’t easily provide an alternative implementation of this because Rand was declared as a type and not as an interface. But we can still provide our custom implementation of its randomness source (Source/Source64):

    // A Source64 is a Source that can also generate
    // uniformly-distributed pseudo-random uint64 values in
    // the range [0, 1<<64) directly.
    // If a Rand r's underlying Source s implements Source64,
    // then r.Uint64 returns the result of one call to s.Uint64
    // instead of making two calls to s.Int63.
    type Source64 interface {
        Source
        Uint64() uint64
    }

    Figure 8: Source64 data type

    Let’s replace the default Source with a new one that uses the input from the fuzzer (e.g., an array of int64) as a deterministic source of randomness (arraySource):

    type arraySource struct {
        pos int
        arr []int64
        src *rand.Rand
    }
    // Uint64 returns a non-negative pseudo-random 64-bit integer as an uint64.
    func (rng *arraySource) Uint64() uint64 {
        if (rng.pos >= len(rng.arr)) {
            return rng.src.Uint64()
        }
          val := rng.arr[rng.pos]
        rng.pos = rng.pos + 1
        if val 
    

    Figure 9: An implementation of uint64() to get signed integers from our deterministic source of randomness

    This new type of source either pops a number from the array or produces a random value from a standard random source if the array was fully consumed. This allows the fuzzer to continue even if all the deterministic values were consumed.

    Ready, Set, Go!

    Once we have modified the code to properly control the random source, we can leverage Go fuzzing like this:

    $ go test -mod=readonly -run=_ -fuzz=FuzzFullAppSimulation -GenesisTime=1688995849 -Enabled=true -NumBlocks=2 -BlockSize=5 -Commit=true -Seed=0 -Period=1 -Verbose=1 -parallel=15
    fuzz: elapsed: 0s, gathering baseline coverage: 0/1 completed
    fuzz: elapsed: 1s, gathering baseline coverage: 1/1 completed, now fuzzing with 15 workers
    fuzz: elapsed: 3s, execs: 16 (5/sec), new interesting: 0 (total: 1)
    fuzz: elapsed: 6s, execs: 22 (2/sec), new interesting: 0 (total: 1)
    …
    fuzz: elapsed: 54s, execs: 23 (0/sec), new interesting: 0 (total: 1)
    fuzz: elapsed: 57s, execs: 23 (0/sec), new interesting: 0 (total: 1)
    fuzz: elapsed: 1m0s, execs: 23 (0/sec), new interesting: 0 (total: 1)
    fuzz: elapsed: 1m3s, execs: 23 (0/sec), new interesting: 5 (total: 6)
    fuzz: elapsed: 1m6s, execs: 30 (2/sec), new interesting: 10 (total: 11)
    fuzz: elapsed: 1m9s, execs: 38 (3/sec), new interesting: 11 (total: 12)

    Figure 10: A short fuzzing campaign using the new approach

    After running this code for a few hours, we collected a number of low-severity bugs in this small trophy case:

    We provided the Cosmos SDK team with our patch for improving the simulation tests, and we are in the process of discussing how to better integrate this into the master.

    Chaos Communication Congress (37C3) recap

    2 February 2024 at 14:00

    Last month, two of our engineers attended the 37th Chaos Communication Congress (37C3) in Hamburg, joining thousands of hackers who gather each year to exchange the latest research and achievements in technology and security. Unlike other tech conferences, this annual gathering focuses on the interaction of technology and society, covering such topics as politics, entertainment, art, sustainability—and, most importantly, security. At the first Congress in the 80s, hackers showcased weaknesses in banking applications over the German BTX system; this year’s theme, “Unlocked,” highlighted breaking technological barriers and exploring new frontiers in digital rights and privacy.

    In this blog post, we will review our contributions to the 37C3—spanning binary exploitation and analysis and fuzzing—before highlighting several talks we attended that we recommend listening to.

    PWNing meetups

    Trail of Bits engineer Dominik Czarnota self-organized two sessions about PWNing, also known as binary exploitation. These meetups showcased Pwndbg and Pwntools, popular tools used during CTF competitions, reverse engineering, and vulnerability research work.

    At the first session, Dominik presented Pwndbg, a plugin for GDB that enhances the debugging of low-level code by displaying useful context on each program stop. This context includes the state of the debugged program (its registers, executable code, and stack memory) and dereferenced pointers, which help the user understand the program’s behavior. The presentation showed some of Pwndbg’s features and commands, such as listing memory mappings (vmmap), displaying process information (procinfo), searching memory (search), finding pointers to specific memory mappings (p2p), identifying stack canary values (canary), and controlling the process execution (nextsyscall, stepuntilasm etc.). The presentation concluded with a release of Pwndbg cheatsheets and details on upcoming features, such as tracking GOT function executions and glibc heap use-after-free analysis. These features have been developed as part of Trail of Bits’s winternship program, now in its thirteenth year of welcoming interns who spend time working and doing research on industry’s most challenging problems.



    At the second session, Arusekk and Peace-Maker showcased advanced features of Pwntools, a Swiss-army Python library useful for exploit development. They demonstrated expert methods for receiving and sending data (e.g., io.recvregex or io.recvpred); command-line tricks when running exploit scripts (cool environment variables or arguments like DEBUG, NOASLR, or LOG_FILE that set certain config options); and other neat features like libcdb command-line tool, the shellcraft module, and the ROP (return oriented programming) helper. For those who missed it, the slides can be found here.

    Next generation fuzzing

    In Fuzz Everything, Everywhere, All at Once, the AFL++ and LibAFL team showcased new features in the LibAFL fuzzer. They presented QEMU-based instrumentation to fuzz binary-only targets and used QEMU hooks to enable sanitizers that help find bugs. In addition to QASan—the team’s QEMU-based AddressSanitizer implementation—the team developed an injection sanitizer that goes beyond finding just memory corruption bugs. Using QEMU hooks, SQL, LDAP, XSS or OS command injections can be detected by defining certain rules in a TOML configuration file. Examination of the config file suggests it should be easily extensible to other injections; we just need to know which functions to hook and which payloads to look for.

    Although memory corruption bugs will decline with the deployment of memory-safe languages like Rust, fuzzing will continue to play an important role in uncovering other bug classes like injections or logic bugs, so it’s great to see new tools created to detect them.

    This presentation’s Q&A session reminded us that oss-fuzz already has a SystemSanitizer that leverages the ptrace syscall, which helped to find a command injection vulnerability in the past.

    In the past, Trail of Bits has used LibAFL in our collaboration with Inria on an academic research project called tlspuffin. The goal of the project was to fuzz various TLS implementations, which uncovered several bugs in wolfSSL.

    Side channels everywhere

    In a talk titled Full AACSess: Exposing and exploiting AACSv2 UHD DRM for your viewing pleasure, Adam Batori presented a concept for side-channel attacks on Intel SGX. Since Trail of Bits frequently conducts audits on projects that use trusted execution environments like Intel SGX (e.g., Mobilecoin), this presentation was particularly intriguing to us.

    After providing an overview of the history of DRM for physical media, Adam went into detail on how the team of researchers behind sgx.fail extracted cryptographic key material from the SGX enclave to break DRM on UHD Blu-ray disks to prove the feasibility of real-world side-channel attacks on secure enclaves. Along the way, he discussed many technological features of SGX along the way.

    The work and talk prompted discussion about Intel’s decision to discontinue SGX on consumer hardware. Due to the high risk of side channels on low-cost consumer devices, we believe that using Intel SGX for DRM purposes is already dead on arrival. Side-channel attacks are just one example of the often-overlooked challenges that accompany the secure use of enclaves to protect data.

    New challenges: Reverse-engineering Rust

    Trail of Bits engineers frequently audit software written in Rust. In Rust Binary Analysis, Feature by Feature, Ben Herzog discussed the compilation output of the Rust compiler. Understanding how Rust builds binaries is important, for example, to optimize Rust programs or to understand the interaction between safe and unsafe Rust code. The talk focused on the debug compilation mode to showcase how the Rust compiler generates code for iterating over ranges and uses iterators or optimizes the layout of Rust enums. The presenter also noted that strings in Rust are not null-terminated, which can cause some reverse-engineering tools like Ghidra to produce hard-to-understand output.

    The talk author posed four questions that should be answered when encountering function calls related to traits:

    • What is the name of the function being called (e.g., next)?
    • On what type is the function defined (e.g., Values<String, Person>)?
    • Which type is returned from the function (e.g., Option)?
    • What trait is the function part of (e.g., Iterator<Type=Person>)?

    More details can be found in the blog post by Ben Herzog.

    Proprietary cryptography is considered harmful

    Under TETRA:BURST, researchers disclosed multiple vulnerabilities in the TETRA radio protocol. The protocol is used by government agencies, police, military, and critical infrastructure across Europe and other areas.
    It is striking how proprietary cryptography is still the default in some industries. Hiding the specification from security researchers by requiring them to sign an NDA greatly limits a system’s reviewability.

    Due to export controls, several classes of algorithms exist in TETRA. One of the older ones, TEA1, is still actively deployed today but uses a key length of only 32 bits. Even though the specifiers no longer recommend using it, it is still actively being used in the field, which is especially problematic given that these weak protocols are counted upon to protect critical infrastructure.

    The researchers demonstrated the exploitability of the vulnerabilities by acquiring radio hardware from online resellers.

    Are you sure you own your train? Do you own your car?

    In Breaking “DRM” in Polish trains, researchers reported the challenges they encountered after they were recruited by an independent train repair company to determine why some trains no longer operated after being serviced.
    Using reverse engineering, the researchers uncovered several anti-features in the trains that made them stop working in various situations (e.g., after they didn’t move for a certain time or when they were located at GPS locations of competitor’s service shops). The talk covers interesting technical details about train software and how the researchers reverse-engineered the firmware, and it questions the extent to which users should have control over the vehicles or devices they own.

    What can we learn from hackers as developers and auditors?

    Hackers possess a unique problem-solving mindset, showing developers and auditors the importance of creative and unconventional thinking in cybersecurity. The event highlighted the necessity of securing systems correctly, and starting with a well understood threat model. Incorrect or proprietary approaches that rely on obfuscation do not adequately protect the end products. Controls such as hiding cryptographic primitives behind an NDA only obfuscate how the protocol works; they do not make the system more secure, and they make security researchers’ jobs harder.

    Emphasizing continuous learning, the congress demonstrated the ever-evolving nature of cybersecurity, urging professionals to stay abreast of the latest threats and technologies. Ethical considerations were a focal point, stressing the responsibility of developers and auditors to respect user privacy and data security in their work.

    The collaborative spirit of the hacker community, as seen at 37C3, serves as a model for open communication and mutual learning within the tech industry. At Trail of Bits, we are committed to demonstrating these values by sharing knowledge publicly through publishing blog posts like this one, resources like the Testing Handbook that help developers secure their code, and documentation about our research into zero-knowledge proofs.

    Closing words

    We highly recommend attending 37C3 in person, even though the date is unfortunately timed between Christmas and New Years, and most talks are live-streamed and available online. The congress includes many self-organized sessions, workshops, and assemblies, making it especially helpful for security researchers. We had initially planned to disclose our recently published LeftoverLocals bug, a vulnerability that affects notable GPU vendors like AMD, Qualcomm, and Apple, at 37C3, but we held off our release date to give GPU vendors more time to fix the bug. The bug disclosure was finally published on January 16; we may report our experience finding and disclosing the bug at the next year’s 38C3!

    Introducing DIFFER, a new tool for testing and validating transformed programs

    31 January 2024 at 14:30

    By Michael Brown

    We recently released a new differential testing tool, called DIFFER, for finding bugs and soundness violations in transformed programs. DIFFER combines elements from differential, regression, and fuzz testing to help users find bugs in programs that have been altered by software rewriting, debloating, and hardening tools. We used DIFFER to evaluate 10 software debloating tools, and it discovered debloating failures or soundness violations in 71% of the transformed programs produced by these tools.

    DIFFER fills a critical need in post-transformation software validation. Program transformation tools usually leave this task entirely to users, who typically have few (if any) tools beyond regression testing via existing unit/integration tests and fuzzers. These approaches do not naturally support testing transformed programs against their original versions, which can allow subtle and novel bugs to find their way into the modified programs.

    We’ll provide some background research that motivated us to create DIFFER, describe how it works in more detail, and discuss its future.

    If you prefer to go straight to the code, check out DIFFER on GitHub.

    Background

    Software transformation has been a hot research area over the past decade and has primarily been motivated by the need to secure legacy software. In many cases, this must be done without the software’s source code (binary only) because it has been lost, is vendor-locked, or cannot be rebuilt due to an obsolete build chain. Among the more popular research topics that have emerged in this area are binary lifting, recompiling, rewriting, patching, hardening, and debloating.

    While tools built to accomplish these goals have demonstrated some successes, they carry significant risks. When compilers lower source code to binaries, they discard contextual information once it is no longer needed. Once a program has been lowered to binary, the contextual information necessary to safely modify the original program generally cannot be fully recovered. As a result, tools that modify program binaries directly may inadvertently break them and introduce new bugs and vulnerabilities.

    While DIFFER is application-agnostic, we originally built this tool to help us find bugs in programs that have had unnecessary features removed with a debloating tool (e.g., Carve, Trimmer, Razor). In general, software debloaters try to minimize a program’s attack surface by removing unnecessary code that may contain latent vulnerabilities or be reused by an attacker using code-reuse exploit patterns. Debloating tools typically perform an analysis pass over the program to map features to the code necessary to execute them. These mappings are then used to cut code that corresponds to features the user doesn’t want. However, these cuts will likely be imprecise because generating the mappings relies on imprecise analysis steps like binary recovery. As a result, new bugs and vulnerabilities can be introduced into debloated programs during cutting, which is exactly what we have designed DIFFER to detect.

    How does DIFFER work?

    At a high level, DIFFER (shown in figure 1) is used to test an unmodified version of the program against one or more modified variants of the program. DIFFER allows users to specify seed inputs that correspond to both unmodified and modified program behaviors and features. It then runs the original program and the transformed variants with these inputs and compares the outputs. Additionally, DIFFER supports template-based mutation fuzzing of these seed inputs. By providing mutation templates, DIFFER can maximize its coverage of the input space and avoid missing bugs (i.e., false negatives).

    DIFFER expects to see the same outputs for the original and variant programs when given inputs that correspond to unmodified features. Conversely, it expects to see different outputs when it executes the programs with inputs corresponding to modified features. If DIFFER detects unexpected matching, differing, or crashing outputs, it reports them to the user. These reports help the user identify errors in the modified program resulting from the transformation process or its configuration.

    Figure 1: Overview of DIFFER

    When configuring DIFFER, the user selects one or more comparators to use when comparing outputs. While DIFFER provides many built-in comparators that check basic outputs such as return codes, console text, and output files, more advanced comparators are often needed. For this purpose, DIFFER allows users to add custom comparators for complex outputs like packet captures. Custom comparators are also useful for reducing false-positive reports by defining allowable differences in outputs (such as timestamps in console output). Our open-source release of DIFFER contains many useful comparator implementations to help users easily write their own comparators.

    However, DIFFER does not and cannot provide formal guarantees of soundness in transformation tools or the modified programs they produce. Like other dynamic analysis testing approaches, DIFFER cannot exhaustively test the input space for complex programs in the general case.

    Use case: evaluating software debloaters

    In a recent research study we conducted in collaboration with our friends at GrammaTech, we used DIFFER to evaluate debloated programs created by 10 different software debloating tools. We used these tools to remove unnecessary features from 20 different programs of varying size, complexity, and purpose. Collectively, the tools created 90 debloated variant programs that we then validated with DIFFER. DIFFER discovered that 39 (~43%) of these variants still had features that debloating tools failed to remove. Even worse, DIFFER found that 25 (~28%) of the variants either crashed or produced incorrect outputs in retained features after debloating.

    By discovering these failures, DIFFER has proven itself as a useful post-transformation validation tool. Although this study was focused on debloating transformations, we want to emphasize that DIFFER is general enough to test other transformation tools such as those used for software hardening (e.g., CFI, stack protections), translation (e.g., C-to-Rust transformers), and surrogacy (e.g., ML surrogate generators).

    What’s next?

    With DIFFER now available as open-source software, we invite the security research community to use, extend, and help maintain DIFFER via pull requests. We have several specific improvements planned as we continue to research and develop DIFFER, including the following:

    • Support running binaries in Docker containers to reduce environmental burdens.
    • Add new built-in comparators.
    • Add support for targets that require superuser privileges.
    • Support monitoring multiple processes that make up distributed systems.
    • Add runtime comparators (via instrumentation, etc.) for “deep” equivalence checks.

    Acknowledgements

    This material is based on work supported by the Office of Naval Research (ONR) under Contract No. N00014-21-C-1032. Any opinions, findings and conclusions, or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the ONR.

    Enhancing trust for SGX enclaves

    26 January 2024 at 14:00

    By Artur Cygan

    Creating reproducible builds for SGX enclaves used in privacy-oriented deployments is a difficult task that lacks a convenient and robust solution. We propose using Nix to achieve reproducible and transparent enclave builds so that anyone can audit whether the enclave is running the source code it claims, thereby enhancing the security of SGX systems.

    In this blog post, we will explain how we enhanced trust for SGX enclaves through the following steps:

    • Analyzed reproducible builds of Signal and MobileCoin enclaves
    • Analyzed a reproducible SGX SDK build from Intel
    • Packaged the SGX SDK in Nixpkgs
    • Prepared a reproducible-sgx repository demonstrating how to build SGX enclaves with Nix

    Background

    Introduced in 2015, Intel SGX is an implementation of confidential (or trusted) computing. More specifically, it is a trusted execution environment (TEE) that allows users to run confidential computations on a remote computer owned and maintained by an untrusted party. Users trust the manufacturer (Intel, in this case) of a piece of hardware (CPU) to protect the execution environment from tampering, even by the highest privilege–level code, such as kernel malware. SGX code and data live in special encrypted and authenticated memory areas called enclaves.

    During my work at Trail of Bits, I observed a poorly addressed trust gap in systems that use SGX, where the user of an enclave doesn’t necessarily trust the enclave author. Instead, the user is free to audit the enclave’s open-source code to verify its functionality and security. This setting can be observed, for instance, in privacy-oriented deployments such as Signal’s contact discovery service or MobileCoin’s consensus protocol. To validate trust, the user must check whether the enclave was built from trusted code. Unfortunately, this turns out to be a difficult task because the builds tend to be difficult to reproduce and rely on a substantial amount of precompiled binary code. In practice, hardly anyone verifies the builds and has no option but to trust the enclave author.

    To give another perspective—a similar situation happens in the blockchain world, where smart contracts are deployed as bytecode. For instance, Etherscan will try to reproduce on-chain EVM bytecode to attest that it was compiled from the claimed Solidity source code. Users are free to perform the same operation if they don’t trust Etherscan.

    A solution to this problem is to build SGX enclaves in a reproducible and transparent way so that multiple parties can independently arrive at the same result and audit the build for any supply chain–related issues. To achieve this goal, I helped port Intel’s SGX SDK to Nixpkgs, which allows building SGX enclaves with the Nix package manager in a fully reproducible way so any user can verify that the build is based on trusted code.

    To see how reproducible builds complete the trust chain, it is important to first understand what guarantees SGX provides.

    How does an enclave prove its identity?

    Apart from the above mentioned TEE protection (nothing leaks out and execution can’t be altered), SGX can remotely prove the enclave’s identity, including its code hash, signature, and runtime configuration. This feature is called remote attestation and can be a bit foreign for someone unfamiliar with this type of technology.

    When an enclave is loaded, its initial state (including code) is hashed by the CPU into a measurement hash, also known as MRENCLAVE. The hash changes only if the enclave’s code changes. This hash, along with other data such as the signer and environment details, is placed in a special memory area accessible only to the SGX implementation. The enclave can ask the CPU to produce a report containing all this data, including a piece of enclave-defined data (called report_data),and then passes it to the special Intel-signed quoting enclave to sign the report (called a quote from now on) so that it can be delivered to the remote party and verified.

    Next, the verifier checks the quote’s authenticity with Intel and the relevant information from the quote. Although there are a few additional checks and steps at this point, in our case, the most important thing to check is the measurement hash, which is a key component of trust verification.

    What do we verify the hash against? The simplest solution is to hard code a trusted MRENCLAVE value into the client application itself. This solution is used, for instance, by Signal, where MRENCLAVE is placed in the client’s build config and verified against the hash from the signed quote sent by the Signal server. Bundling the client and MRENCLAVE makes sense; after all, we need to audit and trust the client application code too. The downside is that the client application has to be rebuilt and re-released when the enclave code changes. If the enclave modifications are expected to be frequent or if it is important to quickly move clients to another enclave—for instance, in the event of a security issue—clients can use a more dynamic approach and fetch MRENCLAVE values from trusted third parties.

    Secure communication channel

    SGX can prove the identity of an enclave and a piece of report_data that was produced by it, but it’s up to the enclave and verifier to establish a trusted and secure communication channel. Since SGX enclaves are flexible and can freely communicate with the outside world over the network through ECALLS and OCALLs, SGX itself doesn’t impose any specific protocol or implementation for the channel. The enclave developer is free to decide, as long as the channel is encrypted, is authenticated, and terminates inside the enclave.

    For instance, the SGX SDK implements an example of an authenticated key exchange scheme for remote attestation. However, the scheme assumes a DRM-like system where the enclave’s signer is trusted and the server’s public key is hard coded in the enclave’s code, so it’s unsuitable for use in a privacy-oriented deployment of SGX such as Signal.

    If we don’t trust the enclave’s author, we can leverage the report_data to establish such a channel. This is where the SGX guarantees essentially end, and from now on, we have to trust the enclave’s source code to do the right thing. This fact is not obvious at first but becomes evident if we look, for instance, at the RA-TLS paper on how to establish a secure TLS channel that terminates inside an enclave:

    The enclave generates a new public-private RA-TLS key pair at every startup. The RA-TLS key need not be persisted since generating a fresh key on startup is reasonably cheap. Not persisting the key reduces the key’s exposure and avoids common problems related to persistence such as state rollback protection. Interested parties can inspect the source code to convince themselves that the key is never exposed outside of the enclave.

    To maintain the trust chain, RA-TLS uses the report_data from the quote that commits to the enclave’s public key hash. A similar method can be observed in the Signal protocol implementing Noise Pipes and committing to the handshake hash in the report_data.

    SGX encrypts and authenticates the enclave’s memory, but it’s up to the code running in the enclave to protect the data. Nothing stops the enclave code from disclosing any information to the outside world. If we don’t know what code runs in the enclave, anything can happen.

    Fortunately, we know the code because it’s open source, but how do we make sure that the code at a particular Git commit maps to the MRENCLAVE hash an enclave is presenting? We have to reproduce the enclave build, calculate its MRENCLAVE hash, and compare it with the hash obtained from the quote. If the build can’t be reproduced, our remaining options are either to trust someone who confirmed the enclave is safe to use or to audit the enclave’s binary code.

    Why are reproducible builds hard?

    The reproducibility type we care about is bit-for-bit reproducibility. Some software might be semantically identical despite minor differences in their artifacts. SGX enclaves are built into .dll or .so files using the SGX SDK and must be signed with the author’s RSA key. Since we calculate hashes of artifacts, even a one-bit difference will produce a different hash. We might get away with minor differences, as the measurement process omits some details from the enclave executable file (such as the signer), but having full file reproducibility is desirable. This is a non-trivial task and can be implemented in multiple ways.

    Both Signal and MobileCoin treat this task seriously and aim to provide a reproducible build for their enclaves. For example, Signal claims the following:

    The enclave code builds reproducibly, so anyone can verify that the published source code corresponds to the MRENCLAVE value of the remote enclave.

    The initial version of Signal’s contact discovery service build (archived early 2023) is based on Debian and uses a .buildinfo file to lock down the system dependencies; however, locking is done based on versions rather than hashes. This is a limitation of Debian, as we read on the BuildinfoFiles page. The SGX SDK and a few other software packages are built from sources fetched without checking the hash of downloaded data. While those are not necessarily red flags, more trust than necessary is placed in third parties (Debian and GitHub).

    From the README, it is unclear how the .buildinfo file is produced because there is no source for the mentioned derebuild.pl script. Most likely, the .buildinfo file is generated during the original build of the enclave’s Debian package and checked into the repository. It is unclear whether this mechanism guarantees capture of all the build inputs and doesn’t let any implicit dependencies fall through the cracks. Unfortunately, I couldn’t reproduce the build because both the Docker and Debian instructions from the README failed, and shortly after that, I noticed that Signal moved to a new iteration of the contact discovery service.

    The current version of Signal’s contact discovery service build is slightly different. Although I didn’t test the build, it’s based on a Docker image that suffers from similar issues such as installing dependencies from a system package manager with network access, which doesn’t guarantee reproducibility.

    Another example is MobileCoin, which provides a prebuilt Docker image with a build environment for the enclave. Building the same image from Dockerfile most likely won’t result in a reproducible hash we can validate, so the image provided by MobileCoin must be used to reproduce the enclave. The problem here is that it’s quite difficult to audit Docker images that are hundreds of megabytes large, and we essentially need to trust MobileCoin that the image is safe.

    Docker is a popular choice for reproducing environments, but it doesn’t come with any tools to support bit-for-bit reproducibility and instead focuses on delivering functionally similar environments. A complex Docker image might reproduce the build for a limited time, but the builds will inevitably diverge, if no special care is taken, due to filesystem timestamps, randomness, and unrestricted network access.

    Why Nix can do it better

    Nix is a cross-platform source-based package manager that features the Nix language to describe packages and a large collection of community-maintained packages called Nixpkgs. NixOS is a Linux distribution built on top of Nix and Nixpkgs, and is designed from the ground up to focus on reproducibility. It is very different from the conventional package managers. For instance, it doesn’t install anything into regular system paths like /bin or /usr/lib. Instead, it uses its own /nix/store directory and symlinks to the packages installed there. Every package is prefixed with a hash capturing all the build inputs like dependency graph or compilation options. This means that it is possible to have the same package installed in multiple variants differing only by build options; from Nix’s perspective, it is a different package.

    Nix does a great job at surfacing most of the issues that could render the build unreproducible. For example, a Nix build will most likely break during development when an impurity (i.e., a dependency that is not explicitly declared as input to the build) is encountered, forcing the developer to fix it. Impurities are often captured from the environment, which includes environment variables or hard-coded system-wide directories like /usr/lib. Nix aims to address all those issues by sandboxing the builds and fixing the filesystem timestamps. Nix also requires all inputs that are fetched from the network to be pinned. On top of that, Nixpkgs contain many patches (gnumake, for instance) to fix reproducibility issues in common software such as compilers or build systems.

    Reducing impurities increases the chance of build reproducibility, which in turn increases the trust in source-to-artifact correspondence. However, ultimately, reproducibility is not something that can be proven or guaranteed. Under the hood, a typical Nix build runs compilers that could rely on some source of randomness that could leak into the compiled artifacts. Ideally, reproducibility should be tracked on an ongoing basis. An example of such a setup is the r13y.com site, which tracks reproducibility of the NixOS image itself.

    Apart from strong reproducibility properties, Nix also shines when it comes to dependency transparency. While Nix caches the build outputs by default, every package can be built from source, and the dependency graph is rooted in an easily auditable stage0 bootstrap, which reduces trust in precompiled binary code to the minimum.

    Issues in Intel’s use of Nix

    Remember the quoting enclave that signs attestation reports? To deliver all SGX features, Intel needed to create a set of privileged architectural enclaves, signed by Intel, that perform tasks too complex to implement in CPU microcode. The quoting enclave is one of them. These enclaves are a critical piece of SGX because they have access to hardware keys burned into the CPU and perform trusted tasks such as remote attestation. However, a bug in the quoting enclave’s code could invalidate the security guarantees of the whole remote attestation protocol.

    Being aware of that, Intel prepared a reproducible Nix-based build that builds SGX SDK (required to build any enclave) and all architectural enclaves. The solution uses Nix inside a Docker container. I was able to reproduce the build, but after a closer examination, I identified a number of issues with it.

    First, the build doesn’t pin the Docker image or the SDK source hashes. The SDK can be built from source, but the architectural enclaves build downloads a precompiled SDK installer from Intel and doesn’t even check the hash. Although Nix is used, there are many steps that happen outside the Nix build.

    The Nix part of the build is unfortunately incorrect and doesn’t deliver much value. The dependencies are hand picked from the prebuilt cache, which circumvents the build transparency Nix provides. The build runs in a nix-shell that should be used only for development purposes. The shell doesn’t provide the same sandboxing features as the regular Nix build and allows different kinds of impurities. In fact, I discovered some impurities when porting the SDK build to Nixpkgs. Some of those issues were also noticed by another researcher but remain unaddressed.

    Bringing SGX SDK to Nixpkgs

    I concluded that the SGX SDK should belong to Nixpkgs to achieve truly reproducible and transparent enclave builds. It turned out there was already an ongoing effort, which I joined and helped finish. The work has been expanded and maintained by the community since then. Now, any SGX enclave can be easily built with Nix by using the sgx-sdk package. I hope that once this solution matures, Nixpkgs maintainers can maintain it together with Intel and bring it into the official SGX SDK repository.

    We prepared the reproducible-sgx GitHub repository to show how to build Intel’s sample enclaves with Nix and the ported SDK. While this shows the basics, SGX enclaves can be almost arbitrarily complex and use different libraries and programming languages. If you wish to see another example, feel free to open an issue or a pull request.

    In this blog post, we discussed only a slice of the possible security issues concerning SGX enclaves. For example, numerous security side-channel attacks have been demonstrated on SGX, such as the recent attack on Blu-ray DRM. If you need help with security of a system that uses SGX or Nix, don’t hesitate to contact us.

    Resources

    We build X.509 chains so you don’t have to

    25 January 2024 at 14:00

    By William Woodruff

    For the past eight months, Trail of Bits has worked with the Python Cryptographic Authority to build cryptography-x509-verification, a brand-new, pure-Rust implementation of the X.509 path validation algorithm that TLS and other encryption and authentication protocols are built on. Our implementation is fast, standards-conforming, and memory-safe, giving the Python ecosystem a modern alternative to OpenSSL’s misuse- and vulnerability-prone X.509 APIs for HTTPS certificate verification, among other protocols. This is a foundational security improvement that will benefit every Python network programmer and, consequently, the internet as a whole.

    Our implementation has been exposed as a Python API and is included in Cryptography’s 42.0.0 release series, meaning that Python developers can take advantage of it today! Here’s an example usage, demonstrating its interaction with certifi as a root CA bundle:

    As part of our design we also developed x509-limbo, a test vector and harness suite for evaluating the standards conformance and consistent behavior of various X.509 path validation implementations. x509-limbo is permissively licensed and reusable, and has already found validation differentials across Go’s crypto/x509, OpenSSL, and two popular pre-existing Rust X.509 validators.

    X.509 path validation

    X.509 and path validation are both too expansive to reasonably summarize in a single post. Instead, we’ll grossly oversimplify X.509 to two basic facts:

    1. X.509 is a certificate format: it binds a public key and some metadata for that key (what it can be used for, the subject it identifies) to a signature, which is produced by a private key. The subject of a certificate can be a domain name, or some other relevant identifier.
    2. Verifying an X.509 certificate entails obtaining the public key for its signature, using that public key to check the signature, and (finally) validating the associated metadata against a set of validity rules (sometimes called an X.509 profile). In the context of the public web, there are two profiles that matter: RFC 5280 and the CA/B Forum Baseline Requirements (“CABF BRs”).

    These two facts make X.509 certificates chainable: an X.509 certificate’s signature can be verified by finding the parent certificate containing the appropriate public key; the parent, in turn, has its own parent. This chain building process continues until an a priori trusted certificate is encountered, typically because of trust asserted in the host OS itself (which maintains a pre-configured set of trusted certificates).

    Chain building (also called “path validation”) is the cornerstone of TLS’s authentication guarantees: it allows a web server (like x509-limbo.com) to serve an untrusted “leaf” certificate along with zero or more untrusted parents (called intermediates), which must ultimately chain to a root certificate that the connecting client already knows and trusts.

    As a visualization, here is a valid certificate chain for x509-limbo.com, with arrows representing the “signed by” relationship:

    In this scenario, x509-limbo.com serves us two initially untrusted certificates: the leaf certificate for x509-limbo.com itself, along with an intermediate (Let’s Encrypt R3) that signs for the leaf.

    The intermediate in turn is signed for by a root certificate (ISRG Root X1) that’s already trusted (by virtue of being in our OS or runtime trust store), giving us confidence in the complete chain, and thus the leaf’s public key for the purposes of TLS session initiation.

    What can go wrong?

    The above explanation of X.509 and path validation paints a bucolic picture: to build the chain, we simply iterate through our parent candidates at each step, terminating on success once we reach a root of trust or with failure upon exhausting all candidates. Simple, right?

    Unfortunately, the reality is far messier:

    • The abstraction above (“one certificate, one public key”) is a gross oversimplification. In reality, a single public key (corresponding to a single “logical” issuing authority) may have multiple “physical” certificates, for cross-issuance purposes.
    • Because the trusted set is defined by the host OS or language runtime, there is no “one true” chain for a given leaf certificate. In reality, most (leaf, [intermediates]) tuples have several candidate solutions, of which any is a valid chain.
      • This is the “why” for the first bullet: a web server can’t guarantee that any particular client has any particular set of trusted roots, so intermediate issuers typically have multiple certificates for a single public key to maximize the likelihood of a successfully built chain.
    • Not all certificates are made equal: certificates (including different “physical” certificates for the same “logical” issuing authority) can contain constraints that prevent otherwise valid paths: name restrictions, overall length restrictions, usage restrictions, and so forth. In other words, a correct path building implementation must be able to backtrack after encountering a constraint that eliminates the current candidate chain.
    • The X.509 profile itself can impose constraints on both the overall chain and its constituent members: the CABF BRs, for example, forbid known-weak signature algorithms and public key types, and many path validation libraries additionally allow users to constrain valid chain constructions below a configurable maximum length.

    In practice, these (non-exhaustive) complications mean that our simple recursive linear scan for chain building is really a depth-first graph search with both static and dynamic constraints. Failing to treat it as such has catastrophic consequences:

    • Failing to implement a dynamic search typically results in overly conservative chain constructions, sometimes with Internet-breaking outcomes. OpenSSL 1.0.x’s inability to build the “chain of pain” in 2020 is one recent example of this.
    • Failing to honor the interior constraints and profile-wide certificate requirements can result in overly permissive chain constructions. CVE-2021-3450 is one recent example of this, causing some configurations of OpenSSL 1.1.x to accept chains built with non-CA certificates.

    Consequently, building both correct and maximal (in the sense of finding any valid chain) X.509 path validator is of the utmost importance, both for availability and security.

    Quirks, surprises, and ambiguities

    Despite underpinning the Web PKI and other critical pieces of Internet infrastructure, there are relatively few independent implementations of X.509 path validation: most platforms and languages reuse one of a small handful of common implementations (OpenSSL and its forks, NSS, Go’s crypto/x509, GnuTLS, etc.) or the host OS’s implementation (CryptoAPI on Windows, Security on macOS). This manifests as a few recurring quirks and ambiguities:

    • A lack of implementation diversity means that mistakes and design decisions (such as overly or insufficiently conservative profile checks) leak into other implementations: users complain when a PKI deployment that was only tested on OpenSSL fails to work against crypto/x509, so implementations frequently bend their specification adherence to accommodate real-world certificates.
    • The specifications often mandate surprising behavior that (virtually) no client implements correctly. RFC 5280, for example, stipulates that path length and name constraints do not apply to self-issued intermediates, but this is widely ignored in practice.
    • Because the specifications themselves are so infrequently interpreted, they contain still-unresolved ambiguities: treating roots as “trust anchors” versus policy-bearing certificates, handling of serial numbers that are 20 bytes long but DER-encoded with 21 bytes, and so forth.

    Our implementation needed to handle each of these families of quirks. To do so consistently, we leaned on three basic strategies:

    • Test first, then implement: To give ourselves confidence in our designs, we built x509-limbo and pre-validated it against other implementations. This gave us both a coverage baseline for our own implementation, and empirical justification for relaxing various policy-level checks, where necessary.
    • Keep everything in Rust: Rust’s performance, strong type system and safety properties meant that we could make rapid iterations to our design while focusing on algorithmic correctness rather than memory safety. It certainly didn’t hurt that PyCA Cryptography’s X.509 parsing is already done in Rust, of course.
    • Obey Sleevi’s Laws: Our implementation treats path construction and path validation as a single unified step with no “one” true chain, meaning that the entire graph is always searched before giving up and returning a failure to the user.
    • Compromise where necessary: As mentioned above, implementations frequently maintain compatibility with OpenSSL, even where doing so violates the profiles defined in RFC 5280 and the CABF BRs. This situation has improved dramatically over the years (and improvements have accelerated in pace, as certificate issuance periods have shortened on the Web PKI), but some compromises are still necessary.

    Looking forward

    Our initial implementation is production-ready, and comes in at around 2,500 lines of Rust, not counting the relatively small Python-only API surfaces or x509-limbo:

    From here, there’s much that could be done. Some ideas we have include:

    • Expose APIs for client certificate path validation. To expedite things, we’ve focused the initial implementation on server validation (verifying that a leaf certificate attesting to a specific DNS name or IP address chains up to a root of trust). This ignores client validation, wherein the client side of a connection presents its own certificate for the server to verify against a set of known principals. Client path validation shares the same fundamental chain building algorithm as server validation, but has a slightly different ideal public API (since the client’s identity needs to be matched against a potentially arbitrary number of identities known to the server).
    • Expose different X.509 profiles (and more configuration knobs). The current APIs expose very little configuration; the only things a user of the Python API can change are the certificate subject, the validation time, and the maximum chain depth. Going forward, we’ll look into exposing additional knobs, including pieces of state that will allow users to perform verifications with the RFC 5280 certificate profile and other common profiles (like Microsoft’s Authenticode profile). Long term, this will help bespoke (such as corporate) PKI use cases to migrate to Cryptography’s X.509 APIs and lessen their dependency on OpenSSL.
    • Carcinize existing C and C++ X.509 users. One of Rust’s greatest strengths is its native, zero-cost compatibility with C and C++. Given that C and C++ implementations of X.509 and path validation have historically been significant sources of exploitable memory corruption bugs, we believe that a thin “native” wrapper around cryptography-x509-verification could have an outsized positive impact on the security of major C and C++ codebases.
    • Spread the gospel of x509-limbo. x509-limbo was an instrumental component in our ability to confidently ship an X.509 path validator. We’ve written it in such a way that should make integration into other path validation implementations as simple as downloading and consuming a single JSON file. We look forward to helping other implementations (such as rustls-webpki) integrate it directly into their own testing regimens!

    If any of these ideas interests you (or you have any of your own), please get in touch! Open source is key to our mission at Trail of Bits, and we’d love to hear about how we can help you and your team take the fullest advantage of and further secure the open-source ecosystem.

    Acknowledgments

    This work required the coordination of multiple independent parties. We would like to express our sincere gratitude to each of the following groups and individuals:

    • The Sovereign Tech Fund, whose vision for OSS security and funding made this work possible.
    • The PyCA Cryptography maintainers (Paul Kehrer and Alex Gaynor), who scoped this work from the very beginning and offered constant feedback and review throughout the development process.
    • The BetterTLS development team, who both reviewed and merged patches that enabled x509-limbo to vendor and reuse their (extensive) testsuite.

    Celebrating our 2023 open-source contributions

    24 January 2024 at 14:00

    At Trail of Bits, we pride ourselves on making our best tools open source, such as Slither, PolyTracker, and RPC Investigator. But while this post is about open source, it’s not about our tools…

    In 2023, our employees submitted over 450 pull requests (PRs) that were merged into non-Trail of Bits repositories. This demonstrates our commitment to securing the software ecosystem as a whole and to improving software quality for everyone. A representative list of contributions appears at the end of this post, but here are some highlights:

    • Sigstore-conformance, a vital component of our Sigstore initiative in open-source engineering, functions as an integration test suite for diverse Sigstore client implementations. Ensuring conformity to the Sigstore client testing suite, it rigorously evaluates overall client behavior, addressing critical scenarios and aligning with ongoing efforts to establish an official Sigstore client specification. This workflow-focused testing suite seamlessly integrates into workflows with minimal configuration, offering comprehensive testing for Sigstore clients.
    • Protobuf-specs is another initiative in our open-source engineering. It is a collaborative repository for standardized data models and protocols across various Sigstore clients andhouses specifications for Sigstore messages. To update protobuf definitions, use Docker to generate protobuf stubs by running $ make all, resulting in Go and Python files under the ‘gen/’ directory.
    • pyOpenSSL stands as the predominant Python library for integrating OpenSSL functionality. Over approximately the past nine months, we have been actively involved in cleanup and maintenance tasks on pyOpenSSL as part of our contract with the STF. pyOpenSSL serves as a thin wrapper around a subset of the OpenSSL library, where many object methods simply invoke corresponding functions in the OpenSSL library.
    • Homebrew-core serves as the central repository for the default Homebrew tap, encompassing a collection of software packages and associated formulas for seamless installations. Once you’ve configured Homebrew on your Mac or Linux system, you gain the ability to execute “brew install” commands for software available in this repository. Emilio Lopez, an application security engineer, actively contributed to this repository by submitting several pull requests and introducing new formulas or updating existing ones. Emilio’s focus has predominantly been on tools developed by ToB, such as crytic-compile, solc-select, Caracal, and others. Consequently, individuals can effortlessly install these tools with a straightforward “brew install” command, streamlining the installation process.
    • Ghidra, a National Security Agency Research Directorate creation, is a powerful software reverse engineering (SRE) framework. It offers advanced tools for code analysis on Windows, macOS, and Linux, including disassembly, decompilation, and scripting. Supporting various processor instruction sets, Ghidra serves as a customizable SRE research platform, aiding in the analysis of malicious code for cybersecurity purposes. We fixed numerous bugs to enhance its functionality, particularly in support of our work on DARPA’s AMP (Assured Micropatching) program.

    We would like to acknowledge that submitting a PR is only a tiny part of the open-source experience. Someone has to review the PR. Someone has to maintain the code after the PR is merged. And submitters of earlier PRs have to write tests to ensure the functionality of their code is preserved.

    We contribute to these projects in part because we love the craft, but also because we find these projects useful. For this, we offer the open-source community our most sincere thanks and wish everyone a happy, safe, and productive 2024!

    Some of Trail of Bits’ 2023 open-source contributions

    AI/ML

    Cryptography

    Languages and compilers

    Libraries

    Tech infrastructure

    Software analysis tools

    Blockchain software

    Reverse engineering tools

    Software analysis/transformational tools

    Packing ecosystem/supply chain

    Our thoughts on AIxCC’s competition format

    18 January 2024 at 14:00

    By Michael Brown

    Late last month, DARPA officially opened registration for their AI Cyber Challenge (AIxCC). As part of the festivities, DARPA also released some highly anticipated information about the competition: a request for comments (RFC) that contained a sample challenge problem and the scoring methodology. Prior rules documents and FAQs released by DARPA painted the competition’s broad strokes, but with this release, some of the finer details are beginning to emerge.

    For those who don’t have time to pore over the 50+ pages of information made available to date, here’s a quick overview of the competition’s structure and our thoughts on it, including areas where we think improvements or clarifications are needed.

    The AIxCC is a grand challenge from DARPA in the tradition of the Cyber Grand Challenge and Driverless Grand Challenge.

    *** Disclaimer: AIxCC’s rules and scoring methods are subject to change. This summary is for our readership’s awareness and is NOT an authoritative document. Those interested in participating in AIxCC should refer to DARPA’s website and official documents for firsthand information. ***

    The competition at a high level

    Competing teams are tasked with building AI-driven, fully automated cyber reasoning systems (CRSs) that can identify and patch vulnerabilities in programs. The CRS cannot receive any human assistance while discovering and patching vulnerabilities in challenge projects. Challenge projects are modified versions of critical real-world software like the Linux kernel and the Jenkins automation server. CRSs must submit a proof of vulnerability (PoV) and a proof of understanding (PoU) and may submit a patch for each vulnerability they discover. These components are scored individually and collectively to determine the winning CRS.

    The competition has four stages:

    • Registration (January–April 2024): Open and Small Business registration tracks are open for registration. After submitting their concept white papers, up to seven small businesses will be selected for a $1 million prize to fund their participation in AIxCC.
    • Practice Rounds (March–July 2024): Practice and familiarization rounds allow competitors to realistically test their systems.
    • Semifinals (August 2024 at DEF CON): In the first competition round, the top seven teams advance to the final round, each receiving a $2 million prize.
    • Finals (August 2025 at DEF CON): In the grand finale, the top three performing CRSs receive prizes of $4 million, $3 million, and $1.5 million, respectively.

    Figure 1: AIxCC events overview

    The challenge projects

    The challenge projects that each team’s CRS must handle are modeled after real-world software and are very diverse. Challenge problems may include source code written in Java, Rust, Go, JavaScript, TypeScript, Python, Ruby, or PHP, but at least half of them will be C/C++ programs that contain memory corruption vulnerabilities. Other types of vulnerabilities that competitors should expect to see will be drawn from MITRE’s Top 25 Most Dangerous Software Weaknesses.

    Challenge problems include source code, a modifiable build process and environment, test harnesses, and a public functionality test suite. Using APIs for these resources, competing CRSs must employ various types of AI/ML and conventional program analysis techniques to discover, locate, trigger, and patch vulnerabilities in the challenge problem. To score points, the CRS must submit a PoV and PoU and may submit a patch. The PoV is an input that will trigger the vulnerability via one of the provided test harnesses. The PoU must specify which sanitizers and harnesses (i.e., vulnerability type, perhaps a CWE number) the PoV will trigger and the lines of code that make up the vulnerability.

    The RFC contains a sample challenge problem that reintroduces a vulnerability that was disclosed in 2021 back into the Linux kernel. The challenge problem example provided is a single function written in C with a heap-based buffer overflow vulnerability and an accompanying sample patch. Unfortunately, this example does not come with example fuzzing harnesses, a test suite, or a build harness. DARPA is planning to release more examples with more details in the future, starting with a new example challenge problem from the Jenkins automation server.

    Scoring

    Each competing CRS will be given an overall score calculated as a function of four components:

    • Vulnerability Discovery Score: Points are awarded for each PoV that triggers the AIxCC sanitizer specified in the accompanying PoU.
    • Program Repair Score: Points are awarded if a patch accompanying the PoV/PoU prevents AIxCC sanitizers from triggering and does not break expected functionality. A small bonus is applied if the patch passes a code linter without error.
    • Accuracy Multiplier: This multiplies the overall score to award CRSs with high accuracy (i.e., minimizing invalid or rejected PoVs, PoUs, and patches).
    • Diversity Multiplier: This multiplies the overall score to award CRSs that handle diverse sets of CWEs and source code languages.

    There are a number of intricacies involved in how the scoring algorithm combines these components. For example, successfully patching a discovered vulnerability is incentivized highly to prevent competitors from focusing solely on vulnerability discovery and ignoring patching. If you’re interested in the detailed math, please check out the RFC scoring for details.

    General thoughts on AIxCC’s format RFC

    In general, we think AIxCC will help significantly advance the state of the art in automated vulnerability detection and remediation. This competition format is a major step beyond the Cyber Grand Challenge in terms of realism for several reasons—namely, the challenge problems 1) are made from real-world software and vulnerabilities, 2) include source code and are compiled to real-world binary formats, and 3) come in many different source languages for many different computing stacks.

    Additionally, we think the focus on AI/ML–driven CRSs for this competition will help create new research areas by encouraging new approaches to software analysis problems that conventional approaches have been unable to solve (due to fundamental limits like the halting problem).

    Concerns we’ve raised in our RFC response

    DARPA has solicited feedback on their scoring algorithm and exemplar challenges by releasing them as an RFC. We responded to their RFC earlier this month and highlighted several concerns that are front of mind for us as we start building our system. We hope that the coming months bring clarifications or changes to address these concerns.

    Construction of challenge problems

    We have two primary concerns related to the challenge problems. First, it appears that the challenges will be constructed by reinjecting previously disclosed vulnerabilities into recent versions of an open-source project. This approach, especially for vulnerabilities that have been explained in detail in blog posts, is almost certainly contained in the training data of commercial large language models (LLMs) such as ChatGPT and Claude.

    Given their high bandwidth for memorization, CRSs based on these models will be unfairly advantaged when detecting and patching these vulnerabilities compared to other approaches. Combined with the fact that LLMs are known to perform significantly worse on novel instances of problems, this strongly suggests that LLM-based CRSs that score highly in AIxCC will likely struggle when used outside the competition. As a result, we recommend that DARPA not use historic vulnerabilities that were disclosed before the training epoch for partner-provided commercial models to create challenge problems for the competition.

    Second, it appears that all challenge problems will be created using open-source projects that will be known to competitors in advance of the competition. This will allow teams to conduct large-scale pre-analysis and specialize their LLMs, fuzzers, and static analyzers to the known source projects and their historical vulnerabilities. These CRSs would be too specific to the competition and may not be usable on different source projects without significant manual effort to retarget the CRSs. To address this potential problem, we recommend that at least 65% of challenge problems be made for source projects that are kept secret prior to each stage of the competition.

    PoU granularity

    We are concerned about the potential for the scoring algorithm to reject valid PoVs/PoUs if AIxCC sanitizers are overly granular. For example, CWE-787 (out-of-bounds write), CWE-125 (out-of-bounds read), and CWE-119 (out-of-bounds buffer operation) are all listed in the MITRE top 25 weaknesses report. All three could be valid to describe a single vulnerability in a challenge problem and are cross-listed in the CWE database. If multiple sanitizers are provided for each of these CWEs but only one is considered correct, it is possible for otherwise valid submissions to be rejected for failing to properly distinguish between three very closely related sanitizers. We recommend that AIxCC sanitizers be sufficiently coarse-grained to avoid unfair penalization of submitted PoUs.

    Scoring

    As currently designed, performance metrics (e.g., CPU runtime, memory overhead, etc.) are not directly addressed by the competition’s areas of excellence, nor are they factored into functionality scores for patches. Performance is a critical nonfunctional software requirement and an important aspect of patch effectiveness and patch acceptability. We think it’s important for patches generated by competing CRSs to maintain the program’s performance within an acceptable threshold. Without this consideration in scoring, it is possible for teams to submit patches that are valid and correct but ultimately so nonperforming that they would not be used in a real-world scenario. We recommend the competition’s functionality score be augmented with a performance component.

    What’s next?

    Although we’ve raised some concerns in our RFC response, we’re very excited for the official kickoff in March and the actual competition later this year in August. Look out for our next post in this series, where we will talk about how our prior work in this area has influenced our high-level approach and discuss the technical areas of this competition we find most fascinating.

    ❌
    ❌