Normal view

There are new articles available, click to refresh the page.
Before yesterdayVulnerabily Research

Cranim: A Toolkit for Cryptographic Visualization

By: Eli Sohl
24 May 2024 at 19:30

Let’s kick this off with some examples. Here’s a seamless loop illustrating CBC-mode encryption:

Here’s a clip showing a code block being rewritten to avoid leaking padding information in error messages:

Here’s an illustration of a block cipher operating in CTS mode:

You may be surprised to learn that each of these illustrations was generated from ≤30 lines of code (30, 9, and 23 lines, respectively), without any golfing. The exact code used can be seen in the Cranim example gallery, along with many other examples of what this toolkit can do.

But let’s take a step back. You may be familiar with the Cryptopals Guided Tour. These longform videos discuss various topics from cryptography, loosely following the path laid out by the cyptopals challenges, and starting with set 2 I began to bring in custom-made visual aids to support discussion as the concepts involved grew more abstract.

To create these visuals, the tool I reached for was Manim, a math visualization library best known for its use in 3Blue1Brown‘s videos (in fact, he is also Manim’s original author). But while this library is very powerful (seriously, check out their example gallery), it is biased towards math, not computer science. It lacks support for such basic tasks as visualizing (or rewriting) a buffer; drawing a wire diagram; modifying a code snippet; and so on. To adapt this library to my use case, I had to write an extensive plugin adding all this functionality and more. Today I am releasing this plugin, cranim, in the hope that it will be useful to other computer science educators. You can find installation and usage guidelines in the GitHub repo: https://github.com/nccgroup/manim-cranim

The default color scheme is optimized for accessibility; contrast between colors should be clear even to colorblind viewers. This color palette was originally published for use by data scientists in multicolor figures. The default background color, a warm and pleasant off-white, is similarly meant to promote legibility: studies have shown that dark text on light backgrounds scans faster and more accurately than the inverse. The precise tone of the background is intended to evoke a poorly-cleaned whiteboard, a familiar sight to any computer science student.

While the toolkit is oriented towards animations, Manim is equally capable of producing static images such as the illustration of CTS mode above; in cases where vector graphics are preferred, Manim can both consume and produce SVG files. The subset of Manim used by Cranim exclusively uses vector representations internally, making it a good fit for this use case.

Cranim is still under active development (as is the Guided Tour), so I have not yet written API docs; they will come as the API stabilizes. However, I keep the Example Gallery up to date, so you can turn to it for simple examples of idiomatic usage. If you’re interested in a less trivial example, the full source code for the animations used in the 17th Guided Tour video can be found in this gist (though note that parts of it are hacky, as it was written quickly and has not been reviewed or edited; in this sense it closely models the sort of code the average Cranim user might write).

If you make something with Cranim, please feel free to send it my way! I’m curious to see what uses people find for this tool, and I’m happy to take feature requests (or bug reports) on GitHub as well.

Announcing the Cryptopals Guided Tour Video 17: Padding Oracles!

By: Eli Sohl
24 May 2024 at 18:59

Hello and welcome back to the Cryptopals Guided Tour (previously, previously)! Today we are taking on Challenge 17, the famous padding oracle attack.

For those who don’t know, Cryptopals is a series of eight sets of challenges covering common cryptographic constructs and common attacks on them. You can read more about Cryptopals at https://cryptopals.com/.

There’s a lot of practical knowledge wrapped up in these challenges, and working through them is an excellent way for programmers to learn more about cryptography – or for cryptographers to learn more about programming. We strongly encourage you to give them a try and to see how far you can get on your own.

The Guided Tour is here for you to check your work after completing a challenge, or to see how else you might’ve solved it – or for when you get stuck, can’t get yourself unstuck, and are looking for a nudge in the right direction. We strongly encourage you to try “learning by doing” before watching the videos. You’ll get more out of them that way!

These problems are complex, and if you take the shortest path to the solution, you’re sure to miss a lot of the sights along the way. It can be hard to know what you’re missing or where to look for it; that’s where these videos come in. From the start, we’ve prioritized detailed discussion; for set 2, we augmented these discussions with detailed animations showing exactly what’s going on under the hood for each attack; for set 3, we’re maintaining these high production values, integrating more research, and open-sourcing the tool used to generate these animations: cranim, a powerful toolkit for cryptographic (and generic computer science) animations with an emphasis on visualizing buffers and data flows. Cranim was developed to support the Guided Tour but is built with flexibility in mind; I hope that other educators will find it useful.

We’re also accelerating the release schedule, favoring individual video releases over dropping an entire set at once. When videos take this long to make, it only makes sense to release them as soon as they’re ready.

If you’re just joining the Guided Tour, here’s a playlist of the full series so far. Each video comes with a timestamped index of content so you can skip around as desired. Check the video descriptions, too; most of them also contain lists of links for further reading.

And now, at long last, here is the next installment of the Cryptopals Guided Tour. We hope you find this helpful and educational, and we look forward to bringing the next videos to you as soon as they’re ready.

Set 3, Challenge 17: The CBC padding oracle

Direct video link: https://youtu.be/6yHM19rQjDo

Challenge link: https://cryptopals.com/sets/3/challenges/17

00:00 – Intro
00:53 – Big-picture view
01:47 – Padding oracles in the wild
02:33 – What happens if we provide an invalid token?
03:33 – Ruining a developer’s night
05:53 – Let’s take a look at the attack
06:48 – Single block case
09:02 – Confirming padding has length 1
09:28 – XOR algebra, and the full search
10:57 – Multi-block case
11:53 – How can you prevent this attack?
13:20 – Timing side-channels
16:57 – Bolting a MAC onto it
17:45 – Note on deniability
18:10 – MACing ciphertext vs MACing plaintext
19:55 – Recapping layers of defense
20:13 – Breaking each layer of defense
21:03 – As our side channel gets less reliable, how does the attack change?
22:28 – Tracking confidences
24:00 – False negatives and false positives
25:12 – Bayes’ Theorem
26:42 – Entropy
27:25 – Adding chart for expected informtion gained
28:14 – Heuristics
31:15 – Getting into trouble with MACs
33:00 – Time to write some code!
35:44 – Obligatory CSPRNG disclaimer
36:50 – Sketching out the script’s functions
39:30 – Implementing the multi-block case
40:43 – Implementing the easy functions
42:18 – Implementing the single-block case
49:10 – Testing the solution
49:49 – “I could just call it done here, but…”
51:40 – Reading the plaintext
52:27 – Implementing the noisy oracle case and signing off

Further reading:

https://redis.io/blog/json-web-tokens-jwt-are-dangerous-for-user-sessions/
https://portswigger.net/web-security/jwt
https://portswigger.net/web-security/jwt/algorithm-confusion
https://fly.io/blog/api-tokens-a-tedious-survey/
https://securitycryptographywhatever.buzzsprout.com/1822302/9020991-what-do-we-do-about-jwt-feat-jonathan-rudenberg
(all of the above are just on JWTs, per the note at 02:33)
https://youtu.be/sthXs4zJ5XU?t=5498
https://iacr.org/submit/files/slides/2023/rwc/rwc2023/72/slides.pdf
https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac
https://en.wikipedia.org/wiki/HMAC
https://en.wikipedia.org/wiki/CBC-MAC
https://en.wikipedia.org/wiki/Off-the-record_messaging
https://en.wikipedia.org/wiki/Acoustic_cryptanalysis
https://en.wikipedia.org/wiki/Power_analysis
https://en.wikipedia.org/wiki/Van_Eck_phreaking
https://crypto.stackexchange.com/questions/75371/how-are-side-channel-attacks-executed-what-does-an-attacker-need-to-execute-a-s
https://www.usenix.org/conference/usenixsecurity20/presentation/van-goethem
https://eprint.iacr.org/2023/1441 (“We were able to detect side channels of single-digit CPU cycles over regular gigabit Ethernet.”)
https://www.youtube.com/watch?v=HZGCoVF3YvM
https://research.nccgroup.com/2021/02/17/cryptopals-exploiting-cbc-padding-oracles/
https://research.nccgroup.com/2023/06/23/exploiting-noisy-oracles-with-bayesian-inference/
https://en.wikipedia.org/wiki/Information_theory#Entropy_of_an_information_source

Thank you!

Before wrapping up this post, I’d like to take a moment to thank Gerald Doussot and Javed Samuel for their continued patience, encouragement, and support with this very large undertaking. I’d also like to thank my teammates in Cryptography Services for their thoughtful and attentive review, particularly Marie-Sarah Lacharite, Thomas Pornin, and Elena Bakos Lang, whose feedback has measurably improved this video (though of course I take full responsibility if any mistakes are found in it). On the logistical side of things, Ristin Rivera has also been invaluable throughout the publication process for this entire series.

I would also like to take a moment to thank the developers of Manim, without which these videos would not be possible in their current form. (By the way, if you want to make videos like these, my Manim plugin Cranim – which I developed to support this series – has now been publicly released!)

Finally, once again I’d like to thank the authors of the Cryptopals challenges. I’ve spent a lot of time with their work and I appreciate the effort they’ve put into it.

A peek into build provenance for Homebrew

14 May 2024 at 13:00

By Joe Sweeney and William Woodruff

Last November, we announced our collaboration with Alpha-Omega and OpenSSF to add build provenance to Homebrew.

Today, we are pleased to announce that the core of that work is live and in public beta: homebrew-core is now cryptographically attesting to all bottles built in the official Homebrew CI. You can verify these attestations with our (currently external, but soon upstreamed) brew verify command, which you can install from our tap:

This means that, from now on, each bottle built by Homebrew will come with a cryptographically verifiable statement binding the bottle’s content to the specific workflow and other build-time metadata that produced it. This metadata includes (among other things) the git commit and GitHub Actions run ID for the workflow that produced the bottle, making it a SLSA Build L2-compatible attestation:

In effect, this injects greater transparency into the Homebrew build process, and diminishes the threat posed by a compromised or malicious insider by making it impossible to trick ordinary users into installing non-CI-built bottles.

This work is still in early beta, and involves features and components still under active development within both Homebrew and GitHub. As such, we don’t recommend that ordinary users begin to verify provenance attestations quite yet.

For the adventurous, however, read on!

A quick Homebrew recap

Homebrew is an open-source package manager for macOS and Linux. Homebrew’s crown jewel is homebrew-core, a default repository of over 7,000 curated open-source packages that ship by default with the rest of Homebrew. homebrew-core’s packages are downloaded hundreds of millions of times each year, and form the baseline tool suite (node, openssl, python, go, etc.) for programmers using macOS for development.

One of Homebrew’s core features is its use of bottles: precompiled binary distributions of each package that speed up brew install and ensure its consistency between individual machines. When a new formula (the machine-readable description of how the package is built) is updated or added to homebrew-core, Homebrew’s CI (orchestrated through BrewTestBot) automatically triggers a process to create these bottles.

After a bottle is successfully built and tested, it’s time for distribution. BrewTestBot takes the compiled bottle and uploads it to GitHub Packages, Homebrew’s chosen hosting service for homebrew-core. This step ensures that users can access and download the latest software version directly through Homebrew’s command-line interface. Finally, BrewTestBot updates references to the changes formula to include the latest bottle builds, ensuring that users receive the updated bottle upon their next brew update.

In sum: Homebrew’s bottle automation increases the reliability of homebrew-core by removing humans from the software building process. In doing so, it also eliminates one specific kind of supply chain risk: by lifting bottle builds away from individual Homebrew maintainers into the Homebrew CI, it reduces the likelihood that a maintainer’s compromised development machine could be used to launch an attack against the larger Homebrew user base1.

At the same time, there are other aspects of this scheme that an attacker could exploit: an attacker with sufficient permissions could potentially upload malicious builds directly to homebrew-core’s bottle storage, potentially leveraging alert fatigue to trick users into installing despite a checksum mismatch. More concerningly, a compromised or rogue Homebrew maintainer could surreptitiously replace both the bottle and its checksum, resulting in silently compromised installs for all users onwards.

This scenario is a singular but nonetheless serious weakness in the software supply chain, one that is well addressed by build provenance.

Build provenance

In a nutshell, build provenance provides cryptographically verifiable evidence that a software package was actually built by the expected “build identity” and not tampered with or secretly inserted by a privileged attacker. In effect, build provenance offers the integrity properties of a strong cryptographic digest, combined with an assertion that the artifact was produced by a publicly auditable piece of build infrastructure.

In the case of Homebrew, that “build identity” is a GitHub Actions workflow, meaning that the provenance for every bottle build attests to valuable pieces of metadata like the GitHub owner and repository, the branch that the workflow was triggered from, the event that triggered the workflow, and even the exact git commit that the workflow ran from.

This data (and more!) is encapsulated in a machine-readable in-toto statement, giving downstream consumers the ability to express complex policies over individual attestations:

Build provenance and provenance more generally are not panaceas: they aren’t a substitute for application-level protections against software downgrades or confusion attacks, and they can’t prevent “private conversation with Satan” scenarios where the software itself is malicious or compromised.

Despite this, provenance is a valuable building block for auditable supply chains: it forces attackers into the open by committing them to public artifacts on a publicly verifiable timeline, and reduces the number of opaque format conversions that an attacker can hide their payload in. This is especially salient in cases like the recent xz-utils backdoor, where the attacker used a disconnect between the upstream source repository and backdoored tarball distribution to maintain their attack’s stealth. Or in other words: build provenance won’t stop a fully malicious maintainer, but it will force their attack into the open for review and incident response.

Our implementation

Our implementation of build provenance for Homebrew is built on GitHub’s new artifact attestations feature. We were given early (private beta) access to the feature, including the generate-build-provenance action and gh attestation CLI, which allowed us to iterate rapidly on a design that could be easily integrated into Homebrew’s pre-existing CI.

This gives us build provenance for all current and future bottle builds, but we were left with a problem: Homebrew has a long “tail” of pre-existing bottles that are still referenced in formulae, including bottles built on (architecture, OS version) tuples that are no longer supported by GitHub Actions2. This tail is used extensively, leaving us with a dilemma:

  1. Attempt to rebuild all old bottles. This is technically and logistically infeasible, both due to the changes in GitHub Actions’ own supported runners and significant toolchain changes between macOS versions.
  2. Only verify a bottle’s build provenance if present. This would effectively punch a hole in the intended security contract for build provenance, allowing an attacker to downgrade to a lower degree of integrity simply by stripping off any provenance metadata.

Neither of these solutions was workable, so we sought a third. Instead of either rebuilding the world or selectively verifying, we decided to create a set of backfilled build attestations, signed by a completely different repository (our tap) and workflow. With a backfilled attestation behind each bottle, verification looks like a waterfall:

  1. We first check for build provenance tied to the “upstream” repository with the expected workflow, i.e. Homebrew/homebrew-core with publish-commit-bottles.yml.
  2. If the “upstream” provenance is not present, we check for a backfilled attestation before a specified cutoff date from the backfill identity, i.e. trailofbits/homebrew-brew-verify with backfill_signatures.yml.
  3. If neither is present, then we produce a hard failure.

This gives us the best of both worlds: the backfill allows us to uniformly fail if no provenance or attestation is present (eliminating downgrades), without having to rebuild every old homebrew-core bottle. The cutoff date then adds an additional layer of assurance, preventing an attacker from attempting to use the backfill attestation to inject an unexpected bottle.

We expect the tail of backfilled bottle attestations to decrease over time, as formulae turn over towards newer versions. Once all reachable bottles are fully turned over, Homebrew will be able to remove the backfill check entirely and assert perfect provenance coverage!

Verifying provenance today

As mentioned above: this feature is in an early beta. We’re still working out known performance and UX issues; as such, we do not recommend that ordinary users try it yet.

With that being said, adventuresome early adopters can give it a try with two different interfaces:

  1. A dedicated brew verify command, available via our third-party tap
  2. An early upstream integration into brew install itself.

For brew verify, simply install our third-party tap. Once installed, the brew verify subcommand will become usable:

brew update
brew tap trailofbits/homebrew-brew-verify
brew verify --help
brew verify bash

Going forward, we’ll be working with Homebrew to upstream brew verify directly into brew as a developer command.

For brew install itself, set HOMEBREW_VERIFY_ATTESTATIONS=1 in your environment:

brew update
export HOMEBREW_VERIFY_ATTESTATIONS=1
brew install cowsay

Regardless of how you choose to experiment with this new features, certain caveats apply:

  • Both brew verify and brew install wrap the gh CLI internally, and will bootstrap gh locally if it isn’t already installed. We intend to replace our use of gh attestation with a pure-Ruby verifier in the medium term.
  • The build provenance beta depends on authenticated GitHub API endpoints, meaning that gh must have access to a suitable access credential. If you experience initial failures with brew verify or brew install, try running gh auth login or setting HOMEBREW_GITHUB_API_TOKEN to a personal access token with minimal permissions.

If you hit a bug or unexpected behavior while experimenting with brew install, please report it! Similarly, for brew verify: please send any reports directly to us.

Looking forward

Everything above concerns homebrew-core, the official repository of Homebrew formulae. But Homebrew also supports third-party repositories (“taps”), which provide a minoritybutsignificant number of overall bottle installs. These repositories also deserve build provenance, and we have ideas for accomplishing that!

Further out, we plan to take a stab at source provenance as well: Homebrew’s formulae already hash-pin their source artifacts, but we can go a step further and additionally assert that source artifacts are produced by the repository (or other signing identity) that’s latent in their URL or otherwise embedded into the formula specification. This will compose nicely with GitHub’s artifact attestations, enabling a hypothetical DSL:

Stay tuned for further updates in this space and, as always, don’t hesitate to contact us! We’re interested in collaborating on similar improvements for other open-source packaging ecosystems, and would love to hear from you.

Last but not least, we’d like to offer our gratitude to Homebrew’s maintainers for their development and review throughout the process. We’d also like to thank Dustin Ingram for his authorship and design on the original proposal, the GitHub Package Security team, as well as Michael Winser and the rest of Alpha-Omega for their vision and support for a better, more secure software supply chain.

1In the not-too-distant past, Homebrew’s bottles were produced by maintainers on their own development machines and uploaded to a shared Bintray account. Mike McQuaid’s 2023 talk provides an excellent overview on the history of Homebrew’s transition to CI/CD builds.
2Or easy to provide with self-hosted runners, which Homebrew uses for some builds.

Announcing two new LMS libraries

26 April 2024 at 13:00

By Will Song

The Trail of Bits cryptography team is pleased to announce the open-sourcing of our pure Rust and Go implementations of Leighton-Micali Hash-Based Signatures (LMS), a well-studied NIST-standardized post-quantum digital signature algorithm. If you or your organization are looking to transition to post-quantum support for digital signatures, both of these implementations have been engineered and reviewed by several of our cryptographers, so please give them a try!

For the Rust codebase, we’ve worked with the RustCrypto team to integrate our implementation into the RustCrypto/signatures repository so that it can immediately be used with their ecosystem once the crate is published.

Our Go implementation was funded by Hewlett Packard Enterprise (HPE), as part of a larger post-quantum readiness effort within the Sigstore ecosystem. We’d like to thank HPE and Tim Pletcher in particular for supporting and collaborating on this high-impact work!

LMS: A stateful post-quantum signature scheme

LMS is a stateful hash-based signature scheme that was standardized in 2019 with RFC 8554 and subsequently adopted into the federal information processing standards in 2020. These algorithms are carefully designed to resist quantum computer attacks, which could threaten conventional algebraic signature schemes like RSA and ECDSA. Unlike other post-quantum signature designs, LMS was standardized before NIST’s large post-quantum cryptography standardization program was completed. LMS has been studied for years and its security bounds are well understood, so it was not surprising that these schemes were selected and standardized in a relatively short time frame (at least compared to the other standards).

Like other post-quantum signature schemes, LMS is a hash-based scheme, relying only on the security of a collision-resistant hash function such as SHA256. Hash-based signature schemes have much longer signatures than lattice-based signature schemes, which were recently standardized by NIST, but they are simpler to implement and require fewer novel cryptographic assumptions. This is the primary reason we chose to develop hash-based signatures first.

Unlike any signature algorithm in common usage today, LMS is a stateful scheme. The signer must track how many messages have been signed with a key, incrementing the counter with each new signature. If the private key is used more than once with the same counter value, an attacker can combine the two signatures to forge signatures on new messages. This is analogous to a nonce-reuse attack in encryption schemes like AES-GCM.

If it’s not immediately obvious, requiring this state also severely limits these schemes’ usability and security. For instance, this makes storing your private key (and its state) to some sort of persisted storage (which is usually typical for secret keys) incredibly risky, as this introduces the possibility of an old state being reused, especially for multi-threaded applications. This is why NIST makes the following warning in their standard:

Stateful hash-based signature schemes are secure against the development of quantum computers, but they are not suitable for general use because their security depends on careful state management. They are most appropriate for applications in which the use of the private key may be carefully controlled and where there is a need to transition to a post-quantum secure digital signature scheme before the post-quantum cryptography standardization process has completed.

The main benefit of a stateful algorithm like LMS over a stateless hash-based signature like SLH-DSA (SPHINCS+) is significantly shorter signature sizes: a signature with LMS is around 4KB, while a signature with SLH-DSA at a similar security level is closer to 40KB. The downside is that stateful schemes like LMS cannot easily be plugged into existing applications. Managing the private state in a signature scheme makes integration into higher-level applications complex and prone to subtle and dangerous security flaws. However, a carefully managed environment for code signing is an excellent place to test stateful post quantum signatures in the real world, and we feel that Sigstore effectively meets the NIST requirement.

RustCrypto implementation

Our Rust implementation is no-std capable and does not require heap allocations, in addition to being fully compatible with the currently available digest and signature crates. In particular, we implement the SignerMut and Verifier traits on private and public keys, respectively.

The goal of our work is to provide a more strongly typed alternative to the pre-existing implementation while also not over-allocating memory. While ArrayVec is a suitable alternative to the headaches caused by generics and GenericArray, at the cost of slightly higher memory requirements in certain cases of signatures, it does introduce an additional crate dependency that did not previously exist, which we wanted to avoid. Currently, in our implementation, both signatures and keys must know their LMS parameters before being able to deserialize and verify signatures. This should be sufficient for most use cases, but if unknown parameters must be used, it is not too difficult to hack together an enum that covers all potential algorithm types and uses the correct TryFrom implementation once the algorithm type is parsed.

Go implementation

Our Go implementation, on the other hand, is less picky. We were asked to build an LMS implementation for Sigstore, which is a more controlled environment and does not have the same restrictions that the general RustCrypto implementation assumes. Because of this, our implementation uses some small heap allocations to keep track of some variable length data, such as the number of hashes in a private key. Go is a less-clever language than Rust, which means we cannot really parameterize it over the various LMS modes, so some additional work needs to be done at a few call sites to re-specify the LMS parameters.

More post-quantum engineering is coming soon!

Like the rest of the world, we are still in the early days of post-quantum cryptography development and deployment. We’re always exploring opportunities to help teams adopt more secure cryptography, with or without the threat of quantum computers in the mix.

Our cryptography team is currently working on another post-quantum standard in Rust, so look out for another open-source codebase soon! If your team needs a post-quantum cryptography (or any other cryptographic library that is not widely supported in the open-source community) module tailored to your exact needs, contact us!

Our team is well-equipped to design and build a codebase incorporating all of your design requirements, with ownership transferred over to you at the end of the project. We will even perform an internal code audit of the same quality we give standard secure code reviews. Get in touch with our sales team to start your next project with Trail of Bits.

Cryptographic design review of Ockam

5 March 2024 at 14:00

By Marc Ilunga, Jim Miller, Fredrik Dahlgren, and Joop van de Pol

In October 2023, Ockam hired Trail of Bits to review the design of its product, a set of protocols that aims to enable secure communication (i.e., end-to-end encrypted and mutually authenticated channels) across various heterogeneous networks. A secure system starts at the design phase, which lays the foundation for secure implementation and deployment, particularly in cryptography, where a secure design can prevent entire vulnerabilities.

In this blog post, we give some insight into our cryptographic design review of Ockam’s protocols, highlight several positive aspects of the initial design, and describe the recommendations we made to further strengthen the system’s security. For anyone considering working with us to improve their design, this blog post also gives a general behind-the-scenes look at our cryptographic design review offerings, including how we use formal modeling to prove that a protocol satisfies certain security properties.

Here is what Ockam’s CTO, Mrinal Wadhwa, had to say about working with Trail of Bits:

Trail of Bits brought tremendous protocol design expertise, careful scrutiny, and attention to detail to our review. In depth and nuanced discussions with them helped us further bolster our confidence in our design choices, improve our documentation, and ensure that we’ve carefully considered all risks to our customers’ data.

Overview of the Ockam system and Ockam Identities

Ockam is a set of protocols and managed infrastructure enabling secure communication. Users may also deploy Ockam on their premises, removing the need to trust Ockam’s infrastructure completely. Our review was based on two use cases of Ockam:

  1. TCP portals: secure TCP communication spanning various networks and traversing NATs
  2. Kafka portals: secure data streaming through Apache Kafka

A key design feature of Ockam is that secure channels are established using an instantiation of the Noise framework’s XX pattern in a way that is agnostic to the networking layer (i.e., the channels can be established for both TCP and Kafka networking, as well as others).

A major component of an Ockam deployment is the concept of Ockam Identities. Identities uniquely identify a node in an Ockam deployment. Each node has a self-generated identifier and an associated primary key pair that is rotated over time. Each rotation is cryptographically attested to with the current and next primary keys, thereby creating a change history. An identity is therefore defined by an identifier and the associated signed change history. The concrete constructions are shown in figure 1.

Diagram of an Ockam Identity showing an example of a signed change history with three blocks

Figure 1: Ockam Identities

Primary keys are not used directly for authentication or session key establishment in the Noise protocol. Rather, they are used to attest to purpose keys used for secure channel establishment and credential issuance. These credentials play a role akin to certificates in traditional PKI systems to enable mutual trust and enforce attribute-based access control policies.

The manual assessment process

We conducted a manual review of the Ockam design specification, including the secure channels, routing and transports, identities, and credentials, focusing on potential cryptographic threats that we see in similar communication protocols. The manual review process identified five issues, mostly related to the insufficient documentation for assumptions and the expected security guarantees. These findings indicate that insufficient information in the specifications, such as threat modeling, may lead Ockam users to make security-critical decisions based on an incomplete understanding of the protocol.

We also raised a few issues related to discrepancies between the specifications and the implementation that we identified from a cursory review of the implementation. Even though the implementation was not in scope for this review, we often find that it serves as a ground truth in cases when the design documentation is unclear and can be interpreted in different ways.

Formal verification with Verifpal and CryptoVerif

In addition to reviewing the Ockam design manually, we used formal modeling tools to verify specific security properties automatically. Our formal modeling efforts primarily focused on Ockam Identities, a critical element of the Ockam system. To achieve comprehensive automated analysis, we used the protocol analyzers Verifpal and CryptoVerif.

Verifpal works in the symbolic model, whereas CryptoVerif works in the computational model, making them a complementary set of tools. Verifpal finds potential high-level attacks against protocols, enabling quick iterations on a protocol until a secure design is found, while CryptoVerif provides more low-level analysis and can more precisely relate the security of the protocol to the cryptographic security guarantees of the individual primitives used in the implementation.

Using Verifpal’s convenient modeling capabilities and built-in primitives, we modeled a (simplified) scenario for Ockam Identities where Alice proves to Bob that she owns the primary key associated with the peer identifier Bob is currently trying to verify. We also modeled a scenario where Bob verifies a new change initiated by Alice.

Modeling the protocol using Verifpal shows that the design of Ockam Identities achieves the expected security guarantees. For a given identifier, only the primary key holder may produce a valid initial change block that binds the public key to the identifier. Any subsequent changes are guaranteed to be generated by an entity holding the previous and current primary keys. Despite the ease of modeling, proving security guarantees with Verifpal requires a few tricks to prevent the tool from identifying trivial or invalid attacks. We discuss these considerations in our comprehensive report.

The current implementation of Ockam Identities can be instantiated with either of two signature schemes, ECDSA or Ed25519, which have different security properties. CryptoVerif highlighted that ECDSA and Ed25519 will not necessarily provide the same security guarantees, depending on what is expected from the protocol. However, this is not explicitly mentioned in the documentation.

Ed25519 is the preferred scheme, but ECDSA is also accepted because it is currently supported by the majority of cloud hardware security modules (HSMs). For the current design of Ockam Identities, ECDSA and Ed25519 theoretically offer the same guarantees. However, future changes to Ockam Identities may require other security guarantees that are provided only by Ed25519.

Occasionally, protocols require stronger properties than what is usually expected from the signature schemes’ properties (see Seems Legit: Automated Analysis of Subtle Attacks on Protocols that Use Signatures). Therefore, from a design perspective, it is desirable that properties expected from a protocol’s building blocks be well understood and explicitly stated.

Our recommendations for strengthening Ockam

Our review did not uncover any issues in the in-scope use cases that would pose an immediate risk to the confidentiality and integrity of data handled by Ockam. But we made several recommendations to strengthen the security of Ockam’s protocols. Our recommendations aim at enabling defense in depth, future-proofing the protocols, improving threat modeling, expanding documentation, and clearly defining the security guarantees of Ockam’s protocols. For example, one of our recommendations describes important considerations for protecting against “store now, decrypt later” attacks from future quantum computers.

We also worked with the Ockam team to flesh out information missing from the specification, such as documenting the exact meaning of certain primary key fields and creating a formal threat model. This information is important to allow Ockam users to make sound decisions when deploying Ockam’s protocols.

Generally, we recommended that Ockam explicitly document the assumptions made about cryptographic protocols and the expected security guarantees of each component of the Ockam system. Doing so will ensure that future development of the protocols builds upon well-understood and explicit assumptions. Good examples of assumptions and expected security guarantees that should be documented are the theoretical issue around ECDSA vs. EdDSA that we identified with CryptoVerif and how using primitives with lower security margins will not significantly impact security.

Ockam’s CTO responded to the above recommendations with the following statement:

We believe that easy to understand and open documentation of Ockam’s protocols and implementation is essential to continuously improve the security and privacy offered by our products. Trail of Bits’ thorough third-party review of our protocol documentation and formal modeling of our protocols has helped make our documentation much more approachable for continuous scrutiny and improvement by our open source community.

Lastly, we strongly recommended an (internal or external) assessment of the Ockam protocols implementation, as a secure design does not imply a secure implementation. Issues in the deployment of a protocol may arise from discrepancies between the design and the implementation, or from specific implementation choices that violate the assumptions in the design.

Security is an ongoing process

At the start of the assessment, we observed that the Ockam design follows best practices, such as using robust primitives that are well accepted in the industry (e.g., the Noise XX protocol with AES-GCM and ChachaPoly1305 as AEADs and with Ed25519 and ECDSA for signatures). Furthermore, the design reflects that Ockam considered many aspects of the system’s security and reliability, including, for instance, various relevant threat models and the root of trust for identities. Moreover, by open-sourcing its implementation and publishing the assessment result, the Ockam team creates a transparent environment and invites further scrutiny from the community.

Our review identified some areas for improvement, and we provided recommendations to strengthen the security of the product, which already stands on a good foundation. You can find more detailed information about the assessment, our findings, and our recommendations in the comprehensive report.

This project also demonstrates that security is an ongoing process, and including security considerations early in the design phase establishes a strong footing that the implementation can safely rely on. But it is always necessary to continuously work on improving the system’s security posture while responding adequately to newer threats. Assessing the design and the implementation are two of the most crucial steps in ensuring a system’s security.

Please contact us if you want to work with our cryptography team to help improve your design—we’d love to work with you!

Circomspect has been integrated into the Sindri CLI

26 February 2024 at 14:00

By Jim Miller

Our tool Circomspect is now integrated into the Sindri command-line interface (CLI)! We designed Circomspect to help developers build Circom circuits more securely, particularly given the limited tooling support available for this novel programming framework. Integrating this tool into a development environment like that provided by Sindri is a significant step toward more widespread use of Circomspect and thus better support for developers writing Circom circuits.

Developing zero-knowledge proof circuits is a difficult task. Even putting aside technical complexities, running non-trivial circuits for platforms like Circom is extremely computationally intensive: running basic tests can take several minutes (or longer), which could massively increase development time. Sindri aims to help alleviate this problem by giving users access to dedicated hardware that significantly accelerates the execution of these circuits. Their simple API and CLI tool allows developers to integrate their circuits with this dedicated hardware without having to manage any of their own infrastructure.

Stasia Carson, the CEO of Sindri Labs, had this to say about the announcement:

Our ongoing focus with the Sindri CLI is to make it more generally and widely useful for circuit developers independent of whether or not they use the Sindri service. The key to this is a unified cross-framework interface over tools for static analysis, linting, compiling, and proving coupled with installation-free tool distribution using optimized Docker containers. Circomspect is a crucial tool for developing secure Circom circuits, and honestly probably the best such tool across all of the frameworks, so we see it as one of the most vital integrations.

Being integrated into the Sindri CLI is an important step for Circomspect. With now even more users, we plan to extend Circomspect with more analysis ideas, which we will reveal throughout the year. Stay tuned to our blog for future updates about Circomspect and zero-knowledge circuit development generally!

Breaking the shared key in threshold signature schemes

20 February 2024 at 14:30

By Fredrik Dahlgren

Today we are disclosing a denial-of-service vulnerability that affects the Pedersen distributed key generation (DKG) phase of a number of threshold signature scheme implementations based on the Frost, DMZ21, GG20, and GG18 protocols. The vulnerability allows a single malicious participant to surreptitiously raise the threshold required to reconstruct the shared key, which could cause signatures generated using the shared key to be invalid.

We first became aware of this vulnerability on a client engagement with Chainflip last year. When we reviewed Chainflip’s implementation of the Frost threshold signature scheme, we noticed it was doing something unusual—something that we had never seen before. Usually, these kinds of observations are an indication that there is a weakness or vulnerability in the codebase, but in this case, Chainflip’s defensive coding practices actually ended up protecting its implementation from a vulnerability. By being extra cautious, Chainflip also avoided introducing a vulnerability into the codebase that could be used by a single party to break the shared key created during the key-generation phase of the protocol. When we realized this, we became curious if other implementations were vulnerable to this issue. This started a long investigation that resulted in ten separate vulnerability disclosures.

What is the Pedersen DKG protocol?

The vulnerability is actually very easy to understand, but to be able to explain it we need to go through some of the mathy details behind the Pedersen DKG protocol. Don’t worry—if you understand what a polynomial is, you should be fine, and if you’ve heard about Shamir’s secret sharing before, you’re most of the way there already.

The Pedersen DKG protocol is based on Feldman’s verifiable secret sharing (VSS) scheme, which is an extension of Shamir’s secret sharing scheme. Shamir’s scheme allows n parties to share a key that can then be reconstructed by t + 1 parties. (Here, we assume that the group has somehow agreed on the threshold t and group size n in advance.) Shamir’s scheme assumes a trusted dealer and is not suitable for multi-party computation schemes where participants may be compromised and act maliciously. This is where Feldman’s VSS scheme comes in. Building on Shamir’s scheme, it allows participants to verify that shares are generated honestly.

Let G be a commutative group where the discrete logarithm problem is hard, and let g be a generator of G. In a (t, n)-Feldman VSS context, the dealer generates a random degree t polynomial p(x) = a0 + a1 x + … + at xt, where a0 represents the original secret to be shared. She then computes the individual secret shares as s1 = p(1), s2 = p(2), …, sn = p(n). This part is exactly identical to Shamir’s scheme. To allow other participants to verify their secret shares, the dealer publishes the values A0 = ga0, A1 = ga1, …, At = gat. Participants can then use the coefficient commitments (A0, Ai, …, At) to verify their secret share si by recomputing p(i) “in the exponent” as follows:

  • Compute V = gp(i) = g s i.
  • Compute V’ = gp(i) = ga0 + a1 i + … + at i t = ∏k (gak) i k = ∏k Aki k.
  • Check that V = V’.

As in Shamir’s secret sharing, the secret s = a0 can be recovered with t + 1 shares using Lagrange interpolation.

In Feldman’s VSS scheme, the shared secret is known to the dealer. To generate a shared key that is unknown to all participants of the protocol, the Pedersen DKG protocol essentially runs n instances of Feldman’s VSS schemes in parallel. The result is a (t, n)-Shamir’s secret sharing of a value that is unknown to all participants: each participant Pi starts by generating a random polynomial pi(x) = ai,0 + ai,1 x + … + ai,t xt of degree t. She publishes the coefficient commitments (Ai,0 = gai,0, Ai,1 = gai,1, …, Ai,t = gai,t) and then sends the secret share si, j = pi(j) to Pj. (Note that the index j must start at 1, otherwise Pi ends up revealing her secret value ai,0 = pi (0).) Pj can check that the secret share si, j was computed correctly by computing V and V’ as above and checking that they agree. To obtain their secret share sj, each participant Pj simply sums the secret shares obtained from the other participants. That is, they compute their secret share as

sj = s1, j + s2, j + … + sn, j = p1(j) + p2(j) + … + pn(j)

Notice that if we define p(x) as the polynomial p(x) = p1(x) + p2(x) + … + pn(x), it is easy to see that what we obtain in the end is a Shamir’s secret sharing of the constant term of p(x), s = p(0) = a1, 0 + a2, 0 + … + an, 0. Since the degree of each polynomial pi(x) is t, the degree of p(x) is also t, and we can recover the secret s with t + 1 shares using Lagrange interpolation as before.

(There are a few more considerations that need to be made when implementing the Pedersen DKG protocol, but they are not relevant here. For more detail, refer to any of the papers linked in the introduction section.)

Moving the goalposts in the Pedersen DKG

Now, we are ready to come back to the engagement with Chainflip that started all of this. While reviewing Chainflip’s implementation of the Frost signature scheme, we noticed that the implementation was summing the commitments for the highest coefficient A1,t + A1,t + … + An,t and checking if the result was equal to the identity element in G, which would mean that the highest coefficients of the resulting polynomial p(x) was 0. This is clearly undesirable since it would allow fewer than t + 1 participants to recover the shared key, but the probability of this happening is cryptographically negligible (even with actively malicious participants). By checking, Chainflip reduced this probability to 0.

This made us wonder, what would happen if a participant used a polynomial pi(x) of a different degree than t in the Pedersen DKG protocol? In particular, what would happen if a participant used a polynomial pi(x) of degree T greater than t? Since p(x) is equal to the sum p1(x) + p2(x) + … + pn(x), the degree of p(x) would then be T rather than t, meaning that the signing protocol would require T + 1 rather than t + 1 participants to complete successfully. If this change were not detected by other participants, it would allow any of the participants to surreptitiously render the shared key unusable by choosing a threshold that was strictly greater than the total number of participants. If the DKG protocol were used to generate a shared key as part of a threshold signature scheme (like one of the schemes referenced in the introduction), any attempt to sign a message with t + 1 participants would fail. Depending on the implementation, this could also cause the system to misattribute malicious behavior to honest participants when the failure is detected. More seriously, this attack could also be used to render the shared key unusable and unrecoverable in most key-resharing schemes based on Feldman’s VSS. This includes the key resharing schemes described in CGGMP21 and earlier versions of Lindell22. In this case, the shared key may already control large sums of money or tokens, which would then be irrevocably lost.

Clearly, this type of malicious behavior could be prevented by simply checking the length of the coefficient commitment vector (Ai,0, Ai,1, …, Ai,T) published by each participant and aborting if any of the lengths is found to be different from t + 1. It turned out that Chainflip already checked for this, but we were curious if other implementations did as well. All in all, we found ten implementations that were vulnerable to this attack in the sense that they allowed a single participant to raise the threshold of the shared key generated using the Pedersen DKG without detection. (We did not find any vulnerable implementations of key-resharing schemes.)

Disclosure process

We reached out to the maintainers of the following vulnerable codebases on January 3, 2024:

Seven of the maintainers responded to acknowledge that they had received the disclosure. Four of those maintainers (Chelsea Komlo, Jesse Possner, Safeheron, and the ZCash Foundation) also reported that they either already have, or are planning to resolve the issue.

We reached out again to the three unresponsive maintainers (Toposware, Trust Machines, and LatticeX) on February 7, 2024. Following this, Toposware also responded to acknowledge that they had received our disclosure.

Cloud cryptography demystified: Amazon Web Services

14 February 2024 at 14:00

By Scott Arciszewski

This post, part of a series on cryptography in the cloud, provides an overview of the cloud cryptography services offered within Amazon Web Services (AWS): when to use them, when not to use them, and important usage considerations. Stay tuned for future posts covering other cloud services.

At Trail of Bits, we frequently encounter products and services that make use of cloud providers’ cryptography offerings to satisfy their security goals. However, some cloud providers’ cryptography tools and services have opaque names or non-obvious use cases. This is particularly true for AWS, whose huge variety of services are tailored for a multitude of use cases but can be overwhelming for developers with limited experience. This guide—informed by Trail of Bits’ extensive auditing experience as well as my own experience as a developer at AWS—dives into the differences between these services and explains important considerations, helping you choose the right solution to enhance your project’s security.

Introduction

The cryptography offered by cloud computing providers can be parceled into two broad categories with some overlap: cryptography services and client-side cryptography software. In the case of AWS, the demarcation between the two is mostly clear.

By client-side, we mean that the service runs in your application (the client), rather than in the Service in question. This doesn’t mean that the service necessarily runs in a web browser or on your users’ devices. Even if the client is running on a virtual machine in EC2, the cryptography is not happening at the back-end service level, and is therefore client-side.

Some examples of AWS cryptography services include the Key Management Service (KMS) and Cloud Hardware Security Module (CloudHSM). In the other corner, AWS’s client-side cryptography software (i.e., tools) includes the AWS Encryption SDK, the AWS Database Encryption SDK, and the S3 Encryption Client.

One product from AWS that blurs the line between both categories is the Cryptographic Computing for Clean Rooms (C3R): a client-side tool tightly integrated into the AWS Clean Rooms service. Another is Secrets Manager, which runs client-side but is its own service. (Some powerful features that use cryptography, such as AWS Nitro, will be explored in detail in a future blog post.)

Let’s explore some of these AWS offerings, including when they’re the most useful and some sharp edges that we often discover in our audits.

AWS cryptography services

AWS CloudHSM

You want to use CloudHSM: If industry or government regulations require you to use an HSM directly for a specific use case. Otherwise, prioritize KMS.

You don’t want to use CloudHSM: If KMS is acceptable instead.

CloudHSM is simply an AWS-provisioned HSM accessible in your cloud environment. If you don’t have a legal requirement to use an HSM directly in your architecture, you can skip CloudHSM entirely.

AWS KMS

You want to use KMS: Any time you use Amazon’s services (even non-cryptographic services) or client-side libraries.

You don’t want to use KMS: For encrypting or decrypting large messages (use key-wrapping with KMS instead).

AWS KMS can be thought of as a usability wrapper around FIPS-validated HSMs. It offers digital signatures, symmetric HMAC, and encryption/decryption capabilities with keys that never leave the HSM. However, KMS encryption is intended for key-wrapping in an envelope encryption setup, rather than for the actual encryption or decryption of your actual data.

One important, but under-emphasized, feature of KMS is Encryption Context. When you pass Encryption Context to KMS during an Encrypt call, it logs the Encryption Context in CloudTrail, and the encrypted data is valid only if the identical Encryption Context is provided on the later Decrypt call.

It’s important to note that the Encryption Context is not stored as part of the encrypted data in KMS. If you’re working with KMS directly, you’re responsible for storing and managing this additional data.

Both considerations are solvable by using client-side software for AWS, which are discussed below.

Recently, KMS added support for external key stores, where KMS will call an HSM in your data center as part of its normal operation. This feature exists to comply with some countries’ data sovereignty requirements, and should be used only if legally required. What you gain in compliance with this feature, you lose in durability, availability, and performance. It’s generally not worth the trade-off.

AWS client-side cryptography software

AWS Encryption SDK

You want to use the AWS Encryption SDK: For encrypting arbitrary-length secrets in a cloud-based application.

You don’t want to use the AWS Encryption SDK: If you’re working with encrypting data for relational or NoSQL databases. The AWS Database Encryption SDK should be used instead.

The AWS Encryption SDK is a general-purpose encryption utility for applications running in the cloud. Its feature set can be as simple as “wraps KMS to encrypt blobs of text” with no further considerations, if that’s all you need, or as flexible as supporting hierarchical key management to minimize network calls to KMS in a multi-keyring setup.

Regardless of how your cryptographic materials are managed, the AWS Encryption SDK stores the Encryption Context passed to KMS in the encrypted message header, so you don’t need to remember to store it separately.

Additionally, if you use an Algorithm Suite that includes ECDSA, it will generate an ephemeral keypair for each message, and the public key will be stored in the Encryption Context. This has two implications:

  1. Because Encryption Context is logged in CloudTrail by KMS, service operators can track the flow of messages through their fleet without ever decrypting them.
  2. Because each ECDSA keypair is used only once and then the secret key discarded, you can guarantee that a given message was never mutated after its creation, even if multiple keyrings are used.

One important consideration for AWS Encryption SDK users is to ensure that you’re specifying your wrapping keys and not using KMS Discovery. Discovery is an anti-pattern that exists only for backwards compatibility.

If you’re not using the hierarchical keyring, you’ll also want to look at data key caching to reduce the number of KMS calls and reduce latency in your cloud applications.

AWS Database Encryption SDK

You want to use the AWS Database Encryption SDK: If you’re storing sensitive data in a database, and would prefer to never reveal plaintext to the database.

You don’t want to use the AWS Database Encryption SDK: If you’re not doing the above.

As of this writing, the AWS Database Encryption SDK exists only for DynamoDB in Java. The documentation implies that support for more languages and database back ends is coming in the future.

The AWS Database Encryption SDK (DB-ESDK) is the successor to the DynamoDB Encryption Client. Although it is backwards compatible, the new message format offers significant improvements and the ability to perform queries against encrypted fields without revealing your plaintext to the database service, using a mechanism called Beacons.

At their core, Beacons are a truncated instance of the HMAC function. Given the same key and plaintext, HMAC is deterministic. If you truncate the output of the HMAC to a few bits, you can reduce the lookup time from a full table scan to a small, tolerable number of false positives.

Extra caution should be taken when using Beacons. If you cut them too short, you can waste a lot of resources on false positive rejection. If you don’t cut them short enough, an attacker with access to your encrypted database may be able to infer relationships between the beacons—and, in turn, the plaintext values they were calculated from. (Note that the risk of relationship leakage isn’t unique to Beacons, but to any techniques that allow an encrypted database to be queried.)

AWS provides guidance for planning your Beacons, based on the birthday bound of PRFs to ensure a healthy distribution of false positives in a dataset.

Disclaimer: I designed the cryptography used by the AWS Database Encryption SDK while employed at Amazon.

Other libraries and services

AWS Secrets Manager

You want to use AWS Secrets Manager: If you need to manage and rotate service passwords (e.g., to access a relational database).

You don’t want to use AWS Secrets Manager: If you’re looking to store your online banking passwords.

AWS Secrets Manager can be thought of as a password manager like 1Password, but intended for cloud applications. Unlike consumer-facing password managers, Secrets Manager’s security model is predicated on access to AWS credentials rather than a master password or other client-managed secret. Furthermore, your secrets are versioned to prevent operational issues during rotation.

Secrets Manager can be configured to automatically rotate some AWS passwords at a regular interval.

In addition to database credentials, AWS Secrets Manager can be used for API keys and other sensitive values that might otherwise be committed into source code.

AWS Cryptographic Computing for Clean Rooms (C3R)

You want to use AWS C3R: If you and several industry partners want to figure out how many database entries you have in common without revealing the contents of your exclusive database entries to each other.

You don’t want to use AWS C3R: If you’re not doing that.

C3R uses server-aided Private Set Intersection to allow multiple participants to figure out how many records they have in common, without revealing unrelated records to each other.

For example: If two or more medical providers wanted to figure out if they have any patients in common (i.e., because they provide services that are not clinically safe together, but are generally safe separately), they could use C3R to calculate the intersection of their private sets and not violate the privacy of the patients that only one provider services.

The main downside of C3R is that it has a rather narrow use-case.

Wrapping up

We hope that this brief overview has clarified some of AWS’s cryptography offerings and will help you choose the best one for your project. Stay tuned for upcoming posts in this blog series that will cover other cloud cryptography services!

In the meantime, if you’d like a deeper dive into these products and services to evaluate whether they’re appropriate for your security goals, feel free to contact our cryptography team. We regularly hold office hours, where we schedule around an hour to give you a chance to meet with our cryptographers and ask any questions.

Public Report: Aleo snarkOS Implementation and Consensus Mechanism Review

8 February 2024 at 08:00

In November 2023, Aleo engaged NCC Group’s Cryptography Services team to perform a review of the consensus mechanism implemented by snarkOS: “a decentralized operating system for zero-knowledge applications [that] forms the backbone of Aleo network, which verifies transactions and stores the encrypted state applications in a publicly verifiable manner.” The consensus mechanism is based on a partially synchronous version of the Bullshark Byzantine Fault Tolerance (BFT) protocol, which uses a directed acyclic graph (DAG) to order updates. The review was performed remotely by four consultants over a total of 25 person-days of effort. A retest was performed in January 2024.

This review complements NCC Group’s prior public report reviewing Aleo’s snarkVM.

Chaos Communication Congress (37C3) recap

2 February 2024 at 14:00

Last month, two of our engineers attended the 37th Chaos Communication Congress (37C3) in Hamburg, joining thousands of hackers who gather each year to exchange the latest research and achievements in technology and security. Unlike other tech conferences, this annual gathering focuses on the interaction of technology and society, covering such topics as politics, entertainment, art, sustainability—and, most importantly, security. At the first Congress in the 80s, hackers showcased weaknesses in banking applications over the German BTX system; this year’s theme, “Unlocked,” highlighted breaking technological barriers and exploring new frontiers in digital rights and privacy.

In this blog post, we will review our contributions to the 37C3—spanning binary exploitation and analysis and fuzzing—before highlighting several talks we attended that we recommend listening to.

PWNing meetups

Trail of Bits engineer Dominik Czarnota self-organized two sessions about PWNing, also known as binary exploitation. These meetups showcased Pwndbg and Pwntools, popular tools used during CTF competitions, reverse engineering, and vulnerability research work.

At the first session, Dominik presented Pwndbg, a plugin for GDB that enhances the debugging of low-level code by displaying useful context on each program stop. This context includes the state of the debugged program (its registers, executable code, and stack memory) and dereferenced pointers, which help the user understand the program’s behavior. The presentation showed some of Pwndbg’s features and commands, such as listing memory mappings (vmmap), displaying process information (procinfo), searching memory (search), finding pointers to specific memory mappings (p2p), identifying stack canary values (canary), and controlling the process execution (nextsyscall, stepuntilasm etc.). The presentation concluded with a release of Pwndbg cheatsheets and details on upcoming features, such as tracking GOT function executions and glibc heap use-after-free analysis. These features have been developed as part of Trail of Bits’s winternship program, now in its thirteenth year of welcoming interns who spend time working and doing research on industry’s most challenging problems.



At the second session, Arusekk and Peace-Maker showcased advanced features of Pwntools, a Swiss-army Python library useful for exploit development. They demonstrated expert methods for receiving and sending data (e.g., io.recvregex or io.recvpred); command-line tricks when running exploit scripts (cool environment variables or arguments like DEBUG, NOASLR, or LOG_FILE that set certain config options); and other neat features like libcdb command-line tool, the shellcraft module, and the ROP (return oriented programming) helper. For those who missed it, the slides can be found here.

Next generation fuzzing

In Fuzz Everything, Everywhere, All at Once, the AFL++ and LibAFL team showcased new features in the LibAFL fuzzer. They presented QEMU-based instrumentation to fuzz binary-only targets and used QEMU hooks to enable sanitizers that help find bugs. In addition to QASan—the team’s QEMU-based AddressSanitizer implementation—the team developed an injection sanitizer that goes beyond finding just memory corruption bugs. Using QEMU hooks, SQL, LDAP, XSS or OS command injections can be detected by defining certain rules in a TOML configuration file. Examination of the config file suggests it should be easily extensible to other injections; we just need to know which functions to hook and which payloads to look for.

Although memory corruption bugs will decline with the deployment of memory-safe languages like Rust, fuzzing will continue to play an important role in uncovering other bug classes like injections or logic bugs, so it’s great to see new tools created to detect them.

This presentation’s Q&A session reminded us that oss-fuzz already has a SystemSanitizer that leverages the ptrace syscall, which helped to find a command injection vulnerability in the past.

In the past, Trail of Bits has used LibAFL in our collaboration with Inria on an academic research project called tlspuffin. The goal of the project was to fuzz various TLS implementations, which uncovered several bugs in wolfSSL.

Side channels everywhere

In a talk titled Full AACSess: Exposing and exploiting AACSv2 UHD DRM for your viewing pleasure, Adam Batori presented a concept for side-channel attacks on Intel SGX. Since Trail of Bits frequently conducts audits on projects that use trusted execution environments like Intel SGX (e.g., Mobilecoin), this presentation was particularly intriguing to us.

After providing an overview of the history of DRM for physical media, Adam went into detail on how the team of researchers behind sgx.fail extracted cryptographic key material from the SGX enclave to break DRM on UHD Blu-ray disks to prove the feasibility of real-world side-channel attacks on secure enclaves. Along the way, he discussed many technological features of SGX along the way.

The work and talk prompted discussion about Intel’s decision to discontinue SGX on consumer hardware. Due to the high risk of side channels on low-cost consumer devices, we believe that using Intel SGX for DRM purposes is already dead on arrival. Side-channel attacks are just one example of the often-overlooked challenges that accompany the secure use of enclaves to protect data.

New challenges: Reverse-engineering Rust

Trail of Bits engineers frequently audit software written in Rust. In Rust Binary Analysis, Feature by Feature, Ben Herzog discussed the compilation output of the Rust compiler. Understanding how Rust builds binaries is important, for example, to optimize Rust programs or to understand the interaction between safe and unsafe Rust code. The talk focused on the debug compilation mode to showcase how the Rust compiler generates code for iterating over ranges and uses iterators or optimizes the layout of Rust enums. The presenter also noted that strings in Rust are not null-terminated, which can cause some reverse-engineering tools like Ghidra to produce hard-to-understand output.

The talk author posed four questions that should be answered when encountering function calls related to traits:

  • What is the name of the function being called (e.g., next)?
  • On what type is the function defined (e.g., Values<String, Person>)?
  • Which type is returned from the function (e.g., Option)?
  • What trait is the function part of (e.g., Iterator<Type=Person>)?

More details can be found in the blog post by Ben Herzog.

Proprietary cryptography is considered harmful

Under TETRA:BURST, researchers disclosed multiple vulnerabilities in the TETRA radio protocol. The protocol is used by government agencies, police, military, and critical infrastructure across Europe and other areas.
It is striking how proprietary cryptography is still the default in some industries. Hiding the specification from security researchers by requiring them to sign an NDA greatly limits a system’s reviewability.

Due to export controls, several classes of algorithms exist in TETRA. One of the older ones, TEA1, is still actively deployed today but uses a key length of only 32 bits. Even though the specifiers no longer recommend using it, it is still actively being used in the field, which is especially problematic given that these weak protocols are counted upon to protect critical infrastructure.

The researchers demonstrated the exploitability of the vulnerabilities by acquiring radio hardware from online resellers.

Are you sure you own your train? Do you own your car?

In Breaking “DRM” in Polish trains, researchers reported the challenges they encountered after they were recruited by an independent train repair company to determine why some trains no longer operated after being serviced.
Using reverse engineering, the researchers uncovered several anti-features in the trains that made them stop working in various situations (e.g., after they didn’t move for a certain time or when they were located at GPS locations of competitor’s service shops). The talk covers interesting technical details about train software and how the researchers reverse-engineered the firmware, and it questions the extent to which users should have control over the vehicles or devices they own.

What can we learn from hackers as developers and auditors?

Hackers possess a unique problem-solving mindset, showing developers and auditors the importance of creative and unconventional thinking in cybersecurity. The event highlighted the necessity of securing systems correctly, and starting with a well understood threat model. Incorrect or proprietary approaches that rely on obfuscation do not adequately protect the end products. Controls such as hiding cryptographic primitives behind an NDA only obfuscate how the protocol works; they do not make the system more secure, and they make security researchers’ jobs harder.

Emphasizing continuous learning, the congress demonstrated the ever-evolving nature of cybersecurity, urging professionals to stay abreast of the latest threats and technologies. Ethical considerations were a focal point, stressing the responsibility of developers and auditors to respect user privacy and data security in their work.

The collaborative spirit of the hacker community, as seen at 37C3, serves as a model for open communication and mutual learning within the tech industry. At Trail of Bits, we are committed to demonstrating these values by sharing knowledge publicly through publishing blog posts like this one, resources like the Testing Handbook that help developers secure their code, and documentation about our research into zero-knowledge proofs.

Closing words

We highly recommend attending 37C3 in person, even though the date is unfortunately timed between Christmas and New Years, and most talks are live-streamed and available online. The congress includes many self-organized sessions, workshops, and assemblies, making it especially helpful for security researchers. We had initially planned to disclose our recently published LeftoverLocals bug, a vulnerability that affects notable GPU vendors like AMD, Qualcomm, and Apple, at 37C3, but we held off our release date to give GPU vendors more time to fix the bug. The bug disclosure was finally published on January 16; we may report our experience finding and disclosing the bug at the next year’s 38C3!

We build X.509 chains so you don’t have to

25 January 2024 at 14:00

By William Woodruff

For the past eight months, Trail of Bits has worked with the Python Cryptographic Authority to build cryptography-x509-verification, a brand-new, pure-Rust implementation of the X.509 path validation algorithm that TLS and other encryption and authentication protocols are built on. Our implementation is fast, standards-conforming, and memory-safe, giving the Python ecosystem a modern alternative to OpenSSL’s misuse- and vulnerability-prone X.509 APIs for HTTPS certificate verification, among other protocols. This is a foundational security improvement that will benefit every Python network programmer and, consequently, the internet as a whole.

Our implementation has been exposed as a Python API and is included in Cryptography’s 42.0.0 release series, meaning that Python developers can take advantage of it today! Here’s an example usage, demonstrating its interaction with certifi as a root CA bundle:

As part of our design we also developed x509-limbo, a test vector and harness suite for evaluating the standards conformance and consistent behavior of various X.509 path validation implementations. x509-limbo is permissively licensed and reusable, and has already found validation differentials across Go’s crypto/x509, OpenSSL, and two popular pre-existing Rust X.509 validators.

X.509 path validation

X.509 and path validation are both too expansive to reasonably summarize in a single post. Instead, we’ll grossly oversimplify X.509 to two basic facts:

  1. X.509 is a certificate format: it binds a public key and some metadata for that key (what it can be used for, the subject it identifies) to a signature, which is produced by a private key. The subject of a certificate can be a domain name, or some other relevant identifier.
  2. Verifying an X.509 certificate entails obtaining the public key for its signature, using that public key to check the signature, and (finally) validating the associated metadata against a set of validity rules (sometimes called an X.509 profile). In the context of the public web, there are two profiles that matter: RFC 5280 and the CA/B Forum Baseline Requirements (“CABF BRs”).

These two facts make X.509 certificates chainable: an X.509 certificate’s signature can be verified by finding the parent certificate containing the appropriate public key; the parent, in turn, has its own parent. This chain building process continues until an a priori trusted certificate is encountered, typically because of trust asserted in the host OS itself (which maintains a pre-configured set of trusted certificates).

Chain building (also called “path validation”) is the cornerstone of TLS’s authentication guarantees: it allows a web server (like x509-limbo.com) to serve an untrusted “leaf” certificate along with zero or more untrusted parents (called intermediates), which must ultimately chain to a root certificate that the connecting client already knows and trusts.

As a visualization, here is a valid certificate chain for x509-limbo.com, with arrows representing the “signed by” relationship:

In this scenario, x509-limbo.com serves us two initially untrusted certificates: the leaf certificate for x509-limbo.com itself, along with an intermediate (Let’s Encrypt R3) that signs for the leaf.

The intermediate in turn is signed for by a root certificate (ISRG Root X1) that’s already trusted (by virtue of being in our OS or runtime trust store), giving us confidence in the complete chain, and thus the leaf’s public key for the purposes of TLS session initiation.

What can go wrong?

The above explanation of X.509 and path validation paints a bucolic picture: to build the chain, we simply iterate through our parent candidates at each step, terminating on success once we reach a root of trust or with failure upon exhausting all candidates. Simple, right?

Unfortunately, the reality is far messier:

  • The abstraction above (“one certificate, one public key”) is a gross oversimplification. In reality, a single public key (corresponding to a single “logical” issuing authority) may have multiple “physical” certificates, for cross-issuance purposes.
  • Because the trusted set is defined by the host OS or language runtime, there is no “one true” chain for a given leaf certificate. In reality, most (leaf, [intermediates]) tuples have several candidate solutions, of which any is a valid chain.
    • This is the “why” for the first bullet: a web server can’t guarantee that any particular client has any particular set of trusted roots, so intermediate issuers typically have multiple certificates for a single public key to maximize the likelihood of a successfully built chain.
  • Not all certificates are made equal: certificates (including different “physical” certificates for the same “logical” issuing authority) can contain constraints that prevent otherwise valid paths: name restrictions, overall length restrictions, usage restrictions, and so forth. In other words, a correct path building implementation must be able to backtrack after encountering a constraint that eliminates the current candidate chain.
  • The X.509 profile itself can impose constraints on both the overall chain and its constituent members: the CABF BRs, for example, forbid known-weak signature algorithms and public key types, and many path validation libraries additionally allow users to constrain valid chain constructions below a configurable maximum length.

In practice, these (non-exhaustive) complications mean that our simple recursive linear scan for chain building is really a depth-first graph search with both static and dynamic constraints. Failing to treat it as such has catastrophic consequences:

  • Failing to implement a dynamic search typically results in overly conservative chain constructions, sometimes with Internet-breaking outcomes. OpenSSL 1.0.x’s inability to build the “chain of pain” in 2020 is one recent example of this.
  • Failing to honor the interior constraints and profile-wide certificate requirements can result in overly permissive chain constructions. CVE-2021-3450 is one recent example of this, causing some configurations of OpenSSL 1.1.x to accept chains built with non-CA certificates.

Consequently, building both correct and maximal (in the sense of finding any valid chain) X.509 path validator is of the utmost importance, both for availability and security.

Quirks, surprises, and ambiguities

Despite underpinning the Web PKI and other critical pieces of Internet infrastructure, there are relatively few independent implementations of X.509 path validation: most platforms and languages reuse one of a small handful of common implementations (OpenSSL and its forks, NSS, Go’s crypto/x509, GnuTLS, etc.) or the host OS’s implementation (CryptoAPI on Windows, Security on macOS). This manifests as a few recurring quirks and ambiguities:

  • A lack of implementation diversity means that mistakes and design decisions (such as overly or insufficiently conservative profile checks) leak into other implementations: users complain when a PKI deployment that was only tested on OpenSSL fails to work against crypto/x509, so implementations frequently bend their specification adherence to accommodate real-world certificates.
  • The specifications often mandate surprising behavior that (virtually) no client implements correctly. RFC 5280, for example, stipulates that path length and name constraints do not apply to self-issued intermediates, but this is widely ignored in practice.
  • Because the specifications themselves are so infrequently interpreted, they contain still-unresolved ambiguities: treating roots as “trust anchors” versus policy-bearing certificates, handling of serial numbers that are 20 bytes long but DER-encoded with 21 bytes, and so forth.

Our implementation needed to handle each of these families of quirks. To do so consistently, we leaned on three basic strategies:

  • Test first, then implement: To give ourselves confidence in our designs, we built x509-limbo and pre-validated it against other implementations. This gave us both a coverage baseline for our own implementation, and empirical justification for relaxing various policy-level checks, where necessary.
  • Keep everything in Rust: Rust’s performance, strong type system and safety properties meant that we could make rapid iterations to our design while focusing on algorithmic correctness rather than memory safety. It certainly didn’t hurt that PyCA Cryptography’s X.509 parsing is already done in Rust, of course.
  • Obey Sleevi’s Laws: Our implementation treats path construction and path validation as a single unified step with no “one” true chain, meaning that the entire graph is always searched before giving up and returning a failure to the user.
  • Compromise where necessary: As mentioned above, implementations frequently maintain compatibility with OpenSSL, even where doing so violates the profiles defined in RFC 5280 and the CABF BRs. This situation has improved dramatically over the years (and improvements have accelerated in pace, as certificate issuance periods have shortened on the Web PKI), but some compromises are still necessary.

Looking forward

Our initial implementation is production-ready, and comes in at around 2,500 lines of Rust, not counting the relatively small Python-only API surfaces or x509-limbo:

From here, there’s much that could be done. Some ideas we have include:

  • Expose APIs for client certificate path validation. To expedite things, we’ve focused the initial implementation on server validation (verifying that a leaf certificate attesting to a specific DNS name or IP address chains up to a root of trust). This ignores client validation, wherein the client side of a connection presents its own certificate for the server to verify against a set of known principals. Client path validation shares the same fundamental chain building algorithm as server validation, but has a slightly different ideal public API (since the client’s identity needs to be matched against a potentially arbitrary number of identities known to the server).
  • Expose different X.509 profiles (and more configuration knobs). The current APIs expose very little configuration; the only things a user of the Python API can change are the certificate subject, the validation time, and the maximum chain depth. Going forward, we’ll look into exposing additional knobs, including pieces of state that will allow users to perform verifications with the RFC 5280 certificate profile and other common profiles (like Microsoft’s Authenticode profile). Long term, this will help bespoke (such as corporate) PKI use cases to migrate to Cryptography’s X.509 APIs and lessen their dependency on OpenSSL.
  • Carcinize existing C and C++ X.509 users. One of Rust’s greatest strengths is its native, zero-cost compatibility with C and C++. Given that C and C++ implementations of X.509 and path validation have historically been significant sources of exploitable memory corruption bugs, we believe that a thin “native” wrapper around cryptography-x509-verification could have an outsized positive impact on the security of major C and C++ codebases.
  • Spread the gospel of x509-limbo. x509-limbo was an instrumental component in our ability to confidently ship an X.509 path validator. We’ve written it in such a way that should make integration into other path validation implementations as simple as downloading and consuming a single JSON file. We look forward to helping other implementations (such as rustls-webpki) integrate it directly into their own testing regimens!

If any of these ideas interests you (or you have any of your own), please get in touch! Open source is key to our mission at Trail of Bits, and we’d love to hear about how we can help you and your team take the fullest advantage of and further secure the open-source ecosystem.

Acknowledgments

This work required the coordination of multiple independent parties. We would like to express our sincere gratitude to each of the following groups and individuals:

  • The Sovereign Tech Fund, whose vision for OSS security and funding made this work possible.
  • The PyCA Cryptography maintainers (Paul Kehrer and Alex Gaynor), who scoped this work from the very beginning and offered constant feedback and review throughout the development process.
  • The BetterTLS development team, who both reviewed and merged patches that enabled x509-limbo to vendor and reuse their (extensive) testsuite.

Celebrating our 2023 open-source contributions

24 January 2024 at 14:00

At Trail of Bits, we pride ourselves on making our best tools open source, such as Slither, PolyTracker, and RPC Investigator. But while this post is about open source, it’s not about our tools…

In 2023, our employees submitted over 450 pull requests (PRs) that were merged into non-Trail of Bits repositories. This demonstrates our commitment to securing the software ecosystem as a whole and to improving software quality for everyone. A representative list of contributions appears at the end of this post, but here are some highlights:

  • Sigstore-conformance, a vital component of our Sigstore initiative in open-source engineering, functions as an integration test suite for diverse Sigstore client implementations. Ensuring conformity to the Sigstore client testing suite, it rigorously evaluates overall client behavior, addressing critical scenarios and aligning with ongoing efforts to establish an official Sigstore client specification. This workflow-focused testing suite seamlessly integrates into workflows with minimal configuration, offering comprehensive testing for Sigstore clients.
  • Protobuf-specs is another initiative in our open-source engineering. It is a collaborative repository for standardized data models and protocols across various Sigstore clients andhouses specifications for Sigstore messages. To update protobuf definitions, use Docker to generate protobuf stubs by running $ make all, resulting in Go and Python files under the ‘gen/’ directory.
  • pyOpenSSL stands as the predominant Python library for integrating OpenSSL functionality. Over approximately the past nine months, we have been actively involved in cleanup and maintenance tasks on pyOpenSSL as part of our contract with the STF. pyOpenSSL serves as a thin wrapper around a subset of the OpenSSL library, where many object methods simply invoke corresponding functions in the OpenSSL library.
  • Homebrew-core serves as the central repository for the default Homebrew tap, encompassing a collection of software packages and associated formulas for seamless installations. Once you’ve configured Homebrew on your Mac or Linux system, you gain the ability to execute “brew install” commands for software available in this repository. Emilio Lopez, an application security engineer, actively contributed to this repository by submitting several pull requests and introducing new formulas or updating existing ones. Emilio’s focus has predominantly been on tools developed by ToB, such as crytic-compile, solc-select, Caracal, and others. Consequently, individuals can effortlessly install these tools with a straightforward “brew install” command, streamlining the installation process.
  • Ghidra, a National Security Agency Research Directorate creation, is a powerful software reverse engineering (SRE) framework. It offers advanced tools for code analysis on Windows, macOS, and Linux, including disassembly, decompilation, and scripting. Supporting various processor instruction sets, Ghidra serves as a customizable SRE research platform, aiding in the analysis of malicious code for cybersecurity purposes. We fixed numerous bugs to enhance its functionality, particularly in support of our work on DARPA’s AMP (Assured Micropatching) program.

We would like to acknowledge that submitting a PR is only a tiny part of the open-source experience. Someone has to review the PR. Someone has to maintain the code after the PR is merged. And submitters of earlier PRs have to write tests to ensure the functionality of their code is preserved.

We contribute to these projects in part because we love the craft, but also because we find these projects useful. For this, we offer the open-source community our most sincere thanks and wish everyone a happy, safe, and productive 2024!

Some of Trail of Bits’ 2023 open-source contributions

AI/ML

Cryptography

Languages and compilers

Libraries

Tech infrastructure

Software analysis tools

Blockchain software

Reverse engineering tools

Software analysis/transformational tools

Packing ecosystem/supply chain

Tag, you’re it: Signal tagging in Circom

2 January 2024 at 14:00

By Tjaden Hess

We at Trail of Bits perform security reviews for a seemingly endless stream of applications that use zero-knowledge (ZK) proofs. While fast new arithmetization and folding libraries like Halo2, Plonky2, and Boojum are rapidly gaining adoption, Circom remains a mainstay of ZK circuit design. We’ve written about Circom safety before in the context of Circomspect, our linter and static analyzer; in this post, we will look at another way to guard against bugs in your Circom circuits using a lesser-known language feature called signal tags. We present four simple rules for incorporating signal tags into your development process, which will help protect you from common bugs and facilitate auditing of your codebase.

This post assumes some familiarity with the Circom language. We will examine some simple Circom programs and demonstrate how signal tags can be used to detect and prevent common classes of bugs; we will also point out potential pitfalls and weaknesses of the signal tagging feature.

Warning: For the remainder of this post, we will be working with Circom 2.1.6. Details of tag propagation have changed since 2.1.0—we highly recommend using version 2.1.6 or higher, as earlier versions contain severe pitfalls not mentioned in this post.

What are signal tags?

Signal tagging is a feature introduced in Circom 2.1.0 that allows developers to specify and enforce—at compile time—ad hoc preconditions and postconditions on templates. Circom tags help developers ensure that inputs to templates always satisfy the requirements of the template, guarding against soundness bugs while reducing duplication of constraints.

Here is the CircomLib implementation of the boolean OR gate:

template OR() {
    signal input {binary} a;
    signal input {binary} b;
    signal output {binary} out;

    out <== a + b - a*b;
}

circomlib/circuits/gates.circom#37–43

Assume that we are writing a ZK circuit that requires proof of authentication whenever either of two values (e.g., outgoing value transfers) is nonzero. An engineer might write this template to enforce the authentication requirement.

// Require `authSucceeded` to be `1` whenever outgoing value is nonzero
template EnforceAuth() {
    signal input valueA;
    signal input valueB;
    signal input authSucceeded;

    signal authRequired <== OR()(valueA, valueB);
    (1 - authSucceeded) * authRequired === 0;
}

When tested with random or typical values, this template will seem to behave correctly; nonzero values of valueA and valueB will be allowed only when authSucceeded is 1.

However, what about when valueA == valueB == 2? Notice that authRequired will be zero and thus the desired invariant of EnforceAuth will be violated.

So what went wrong? There was an implicit precondition on the OR template that a and b both be binary—that is, in the set {0,1}. Violating this condition leads to unexpected behavior.

One way to approach the issue is to add constraints to the OR gate requiring that the inputs be binary:

template OR() {
    signal input a;
    signal input b;
    signal output out;

    // Constrain a and b to be binary
    a * (1 - a) === 0;
    b * (1 - b) === 0;
    
    out <== a + b - a*b;
}

The problem with this approach is that we have just tripled the number of constraints needed per OR gate. Often the inputs will have already been constrained earlier in the circuit, which makes these constraints purely redundant and needlessly increases the compilation and proving time.

In many languages, input constraints would be expressed as types. Circom, unlike more flexible API-driven frameworks like Halo2, does not support expressive types; all signals can carry any value between 0 and P. However, Circom 2.1.0 and higher does support signal tags, which can be used as a sort of ad-hoc type system.

Let’s see how the OR template would look using signal tags:

template OR() {
    signal input {binary} a;
    signal input {binary} b;
    signal output {binary} out;

    out <== a + b - a*b;
}

Notice that the logic is entirely unchanged from the original; tags do not affect the compiled constraint system at all. However, if we try compiling the EnforceAuth template now, we get a compiler error:

Unset
error[T3001]: Invalid assignment: missing tags required by input signal.
 Missing tag: binary
   ┌─ "example1.circom":18:26
   │
18 │     signal authRequired <== OR()(valueA, valueB); │ ^^^^^^^^^^^^^^^^^^^^ found here │ = call trace: ->EnforceAuth

previous errors were found

Input tags are preconditions: requirements that inputs to the template must satisfy. By attaching a signal tag to the input, a developer indicates that the corresponding property must already be enforced; the template itself may assume but not enforce the condition.

Pretty cool! Now how do we rewrite the program to properly use tags? Let’s define a new template that properly checks if each value is zero before computing the OR value.

// `out` is 1 whenever `in` is nonzero, or 0 otherwise
template ToBinary() {
    signal input in;
    // POSTCONDITION: out is either 0 or 1
    // PROOF:
    // in != 0 => out == 1 (by constraint (2))
    // in == 0 => out == 0 (by constraint (1))
    signal output {binary} out;
    
    signal inv <-- in!=0 ? 1/in : 0;
    out <== in*inv;
    in*(1 - out) === 0;
}

This is essentially the negation of CircomLib IsZero template, normalizing the input and adding binary tag to the output. Note that binary is just an arbitrary string – Circom does not know anything about the semantics that we intend binary to have and in particular does not check that out is in the set {0,1}. Circom simply attaches the opaque tag binary to the output wire of IsZero.

Output tags are postconditions: promises that the developer makes to downstream users of the signal.

Note that, as Circom does not check our postconditions for us, we must be very careful not to accidentally assign a label to a signal that could possibly carry a value outside the allowed values for the tag. In order to keep track of all the potential ways that a signal can be assigned a tag, we recommend including a comment just above any template output with tags, explaining the reason that the postcondition is satisfied.

Now we can plug this into our EnforceAuth circuit, and everything compiles!

// Require `authSucceeded` to be `1` whenever outgoing value is nonzero
template EnforceAuth() {
    signal input valueA;
    signal input valueB;
    signal input authSucceeded;

    signal spendsA <== ToBinary()(valueA);
    signal spendsB <== ToBinary()(valueB);

    signal authRequired <== OR()(spendsA, spendsB);
    (1 - authSucceeded) * authRequired === 0;
}

Under the hood, Circom is propagating the tag attached to the output signal of ToBinary, so that spendsA also has the tag. Then when OR checks that its input has the binary tag, it is satisfied.

Tag propagation

Tags are propagated through direct assignment, but not through arithmetic operations. In the following example, signal x acquires the binary tag from in.

template Example {
    signal input {binary} in;

    // x gets the `binary` tag
    signal x <== in;
    // one_minus_x does not have the `binary` tag;
    signal one_minus_x <== 1 - x;

// Compiler Error
    1 === OR()(x, one_minus_x);
    
    // Assume NOT is defined to return a binary output, like OR.
    signal not_x <== NOT()(x);
    // Then this is OK
    1 === OR()(x, not_x);
}

Elements of a signal array have a tag if and only if all members of the array have that tag.

template Example {
    signal input {binary} a;
    signal input {binary} b;
    signal input c;
    
       // xs does not have tag `binary` because `c` does not have the tag
    signal xs[3] <== [a, b, c];

    // Error: missing tag
    1 === OR()(xs[0], xs[1]);
}

Tags with value

A common source of soundness bugs in zero-knowledge circuits occurs when arithmetic operations unexpectedly overflow the finite field modulus. Signal tags in Circom can also carry values, which are compile time variables that are propagated along with the tag. Using tags with value, we can ensure at compile time that operations never overflow.

template EnforceMaxBits(n) {
    assert(n < 254); // Number of bits in the finite field 
    signal input in;

    // REASON: Num2Bits constrains in to be representable by `n` bits
    signal output {maxbits} out;
    out.maxbits = n;

    Num2Bits(n)(in);
    out <== in;
}

// Add two numbers, ensuring that the resut does not overflow
template AddMaxBits(){
    signal input {maxbits} a;
    signal input {maxbits} b;

    // REASON: log(a + b) <= log(2*max(a, b)) = 1 + max(log(a), log(b))
    signal output {maxbits} c;
    c.maxbits = max(a.maxbits, b.maxbits) + 1
    assert(c.maxbits < 254);

    c <== a + b;
}

// Multiply two numbers, ensuring that the resut does not overflow
template MulMaxBits(){  
    signal input {maxbits} a;
    signal input {maxbits} b;

    // REASON: log(a * b) = log(a) + log(b)
    signal output {maxbits} c;
    c.maxbits = a.maxbits + b.maxbits;
    assert(c.maxbits < 254);

    c <== a * b;
}

Tag values must be assigned before the signal is assigned. If a tag value propagates via signal assignment to a signal that already has a different tag value, Circom will throw an error.

Avoiding incorrect tag assignment

While signal tags can help prevent programming errors, the language feature syntax easily allows for accidental or unwarranted addition of tags to signals. Incorrectly assigning a tag to a signal that is not constrained to abide by the rules of that tag undermines the guarantees of the tag system and can easily lead to severe security issues. In order to get the full benefit of signal tags, we recommend strictly adhering to these usage rules.

Rule #1: Output and internal tag annotations must be accompanied by an explanatory comment

We mentioned before that adding tag annotations to output signals is dangerous. Internal signals can also be declared with a tag annotation, which unconditionally adds the tag to the signal. For example, this unsafe modification of the original EnforceAuth program uses tagged internal signals:

// Require `authSucceeded` to be `1` whenever outgoing value is nonzero
template EnforceAuth() {
    signal input valueA;
    signal input valueB;
    signal input authSucceeded;

    // These signals acquire the `binary` tag
// _without_ any checks that the values are in fact binary
// This is UNSAFE
    signal {binary} spendsA <== valueA;
    signal {binary} spendsB <== valueB;

    signal authRequired <== OR()(valueA, valueB);
    (1 - authSucceeded) * authRequired === 0;
}

We strongly recommend that manually tagged internal and output signals be avoided when possible. Any output or internal signal tag annotations must be accompanied by a comment explaining why the tag requirements are satisfied.

Rule #2: Tags should be added to signals using dedicated library templates

In order to minimize the use of manual signal tag annotation in high-level code, we recommend providing a library of helper templates comprising a safe API for using the tag. The following code exemplifies a library for binary values that contains constructors and type-safe operators.

// binary.circom
// Tags:
//     binary: signals must be either 0 or 1

// Create a binary value from a constant 0 or 1
template BinaryConstant(b){
    // REASON: Only valuid values are allowed at compile time
    signal output {binary} out;
    assert(b == 0 || b == 1);
    out <== b;
}

// Constrains a sinal to be binary and returns a tagged output
template EnforceBinary(){
    signal input in;
    // REASON: Only solutions to x*(x-1) = 0 are 0 and 1
    signal output {binary} out;
    
    in * (in - 1) === 0;
    out <== in;
}

// Empty template simply asserts that input is tagged
// Useful for checking / documenting post conditions on output signals
template AssertBinary() {
    signal input {binary} in;
}

// Returns 1 when input is "truthy" (nonzero), 0 when input is zero
template ToBinary() {
    signal input in;
    // REASON:
    // in != 0 => out == 1 (by constraint (2))
    // in == 0 => out == 0 (by constraint (1))
    signal output {binary} out;
    
    signal inv <-- in!=0 ? 1/in : 0;
    out <== in*inv;
    in*(1 - out) === 0;
}

template AND(){
    signal input {binary} a;
    signal input {binary} b;
    // REASON: 1*1 = 1, 1*0 = 0, 0*1 = 0, 0*0 = 0
    signal output {binary} out <== a*b;
}

template NOT(){
    signal input {binary} in;
    // REASON: 1 - 0 = 1, 1 - 1 = 0
    signal output {binary} out <== 1 - in;
}

template OR() {
    signal input {binary} a;
    signal input {binary} b;
    // REASON: a = 0 => out = b - 0*b = b, a = 1 => out = 1 + b - 1*b = 1
    signal output {binary} out;

    out <== a + b - a*b;
}

Once a sufficiently rich library of templates has been established, developers should rarely need to manually add a tag elsewhere. Reducing the number of manual tags makes auditing for correctness much easier.

Postconditions of higher-level templates can be documented using assertion templates like AssertBinary, without using output tag annotations:

template IsZero() {
    signal input in;
    // POSTCONDITION: out has `binary` tag
    signal output out; // Avoid risky output tag annotation here
    
    signal isNonZero <== ToBinary()(in); // Avoid risky internal tag annotation here
    
    out <== Not()(isNonZero);
    
    AssertBinary()(out); // Document and check postcondition with no runtime cost
}

Rule #3: Explicit tag value assignments should be scarce and documented

Most tag values should be assigned automatically by library functions, as in the maxbits example. Whenever a signal is assigned a tag value, an explanatory comment should be nearby.

Rule #4: Tag- with-value must always have a value

Every tag in the codebase must either always have an associated value or never have an associated value. Mixing the two can cause confusion, especially when dealing with signal arrays.

A real-world example

We will look at two issues from our review of Succinct Labs’ Telepathy and explain how Circom tags could have been used to prevent them.

Telepathy is an implementation of the Ethereum sync committee light client protocol, using zkSNARKs written in Circom to accelerate verification of aggregate BLS signatures. The exact details of ETH2.0 light clients and BLS aggregation are not required to understand the bugs, but a refresher on elliptic curves and some notes on big-integer arithmetic in Circom will be useful.

The ETH2.0 light client protocol uses aggregate BLS signatures over the BLS12-381 curve1. Public keys are points (X, Y) on the BLS12-381 curve, where Y2 = X3 + 4 mod Q where Q is a 381-bit prime. Notice that the coordinates of the BLS public keys are 381 bits, while Circom signals can represent at most 254 bits. In order to represent a single public key coordinate, circom-pairing uses seven Circom signals (called “limbs”), each holding a 55-bit value. In order to ensure that representations of big integers are unique and to prevent overflow during arithmetic operations, the developer must ensure that the value of each limb is less than 255.

Ethereum blocks contain commitments to the sync committee public keys in compressed form, meaning that the keys are stored as an X coordinate plus one extra bit to indicate the sign of Y.2 In order to perform arithmetic operations with the curve points, the Telepathy circuits require the prover to provide the Y coordinate corresponding to the public key X coordinate. This Y value is then validated by the SubgroupCheckG1WithValidX template, which in turn enforces that the curve equation holds.

/* VERIFY THAT THE WITNESSED Y-COORDINATES MAKE THE PUBKEYS LAY ON THE CURVE */
    
component isValidPoint[SYNC_COMMITTEE_SIZE];
for (var i = 0; i < SYNC_COMMITTEE_SIZE; i++) {
    isValidPoint[i] = SubgroupCheckG1WithValidX(N, K);
    for (var j = 0; j < K; j++) {
        isValidPoint[i].in[0][j] <== pubkeysBigIntX[i][j];
        isValidPoint[i].in[1][j] <== pubkeysBigIntY[i][j];
    }
}

telepathy-circuits/circuits/rotate.circom#L101-L109

template SubgroupCheckG1WithValidX(n, k){
    signal input in[2][k];
    var p[50] = get_BLS12_381_prime(n, k);
    var x_abs = get_BLS12_381_parameter();
    var b = 4;
    component is_on_curve = PointOnCurve(n, k, 0, b, p);
    for(var i=0; i<2; i++)for(var idx=0; idx<k; idx++)
        is_on_curve.in[i][idx] <== in[i][idx];
}

telepathy-circuits/circuits/pairing/bls12_381_hash_to_G2.circom#L723-L731

However, PointOnCurve assumes that the inputs are properly formatted big integers—in particular that each of the k limbs of Y is less than 2n. This check is never enforced, however, leading to uncontrolled overflow in the intermediate computations. Using this vulnerability, a malicious prover can cause the protocol to become stuck in an irrecoverable state, freezing the light client and any bridge funds depending on continued operation.

Using signal tags could have prevented this bug (TOB-SUCCINCT-1) and two others (TOB-SUCCINCT-2, TOB-SUCCINCT-14) that we found during the review. Properly formed big integer values should have a maxbits tag with a value corresponding to the size of the limbs (in this case, 55). BLS12-381 coordinates should additionally have a fp tag indicating that they are reduced modulo the base field prime. Together these two tags, used to indicate preconditions for templates that expect big integers and reduced finite field elements, would have prevented three major missing constraints in the final circuit.

Conclusion

Circom tags are a powerful feature for preventing bugs due to type confusion, missing range checks, and other common missing constraints. In order to receive the full benefits of the feature and hold yourself accountable for good development practices, follow the four simple rules above.

Tags are not a full solution to ZK circuit security. There are many other types of logic, arithmetic, and integration bugs that can compromise the security of your system. Don’t hesitate to contact us with any questions, and reach out if you would like us to review, specify, or implement any ZK circuit or protocol.

1The “BLS” acronyms in BLS signatures (Boneh–Lynn–Shacham) and BLS curves (Barreto-Lynn-Scott) overlap only for Ben Lynn, whose thesis on pairings is an excellent resource.
2For any X there are at most two corresponding Y values, of the form sqrt(X3 + 4), -sqrt(X3 + 4).

A trail of flipping bits

18 December 2023 at 13:30

By Joop van de Pol

Trusted execution environments (TEE) such as secure enclaves are becoming more popular to secure assets in the cloud. Their promise is enticing because when enclaves are properly used, even the operator of the enclave or the cloud service should not be able to access those assets. However, this leads to a strong attacker model, where the entity interacting with the enclave can be the attacker. In this blog post, we will examine one way that cryptography involving AES-GCM, ECDSA, and Shamir’s secret sharing algorithm can fail in this setting—specifically, by using the Forbidden attack on AES-GCM to flip bits on a private key shard, we can iteratively recover the private key.

The Trail of Bits mascot, an octopus, wears a detective's fedora and examines a trail of bits (0s and 1s) through a magnifying glass.

Trusted Enclaves

TEEs come in all shapes and sizes. They can be realized using separate secure hardware, such as Hardware Security Modules (HSM), Trusted Platform Modules (TPM), or other dedicated security chips as part of a system on chip (SoC). It’s also possible to implement them in hardware that is shared with untrusted entities, using memory isolation techniques such as TrustZone or a hypervisor. Examples in this category are secure enclaves such as Intel SGX, Amazon Nitro, etc.

One challenge secure enclaves face is that they have little to no persistent memory, so large amounts of data that need to be available across power cycles must be stored outside the enclave. To keep this data secure, it must be encrypted using a storage key that is stored either inside the trusted environment or inside an external Key Management Service (KMS) that restricts access to the enclave (e.g., through some form of attestation).

Figure 1: Design of a typical secure enclave, where encrypted data is stored outside the enclave and the data encryption key is securely stored outside the enclave in a KMS

However, because the data is stored externally, the untrusted entity interacting with the enclave will see this data and can potentially modify it. Even when using strong cryptography such as authenticated encryption—typically Authenticated Encryption with Additional Data (AEAD)—it is very difficult for the enclave to protect itself against rollback attacks, where the untrusted entity replaces the external data with an earlier version of the same data since both of them will pass authentication. A tempting solution would be to version data stored externally to the enclave, but because the enclave is stateless and doesn’t know what the latest version should be, this quickly becomes a chicken-and-egg problem. Therefore, keeping track of version numbers or usage counters in this setting is difficult, if not impossible.

Signing in a trusted enclave

One interesting application for trusted enclaves is holding digital signature private keys (such as ECDSA keys) to perform signing. If set up correctly, no one can exfiltrate the signing keys from the enclave. However, because the signing keys must be available even after a power cycle of the enclave, they must typically be stored persistently in some external storage. To prevent anyone with access to this external storage from obtaining or modifying the signing key, it needs to be encrypted using an AEAD.

Figure 2: Design for signing with a trusted enclave, where the encrypted signing key is stored outside the enclave and encrypted with a key protected and managed by a KMS

Enter everyone’s favorite AEAD: AES-GCM! Due to its brittle design, the authentication guarantees are irrevocably broken as soon as the nonce is reused to encrypt two different signing keys. Because the AES block size is limited to 128 bits and because you need 32 bits for the counter, you have only 96 bits for your nonce. No worries, though; you just have to make sure you don’t invoke AES-GCM with the same secret key using random nonces more than 232 times! So the enclave just has to keep track of a usage counter. Alas, as previously stated, that’s basically impossible.1

Figure 3: Preventing AES-GCM misuse in an enclave requires maintaining state to monitor AES-GCM usage and must prevent rollback attacks where an attacker replays an old state, though this is difficult to achieve in practice.

So an attacker can have the enclave generate an arbitrary number of signing keys, all of which it must encrypt to store them externally. Eventually, the nonce will repeat, and the attacker can recover the AES-GCM hash key using the Forbidden attack. The details are not very important, but essentially, with the AES-GCM hash key, the attacker can take any existing AES-GCM ciphertext and tag, modify the ciphertext in some way, and use the hash key to update the tag. Specifically, they can flip bits in the ciphertext, which, when decrypted by the enclave, will result in the original plaintext except that the same bits will be flipped. This is not good. But how bad is it?

Attacking ECDSA signatures

The attack is not specific to ECDSA, so understanding all the specific mathematics behind ECDSA is not required. The only important background needed to understand the attack is an understanding of how ECDSA key pairs are constructed. The private key corresponds to a number (also called a scalar) d. To obtain the corresponding public key Q, the private key is multiplied by the base point G of the specific elliptic curve you want to use.

Q = d · G

By leveraging the broken AES-GCM authentication, the attacker can flip bits in the encrypted private key and have the enclave decrypt it and use it to sign a message. As the encryption part of AES-GCM is essentially counter mode, flipping bits in the encrypted private key will cause the same bit flips in the corresponding plaintext private key.

Figure 4: By modifying the ciphertext stored in external storage, an attack can cause the secure enclave to sign messages with a modified key without having to target the enclave itself.

What happens when we flip the least significant bit of the private key? A zero bit would become a one, which is equivalent to adding one to the private key. Conversely, a one bit would become a zero, which is equivalent to subtracting one from the private key. Essentially, the effect that the bit flip has on the private key depends on the unknown value of the private key bit.

That’s great, but how can we know which of these two options happened without knowing the private key? Well, if we generate a signature with the flipped private key, we can verify the signature using a modified public key by adding or subtracting the generator. If it verifies with the added generator, we know that the private key bit was zero, whereas if it verifies with the subtracted generator, we know that the private key bit was one.

(d + 1) · G = d · G + G = Q + G

(d – 1) · G = d · GG = QG 

We can now repeat the process to recover other bits of the private key. Instead of adding or subtracting one, we’ll be adding or subtracting a power of two from the private key. By adding or subtracting the corresponding multiples of the generator from the public key, we learn a new bit of the private key. It’s not strictly necessary to recover one bit at a time. You can flip multiple bits and try signature verification based on all the possible effects these flipped bits can have on the private key.

Splitting the bit

Interestingly, the attack still works when the private key is split into different shards using Shamir’s secret sharing algorithm before encryption. The enclave receives the different encrypted shards, decrypts them, recombines the shards into the private key, and then signs. As a result, we cannot directly flip individual bits in the private key.

But what happens when we flip a bit in one of the shards? In Shamir’s secret sharing (see also our excellent ZKDocs article on this topic), each shard consists of a pair of x and y values that are used to interpolate a polynomial using Lagrange interpolation. The secret value is given by the value of the interpolated polynomial when evaluated at x = 0.

Flipping bits in one of the y values changes the interpolated polynomial, which corresponds to a different secret—in our case, the private key. Basically, recombining the secret corresponds to a sum of weighted y values, where each weight is a Lagrange coefficient λj that can easily be computed from the x coordinates (which are typically chosen to be consecutive integers starting from one up to the number of shards).

Putting all this together, flipping bits in one of the shares adds to or subtracts from the share, depending on the value of the bit. This then results in adding or subtracting a multiple of the corresponding Lagrange coefficient λj from the private key. By generating signatures with this modified private key and validating them using modified public keys, we can recover the values of the secret shares bit by bit. After obtaining the shares, we can recombine them into the private key. All in all, this shows that the enclave operator could extract the private key from the enclave, despite all the cryptography and isolation involved.

Final bit

As this exploration of the Forbidden attack on AES-GCM in secure enclaves reveals, cryptographic primitives such as AES-GCM, ECDSA, and Shamir’s secret sharing, while generally robust, may still be vulnerable if deployed incorrectly. The complexity of TEEs and the evolving nature of adversarial methods make safeguarding sensitive data a difficult task. At Trail of Bits, we understand these challenges. Using our deep expertise in cryptography and application security, we provide comprehensive system audits, identifying potential vulnerabilities and offering effective mitigation strategies. By partnering with us, developers can better avoid potential cryptographic pitfalls and improve the overall security posture of their TEEs.
1 You could argue that, in this toy example, the KMS could keep track of the usage counter because it controls access to the storage key. However, in practice, the KMS is usually quite limited in the type of data it can encrypt and decrypt (typically only cryptographic keys). It is likely not possible to encrypt secret key shards, for example.

Public Report – Security Review of RSA Blind Signatures with Public Metadata

14 December 2023 at 15:00

During the Autumn of 2023, Google engaged NCC Group to conduct a security assessment of the white paper entitled RSA Blind Signatures with Public Metadata, along with the corresponding IETF draft for Partially Blind RSA Signatures. The work is inspired by the growing importance of anonymous tokens for the privacy of real-world applications. In particular, the paper aims to modify the standard RSA Blind signature protocol such that signatures can only be generated for a specific choice of public metadata.

The security assessment of the protocol was performed through an analysis of both the whitepaper and online draft, with direct communication with the Google team. Additionally, a SageMath implementation of the protocol was written following the specification outlined in the IETF draft. The review was performed by three consultants over two weeks for a total of fifteen person-days.

In early November 2023, a retest was performed by two consultants, for four person-days of additional efforts.

Public Report – Aleo snarkVM Implementation Review

13 December 2023 at 14:45

During late summer 2023, Aleo Systems Inc. engaged NCC Group’s Cryptography Services team to conduct an implementation review of several components of snarkVM, a virtual machine for zero-knowledge proofs. The snarkVM platform allows users to write and execute smart contracts in an efficient, yet privacy-preserving manner by leveraging zero-knowledge succinct non-interactive arguments of knowledge (zk-SNARKs). The review was performed remotely by 4 consultants with a combined total of 60 person-days of effort, including a retest phase performed a few months after the original engagement.

Publishing Trail of Bits’ CodeQL queries

6 December 2023 at 13:30

By Paweł Płatek

We are publishing a set of custom CodeQL queries for Go and C. We have used them to find critical issues that the standard CodeQL queries would have missed. This new release of a continuously updated repository of CodeQL queries joins our public Semgrep rules and Automated Testing Handbook in an effort to share our technical expertise with the community.

For the initial release of our internal CodeQL queries, we focused on issues like misused cryptography, insecure file permissions, and bugs in string methods:

Language Query name Vulnerability description
Go Message not hashed before signature verification This query detects calls to (EC)DSA APIs with a message that was not hashed. If the message is longer than the expected hash digest size, it is silently truncated.
Go File permission flaws This query finds non-octal (e.g., 755 vs 0o755) and unsupported (e.g., 04666) literals used as a filesystem permission parameter (FileMode).
Go Trim functions misuse This query finds calls to string.{Trim,TrimLeft,TrimRight} with the second argument not being a cutset but a continuous substring to be trimmed.
Go Missing MinVersion in tls.Config This query finds cases when you do not set the tls.Config.MinVersion explicitly for servers. By default, version 1.0 is used, which is considered insecure. This query does not mark explicitly set insecure versions.
C CStrNFinder This query finds calls to functions that take a string and its size as separate arguments (e.g., strncmp, strncat) but the size argument is wrong.
C Missing null terminator This query finds incorrectly initialized strings that are passed to functions expecting null-byte-terminated strings.

CodeQL 101

CodeQL is the static analysis tool powering GitHub Advanced Security and is widely used throughout the community to discover vulnerabilities. CodeQL operates by transforming the code being tested into a database that is queryable using a Datalog-like language. While the core engine of CodeQL remains proprietary and closed source, the tool offers open-source libraries implementing various analyses and sets of security queries.

To test our queries, install the CodeQL CLI by following the official documentation. Once the CodeQL CLI is ready, download Trail of Bits’ query packs and check whether the new queries are detected:

codeql pack download trailofbits/{cpp,go}-queries
codeql resolve qlpacks | grep trailofbits

Now go to your project’s root directory and generate a CodeQL database, specifying either go or cpp as the programming language:

codeql database create codeql.db --language go

If the generation hasn’t succeeded or the project has a complex build system, use the command flag. Finally, execute Trail of Bits’ queries against the database:

codeql database analyze database.db --format=sarif-latest --output=./tob.sarif -- trailofbits/go-queries

Output of the analysis is in the Static Analysis Results Interchange Format (SARIF). Use Visual Studio Code with SARIF Viewer plugin to open it and triage findings. Alternatively, upload results to GitHub or use --format csv to get results in text form.

(EC)DSA silent input truncation in Go

Let’s sign the /etc/passwd file using ECDSA. Is the following implementation secure?

func main() {
    privateKey, err := ecdsa.GenerateKey(elliptic.P256(), rand.Reader)
    if err != nil { panic(err) }

    data, err := os.ReadFile("/etc/passwd")
    if err != nil { panic(err) }

    sig, err := ecdsa.SignASN1(rand.Reader, privateKey, data)
    if err != nil { panic(err) }
    fmt.Printf("signature: %x\n", sig)

    valid := ecdsa.VerifyASN1(&privateKey.PublicKey, data, sig)
    fmt.Println("signature verified:", valid)
}

Figure 1: An example signature generation and verification function

Of course it isn’t. The issue lies in passing raw, unhashed, and potentially long data to the ecdsa.SignASN1 and ecdsa.VerifyASN1 methods, while the Go crypto/ecdsa package (and a few other packages) expects data for signing and verification to be a hash of the actual data.

This behavior means that the code signs and verifies only the first 32 bytes of the file, as the size of the P-256 curve used in the example is 32 bytes.

The silent truncation of input data occurs in the hashToNat method, which is used internally by the ecdsa.{SignASN1,VerifyASN1} methods:

// hashToNat sets e to the left-most bits of hash, according to
// SEC 1, Section 4.1.3, point 5 and Section 4.1.4, point 3.
func hashToNat[Point nistPoint[Point]](c *nistCurve[Point], e *bigmod.Nat, hash []byte) {
    // ECDSA asks us to take the left-most log2(N) bits of hash, and use them as
    // an integer modulo N. This is the absolute worst of all worlds: we still
    // have to reduce, because the result might still overflow N, but to take
    // the left-most bits for P-521 we have to do a right shift.
    if size := c.N.Size(); len(hash) > size {
        hash = hash[:size]

Figure 2: The silent truncation of input data (crypto/ecdsa/ecdsa.go)

We have seen this vulnerability in real-world codebases and the impact was critical. To address the issue, there are a couple of approaches:

  1. Length validation. A simple approach to prevent the lack-of-hashing issues is to validate the length of the provided data, as done in the go-ethereum library.
    func VerifySignature(pubkey, msg, signature []byte) bool {
        if len(msg) != 32 || len(signature) != 64 || len(pubkey) == 0 {
            return false
        }

    Figure 3: Validation function from the go-ethereum library
    (go-ethereum/crypto/secp256k1/secp256.go#126–129)

  2. Static detection. Another approach is to statically detect the lack of hashing. For this purpose, we developed the tob/go/msg-not-hashed-sig-verify query, which detects all data flows to potentially problematic methods, ignoring flows that initiate from or go through a hashing function or slicing operation.

    An interesting problem we had to solve was how to set starting points (sources) for the data flow analysis? We could have used the UntrustedFlowSource class for that purpose. Then the analysis would be finding flows from any input potentially controlled by an attacker. However, UntrustedFlowSource often needs to be extended per project to be useful, so using it for our analysis would result in a lot of flows missed for a lot of projects. Therefore, our query focuses on finding the longest data flows, which are more likely to indicate potential vulnerabilities.

File permissions flaws in Go

Can you spot a bug in the code below?

if err := os.Chmod(“./secret_key”, 400); err != nil {
    return
}

Figure 4: Buggy Go code

Okay, so file permissions are usually represented as octal integers. In our case, the secret key file would end up with the permission set to 0o620 (or rw--w----), allowing non-owners to modify the file. The integer literal used in the call to the os.Chmod method is—most probably—not the one that a developer wanted to use.

To find unexpected integer values used as FileModes, we implemented a WYSIWYG (“what you see is what you get”) heuristic in the tob/go/file-perms-flaws CodeQL query. The “what you see” is a cleaned-up integer literal (a hard-coded number of the FileMode type)—with removed underscores, a removed base prefix, and left-padded zeros. The “what you get” is the same integer converted to an octal representation. If these two parts are not equal, there may be a bug present.

// what you see
fileModeAsSeen = ("000" + fileModeLitStr.replaceAll("_", "").regexpCapture("(0o|0x|0b)?(.+)", 2)).regexpCapture("0*(.{3,})", 1)

// what you get
and fileModeAsOctal = octalFileMode(fileModeInt)

// what you see != what you get
and fileModeAsSeen != fileModeAsOctal

Figure 5: The WYSIWYG heuristic in CodeQL

To minimize false positives, we filter out numbers that are commonly used constants (like 0755 or 0644) but in decimal or hexadecimal form. These known, valid constants are explicitly defined in the isKnownValidConstant predicate. Here is how we implemented this predicate:

predicate isKnownValidConstant(string fileMode) {
  fileMode = ["365", "420", "436", "438", "511", "509", "493"]
  or
  fileMode = ["0x16d", "0x1a4", "0x1b4", "0x1b6", "0x1ff", "0x1fd", "0x1ed"]
}

Figure 6: The CodeQL predicate that filters out common file permission constants

Using non-octal representation of numbers isn’t the only possible pitfall when dealing with file permissions. Another issue to be aware of is the use of more than nine bits in calls to permission-changing methods. File permissions are encoded only as the first nine bits, and the other bits encode file modes such as sticky bit or setuid. Some permission changing methods—like os.Chmod or os.Mkdirignore a subset of the mode bits, depending on the operating system. The tob/go/file-perms-flaws query warns about this issue as well.

String trimming misuses in Go

API ambiguities are a common source of errors, especially when there are multiple methods with similar names and purposes accepting the same set of arguments. This is the case for Go’s strings.Trim family of methods. Consider the following calls:

strings.TrimLeft("file://FinnAndHengest", "file://")
strings.TrimPrefix("file://FinnAndHengest", "file://")

Figure 7: Ambiguous Trim methods

Can you tell the difference between these calls and determine which one works “as expected”?

According to the documentation, the strings.TrimLeft method accepts a cutset (i.e., a set of characters) for removal, rather than a prefix. Consequently, it deletes more characters than one would expect. While the above example may seem innocent, a bug in a cross-site scripting (XSS) sanitization function, for example, could have devastating consequences.

When looking for misused strings.Trim{Left,Right} calls, the tricky part is defining what qualifies as “expected” behavior. To address this challenge, we developed the tob/go/trim-misuse CodeQL query with simple heuristics to differentiate between valid and possibly mistaken calls, based on the cutset argument. We consider a Trim operation invalid if the argument contains repeated characters or meets all of the following conditions:

  • Is longer than two characters
  • Contains at least two consecutive alphanumeric characters
  • Is not a common list of continuous characters

While the heuristics look oversimplified, they worked well enough in our audits. In CodeQL, the above rules are implemented as shown below. The cutset is a variable corresponding to the cutset argument of a strings.Trim{Left,Right} method call.

// repeated characters imply the bug
cutset.length() != unique(string c | c = cutset.charAt(_) | c).length()

or
(
// long strings are considered suspicious
cutset.length() > 2

// at least one alphanumeric
and exists(cutset.regexpFind("[a-zA-Z0-9]{2}", _, _))

// exclude probable false-positives
and not cutset.matches("%1234567%")
and not cutset.matches("%abcdefghijklmnopqrstuvwxyz%")
)

Figure 8: CodeQL implementation of heuristics for a Trim operation

Interestingly, misuses of the strings.Trim methods are so common that Go developers are considering deprecating and replacing the problematic functions.

Identifying missing minimum TLS version configurations in Go

When using static analysis tools, it’s important to know their limitations. The official go/insecure-tls CodeQL query finds TLS configurations that accept insecure (outdated) TLS versions (e.g., SSLv3, TLSv1.1). It accomplishes that task by comparing values provided to the configuration’s MinVersion and MaxVersion settings against a list of deprecated versions. However, the query does not warn about configurations that do not explicitly set the MinVersion.

Why should this be a concern? The reason is that the default MinVersion for servers is TLSv1.0. Therefore, in the example below, the official query would mark only server_explicit as insecurely configured, despite both servers using the same MinVersion.

server_explicit := &http.Server{
    TLSConfig: &tls.Config{MinVersion: tls.VersionTLS10}
}
server_default := &http.Server{TLSConfig: &tls.Config{}}

Figure 9: Explicit and default configuration of the MinVersion setting

The severity of this issue is rather low since the default MinVersion for clients is a secure TLSv1.2. Nevertheless, we filled the gap and developed the tob/go/missing-min-version-tls CodeQL query, which detects tls.Config structures without the MinVersion field explicitly set. The query skips reporting configurations used for clients and limits false positives by filtering out findings where the MinVersion is set after the structure initialization.

String bugs in C and C++

Building on top of the insightful cstrnfinder research conducted by one of my Trail of Bits colleagues, we developed the tob/cpp/cstrnfinder query. This query aims to identify invalid numeric constants provided to calls to functions that expect a string and its corresponding size as input—such as strncmp, strncpy, and memmove. We focused on detecting three erroneous cases:

  • Buffer underread. This occurs when the size argument (number 20 in the example below) is slightly smaller than the source string’s length:
    if (!strncmp(argv[1], "org/tob/test/SafeData", 20)) {
        puts("Secure");
    } else {
        puts("Not secure");
    }

    Figure 10: A buffer underread bug example

    Here, the length of the "org/tob/test/SafeData" string is 21 bytes (22 if we count the terminating null byte). However, we are comparing only the first 20 bytes. Therefore, a string like "org/tob/test/SafeDatX" is incorrectly matched.

  • Buffer overread. This arises when the size argument (14 in the example below) is greater than the length of the input string, causing the function to read out of bounds.
    int check(const char *password) {
        const char pass[] = "Silmarillion";
        return memcmp(password, pass, 14);
    }

    Figure 11: A buffer overread bug example

    In the example, the length of the "Silmarillion" string is 12 bytes (13 with the null byte). If the password is longer than 13 bytes and starts with the "Silmarillion" substring, then the memcmp function reads data outside of the pass buffer. While functions operating on strings stop reading input buffers on a null byte and will not overread the input, the memcmp function operates on bytes and does not stop on null bytes.

  • Incorrect use of string concatenation function. If the size argument (BUFSIZE-1 in the example below) is greater than the source string’s length (the length of “, Beowulf\x00”, so 10 bytes), the size argument may be incorrectly interpreted as the destination buffer’s size (BUFSIZE bytes in the example), instead of the input string’s size. This may indicate a buffer overflow vulnerability.
    #define BUFSIZE 256
    
    char all_books[BUFSIZE];
    FILE *books_f = fopen("books.txt", "r");
    fgets(all_books, BUFSIZE, books_f);
    fclose(books_f);
    
    strncat(all_books, ", Beowulf", BUFSIZE-1);
    // safe version: strncat(all_books, ", Beowulf", BUFSIZE-strlen(dest)-1);

    Figure 12: A strncat function misuse bug example

    In the code above, the all_books buffer can hold a maximum 256 bytes of data. If the books.txt file contains 250 characters, then the remaining space in the buffer before the call to the strncat function is 6 bytes. However, we instruct the function to add up to 255 (BUFSIZE-1) bytes to the end of the all_books buffer. Therefore, a few bytes of the “, Beowulf” string will end up outside the allocated space. What we should do instead is instruct the strncat to add at most 5 bytes (leaving 1 byte for the terminating \x00).

    There is a similar built-in query with ID cpp/unsafe-strncat, but it doesn’t work with constant sizes.

Missing null terminator bug in C

Both C and C++ allow developers to construct fixed-size strings with an initialization literal. If the length of the literal is greater than or equal to the allocated buffer size, then the literal is truncated and the terminating null byte is not appended to the string.

char b1[18] = "The Road Goes Ever On";  // missing null byte, warning
char b2[13] = "Ancrene Wisse";  // missing null byte, NO WARNING
char b3[] = "Farmer Giles of Ham"; // correct initialization
char b4[3] = {'t', 'o', 'b'} // not a string, lack of null byte is expected

Figure 13: Example initializations of C strings

Interestingly, C compilers warn against initializers longer than the buffer size, but don’t raise alarms for initializers of a length equal to the buffer size—even though neither of the resulting strings are null-terminated. C++ compilers return errors for both cases.

The tob/cpp/no-null-terminator query uses data flow analysis to find incorrectly initialized strings passed to functions expecting a null-terminated string. Such function calls result in out-of-bounds read or write vulnerabilities.

CodeQL: past, present, and future

This will be a continuing project from Trail of Bits, so be on the lookout for more soon! One of our most valuable developments is our expertise in automated bug finding. This new CodeQL repository, the Semgrep rules, and the Automated Testing Handbook are key methods to helping others benefit from our work. Please use these resources and report any issues or improvements to them!

If you’d like to read more about our work on CodeQL, we have used its capabilities in several ways, such as detecting iterator invalidations, identifying unhandled errors, and uncovering divergent representations.

Contact us if you’re interested in customizing CodeQL queries for your project.

A deep look into Cipher Block Chaining (CBC) Algorithm Bit Flipping

17 November 2023 at 03:00

The motivation for this article came after a long journey trying to study a bit flipping attack on the CBC (Cipher Block Chaining) mode cipher. The conclusion of my initial journey in this study took place after the construction of a CTF challenge that I created for an event by Boitatech. This article will cover the theory behind the attack.

Portuguese version: here 🇧🇷

Primitives

Before we start, we should note some interesting readings or knowledge to assist all the logic are the general concepts of cryptography, properties of XOR operations and of course, for the resolution of the examples, Python programming.

As described in Wikipedia about modes of operation including CBC, they only guarantee the confidentiality of the message, but the integrity is affected. This is very important considering the bit-flipping attack that explores precisely this property.

AES CBC

The CBC mode is one of the modes of operation of AES (Advanced Encryption Standard) a symmetric encryption algorithm, which means that the same key is used both to encrypt and decrypt the data.

Encryption

The CBC as the name says refers to the operation of the algorithm, which divides the plaintext into blocks, and each block is always considered in the operation of the following block, the following image shows the process.

As the first block does not have a previous block, what is used is the Initialization Vector, better known as IV, so an XOR operation is done being the 1st plaintext block $\oplus$ IV, the result is sent to the AES function which will result in the first block of ciphertext, note that this block also goes to the XOR of the second plaintext block and thus the process repeats itself with the other blocks.

The mathematical formula would be:

\[\large C_i=E_k(P_i \oplus C_{i-1}) \\ \large C_0=IV\]

Decryption

The decryption process follows the reverse logic of the encryption process. The first block of ciphertext is used in the decryption function, and this result is XORed with the IV to give the plaintext. Note that at the beginning, the first block of ciphertext is passed to the second stage in an XOR operation with the result of the decryption function of the second block. This process keeps repeating, always considering the previous blocks.

In the mathematical formula:

\[\large P_i=D_k(C_i) \oplus C_{i-1} \\ \large C_0 = IV\]

The exploitation of bit-flipping, which will be addressed in the next topics, occurs due to problems in the way the CBC’s decryption process takes place. Understanding it is crucial for the execution of the attack.

Bit flipping basics

Alice and Bob

The idea of bit-flipping can be explained by considering the scenario in the image above, where there is an insecure communication channel. For the purpose of our example, let’s assume that Alice will send a message to Bob. This message will be encrypted by Alice, so what will actually be sent is the ciphertext. Eve, the attacker, will intercept the ciphertext in the channel, and by carrying out the bit-flipping attack, will modify the content of the plaintext from the ciphertext without knowing the key. Thus, when Bob decrypts the message after receiving it, he will end up with a modified plaintext. The confidentiality of the message has been maintained, as it continued to be encrypted and its plaintext confidential; however, the integrity was affected since the plaintext was altered by Eve.

TLDR; bit-flipping is altering the final plaintext

I believe things will become clearer during the exploration and examples, but keep the TLDR in mind, what we’re talking about here is modifying the content of messages without knowing the key. Throughout the article, it will become clear that there are variations of the exploitation depending on how much the attacker knows about the original plaintext and other factors related to the encryption operation in question.

Understanding the problem

As discussed earlier, the bit-flipping attack on CBC is due to the decrypt process and is related to XOR operations.

Analyzing the above image we have:

  • Decrypt process being performed;
  • A plaintext/ciphertext of 3 blocks of 16 bytes, 3 * 16 bytes = 48 bytes in total

For the sake of example, let’s consider the following goal: We want to change the third, fourth and fifth bytes of the third block. In this case we know the original plaintext.

To do this it will be necessary to modify the third, fourth and fifth bytes of the second block. Notice that they are part of the XOR operation: $P_3 = D_k(C_3) \oplus C_2$, being:

  • $P_3$ the plaintext of the third block;
  • $C_2$ the ciphertext of the second block;
  • $D_k(C_3)$ the result of the decrypt of block 3;

The bit-flipping attack aims to change the final plaintext, in our case the altered plaintext will be $P_3’$, for this to happen, we should change the ciphertext, in our case the ciphertext of the previous block, we will call it $C_2’$, follows the logic:

Follow the operations below:

  • Starting with:
\[\large P_3 = D_k(C_3) \oplus C_2 \\ \large D_k(C_3) = P_3 \oplus C_2\]
  • Considering a modified plaintext we would have:
\[\large P_3'=D_k(C_3) \oplus C_2'\]
  • The common factor between the equations is $D_k(C_3)$, therefore replacing in the modified plaintext equation, we have:
\[\large P_3'= P_3 \oplus C_2 \oplus C_2' \\ \large C_2' = P_3 \oplus C_2 \oplus P_3'\]

Notice how the last equation turned out, we have a definition as being modified ciphertext = original plaintext $\oplus$ original previous ciphertext $\oplus$ plaintext that the attacker wants. The most interesting thing of all is that at no time do we need the key or understand the functions $E_k()$ and $D_k()$ we can reach an equation to carry out the attack just with XOR properties.

General bit flipping equation:

\[\large P_i' = P_i \oplus C_{i-1} \oplus C_{i-1}' \\ \large C_{i-1}' = P_i \oplus C_{i-1} \oplus P_i' \\ \large C_0 = IV ^ * \\ \large C_0' = IV' ^ *\]
  • * : It is also possible to work with the IV depending on the case, in the example of the CTF challenge covered in this article the attack involves modification of the IV

Weaponizing

Now that we have a theoretical definition about the attack and the CBC operation mode, let’s move on to practical examples. To simplify some things, we will use two python libraries:

$ pip install pycryptodome pwntools

We will start the following script to generate the ciphertext that we will attack:

from Crypto.Cipher import AES
from Crypto.Util.Padding import pad, unpad

def encrypt(key, iv, plaintext):
    plaintext = pad(plaintext.encode(), AES.block_size) 
    # add padding
    cipher = AES.new(key, AES.MODE_CBC, iv)
    ciphertext = cipher.encrypt(plaintext)
    return ciphertext

# random generated 16 bytes key
key = bytes.fromhex('63f835d0e3b9c70130797be59e25c00f')
# random generated 16 bytes iv
iv = bytes.fromhex('b05fee43fe4db7c5503b1f6732fedd1b') 
print('Key: ', key.hex())
print('IV: ', iv.hex())

plaintext = 'DOGECOIN transaction: 100 DOGE to [email protected]'
print('Plaintext: ', plaintext)

ciphertext = encrypt(key, iv, plaintext)
print('Ciphertext in bytes:', ciphertext)
print('Ciphertext in hex: ', ciphertext.hex())

Note that the plaintext of the example is:

plaintext = 'DOGECOIN transaction: 100 DOGE to [email protected]'

Now let’s go to the attack scenario:

Alice performs a DOGECOIN transaction and sends 100 DOGE to BOB, who has the address “[email protected]”. Eve, the attacker plans to carry out a bitflipping attack and change the destination of the bitcoins to “[email protected]”.

With the attack scenario in mind, we can then set up the modified plaintext and outline our “target” bytes.

mod_plaintext = 'DOGECOIN transaction: 100 DOGE to [email protected]'
#                                                  ^^^          

In this practical example I placed the bytes to be changed in the same position as the theoretical example looking for a better understanding. Notice that the difference between the original plaintext and the modified one are only the third, fourth and fifth byte of the third block, so we have:

  • B turns into E (third byte of the third block, 34° total)
  • O turns into V (fourth byte of the third block, 35° total)
  • B turns into E (fifth byte of the third block, 36° total)

Remembering the bit-flipping equation presented in previous topics:

\[\large C_{i-1}' = P_i \oplus C_{i-1} \oplus P_i'\]

Trying to transform from mathematics to Python, we need to note that in the mathematical formula we are referring to the blocks of ciphertext/plaintext, where $i$ represents the block number, when we think in Python, a “byte by byte” operation is being performed, we are referring exactly to the position of the byte we want to modify in relation to the ciphertext/plaintext, so applying this, $i$ will now be the position of the byte to be modified in the final plaintext, now we will have to find that same position but related to the previous block, that is, going back 16 bytes, so $C_{i-1}’$ would be in python mod_ciphertext[i - 16].

target_pos = i
mod_ciphertext[i - 16] = plaintext[i] ^ ciphertext[i - 16] ^ mod_plaintext[i]
# mod_ciphertext = original_byte_plaintext ^ original_ciphertext_in_previous_block ^ malicious_mod_plaintext_byte

Notice, it’s an operation directly related to the bytes! If you are looking for an operation in python that defines this whole article, it is just above, if you have to remember something remember it!

Bit-flipping attack

To apply this operation in the script and carry out the attack, we need to modify the code a bit, adding a decrypt function (to test if the attack worked). Notice that there is a key in the code, but only to create the ciphertext and then perform the decrypt, it is not used in the attack.

from Crypto.Cipher import AES
from Crypto.Util.Padding import pad, unpad

def encrypt(key, iv, plaintext):
    plaintext = pad(plaintext.encode(), AES.block_size) 
    # add padding
    cipher = AES.new(key, AES.MODE_CBC, iv)
    ciphertext = cipher.encrypt(plaintext)
    return ciphertext

def decrypt(key, iv, ciphertext):
    cipher = AES.new(key, AES.MODE_CBC, iv)
    plaintext = cipher.decrypt(ciphertext)
    plaintext = unpad(plaintext, 16) 
    # remove padding
    return plaintext

key = bytes.fromhex('63f835d0e3b9c70130797be59e25c00f')
iv = bytes.fromhex('b05fee43fe4db7c5503b1f6732fedd1b') 

plaintext = 'DOGECOIN transaction: 100 DOGE to [email protected]'
original_plaintext = plaintext
ciphertext = encrypt(key, iv, plaintext)

mod_plaintext = 'DOGECOIN transaction: 100 DOGE to [email protected]'

target_pos1 = 34 # B to E
target_pos2 = 35 # O to V
target_pos3 = 36 # B to E

mod_ciphertext = bytearray(ciphertext)

# change B to E in position 34
mod_ciphertext[target_pos1 - 16] = ord('B') ^ ciphertext[target_pos1 - 16] ^ ord('E') # xor operation
# change O to V in position 35
mod_ciphertext[target_pos2 - 16] = ord('O') ^ ciphertext[target_pos2 - 16] ^ ord('V') # xor operation
# change B to E in position 36
mod_ciphertext[target_pos3 - 16] = ord('B') ^ ciphertext[target_pos3 - 16] ^ ord('E') # xor operation

print('[+] Old ciphertext: ', ciphertext.hex())
print('[+] New ciphertext: ', mod_ciphertext.hex())

print('[-] Decrypting...')

mod_plaintext = decrypt(key, iv, mod_ciphertext)

print('[+] Original plaintext: ', original_plaintext)
print('[+] Modified ciphertext ', mod_plaintext)

And as a result we have:

b'DOGECOIN transac\xc9L\xba\xf3l\xdatY\xb4\x94\xcd{\xef%+po [email protected]'

We managed to change the plaintext! Our malicious and modified ciphertext entered the decrypt and in the end we obtained the modified plaintext! By itself, the attack was completed, but as can be seen, the final plaintext ended up with certain strange bytes, which do not correspond to the original plaintext.

Variations

Before starting the topic of attack variations, I would like to bring back the image that represents the bit-flipping attack.

Notice the colors and identify them in the terminal print that contains the result of the attack.

  • Yellow: The modified bytes of the ciphertext
  • Green: The modified plaintext
  • Red: The corresponding bytes of the block previous to the bytes of the modified plaintext; the entire block was affected, affecting the final plaintext.

The conclusion regarding the red part is: As the ciphertext was altered for plaintext modification, this altered block of ciphertext, when passed through the decrypt function will result in a different and altered plaintext block.

Recursively in each block

The thinking that the attacker should have at this time is a thought linked to recursion, now we have a modified ciphertext, which produces a plaintext with a malicious email address (EVE), but this plaintext has wrong bytes in the middle of it. So, with this we have the same situation as at the beginning, A plaintext that we know needs to be altered (to change the bytes that came out wrong to the correct bytes we know from the plaintext)…

In the first example, the attack was carried out byte by byte, “line by line” in python, but the ideal would be a generic function that carried out the attack. Therefore, here begins the phase of developing the algorithm to automatically perform bit-flipping attacks in CBC.

Of course, this would depend on the attacker having access to the entire result of the final plaintext, without this, it is impossible to follow the next steps. That is, in some way the attacker has to be able to send a ciphertext and obtain the result of the decrypt of this ciphertext.

Exploit

To start the automation of the attack, first it is necessary to find a way to find the differences between one set of bytes and another, in our case, between the original plaintext and the modified one, finding these differences, the positions of each byte to be modified should be saved.

Basically what we will do is a diff, and what we will use will be one of the properties of XOR:

\[\large A \oplus A = 0 \\ \large A \oplus B \ne 0\]

Something XORed with itself is always Null, if it is something different from itself, it always results in something different, try to run the following script:

from termcolor import colored
from pwn import xor

def bit_flipping(original_plaintext: bytes, modified_plaintext: bytes):
    diff_positions = []
    for position, byte_ in enumerate(original_plaintext):
        x = xor(original_plaintext[position], modified_plaintext[position]) # xor each byte
        if x != b'\x00': # if the result is not null, then it is different
            diff_positions.append(position)
    # just to visualize what is happening in colors, ignore
    og = ''
    for i, b in enumerate(original_plaintext):
        if i in diff_positions:
            og += colored(chr(b), 'red')
        else:
            og += chr(b)
    mod = ''
    for i, byte_ in enumerate(modified_plaintext):
        if i in diff_positions:
            mod += colored(chr(byte_), 'green')
        else:
            mod += chr(byte_)
            
    print('[+] Original plaintext: ', og)
    print('[+] Modified plaintext: ', mod)
    print('[+] Positions to be modified: ', diff_positions)

Output:

Now we have a way to know what must be changed and what must be kept. Following the developed logic, let’s add to the code an algorithm to change each of these plaintext bytes in these positions by changing the ciphertext in these positions - 16.

NOTE: You may have thought: “But what about the positions to be changed that are less than 16?”. For these positions we would have to change the IV, we will work with this later on, so for now we will take only the positions that are >= 16.

def bit_flipping(original_plaintext: bytes, modified_plaintext: bytes, ciphertext: bytes):
    diff_positions = []
    for position, byte_ in enumerate(original_plaintext):
        x = xor(original_plaintext[position], modified_plaintext[position]) # xor each byte
        if x != b'\x00': # if the result is not null, then it is different
            diff_positions.append(position)
    
    ciphertext = list(ciphertext)
    diff_positions.reverse()
    diff_positions = [position for position in diff_positions if position >= 16]
    
    for position in diff_positions:
        ciphertext[position - 16] = original_plaintext[position] ^ ciphertext[position - 16] ^ modified_plaintext[position] 
    
    ciphertext = bytes(ciphertext)
    return ciphertext
    
original_plaintext = b'DOGECOIN transaction: 100 DOGE to [email protected]'
mod_plaintext =      b'DOGECOIN transaction: 100 DOGE to [email protected]'

ciphertext = encrypt(key, iv, original_plaintext)
new_ciphertext = bit_flipping(original_plaintext, mod_plaintext, ciphertext)

print(decrypt(key, iv, new_ciphertext))

The function takes positions and changes the ciphertext of the previous block using the formula $C_{i-1}’= P_i \oplus C_{i-1} \oplus P_i’$ (line 13), and at the end returns the ciphertext.

When executed, the output is:

b'DOGECOIN transac\xc9L\xba\xf3l\xdatY\xb4\x94\xcd{\xef%+po [email protected]'

Exactly what we had before, now to play a little more, try to change more things from the last block of plaintext, using a mod_plaintext = b'DOGECOIN transaction: 100 DOGE to [email protected]' (changing the domain of the email) we get: b'DOGECOIN transac]p\xc7P\xb6?)\xb6\xbb\x8av\x8c\xb68\x01\xe9o [email protected]'. You can see that it was changed.

Now we have a part of the exploit, a function capable of identifying the bytes to be changed and performing the attack automatically, but we still have some problems, the final plaintext is still wrong, now let’s apply the recursion that was mentioned in the previous topics.

def bit_flipping(original_plaintext: bytes, modified_plaintext: bytes, ciphertext: bytes):
    diff_positions = []
    for position, byte_ in enumerate(original_plaintext):
        x = xor(original_plaintext[position], modified_plaintext[position]) # xor each byte
        if x != b'\x00': # if the result is not null, then it is different
            diff_positions.append(position)
    
    ciphertext = list(ciphertext)
    diff_positions.reverse()
    diff_positions = [position for position in diff_positions if position >= 16]
    
    for position in diff_positions:
        ciphertext[position - 16] = original_plaintext[position] ^ ciphertext[position - 16] ^ modified_plaintext[position] 
    
    ciphertext = bytes(ciphertext) # this ciphertext is wrong

    # recursively call the function until the modified plaintext is equal to the plaintext

    mod_final_plaintext = decrypt(key, IV, ciphertext) # the attacker needs to be able to decrypt the ciphertext
    new_ciphertext = ciphertext ## need to change the ciphertext again
    
    while mod_final_plaintext[16:] != modified_plaintext[16:]:
        new_ciphertext = bit_flipping(mod_final_plaintext, modified_plaintext, new_ciphertext)
        mod_final_plaintext = decrypt(key, IV, new_ciphertext)

    return new_ciphertext

To use recursion to our advantage, what the algorithm does is to consider the modified ciphertext that generates the strange plaintext, as a new ciphertext to be modified, so we run the function recursively (line 22) using as target the new_ciphertext (this ciphertext that generates the wrong plaintext). Note that even in the while only the 16th byte onwards is taken, this is because we have not yet started to mess with the IV.

After executing, we get:

b'\xc6\xd1\xad=\xabF\xd6\xcbE\r\x0b\xfe\xa8\x0b<Ftion: 100 DOGE to [email protected]'

It worked! Now the second block that was leaving wrong already appears, to play a little more, we can modify the transaction value to 999. Running it gives us:

b'\x08\xd4\xf1\x92\x90\x16\xd4\x07\x8c\xaa\xc4\xc9\x9c\xad\xd28tion: 999 DOGE to [email protected]'

Note that we still have a final plaintext that has wrong bytes, those belonging to the first block, how to fix it? For this we will have to enter a variation of the bitflipping attack, where the attacker can modify the IV used in the decrypt function. By changing the IV, we will be able to fix the first block, and thus we will have a 100% clean plaintext.

And how to change the IV? Using the general equation we come to:

\[\large C_{1-1}' = C_0' = IV' = P_1 \oplus IV \oplus P_1' \\ \large C_0 = IV\]

In order to apply this in our algorithm, we have to modify our function by adding two new arguments, the changeiv and the iv.

def bit_flipping(original_plaintext: bytes, modified_plaintext: bytes, ciphertext: bytes, changeiv=False, iv=None):
    diff_positions = []
    for position, byte_ in enumerate(original_plaintext):
        x = xor(original_plaintext[position], modified_plaintext[position]) # xor each byte
        if x != b'\x00': # if the result is not null, then it is different
            diff_positions.append(position)
    
    ciphertext = list(ciphertext)
    diff_positions.reverse()
    diff_positions = [position for position in diff_positions if position >= 16]
    
    for position in diff_positions:
        ciphertext[position - 16] = original_plaintext[position] ^ ciphertext[position - 16] ^ modified_plaintext[position] 
    
    ciphertext = bytes(ciphertext) # this ciphertext is wrong

    # recursively call the function until the modified plaintext is equal to the plaintext

    mod_final_plaintext = decrypt(key, IV, ciphertext) # the attacker needs to be able to decrypt the ciphertext
    new_ciphertext = ciphertext ## need to change the ciphertext again
    
    while mod_final_plaintext[16:] != modified_plaintext[16:]:
        new_ciphertext = bit_flipping(mod_final_plaintext, modified_plaintext, new_ciphertext)
        mod_final_plaintext = decrypt(key, IV, new_ciphertext)

    if changeiv == True:
        # the firts 16 bytes of our modified plaintext are wrong, so we need to change the iv, lets get exactly the first 16 bytes wrongs positions of the plaintext
        # like diff again      
        wrong_positions = []
        for position, byte_ in enumerate(mod_final_plaintext[:16]):
            x = xor(mod_final_plaintext[position], modified_plaintext[position])
            if x != b'\x00':
                wrong_positions.append(position)

        # iv to change   
        new_iv = list(iv)
        for wrong_position in wrong_positions:
            new_iv[wrong_position] = mod_final_plaintext[wrong_position] ^ iv[wrong_position] ^ modified_plaintext[wrong_position]
        new_iv = bytes(new_iv)

        # return a tuple contaning the new ciphertext and the new iv
        return new_ciphertext, new_iv

    return new_ciphertext
    
original_plaintext = b'DOGECOIN transaction: 100 DOGE to [email protected]'
mod_plaintext =      b'DOGECOIN transaction: 999 DOGE to [email protected]'

ciphertext = encrypt(key, iv, original_plaintext)
new_ciphertext, new_iv = bit_flipping(original_plaintext, mod_plaintext, ciphertext, changeiv=True, iv=iv)

print(decrypt(key, new_iv, new_ciphertext))

When checking if it is necessary to change the iv (line 26), a part of the diff is first executed, to get the exact positions of what needs to be changed, the application of the equation occurs on line 38, where the new_iv is modified using the equation. Note that there is nothing like position - 16, this is because the IV is another thing, the part of the ciphertext, what we do is just reflect the position to it, since both the first block of the ciphertext and the IV have 16 bytes of size. After execution, the result is: b'DOGECOIN transaction: 999 DOGE to [email protected]'

Now yes, a perfect final plaintext! As usual, to play around, try to change the mod_plaintext to only empty spaces…

original_plaintext = b'DOGECOIN transaction: 100 DOGE to [email protected]'
mod_plaintext =      b'                                                '

ciphertext = encrypt(key, iv, original_plaintext)
new_ciphertext, new_iv = bit_flipping(original_plaintext, mod_plaintext, ciphertext, changeiv=True, iv=iv)

print(decrypt(key, new_iv, new_ciphertext))

Executing…

image

It works too, this proves that now it is possible to make any kind of change.

Conclusion

After some research, I didn’t find many attacks or vulnerabilities that specifically involve bit-flipping in CBC, I believe that so far it is a theoretical vulnerability, which is usually found in CTF cryptography challenges. I found difficulty in researching the topic due to its complexity, and because many articles are focused on specific CTF challenges, so I decided to write this article to address the topic in a general way, without CTF challenges involved, bring the theory and mathematical operations seeking to address an example and carry it to the end with the aim of building a generic algorithm/exploit for any situation (within the variations of the attack).

It is important to mention that this article was constructed from personal research and study, therefore, it may contain errors, if you find any, please let me know so I can correct it. If this happens, send an email to: [email protected] or open an issue in the repository of this blog.

Thanks for reading!

A deep look into Cipher Block Chaining (CBC) Algorithm Bit Flipping

17 November 2023 at 03:00

A motivação para esse artigo surgiu após uma longa jornada tentando estudar um ataque de bit flipping na cifra de modo CBC (Cipher Block Chaining). A conclusão da minha jornada inicial nesse estudo se deu após a construção de um desafio de CTF que eu criei para um evento da Boitatech. Esse artigo contemplará a teoria por trás do ataque.

English version: here

Primitivas

Antes de iniciar acredito que algumas leituras ou conhecimentos interessantes para acompanhar toda a lógica sejam as ideias gerais de criptografia, propriedades das operações XOR e claro, para a resolução dos exemplos, programação em Python.

Como descrito na wikipedia sobre modos de operação incluindo o CBC, eles garantem apenas a confidencialidade da mensagem, porém a intergidade é afetada. Isso é bem importante tendo em vista o ataque de bit-flipping que explora justamente essa propriedade.

AES CBC

O modo CBC é um dos modos de operação do AES (Advanced Encryption Standard) um algoritimo de criptografia simétrica, isso significa que a mesma chave é usada tanto para criptografar como descriptografar os dados.

Encryption

O CBC como o nome diz se refere a operação do algoritimo, que divide o plaintext em blocos, e cada bloco é sempre considerado na operação do bloco seguinte, a imagem a seguir mostra o processo.

Como o primeiro bloco não possui um bloco anterior, o que é usado é o Initialization Vector, mais conhecido como IV, então é feita uma operação XOR sendo o 1° bloco de plaintext $\oplus$ IV, o resultado é enviado para a função AES que resultará no primeiro bloco de ciphertext, note que esse bloco também vai para o XOR do segundo bloco de plaintext e assim o processo se repete com os demais blocos.

Na formula matematica:

\[\large C_i=E_k(P_i \oplus C_{i-1}) \\ \large C_0=IV\]

Decryption

O processo de decrypt segue a lógica inversa do processo de encrypt, o primeiro bloco de ciphertext é utilizado na função de decrypt, e esse resultado é XOREADO com o IV para resultar no plaintext, note que no iníco, o primeiro bloco de ciphertext é jogado para a segunda etapa numa operação XOR com o resultado da função decrypt do segundo bloco. Assim o processo vai se repetindo sempre levando em consderação os blocos anteriores.

Na formula matemática:

\[\large P_i=D_k(C_i) \oplus C_{i-1} \\ \large C_0 = IV\]

A exploração do bit-flipping que será abordada nos próximos tópicos ocorre por problemas na forma que o processo de Decrypt do CBC ocorre, entender ele é crucial para a execução do atque.

Bit flipping basics

Alice and Bob

A ideia de bit-flipping pode ser explicada levando em consideração o cenário da imagem acima, em que existe um canal de comunicação inseguro. Para fins de exemplo levemos em consideração que a Alice enviará uma mensagem para Bob, essa mensagem será criptografada pela Alice portanto, o que será enviado mesmo será o ciphertext, Eve, o atacante interceptará o ciphertext no canal, e realizando o ataque de bit-flipping modificará o conteúdo do plaintext a partir do ciphertext sem conhecer a key, assim, Bob quando realizar o processo de decriptação após receber a mensagem terá um plaintext modificado. A confidencialidade da mensagem foi mantida, visto que ela continuou sendo criptografada e com o seu plaintext confidencial, porém a intergidade foi afetada visto que o plaintext foi alterado por Eve.

TLDR; bit-flipping é alterar o plaintext final

Acredito que as coisas ficarão mais claras durante a exploração e nos exemplos, mas tenha o TLDR em mente, o que estamos falando aqui é sobre modificar o conteúdo de mensagens sem conhecer a chave. Ao longo do artigo ficará claro que existem variações da exploração dependendo do quanto o atacante conhece do plaintext original e outros fatores relacionados a operação de criptografia em questão.

Understanding the problem

Como discutido anteriormente, o ataque de bit-flipping em CBC se dá por conta do processo de decrypt e tem relação com operações XOR.

Analisando a imagem acima temos:

  • Processo de decrypt sendo realizado;
  • Um plaintext/ciphertext de 3 blocos de 16 bytes, 3 * 16 bytes = 48 bytes no total

A fins de exemplo, vamos considerar o seguinte objetivo: Queremos alterar os terceiro, quarto e quinto bytes do terceiro bloco. No caso nós conhecemos o plaintext original.

Para isso será necessário modificar os terceiro, quarto e quinto bytes do segundo bloco. Pois perceba que eles fazem parte da operação XOR: $P_3 = D_k(C_3) \oplus C_2$, sendo:

  • $P_3$ o plaintext do terceiro bloco;
  • $C_2$ o ciphertext do segundo;
  • $D_k(C_3)$ o resultado do decrypt do bloco 3;

O ataque de bit-flipping tem como objetivo alterar o plaintext final, no nosso caso o plaintext alterado será $P_3’$, para isso acontecer, deveremos alterar o ciphertext, no nosso caso o ciphertext do bloco anterior, chamaremos de $C_2’$, segue a lógica:

Acompanhe as operações a seguir:

  • Começando com:
\[\large P_3 = D_k(C_3) \oplus C_2 \\ \large D_k(C_3) = P_3 \oplus C_2\]
  • Considerando um plaintext modificado teriamos:
\[\large P_3'=D_k(C_3) \oplus C_2'\]
  • O fator comum entre as equações é $D_k(C_3)$, portanto substituindo na equação do plaintext modificado, temos:
\[\large P_3'= P_3 \oplus C_2 \oplus C_2' \\ \large C_2' = P_3 \oplus C_2 \oplus P_3'\]

Perceba como ficou a última equação, temos uma definição como sendo ciphertext modificado = plaintext original $\oplus$ ciphertext original anterior $\oplus$ plaintext que o atacante quiser. O mais interessante de tudo é que em nenhum momento precisamos da key ou entender as funções $E_k()$ e $D_k()$ conseguimos chegar a uma equação para realizar o ataque apenas com propriedades do XOR.

Equação geral do bit flipping:

\[\large P_i' = P_i \oplus C_{i-1} \oplus C_{i-1}' \\ \large C_{i-1}' = P_i \oplus C_{i-1} \oplus P_i' \\ \large C_0 = IV ^ * \\ \large C_0' = IV' ^ *\]
  • * : É possível também trabalhar com o IV dependendo do caso, no exemplo do desafio de CTF abordado nesse artigo o ataque envolve modificação do IV

Weponazing

Agora que possuimos uma definição teórica sobre o ataque e o modo de operação CBC, vamos aos exemplos práticos. Para facilitar algumas coisas, usaremos duas bibliotecas do python:

$ pip install pycryptodome pwntools

Iniciaremos o seguinte script para gerar o ciphertext que iremos atacar:

from Crypto.Cipher import AES
from Crypto.Util.Padding import pad, unpad

def encrypt(key, iv, plaintext):
    plaintext = pad(plaintext.encode(), AES.block_size) 
    # add padding
    cipher = AES.new(key, AES.MODE_CBC, iv)
    ciphertext = cipher.encrypt(plaintext)
    return ciphertext

# random generated 16 bytes key
key = bytes.fromhex('63f835d0e3b9c70130797be59e25c00f')
# random generated 16 bytes iv
iv = bytes.fromhex('b05fee43fe4db7c5503b1f6732fedd1b') 
print('Key: ', key.hex())
print('IV: ', iv.hex())

plaintext = 'DOGECOIN transaction: 100 DOGE to [email protected]'
print('Plaintext: ', plaintext)

ciphertext = encrypt(key, iv, plaintext)
print('Ciphertext in bytes:', ciphertext)
print('Ciphertext in hex: ', ciphertext.hex())

Note que o plaintext do exemplo é:

plaintext = 'DOGECOIN transaction: 100 DOGE to [email protected]'

Agora vamos ao cenário do ataque:

Alice realiza uma transação DOGECOIN e envia 100 DOGE para BOB, que possuí o endereço “[email protected]”. Eve, o atacante pretende realizar um ataque de bitflipping e alterar o destino dos bitcoins para “[email protected]”.

Com o cenário do ataque em mente, podemos montar então o plaintext modificado e traçar nossos bytes “alvos”.

mod_plaintext = 'DOGECOIN transaction: 100 DOGE to [email protected]'
#                                                  ^^^          

Nesse exemplo prático coloquei os bytes a serem alterados na mesma posição do exemplo teórico buscando um melhor entendimento. Perceba que a diferença entre o plaintext original e o modificado são apenas o terceiro, quarto e quinto byte do terceiro bloco, então temos:

  • B vira E (terceiro byte do terceiro bloco, 34° do total)
  • O vira V (quarto byte do terceiro bloco, 35° do total)
  • B vira E (quinto byte do terceiro bloco, 36° do total)

Lembrando a equação do bit-flipping apresentada em tópicos anteriores:

\[\large C_{i-1}' = P_i \oplus C_{i-1} \oplus P_i'\]

Buscando transformar da matemática para o Python, precisamos nos atentar que na formula matemática estamos nos referindo aos blocos de ciphertext/plaintext, onde $i$ representa o número do bloco, quando pensamos em Python, está sendo realizado uma operação “byte a byte”, estamos nos referindo exatamente a posição do byte que queremos modificar em relação ao ciphertext/plaintext, portanto aplicando isso, $i$ agora será a posição do byte a ser modificado no plaintext final, agora teremos que buscar essa mesma posição mas relacionada ao bloco anterior, ou seja, voltando 16 bytes para trás, logo $C_{i-1}’$ seria em python mod_ciphertext[i - 16].

target_pos = i
mod_ciphertext[i - 16] = plaintext[i] ^ ciphertext[i - 16] ^ mod_plaintext[i]
# mod_ciphertext = original_byte_plaintext ^ original_ciphertext_in_previous_block ^ malicious_mod_plaintext_byte

Perceba, é uma operação relacionada diretamente aos bytes! Caso você esteja buscando uma operação em python que defina todo esse artigo, está logo acima, se é pra lembrar de algo lembre dela!

Bit-flipping attack

Para aplicar essa operação no script e realizar o ataque, precisamos modificar um pouco o código, adicionando uma função de decrypt (para testar se o ataque funcionou). Note que existe uma key no código, mas apenas para criar o ciphertext e depois realizar o decrypt, ela não é usada no ataque.

from Crypto.Cipher import AES
from Crypto.Util.Padding import pad, unpad

def encrypt(key, iv, plaintext):
    plaintext = pad(plaintext.encode(), AES.block_size) 
    # add padding
    cipher = AES.new(key, AES.MODE_CBC, iv)
    ciphertext = cipher.encrypt(plaintext)
    return ciphertext

def decrypt(key, iv, ciphertext):
    cipher = AES.new(key, AES.MODE_CBC, iv)
    plaintext = cipher.decrypt(ciphertext)
    plaintext = unpad(plaintext, 16) 
    # remove padding
    return plaintext

key = bytes.fromhex('63f835d0e3b9c70130797be59e25c00f')
iv = bytes.fromhex('b05fee43fe4db7c5503b1f6732fedd1b') 

plaintext = 'DOGECOIN transaction: 100 DOGE to [email protected]'
original_plaintext = plaintext
ciphertext = encrypt(key, iv, plaintext)

mod_plaintext = 'DOGECOIN transaction: 100 DOGE to [email protected]'

target_pos1 = 34 # B to E
target_pos2 = 35 # O to V
target_pos3 = 36 # B to E

mod_ciphertext = bytearray(ciphertext)

# change B to E in position 34
mod_ciphertext[target_pos1 - 16] = ord('B') ^ ciphertext[target_pos1 - 16] ^ ord('E') # xor operation
# change O to V in position 35
mod_ciphertext[target_pos2 - 16] = ord('O') ^ ciphertext[target_pos2 - 16] ^ ord('V') # xor operation
# change B to E in position 36
mod_ciphertext[target_pos3 - 16] = ord('B') ^ ciphertext[target_pos3 - 16] ^ ord('E') # xor operation

print('[+] Old ciphertext: ', ciphertext.hex())
print('[+] New ciphertext: ', mod_ciphertext.hex())

print('[-] Decrypting...')

mod_plaintext = decrypt(key, iv, mod_ciphertext)

print('[+] Original plaintext: ', original_plaintext)
print('[+] Modified ciphertext ', mod_plaintext)

E de resultado temos:

b'DOGECOIN transac\xc9L\xba\xf3l\xdatY\xb4\x94\xcd{\xef%+po [email protected]'

Conseguimos alterar o plaintext! O nosso ciphertext malicioso e modificado entrou no decrypt e no final obtivemos o plaintext modificado! Por si só, o ataque foi concluido, mas como pode ser percebido, o plaintext final ficou com certos bytes estranhos, que não condizem com o plaintext original.

Variations

Antes de iniciar o tópico de variações do ataque, gostaria de trazer novamente a imagem que representa o ataque de bit-flipping.

Perceba as cores e idendifique-as no print do terminal que contém o resultado do ataque.

  • Amarelo: Os bytes modificados do ciphertext
  • Verde: O plaintext modificado
  • Vermelho: Os bytes correspondentes do bloco anterior aos bytes do plaintext modificado; o bloco inteiro foi prejudicado, afetando o plaintext final.

A conclusão em relação a parte de cor vermelha é: Como o ciphertext foi alterado para modificação do plaintext, esse bloco alterado do ciphertext, quando passar pela função de decrypt resultará em um bloco plaintext diferente e alterado.

Recursively in each block

O pensamento que o atacante deverá ter nessa hora é um pensamento ligado a recursividade, agora temos um ciphertext modificado, que produz um plaintext com um endereço de e-mail malicioso (EVE), porém esse plaintext possui bytes errados no meio dele. Pois então, com isso temos a mesma situação do ínicio, Um plaintext que conhecemos que precisa ser alterado (alterar os bytes que sairam errado para os bytes certos que conhecemos do plaintext)…

No primeiro exemplo, o ataque foi realizado em cada byte, “linha por linha” no python, mas o ideal seria uma função genérica que fizesse o ataque. Portanto, aqui se inicia a fase do desenvolvimento do algoritimo para realizar ataques de bit-flipping em CBC automaticamente.

É claro que isso dependeria do atacante ter acesso ao resultado inteiro do plaintext final, sem isso, é impossível seguir os próximos passos. Ou seja, de alguma forma o atacante tem que ser capaz de enviar um ciphertext e obter o resultado do decrypt desse ciphertext.

Exploit

Para iniciar a automação do ataque, primeiro é necessário encontrar uma forma de encontrar as diferenças entre um conjunto de bytes e outro, no nosso caso, entre o plaintext original e o modificado, encontrando essas diferenças, as posições de cada byte a ser modificado devem ser guardadas.

Basicamente o que faremos é um diff, e o que usaremos será uma das propriedades do XOR:

\[\large A \oplus A = 0 \\ \large A \oplus B \ne 0\]

Algo XOREADO com ele mesmo é sempre Null, se for algo diferente dele mesmo, sempre resulta em algo diferente, tente executar o script a seguir:

from termcolor import colored
from pwn import xor

def bit_flipping(original_plaintext: bytes, modified_plaintext: bytes):
    diff_positions = []
    for position, byte_ in enumerate(original_plaintext):
        x = xor(original_plaintext[position], modified_plaintext[position]) # xor each byte
        if x != b'\x00': # if the result is not null, then it is different
            diff_positions.append(position)
    # just to visualize what is happening in colors, ignore
    og = ''
    for i, b in enumerate(original_plaintext):
        if i in diff_positions:
            og += colored(chr(b), 'red')
        else:
            og += chr(b)
    mod = ''
    for i, byte_ in enumerate(modified_plaintext):
        if i in diff_positions:
            mod += colored(chr(byte_), 'green')
        else:
            mod += chr(byte_)
            
    print('[+] Original plaintext: ', og)
    print('[+] Modified plaintext: ', mod)
    print('[+] Positions to be modified: ', diff_positions)

Output:

Agora temos uma forma de saber o que deve ser alterado e o que deve ser mantido. Seguindo a lógica desenvolvida, vamos adicionar no código um algoritimo para alterar cada um desses bytes de plaintext nessas posições alterando o ciphertext nessas posições - 16.

OBS: Talvez você tenha pensado: “Mas e as posições a serem alteradas que forem menores que 16?”. Para essas posições teríamos que alterar o IV, vamos trabalhar com isso mais tarde, portando, por enquanto vamos pegar apenas as posições que forem >= que 16.

def bit_flipping(original_plaintext: bytes, modified_plaintext: bytes, ciphertext: bytes):
    diff_positions = []
    for position, byte_ in enumerate(original_plaintext):
        x = xor(original_plaintext[position], modified_plaintext[position]) # xor each byte
        if x != b'\x00': # if the result is not null, then it is different
            diff_positions.append(position)
    
    ciphertext = list(ciphertext)
    diff_positions.reverse()
    diff_positions = [position for position in diff_positions if position >= 16]
    
    for position in diff_positions:
        ciphertext[position - 16] = original_plaintext[position] ^ ciphertext[position - 16] ^ modified_plaintext[position] 
    
    ciphertext = bytes(ciphertext)
    return ciphertext
    
original_plaintext = b'DOGECOIN transaction: 100 DOGE to [email protected]'
mod_plaintext =      b'DOGECOIN transaction: 100 DOGE to [email protected]'

ciphertext = encrypt(key, iv, original_plaintext)
new_ciphertext = bit_flipping(original_plaintext, mod_plaintext, ciphertext)

print(decrypt(key, iv, new_ciphertext))

A função pega posições e altera o ciphertext do bloco anterior usando a fórmula $C_{i-1}’= P_i \oplus C_{i-1} \oplus P_i’$ (linha 13), e ao final retorna o ciphertext.

Executando temos como output:

b'DOGECOIN transac\xc9L\xba\xf3l\xdatY\xb4\x94\xcd{\xef%+po [email protected]'

Exatamente o que tinhamos antes, agora para brincar mais um pouco, experimente alterar mais coisas do último bloco do plaintext, usando um mod_plaintext = b'DOGECOIN transaction: 100 DOGE to [email protected]' (alterando do dominio do e-mail) temos: b'DOGECOIN transac]p\xc7P\xb6?)\xb6\xbb\x8av\x8c\xb68\x01\xe9o [email protected]'. Pode ver que foi alterado.

Agora temos uma parte do exploit, uma função capaz de identificar os bytes a serem alterados e realizar o ataque de forma automática, mas ainda temos alguns problemas, o plaintext final ainda fica errado, agora sim vamos aplicar a recursividade que foi falada nos tópicos anteriores.

def bit_flipping(original_plaintext: bytes, modified_plaintext: bytes, ciphertext: bytes):
    diff_positions = []
    for position, byte_ in enumerate(original_plaintext):
        x = xor(original_plaintext[position], modified_plaintext[position]) # xor each byte
        if x != b'\x00': # if the result is not null, then it is different
            diff_positions.append(position)
    
    ciphertext = list(ciphertext)
    diff_positions.reverse()
    diff_positions = [position for position in diff_positions if position >= 16]
    
    for position in diff_positions:
        ciphertext[position - 16] = original_plaintext[position] ^ ciphertext[position - 16] ^ modified_plaintext[position] 
    
    ciphertext = bytes(ciphertext) # this ciphertext is wrong

    # recursively call the function until the modified plaintext is equal to the plaintext

    mod_final_plaintext = decrypt(key, IV, ciphertext) # the attacker needs to be able to decrypt the ciphertext
    new_ciphertext = ciphertext ## need to change the ciphertext again
    
    while mod_final_plaintext[16:] != modified_plaintext[16:]:
        new_ciphertext = bit_flipping(mod_final_plaintext, modified_plaintext, new_ciphertext)
        mod_final_plaintext = decrypt(key, IV, new_ciphertext)

    return new_ciphertext

Para usar a recursividade ao nosso favor, o que o algoritimo faz é encarar o ciphertext modificado que gera o plaintext estranho, como um novo ciphertext a ser modificado, então rodamos recursivamente a função (linha 22) usando como alvo o new_ciphertext (esse ciphertext que gera o plaintext errado). Note que ainda no while é pego apenas do 16 byte para frente, isso pois ainda não começamos a mexer no IV.

Após executar temos: b'\xc6\xd1\xad=\xabF\xd6\xcbE\r\x0b\xfe\xa8\x0b<Ftion: 100 DOGE to [email protected]'

Deu certo! Agora o segundo bloco que estava vindo errado ja aparece, para brincar um pouco podemos modificar por exemplo o valor da transação para 999. Rodando temos:

b'\x08\xd4\xf1\x92\x90\x16\xd4\x07\x8c\xaa\xc4\xc9\x9c\xad\xd28tion: 999 DOGE to [email protected]'

Note que ainda temos um plaintext final que tem bytes errados, os pertencentes ao primeiro bloco, como arrumar? Para isso teremos que entrar em uma variação do ataque de bitflipping, a que o atacante consegue modificar o IV utilizado na função de decrypt. Alterando o IV, poderemos arrumar o primeiro bloco, e assim teremos um plaintext 100% limpo.

E como alterar o IV? Usando a equação geral chegamos em:

\[\large C_{1-1}' = C_0' = IV' = P_1 \oplus IV \oplus P_1' \\ \large C_0 = IV\]

A fim de aplicar isso no nosso algoritimo temos que modificar nossa função adicionando dois novos argumentos, o changeiv e o iv.

def bit_flipping(original_plaintext: bytes, modified_plaintext: bytes, ciphertext: bytes, changeiv=False, iv=None):
    diff_positions = []
    for position, byte_ in enumerate(original_plaintext):
        x = xor(original_plaintext[position], modified_plaintext[position]) # xor each byte
        if x != b'\x00': # if the result is not null, then it is different
            diff_positions.append(position)
    
    ciphertext = list(ciphertext)
    diff_positions.reverse()
    diff_positions = [position for position in diff_positions if position >= 16]
    
    for position in diff_positions:
        ciphertext[position - 16] = original_plaintext[position] ^ ciphertext[position - 16] ^ modified_plaintext[position] 
    
    ciphertext = bytes(ciphertext) # this ciphertext is wrong

    # recursively call the function until the modified plaintext is equal to the plaintext

    mod_final_plaintext = decrypt(key, IV, ciphertext) # the attacker needs to be able to decrypt the ciphertext
    new_ciphertext = ciphertext ## need to change the ciphertext again
    
    while mod_final_plaintext[16:] != modified_plaintext[16:]:
        new_ciphertext = bit_flipping(mod_final_plaintext, modified_plaintext, new_ciphertext)
        mod_final_plaintext = decrypt(key, IV, new_ciphertext)

    if changeiv == True:
        # the firts 16 bytes of our modified plaintext are wrong, so we need to change the iv, lets get exactly the first 16 bytes wrongs positions of the plaintext
        # like diff again      
        wrong_positions = []
        for position, byte_ in enumerate(mod_final_plaintext[:16]):
            x = xor(mod_final_plaintext[position], modified_plaintext[position])
            if x != b'\x00':
                wrong_positions.append(position)

        # iv to change   
        new_iv = list(iv)
        for wrong_position in wrong_positions:
            new_iv[wrong_position] = mod_final_plaintext[wrong_position] ^ iv[wrong_position] ^ modified_plaintext[wrong_position]
        new_iv = bytes(new_iv)

        # return a tuple contaning the new ciphertext and the new iv
        return new_ciphertext, new_iv

    return new_ciphertext
    
original_plaintext = b'DOGECOIN transaction: 100 DOGE to [email protected]'
mod_plaintext =      b'DOGECOIN transaction: 999 DOGE to [email protected]'

ciphertext = encrypt(key, iv, original_plaintext)
new_ciphertext, new_iv = bit_flipping(original_plaintext, mod_plaintext, ciphertext, changeiv=True, iv=iv)

print(decrypt(key, new_iv, new_ciphertext))

Ao checar se é necessário mudar o iv (linha 26), primeiro é executado a parte do diff, para conseguir as posições exatas do que deve ser mudado, a aplicação da equação ocorre na linha 38, onde o new_iv é modificado utilizando a equação. Note que não é utilizado algo como position - 16, isso por que o IV é outra coisa, a parte do ciphertext, o que fazemos é apenas refletir a posição para ele, ja que ambos o primeiro bloco do ciphertext e o IV possuem 16 byres de tamanho. Após a execução temos como resultado: b'DOGECOIN transaction: 999 DOGE to [email protected]'

Agora sim, um plaintext final perfeito! Como de costume, para brincar, experimente mudar o mod_plaintext para apenas espaços vazios…

original_plaintext = b'DOGECOIN transaction: 100 DOGE to [email protected]'
mod_plaintext =      b'                                                '

ciphertext = encrypt(key, iv, original_plaintext)
new_ciphertext, new_iv = bit_flipping(original_plaintext, mod_plaintext, ciphertext, changeiv=True, iv=iv)

print(decrypt(key, new_iv, new_ciphertext))

Executando…

image

Da certo também, isso prova que agora é possível fazer qualquer tipo de alteração.

Conclusion

Após algumas pesquisas, não encontrei muitos ataques ou vulnerabilidades que envolvam especificamente bit-flipping em CBC, acredito que até então se trate de uma vulnerabilidade teórica, que normalmente é encontrada em desafios de CTF de criptografia. Encontrei dificuldade em pesquisar sobre o tema pela complexidade dele, e pelo fato de muitos artigos estarem focados em desafios de CTF especificos, então decidi escrever esse artigo para abordar o tema de forma geral, sem desafios de CTF envolvidos, trazer a teoria e as operações matemáticas buscando abordar um exemplo e carrega-lô até o final com o objetivo de construir um algoritimo/exploit genérico para qualquer situação (dentro das variações do ataque).

É importante ressaltar que esse artigo foi construido a partir de uma pesquisa e estudo pessoal, portanto, pode conter erros, caso encontre algum, por favor, me avise para que eu possa corrigir. Caso isso aconteça, mande um e-mail para: [email protected] ou abra uma issue no repositório desse blog.

Obrigado por ler!

Public Report – WhatsApp Auditable Key Directory (AKD) Implementation Review

14 November 2023 at 20:59

In August 2023, Meta engaged NCC Group’s Cryptography Services practice to perform an implementation review of their Auditable Key Directory (AKD) library, which provides an append-only directory of public keys mapped to user accounts and a framework for efficient cryptographic validation of this directory by an auditor. The library is being leveraged to provide an AKD for WhatsApp and is meant to serve as a reference implementation for auditors of the WhatsApp AKD, as well as to allow other similar services to implement key transparency. The review was performed remotely by 3 consultants over a two-week period with a total of 20 person-days spent. The project concluded with a retest phase a few weeks after the original engagement that confirmed all findings were fixed.

Public Report – Zcash FROST Security Assessment

23 October 2023 at 12:06

In Summer 2023, the Zcash Foundation engaged NCC Group to conduct a security
assessment of the Foundation’s FROST threshold signature implementation, based on the
paper FROST: Flexible Round-Optimized Schnorr Threshold Signatures. This project
implements v12 of the draft FROST specification in Rust, with a variety of options available
for underlying elliptic curve groups. The review was performed by three consultants over 25 person-days of effort. The project concluded with a retest phase a few weeks after the original engagement that confirmed all findings were fixed.

On Multiplications with Unsaturated Limbs

18 September 2023 at 17:04

This post is about a rather technical coding strategy choice that arises when implementing cryptographic algorithms on some elliptic curves, namely how to represent elements of the base field. We will be discussing Curve25519 implementations, in particular as part of Ed25519 signatures, as specified in RFC 8032. The most widely used Rust implementation of these operations is the curve25519-dalek library. My own research library is crrl, also written in plain Rust (no assembly); it is meant for research purposes, but I write it using all best practices for production-level implementations, e.g. it is fully constant-time and offers an API amenable to integration into various applications.

The following table measures performance of Ed25519 signature generation and verification with these libraries, using various backend implementations for operations in the base field (integers modulo 2255 – 19), on two test platforms (64-bit x86, and 64-bit RISC-V):

Implementationx86 (Intel “Coffee Lake”)RISC-V (SiFive U74)
LibraryBackendsignverifysignverify
crrlm6449130108559202021412764
m5170149148653158928304902
curve25519-daleksimd59553116243
serial59599169621180142449980
fiat70552198289187220429755
Ed25519 performance (in clock cycles), on x86 and RISC-V.

Test platforms are the following:

  • x86: an Intel Core i5-8259U CPU, running at 2.30 GHz (TurboBoost is disabled). This uses “Coffee Lake” cores (one of the late variants of the “Skylake” line). Operating system is Linux (Ubuntu 22.04), in 64-bit mode.
  • RISC-V: a StarFive VisionFive2 board with a StarFive JH7110 CPU, running at 1 GHz. The CPU contains four SiFive U74 cores and implements the I, M, C, Zba and Zbb architecture extensions (and some others which are not relevant here). Operating system is again Linux (Ubuntu 23.04), in 64-bit mode.

In both cases, the current Rust “stable” version is used (1.72.0, from 2023-08-23), and compilation uses the environment variable RUSTFLAGS=”-C target-cpu=native” to allow the compiler to use all opcodes supported by the current platform. The computation is performed over a single core, with measurements averaged over randomized inputs. The CPU cycle counter is used. Figures above are listed with many digits, but in practice there is a bit of natural variance due to varying inputs (signature verification is not constant-time, since it uses only public data) and, more generally, because of the effect of various operations also occurring within the CPU (e.g. management mode, cache usage from other cores, interruptions from hardware…), so that the measured values should be taken with a grain of salt (roughly speaking, differences below about 3% are not significant).

crrl and curve25519-dalek differ a bit in how they use internal tables to speed up computations; in general, crrl tables are smaller, and crrl performs fewer point additions but more point doublings. For signature verification, crrl implements the Antipa et al optimization with Lagrange’s algorithm for lattice basis reduction, but curve25519-dalek does not. The measurements above show that crrl’s strategy works (i.e. it is a tad faster than curve25519-dalek) (note: not listed above is the fact that curve25519-dalek supports batch signature verification with a substantially lower per-signature cost; crrl does not implement that feature yet). The point of this post is not to boast about how crrl is faster; its good performance should be taken as an indication that it is decently optimized and thus a correct illustration of the effect of its implementation strategy choices. Indeed, the interesting part is how the different backends compare to each other, on the two tested architectures.

curve25519-dalek has three backends:

  • serial: Field elements are split over 5 limbs of 51 bits; that is, value x is split into five values x0 to x4, such that x = x0 + 251x1 + 2102x2 + 2153x3 + 2204x4. Importantly, limb values are held in 64-bit words and may somewhat exceed 251 (within some limits, to avoid overflows during computations). The representation is redundant, in that a given field element x accepts many different representations; a normalization step is applied when necessary (e.g. when serializing curve points into bytes).
  • fiat: The fiat backend is a wrapper around the fiat-crypto library, which uses basically the same implementation strategy as the serial backend, but through automatic code generation that includes a correctness proof. In other words, the fiat backend is guaranteed through the magic of mathematics to always return the correct result, while in all other library backends listed here, the guarantee is “only” through the non-magic of code auditors (including myself) poring over the code for hours in search of issues, and not finding any (in practice all the code referenced here is believed correct).
  • simd: AVX2 opcodes are used to perform arithmetic operations on four field elements in parallel; each element is split over ten limbs of 25 and 26 bits each. curve25519-dalek selects that backend whenever possible, i.e. on x86 systems which have AVX2 (such as an Intel “Coffee Lake”), but of course it is not available on RISC-V.

crrl has two backends:

  • m51: A “51-bit limbs” backend similar to curve25519-dalek’s “serial”, though with somewhat different choices for the actual operations (this is detailed later on).
  • m64: Field elements are split over four 64-bit limbs, held in 64-bit types. By nature, such limbs cannot exceed their 64-bit range. The representation is still slightly redundant in that overall values may use the complete 256-bit range, so that each field element has two or three possible representations (a final reduction modulo 2255 – 19 is performed before serializing).

The “m64” backend could be deemed to be the most classical, in that such a representation would be what was preferred for big integer computations in, say, the 1990s. It minimizes the number of multiplication opcode invocations during a field element multiplication (with two 4-limb operands, 16 register multiplications are used), but also implies quite a lot of carry propagation. See for instance this excerpt of the implementation of field element multiplication in crrl’s m64 backend:

    let (e0, e1) = umull(a0, b0);
let (e2, e3) = umull(a1, b1);
let (e4, e5) = umull(a2, b2);
let (e6, e7) = umull(a3, b3);

let (lo, hi) = umull(a0, b1);
let (e1, cc) = addcarry_u64(e1, lo, 0);
let (e2, cc) = addcarry_u64(e2, hi, cc);
let (lo, hi) = umull(a0, b3);
let (e3, cc) = addcarry_u64(e3, lo, cc);
let (e4, cc) = addcarry_u64(e4, hi, cc);
let (lo, hi) = umull(a2, b3);
let (e5, cc) = addcarry_u64(e5, lo, cc);
let (e6, cc) = addcarry_u64(e6, hi, cc);
let (e7, _) = addcarry_u64(e7, 0, cc);

let (lo, hi) = umull(a1, b0);
let (e1, cc) = addcarry_u64(e1, lo, 0);
let (e2, cc) = addcarry_u64(e2, hi, cc);
let (lo, hi) = umull(a3, b0);
let (e3, cc) = addcarry_u64(e3, lo, cc);
let (e4, cc) = addcarry_u64(e4, hi, cc);
let (lo, hi) = umull(a3, b2);
let (e5, cc) = addcarry_u64(e5, lo, cc);
let (e6, cc) = addcarry_u64(e6, hi, cc);
let (e7, _) = addcarry_u64(e7, 0, cc);

The addcarry_u64() calls above implement the “add with carry” operation, which, on x86, map to the ADC opcode (or, on that core, the ADCX or ADOX opcodes).

When Ed25519 signatures were invented, in 2011, the Intel CPUs du jour (Intel “Westmere” core) were not very good at carry propagation; they certainly supported the ADC opcode, but with a relatively high latency (2 cycles), and that made the classical code somewhat slow. The use of 51-bit limbs allowed a different code, which, in curve25519-dalek’s serial backend, looks like this:

    let b1_19 = b[1] * 19;
let b2_19 = b[2] * 19;
let b3_19 = b[3] * 19;
let b4_19 = b[4] * 19;

// Multiply to get 128-bit coefficients of output
let c0: u128 = m(a[0], b[0]) + m(a[4], b1_19) + m(a[3], b2_19) + m(a[2], b3_19) + m(a[1], b4_19);
let mut c1: u128 = m(a[1], b[0]) + m(a[0], b[1]) + m(a[4], b2_19) + m(a[3], b3_19) + m(a[2], b4_19);
let mut c2: u128 = m(a[2], b[0]) + m(a[1], b[1]) + m(a[0], b[2]) + m(a[4], b3_19) + m(a[3], b4_19);
let mut c3: u128 = m(a[3], b[0]) + m(a[2], b[1]) + m(a[1], b[2]) + m(a[0], b[3]) + m(a[4], b4_19);
let mut c4: u128 = m(a[4], b[0]) + m(a[3], b[1]) + m(a[2], b[2]) + m(a[1], b[3]) + m(a[0] , b[4]);

This code excerpt computes the result over five limbs which can now range over close to 128 bits, and some extra high part propagation (not shown above) is needed to shrink limbs down to 51 bits or so. As we see here, there are now 25 individual multiplications (the m() function), since there are five limbs per input. There still are ADC opcodes in there! They are somewhat hidden behind the additions: these additions are over Rust’s u128 type, a 128-bit integer type that internally uses two registers, so that each addition implies one ADC opcode. However, these carry propagations can occur mostly in parallel (there are five independent dependency chains here), and that maps well to the abilities of a Westmere core. On such cores, the “serial” backend is faster than crrl’s m64. However, in later x86 CPUs from Intel (starting with the Haswell core), support for add-with-carry opcodes was substantially improved, and the classical method with 64-bit limbs again gained the upper hand. This was already noticed by Nath and Sarkar in 2018 and this explains why crrl’s m64 backend, on our x86 test system, is faster than curve25519-dalek’s serial and fiat backends, and even a bit faster than the AVX2-powered simd backend.

RISC-V

Now enters the RISC-V platform. RISC-V is an open architecture which has been designed with what can be viewed as “pure RISC philosophy”, with a much reduced instruction set. It is inspired from the older DEC Alpha, including in particular a large number of integer registers (32), one of which being fixed to the value zero, and, most notably, no carry flag at all. An “add-with-carry” operation, which adds together two 64-bit inputs x and y and an input carry c, and outputs a 64-bit result z and an output carry d, now requires no fewer than five instructions:

  1. Add x and y, into z (ADD).
  2. Compare z to x (SLTU): if z is strictly lower, then the addition “wrapped around”; the comparison output (0 or 1) is written into d.
  3. Add c to z (ADD).
  4. Compare z to c (SLTU) for another potential “wrap around”, with a 0 or 1 value written into another register t.
  5. Add t to d (ADD).

(I cannot prove that it is not doable in fewer RISC-V instructions; if there is a better solution please tell me.)

Thus, the add-with-carry is not only a high-latency sequence, but it also requires quite a few instructions, and instruction throughput may be a bottleneck. Out test platform (SiFive U74 core) is not well documented, but some cursory tests show the following:

  • Multiplication opcodes have a throughput of one per cycle, and a latency of three cycles (this seems constant-time). As per the RISC-V specification (“M” extension), a 64×64 multiplication with a 128-bit result requires two separate opcodes (MUL returns the low 64 bits of the result, MULHU returns the high 64 bits). There is a recommended code sequence for when the two opcodes relate to the same operands, but this does not appear to be leveraged by this particular CPU.
  • For “simple” operations such as ADD or SLTU, the CPU may schedule up to two instructions in the same cycle, but the exact conditions for this to happen are unclear, and each instruction still has a 1-cycle latency.

Under such conditions, a 5-instruction add-with-carry will need a minimum of 2.5 cycles (in terms of throughput). The main output (z) is available with a latency of 2 cycles, but the output carry has a latency of 4 cycles. A “partial” add-with-carry with no input carry is cheaper (an ADD and a SLTU), and so is an add-with-carry with no output carry (two ADDs), but these are still relatively expensive. The high latency is similar to the Westmere situation, but the throughput cost is new. For that RISC-V platform, we need to avoid not only long dependency chains of carry propagation, but we should also endeavour to do fewer carry propagations. Another operation which is similarly expensive is the split of a 115-bit value (held in a 128-bit variable) into a low (51 bits) and high (64 bits) parts. The straightforward Rust code looks like this (from curve25519-dalek):

    let carry: u64 = (c4 >> 51) as u64;
out[4] = (c4 as u64) & LOW_51_BIT_MASK;

On x86, the 128-bit value is held in two registers; the low part is a simple bitwise AND with a constant, and the high part is extracted with a single SHLD opcode, that can extract a chunk out of the concatenation of two input registers. On RISC-V, there is no shift opcode with two input registers (not counting the shift count); instead, the extraction of the high part (called carry in the code excerpt above) requires three instructions: two single-register shifts (SHR, SHL) and one bitwise OR to combine the results.

In order to yield better performance on RISC-V, the crrl m51 backend does things a bit differently:

    let a0 = a0 << 6;
let b0 = b0 << 7;
// ...
let (c00, h00) = umull(a0, b0);
let d0 = c00 >> 13;

Here, the input limbs are pre-shifted (by 6 or 7 bits) so that the products are shifted by 13 bits. In that case, the boundary between the low and high parts falls exactly on the boundary between the two registers that receive the multiplication result; the extraction of the high part becomes free! The low part is obtained with a single opcode (a right shift of the low register by 13 bits). Moreover, instead of performing 128-bit additions, crrl’s m51 code adds the low and high parts separately:

    let d0 = c00 >> 13;
let d1 = (c01 >> 13)
+ (c10 >> 13);
let d2 = (c02 >> 13)
+ (c11 >> 13)
+ (c20 >> 13);
// ...
let h0 = h00;
let h1 = h01 + h10;
let h2 = h02 + h11 + h20;

In that way, all add-with-carry operations are avoided. This makes crrl’s m51 code somewhat slower than curve2519-dalek’s serial backend on x86, but it quite improves the performance on RISC-V.

Conclusion

The discussion above is about a fairly technical point. In the grand scheme of things, the differences in performance between the various implementation strategies is not great; there are not many usage contexts where a speed difference of less than 30% in computing or verifying Ed25519 signatures has any relevance to overall application performance. But, insofar as such things matter, the following points are to be remembered:

  • Modern large CPUs (for laptops and servers) are good at handling add-with-carry, and for them the classical “64-bit limbs” format tends to be the fastest.
  • Some smaller CPUs will be happier with 51-bit limbs. However, there is no one-size-fits-all implementation strategy: for some CPUs, the main issue is the latency of add-with-carry, while for some others, in particular RISC-V systems, the instruction throughput is the bottleneck.
❌
❌