Normal view

There are new articles available, click to refresh the page.

Before yesterdayMain stream

Trail of Bits Blog
A peek into build provenance for HomebrewTrail of Bits
14 May 2024 at 13:00

A peek into build provenance for Homebrew

Trail of Bits Blog

By: Trail of Bits

14 May 2024 at 13:00

By Joe Sweeney and William Woodruff

Last November, we announced our collaboration with Alpha-Omega and OpenSSF to add build provenance to Homebrew.

Today, we are pleased to announce that the core of that work is live and in public beta: homebrew-core is now cryptographically attesting to all bottles built in the official Homebrew CI. You can verify these attestations with our (currently external, but soon upstreamed) brew verify command, which you can install from our tap:

This means that, from now on, each bottle built by Homebrew will come with a cryptographically verifiable statement binding the bottle’s content to the specific workflow and other build-time metadata that produced it. This metadata includes (among other things) the git commit and GitHub Actions run ID for the workflow that produced the bottle, making it a SLSA Build L2-compatible attestation:

In effect, this injects greater transparency into the Homebrew build process, and diminishes the threat posed by a compromised or malicious insider by making it impossible to trick ordinary users into installing non-CI-built bottles.

This work is still in early beta, and involves features and components still under active development within both Homebrew and GitHub. As such, we don’t recommend that ordinary users begin to verify provenance attestations quite yet.

For the adventurous, however, read on!

A quick Homebrew recap

Homebrew is an open-source package manager for macOS and Linux. Homebrew’s crown jewel is homebrew-core, a default repository of over 7,000 curated open-source packages that ship by default with the rest of Homebrew. homebrew-core’s packages are downloaded hundreds of millions of times each year, and form the baseline tool suite (node, openssl, python, go, etc.) for programmers using macOS for development.

One of Homebrew’s core features is its use of bottles: precompiled binary distributions of each package that speed up brew install and ensure its consistency between individual machines. When a new formula (the machine-readable description of how the package is built) is updated or added to homebrew-core, Homebrew’s CI (orchestrated through BrewTestBot) automatically triggers a process to create these bottles.

After a bottle is successfully built and tested, it’s time for distribution. BrewTestBot takes the compiled bottle and uploads it to GitHub Packages, Homebrew’s chosen hosting service for homebrew-core. This step ensures that users can access and download the latest software version directly through Homebrew’s command-line interface. Finally, BrewTestBot updates references to the changes formula to include the latest bottle builds, ensuring that users receive the updated bottle upon their next brew update.

In sum: Homebrew’s bottle automation increases the reliability of homebrew-core by removing humans from the software building process. In doing so, it also eliminates one specific kind of supply chain risk: by lifting bottle builds away from individual Homebrew maintainers into the Homebrew CI, it reduces the likelihood that a maintainer’s compromised development machine could be used to launch an attack against the larger Homebrew user base¹.

At the same time, there are other aspects of this scheme that an attacker could exploit: an attacker with sufficient permissions could potentially upload malicious builds directly to homebrew-core’s bottle storage, potentially leveraging alert fatigue to trick users into installing despite a checksum mismatch. More concerningly, a compromised or rogue Homebrew maintainer could surreptitiously replace both the bottle and its checksum, resulting in silently compromised installs for all users onwards.

This scenario is a singular but nonetheless serious weakness in the software supply chain, one that is well addressed by build provenance.

Build provenance

In a nutshell, build provenance provides cryptographically verifiable evidence that a software package was actually built by the expected “build identity” and not tampered with or secretly inserted by a privileged attacker. In effect, build provenance offers the integrity properties of a strong cryptographic digest, combined with an assertion that the artifact was produced by a publicly auditable piece of build infrastructure.

In the case of Homebrew, that “build identity” is a GitHub Actions workflow, meaning that the provenance for every bottle build attests to valuable pieces of metadata like the GitHub owner and repository, the branch that the workflow was triggered from, the event that triggered the workflow, and even the exact git commit that the workflow ran from.

This data (and more!) is encapsulated in a machine-readable in-toto statement, giving downstream consumers the ability to express complex policies over individual attestations:

Build provenance and provenance more generally are not panaceas: they aren’t a substitute for application-level protections against software downgrades or confusion attacks, and they can’t prevent “private conversation with Satan” scenarios where the software itself is malicious or compromised.

Despite this, provenance is a valuable building block for auditable supply chains: it forces attackers into the open by committing them to public artifacts on a publicly verifiable timeline, and reduces the number of opaque format conversions that an attacker can hide their payload in. This is especially salient in cases like the recent xz-utils backdoor, where the attacker used a disconnect between the upstream source repository and backdoored tarball distribution to maintain their attack’s stealth. Or in other words: build provenance won’t stop a fully malicious maintainer, but it will force their attack into the open for review and incident response.

Our implementation

Our implementation of build provenance for Homebrew is built on GitHub’s new artifact attestations feature. We were given early (private beta) access to the feature, including the generate-build-provenance action and gh attestation CLI, which allowed us to iterate rapidly on a design that could be easily integrated into Homebrew’s pre-existing CI.

This gives us build provenance for all current and future bottle builds, but we were left with a problem: Homebrew has a long “tail” of pre-existing bottles that are still referenced in formulae, including bottles built on (architecture, OS version) tuples that are no longer supported by GitHub Actions². This tail is used extensively, leaving us with a dilemma:

Attempt to rebuild all old bottles. This is technically and logistically infeasible, both due to the changes in GitHub Actions’ own supported runners and significant toolchain changes between macOS versions.
Only verify a bottle’s build provenance if present. This would effectively punch a hole in the intended security contract for build provenance, allowing an attacker to downgrade to a lower degree of integrity simply by stripping off any provenance metadata.

Neither of these solutions was workable, so we sought a third. Instead of either rebuilding the world or selectively verifying, we decided to create a set of backfilled build attestations, signed by a completely different repository (our tap) and workflow. With a backfilled attestation behind each bottle, verification looks like a waterfall:

We first check for build provenance tied to the “upstream” repository with the expected workflow, i.e. Homebrew/homebrew-core with publish-commit-bottles.yml.
If the “upstream” provenance is not present, we check for a backfilled attestation before a specified cutoff date from the backfill identity, i.e. trailofbits/homebrew-brew-verify with backfill_signatures.yml.
If neither is present, then we produce a hard failure.

This gives us the best of both worlds: the backfill allows us to uniformly fail if no provenance or attestation is present (eliminating downgrades), without having to rebuild every old homebrew-core bottle. The cutoff date then adds an additional layer of assurance, preventing an attacker from attempting to use the backfill attestation to inject an unexpected bottle.

We expect the tail of backfilled bottle attestations to decrease over time, as formulae turn over towards newer versions. Once all reachable bottles are fully turned over, Homebrew will be able to remove the backfill check entirely and assert perfect provenance coverage!

Verifying provenance today

As mentioned above: this feature is in an early beta. We’re still working out known performance and UX issues; as such, we do not recommend that ordinary users try it yet.

With that being said, adventuresome early adopters can give it a try with two different interfaces:

A dedicated brew verify command, available via our third-party tap
An early upstream integration into brew install itself.

For brew verify, simply install our third-party tap. Once installed, the brew verify subcommand will become usable:

brew update
brew tap trailofbits/homebrew-brew-verify
brew verify --help
brew verify bash

Going forward, we’ll be working with Homebrew to upstream brew verify directly into brew as a developer command.

For brew install itself, set HOMEBREW_VERIFY_ATTESTATIONS=1 in your environment:

brew update
export HOMEBREW_VERIFY_ATTESTATIONS=1
brew install cowsay

Regardless of how you choose to experiment with this new features, certain caveats apply:

Both brew verify and brew install wrap the gh CLI internally, and will bootstrap gh locally if it isn’t already installed. We intend to replace our use of gh attestation with a pure-Ruby verifier in the medium term.
The build provenance beta depends on authenticated GitHub API endpoints, meaning that gh must have access to a suitable access credential. If you experience initial failures with brew verify or brew install, try running gh auth login or setting HOMEBREW_GITHUB_API_TOKEN to a personal access token with minimal permissions.

If you hit a bug or unexpected behavior while experimenting with brew install, please report it! Similarly, for brew verify: please send any reports directly to us.

Looking forward

Everything above concerns homebrew-core, the official repository of Homebrew formulae. But Homebrew also supports third-party repositories (“taps”), which provide a minority–but–significant number of overall bottle installs. These repositories also deserve build provenance, and we have ideas for accomplishing that!

Further out, we plan to take a stab at source provenance as well: Homebrew’s formulae already hash-pin their source artifacts, but we can go a step further and additionally assert that source artifacts are produced by the repository (or other signing identity) that’s latent in their URL or otherwise embedded into the formula specification. This will compose nicely with GitHub’s artifact attestations, enabling a hypothetical DSL:

Stay tuned for further updates in this space and, as always, don’t hesitate to contact us! We’re interested in collaborating on similar improvements for other open-source packaging ecosystems, and would love to hear from you.

Last but not least, we’d like to offer our gratitude to Homebrew’s maintainers for their development and review throughout the process. We’d also like to thank Dustin Ingram for his authorship and design on the original proposal, the GitHub Package Security team, as well as Michael Winser and the rest of Alpha-Omega for their vision and support for a better, more secure software supply chain.

¹In the not-too-distant past, Homebrew’s bottles were produced by maintainers on their own development machines and uploaded to a shared Bintray account. Mike McQuaid’s 2023 talk provides an excellent overview on the history of Homebrew’s transition to CI/CD builds.
²Or easy to provide with self-hosted runners, which Homebrew uses for some builds.

Yesterday — 16 May 2024Main stream

Trail of Bits Blog
Understanding AddressSanitizer: Better memory safety for your codeTrail of Bits
16 May 2024 at 13:00

Understanding AddressSanitizer: Better memory safety for your code

Trail of Bits Blog

By: Trail of Bits

16 May 2024 at 13:00

By Dominik Klemba and Dominik Czarnota

This post will guide you through using AddressSanitizer (ASan), a compiler plugin that helps developers detect memory issues in code that can lead to remote code execution attacks (such as WannaCry or this WebP implementation bug). ASan inserts checks around memory accesses during compile time, and crashes the program upon detecting improper memory access. It is widely used during fuzzing due to its ability to detect bugs missed by unit testing and its better performance compared to other similar tools.

ASan was designed for C and C++, but it can also be used with Objective-C, Rust, Go, and Swift. This post will focus on C++ and demonstrate how to use ASan, explain its error outputs, explore implementation fundamentals, and discuss ASan’s limitations and common mistakes, which will help you grasp previously undetected bugs.

Finally, we share a concrete example of a real bug we encountered during an audit that was missed by ASan and can be detected with our changes. This case motivated us to research ASan bug detection capabilities and contribute dozens of upstreamed commits to the LLVM project. These commits resulted in the following changes:

Extended container sanitization ASan API in LLVM16 by adding support for unaligned memory buffers and adding a function for double-ended contiguous containers. Thanks to that, since LLVM17, std::vector annotations work with all allocators by default.
Added std::deque annotations in LLVM17. For details, check the libc++ 17 release notes.
Added annotations for the long string case of std::string in LLVM18 (with all allocators by default). Check the libc++18 release notes for more details.
We have recently upstreamed short string annotations (read about “short string optimization”), and there is a high probability that they will be included in libc++19, assuming no new concerns or issues arise. Keep an eye on the libc++19 release notes.

Getting started with ASan

ASan can be enabled in LLVM’s Clang and GNU GCC compilers by using the -fsanitize=address compiler and linker flag. The Microsoft Visual C++ (MSVC) compiler supports it via the /fsanitize=address option. Under the hood, the program’s memory accesses will be instrumented with ASan checks and the program will be linked with ASan runtime libraries. As a result, when a memory error is detected, the program will stop and provide information that may help in diagnosing the cause of memory corruption.

AddressSanitizer’s approach differs from other tools like Valgrind, which may be used without rebuilding a program from its source, but has bigger performance overhead (20x vs 2x) and may detect fewer bugs.

Simple example: detecting out-of-bounds memory access

Let’s see ASan in practice on a simple buggy C++ program that reads data from an array out of its bounds. Figure 1 shows the code of such a program, and figure 2 shows its compilation, linking, and output when running it, including the error detected by ASan. Note that the program was compiled with debugging symbols and no optimizations (-g3 and -O0 flags) to make the ASan output more readable.

Figure 1: Example program that has an out-of-bounds bug on the stack since it reads the fifth item from the buf array while it has only 4 elements (example.cpp)

Figure 2: Running the program from figure 1 with ASan

When ASan detects a bug, it prints out a best guess of the error type that has occurred, a backtrace where it happened in the code, and other location information (e.g., where the related memory was allocated or freed).

Figure 3: Part of an ASan error message with location in code where related memory was allocated

In this example, ASan detected a heap-buffer overflow (an out-of-bounds read) in the sixth line of the example.cpp file. The problem was that we read the memory of the buf variable out of bounds through the buf[i] code when the loop counter variable (i) had a value of 4.

It is also worth noting that ASan can detect many different types of errors like stack-buffer-overflows, heap-use-after-free, double-free, alloc-dealloc-mismatch, container-overflow, and others. Figures 4 and 5 present another example, where the ASan detects a heap-use-after-free bug and shows the exact location where the related heap memory was allocated and freed.

Figure 4: Example program that uses a buffer that was freed (built with -fsanitize=address -O0 -g3)

Figure 5: Excerpt of ASan report from running the program from figure 4

For more ASan examples, refer to the LLVM tests code or Microsoft’s documentation.

Building blocks of ASan

ASan is built upon two key concepts: shadow memory and redzones. Shadow memory is a dedicated memory region that stores metadata about the application’s memory. Redzones are special memory regions placed in between objects in memory (e.g., variables on the stack or heap allocations) so that ASan can detect attempts to access memory outside of the intended boundaries.

Shadow memory

Shadow memory is allocated at a high address of the program, and ASan modifies its data throughout the lifetime of the process. Each byte in shadow memory describes the accessibility status of a corresponding memory chunk that can potentially be accessed by the process. Those memory chunks, typically referred to as “granules,” are commonly 8 bytes in size and are aligned to their size (the granule size is set in GCC/LLVM code). Figure 6 shows the mapping between granules and process memory.

Figure 6: Logical division of process memory and corresponding shadow memory bytes

The shadow memory values detail whether a given granule can be fully or partially addressable (accessible by the process), or whether the memory should not be touched by the process. In the latter case, we call this memory “poisoned,” and the corresponding shadow memory byte value details the reason why ASan thinks so. The shadow memory values legend is printed by ASan along with its reports. Figure 7 shows this legend.

Figure 7: Shadow memory legend (the values are displayed in hexadecimal format)

By updating the state of shadow memory during the process execution, ASan can verify the validity of memory accesses by checking the granule’s value (and so its accessibility status). If a memory granule is fully accessible, a corresponding shadow byte is set to zero. Conversely, if the whole granule is poisoned, the value is negative. If the granule is partially addressable—i.e., only the first N bytes may be accessed and the rest shouldn’t—then the number N of addressable bytes is stored in the shadow memory. For example, freed memory on the heap is described with value fd and shouldn’t be used by the process until it’s allocated again. This allows for detecting use-after-free bugs, which often lead to serious security vulnerabilities.

Partially addressable granules are very common. One example may be a buffer on a heap of a size that is not 8-byte-aligned; another may be a variable on the stack that has a size smaller than 8 bytes.

Redzones

Redzones are memory regions inserted into the process memory (and so reflected in shadow memory) that act as buffer zones, separating different objects in memory with poisoned memory. As a result, compiling a program with ASan changes its memory layout.

Let’s look at the shadow memory for the program shown in figure 8, where we introduced three variables on the stack: “buf,” an array of six items each of 2 bytes, and “a” and “b” variables of 2 and 1 bytes.

Figure 8: Example program with an out of bounds memory access error detected by ASan (built with -fsanitize=address -O0 -g3)

Running the program with ASan, as in figure 9, shows us that the problematic memory access hit the “stack right redzone” as marked by the “[f3]” shadow memory byte. Note that ASan marked this byte with the arrow before the address and the brackets around the value.

Figure 9: Shadow bytes describing memory area around stack variables from figure 6. Note that the byte 01 corresponds to the variable “b,” the 02 to variable “a,” and 00 04 to the buf array.

This shadow memory along with the corresponding process memory is shown in figure 10. ASan would detect accesses to the bytes colored in red and report them as errors.

Figure 10: Memory layout with ASan. Each cell represents one byte.

Without ASan, the “a,” “b,” and “buf” variables would likely be next to each other, without any padding between them. The padding was added by the fact that the variables must be partially addressable and because redzones were added in between them as well as before and after them.

Redzones are not added between elements in arrays or in between member variables in structures. This is due to the fact that it would simply break many applications that depend upon the structure layout, their sizes, or simply on the fact that arrays are contiguous in memory.

Sadly, ASan also doesn’t poison the structure padding bytes, since they may be accessed by valid programs when a whole structure is copied (e.g., with the memcpy function).

How does ASan instrumentation work?

ASan instrumentation is fully dependent on the compiler; however, implementations are very similar between compilers. Its shadow memory has the same layout and uses the same values in LLVM and GCC, as the latter is based on the former. The instrumented code also calls to special functions defined in compiler-rt, a low-level runtime library from LLVM. It is worth noting that there are also shared or static versions of the ASan libraries, though this may vary based on a compiler or environment.

The ASan instrumentation adds checks to the program code to validate legality of the program’s memory accesses. Those checks are performed by comparing the address and size of the access against the shadow memory. The shadow memory mapping and encoding of values (the fact that granules are of 8 bytes in size) allow ASan to efficiently detect memory access errors and provide valuable insight into the problems encountered.

Let’s look at a simple C++ example compiled and tested on x86-64, where the touch function accesses 8 bytes at the address given in the argument (the touch function takes a pointer to a pointer and dereferences it):

Figure 11: A function accessing memory area of size 8 bytes

Without ASan, the function has a very simple assembly code:

Figure 12: The function from figure 11 compiled without ASan

Figure 13 shows that, when compiling code from figure 11 with ASan, a check is added that confirms if the access is correct (i.e., if the whole granule is accessed). We can see that the address that we are going to access is first divided by 8 (shr rax, 3 instruction) to compute its offset in the shadow memory. Then, the program checks if the shadow memory byte is zero; if it’s not, it calls to the __asan_report_load8 function, which makes ASan to report the memory access violation. The byte is checked against zero, because zero means that 8 bytes are accessible, whereas the memory dereference that the program performs returns another pointer, which is of course of 8 bytes in size.

Figure 13: The function from Figure 11 compiled with ASan using Clang 15

For comparison, we can see that the gcc compiler generates similar code (figure 14) as by LLVM (figure 13):

Figure 14: The function from Figure 11 compiled with ASan using gcc 12

Of course, if the program accessed a smaller region, a different check would have to be generated by the compiler. This is shown in figures 15 and 16, where the program accesses just a single byte.

Figure 15: A function accessing memory area smaller than a granule

Now the function accesses a single byte that may be at the beginning, middle, or the end of a granule, and every granule may be fully addressable, partially addressable, or fully poisoned. The shadow memory byte is first checked against zero, and if it doesn’t match, a detailed check is performed (starting from the .LBB0_1 label). This check will raise an error if the granule is partially addressable and a poisoned byte is accessed (from a poisoned suffix) or if the granule is fully poisoned. (GCC generates similar code.)

Figure 16: An example of a more complex check, confirming legality of the access in function from figure 15, compiled with Clang 15

Can you spot the problem above?

You may have noticed in figures 12-14 that access to poisoned memory may not be detected if the address we read 8 bytes from is unaligned. For such an unaligned memory access, its first and last bytes are in different granules.

The following snippet illustrates a scenario when the address of variable ptr is increased by three and the touch function touches an unaligned address.

Figure 17: Code accessing unaligned memory of size 8 may not be detected by ASan in Clang 15

The incorrect access from figure 17 is not detected when it is compiled with Clang 15, but it is detected by GCC 12 as long as the function is inlined. If we force non-inlining with __attribute__ ((noinline)), GCC won’t detect it either. It seems that when GCC is aware of address manipulations that may result in unaligned addressing, it generates a more robust check that detects the invalid access correctly.

ASan’s limitations and quirks

While ASan may miss some bugs, it is important to note that it does not report any false positives if used properly. This means that if it detects a bug, it must be a valid bug in the code, or, a part of the code was not linked with ASan properly (assuming that ASan itself doesn’t have bugs).

However, the ASan implementation in GCC and LLVM include the following limitations or/and quirks:

Redzones are not added between variables in structures.
Redzones are not added between array elements.
Padding in structures is not poisoned (example).
Access to allocated, but not yet used, memory in a container won’t be detected, unless the container annotates itself like C++’s std::vector, std::deque, or std::string (in some cases). Note that std::basic_string (with external buffers) and std::deque are annotated in libc++ (thanks to our patches) while std::string is also annotated in Microsoft C++ standard library.
Incorrect access to memory managed by a custom allocator won’t raise an error unless the allocator performs annotations.
Only suffixes of a memory granule may be poisoned; therefore, access before an unaligned object may not be detected.
ASan may not detect memory errors if a random address is accessed. As long as the random number generator returns an addressable address, access won’t be considered incorrect
ASan doesn’t understand context and only checks values in shadow memory. If a random address being accessed is annotated as some error in shadow memory, ASan will correctly report that error, even if its bug title may not make much sense.
Because ASan does not understand what programs are intended to do, accessing an array with an incorrect index may not be detected if the resulting address is still addressable, as shown in figure 18.

Figure 18: Access to memory that is addressable but out of bounds of the array. There is no error detected.

ASan is not meant for production use

ASan is designed as a debugging tool for use in development and testing environments and it should not be used on production. Apart from its overhead, ASan shouldn’t be used for hardening as its use could compromise the security of a program. For example, it decreases the effectiveness of ASLR security mitigation by its gigantic shadow memory allocation and it also changes the behavior of the program based on environment variables which could be problematic, e.g., for suid binaries.

If you have any other doubts, you should check the ASan FAQ and for hardening your application, refer to compiler security flags.

Poisoning-only suffixes

Because ASan currently has a very limited number of values in shadow memory, it can only poison suffixes of memory granules. In other words, there is no such value encoding in shadow memory to inform ASan that for a granule a given byte is accessible if it follows an inaccessible (poisoned) byte.

As an example, if the third byte in a granule is not poisoned, the previous two bytes are not poisoned as well, even if logic would require them to be poisoned.

It also means that up to seven bytes may not be poisoned, assuming that an object/variable/buffer starts in the middle or at the last byte of a granule.

False positives due to linking

False positives can occur when only part of a program is built with ASan. These false positives are often (if not always) related to container annotations. For example, linking a library that is both missing instrumentation and modifying annotated objects may result in false positives.

Consider a scenario where the push_back member function of a vector is called. If an object is added at the end of the container in a part of the program that does not have ASan instrumentation, no error will be reported, and the memory where the object is stored will not be unpoisoned. As a result, accessing this memory in the instrumented part of the program will trigger a false positive error.

Similarly, access to poisoned memory in a part of the program that was built without ASan won’t be detected.

To address this situation, the whole application along with all its dependencies should be built with ASan (or at least all parts modifying annotated containers). If this is not possible, you can turn off container annotations by setting the environment variable ASAN_OPTIONS=detect_container_overflow=0.

Do it yourself: user annotations

User annotations may be used to detect incorrect memory accesses—for example, when preallocating a big chunk of memory and managing it with a custom allocator or in a custom container. In other words, user annotations can be used to implement similar checks to those std::vector does under the hood in order to detect out-of-bounds access in between the vector’s data+size and data+capacity addresses.

If you want to make your testing even stronger, you can choose to intentionally “poison” certain memory areas yourself. For this, there are two macros you may find useful:

ASAN_POISON_MEMORY_REGION(addr, size)
ASAN_UNPOISON_MEMORY_REGION(addr, size)

To use these macros, you need to include the ASan interface header:

Figure 19: The ASan API must be included in the program

This makes poisoning and unpoisoning memory quite simple. The following is an example of how to do this:

Figure 20: A program demonstrating user poisoning and its detection.

The program allocates a buffer on heap, poisons the whole buffer (through user poisoning), and then accesses an element from the buffer. This access is detected as forbidden, and the program reports a “Poisoned by user” error (f7). The figure below shows the buffer (poisoned by user) as well as the heap redzone (fa).

Figure 21: A part of the error message generated by program from figure 20 while compiled with ASan

However, if you unpoison part of the buffer (as shown below, for four elements), no error would be raised while accessing the first four elements. Accessing any further element will raise an error.

Figure 22: An example of unpoisoning memory by user

If you want to understand better how those macros impact the code, you can look into its definition in an ASan interface file.

The ASAN_POISON_MEMORY_REGION and ASAN_UNPOISON_MEMORY_REGION macros simply invoke the __asan_poison_memory_region and __asan_unpoison_memory_region functions from the API. However, when a program is compiled without ASan, these macros do nothing beyond evaluating the macro arguments.

The bug missed by ASan

As we noted previously in the limitations section, ASan does not automatically detect out-of-bound accesses into containers that preallocate memory and manage it. This was also a case we came across during an audit: we found a bug with manual review in code that we were fuzzing and we were surprised the fuzzer did not find it. It turned out that this was because of lack of container overflow detection in the std::basic_string and std::deque collections in libc++.

This motivated us to get involved in ASan development by developing a proof of concept of those ASan container overflow detections in GCC and LLVM and eventually upstream patches to LLVM.

So what was the bug that ASan missed? Figure 23 shows a minimal example of it. The buggy code compared two containers via an std::equal function that took only the first1, last1, and first2 iterators, corresponding to the beginning and end of the first sequence and to the beginning of the second sequence for comparison, assuming the same length of the sequences.

However, when the second container is shorter than the first one, this can cause an out-of-bounds read, which was not detected by ASan and which we changed. With our patches, this is finally detected by ASan.

Figure 23: Code snippet demonstrating the nature of the bug we found during the audit. Container type was changed for demonstrative purposes.

Use ASan to detect more memory safety bugs

We hope our efforts to improve ASan’s state-of-the-art bug detection capabilities will cement its status as a powerful tool for protecting codebases against memory issues.

We’d like to express our sincere gratitude to the entire LLVM community for their support during the development of our ASan annotation improvements. From reviewing code patches and brainstorming implementation ideas to identifying issues and sharing knowledge, their contributions were invaluable. We especially want to thank vitalybuka, ldionne, philnik777, and EricWF for their ongoing support!

We hope this explanation of AddressSanitizer has been insightful and demonstrated its value in hunting down bugs within a codebase. We encourage you to leverage this knowledge to proactively identify and eliminate issues in your own projects. If you successfully detect bugs with the help of the information provided here, we’d love to hear about it! Happy hunting!

If you need help with ASan annotations, fuzzing, or anything related to LLVM, contact us! We are happy to help tailor sanitizers or other LLVM tools to your specific needs. If you’d like to read more about our work on compilers, check out the following posts: VAST (GitHub repository) and Macroni (GitHub repository).