Coverage Guided Fuzzing – Extending Instrumentation to Hunt Down Bugs Faster!

25 April 2024 at 18:30

We at IncludeSec sometimes have the need to develop fuzzing harnesses for our clients as part of our security assessment and pentesting work. Using fuzzing in an assessment methodology can uncover vulnerabilities in modern and complex software during security assessments by providing a faster way to submit highly structured inputs to the applications. This technique is usually applied when a more comprehensive effort beyond manual and traditional automated testing are requested by our clients to provide an additional analysis to uncover more esoteric vulnerabilities.

Introduction

Coverage-guided fuzzing is a useful capability in advanced fuzzers (AFL, libFuzzer, Fuzzilli, and others). This capability permits the fuzzer to acknowledge if an input can discover new edges or branches in the binary execution paths. An edge links two branches in a control flow graph (CFG). For instance, if a logical condition involves an if-else statement, there would be two edges, one for the if and the other for the else statement. It is a significant part of the fuzzing process, helping determine if the target program’s executable code is effectively covered by the fuzzer.

A guided fuzzing process usually utilizes a coverage-guided fuzzing (CGF) technique, employing very basic instrumentation to collect data needed to identify if a new edge or coverage block is hit during the execution of a fuzz test case. The instrumentation is code added during the compilation process, utilized for a number of reasons, including software debugging which is how we will use it in this post.

However, CGF instrumentation techniques can be extended, such as by adding new metrics, as demonstrated in this paper [1], where the authors consider not only the edge count but when there is a security impact too. Generally, extending instrumentation is useful to retrieve more information from the target programs.

In this post, we modify the Fuzzilli patch for the software JerryScript. JerryScript has a known and publicly available vulnerability/exploit, that we can use to show how extending Fuzzilli’s instrumentation could be helpful for more easily identifying vulnerabilities and providing more useful feedback to the fuzzer for further testing. Our aim is to demonstrate how we can modify the instrumentation and extract useful data for the fuzzing process.

[1] (Not All Coverage Measurements Are Equal: Fuzzing by Coverage Accounting for Input Prioritization – NDSS Symposium (ndss-symposium.org)

Fuzzing

Fuzzing is the process of submitting random inputs to trigger an unexpected behavior from the application. In recent approaches, the fuzzers consider various aspects of the target application for generating inputs, including the seeds – sources for generating the inputs. Since modern software has complex structures, we can not reach satisfactory results using simple inputs. In other words, by not affecting most of the target program it will be difficult to discover new vulnerabilities.

The diagram below shows an essential structure for a fuzzer with mutation strategy and code coverage capability.

Seeds are selected;
The mutation process takes the seeds to originate inputs for the execution;
The execution happens;
A vulnerability can occur or;
The input hits a new edge in the target application; the fuzzer keeps mutating the same seed or;
The input does not hit new edges, and the fuzzer selects a new seed for mutation.

The code coverage is helpful to identify if the input can reach different parts of the target program by pointing to the fuzzer that a new edge or block was found during the execution.

CLANG

Clang [Clang] is a compiler for the C, C++, Objective-C, and Objective-C++ programming languages. It is part of the LLVM project and offers advantages over traditional compilers like GCC (GNU Compiler Collection), including more expressive diagnostics, faster compilation times, and extensive analysis support.

One significant tool within the Clang compiler is the sanitizer. Sanitizers are security libraries or tools that can detect bugs and vulnerabilities automatically by instrumenting the code. The compiler checks the compiled code for security implications when the sanitizer is enabled.

There are a few types of sanitizers in this context:

AddressSanitizer (ASAN): This tool detects memory errors, including vulnerabilities like buffer overflows, use-after-free, double-free, and memory leaks.

UndefinedBehaviorSanitizer (UBSAN): Identifies undefined behavior in C/C++ code such as integer overflow, division by zero, null pointer dereferences, and others.

MemorySanitizer (MSAN): Detected uninitialized memory reads in C/C++ programs that can lead to unpredictable behavior.

ThreadSanitizer (TSAN): Detects uninitialized data races and deadlocks in multithreads C/C++ applications.

LeakSanitizer (LSAN): This sanitizer is integrated with AddressSanitizer and helps detect memory leaks, ensuring that all allocated memory is being freed.

The LLVM documentation (SanitizerCoverage — Clang 19.0.0git documentation (llvm.org)) provides a few examples of what to do with the tool. The shell snippet below shows the command line for the compilation using the ASAN option to trace the program counter.

$ clang -o targetprogram -g -fsanitize=address -fsanitize-coverage=trace-pc-guard targetprogram.c

From clang documentation:

“LLVM has a simple code coverage instrumentation built in (SanitizerCoverage). It inserts calls to user-defined functions on function-, basic-block-, and edge- levels. Default implementations of those callbacks are provided and implement simple coverage reporting and visualization, however if you need just coverage visualization you may want to use SourceBasedCodeCoverage instead.”

For example, code coverage in Fuzzilli (googleprojectzero/fuzzilli: A JavaScript Engine Fuzzer (github.com)), Google’s state-of-the-art JavaScript engine fuzzer, utilizes simple instrumentation to respond to Fuzzilli’s process, as demonstrated in the code snippet below.

extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
    uint32_t index = *guard;
    __shmem->edges[index / 8] |= 1 << (index % 8);
    *guard = 0;
}

The function __sanitizer_cov_trace_pc_guard() will consistently execute when a new edge is found, so no condition is necessary to interpret the new edge discovery. Then, the function changes a bit in the shared bitmap __shmem->edges to 1 (bitwise OR), and then Fuzzilli analyzes the bitmap after execution.

Other tools, like LLVM-COV (llvm-cov – emit coverage information — LLVM 19.0.0git documentation), capture code coverage information statically, providing a human-readable document after execution; however, fuzzers need to be efficient, and reading documents in the disk would affect the performance.

Getting More Information

We can modify Fuzzilli’s instrumentation and observe other resources that __sanitizer_cov_trace_pc_guard() can bring to the code coverage. The code snippet below demonstrates the Fuzzilli instrumentation with a few tweaks.

extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
    uint32_t index = *guard;

    void *PC = __builtin_return_address(0);
    char PcDescr[1024];

    __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
    printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);

    __shmem->edges[index / 8] |= 1 << (index % 8);
    *guard = 0;
}

We already know that the function __sanitizer_cov_trace_pc_guard() is executed every time the instrumented program hits a new edge. In this case, we are utilizing the function __builtin_return_address() to collect the return addresses from every new edge hit in the target program. Now, the pointer PC has the return address information. We can utilize the __sanitizer_symbolize_pc() function to correlate the address to the symbols, providing more information about the source code file used during the execution.

Most fuzzers use only the edge information to guide the fuzzing process. However, as we will demonstrate in the next section, using the sanitizer interface can provide compelling information for security assessments.

Lab Exercise

In our laboratory, we will utilize another JavaScript engine. In this case, an old version of JerryScript JavaScript engine to create an environment.

Operating System (OS): Ubuntu 22.04
Target Program: JerryScript
Vulnerability: CVE-2023-36109

Setting Up the Environment

You can build JerryScript using the following instructions.

First, clone the repository:

$ git clone https://github.com/jerryscript-project/jerryscript.git

Enter into the JerryScript folder and checkout the 8ba0d1b6ee5a065a42f3b306771ad8e3c0d819bc commit.

$ git checkout 8ba0d1b6ee5a065a42f3b306771ad8e3c0d819bc

Fuzzilli utilizes the head 8ba0d1b6ee5a065a42f3b306771ad8e3c0d819bc for the instrumentation, and we can take advantage of the configuration done for our lab. Apply the patch available in the Fuzziilli’s repository (fuzzilli/Targets/Jerryscript/Patches/jerryscript.patch at main · googleprojectzero/fuzzilli (github.com))

$ cd jerry-main
$ wget https://github.com/googleprojectzero/fuzzilli/raw/main/Targets/Jerryscript/Patches/jerryscript.patch
$ patch < jerryscript.patch
patching file CMakeLists.txt
patching file main-fuzzilli.c
patching file main-fuzzilli.h
patching file main-options.c
patching file main-options.h
patching file main-unix.c

The instrumented file is jerry-main/main-fuzzilli.c, provided by the Fuzzilli’s patch. It comes with the necessary to work with simple code coverage capabilities. Still, we want more, so we can use the same lines we demonstrated in the previous section to update the function __sanitizer_cov_trace_pc_guard() before the compilation. Also, adding the following header to jerry-main/main-fuzzilli.c file:

#include <sanitizer/common_interface_defs.h>

The file header describes the __sanitizer_symbolize_pc() function, which will be needed in our implementation. We will modify the function in the jerry-main/main-fuzzilli.c file.

void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
    uint32_t index = *guard;
    if(!index) return;
    index--;

    void *PC = __builtin_return_address(0);
    char PcDescr[1024];

    __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
    printf("guard: %p %x PC %s\n", (void *)guard, *guard, PcDescr);
    __shmem->edges[index / 8] |= 1 << (index % 8);
    *guard = 0;
}

We now change the compilation configuration and disable the strip. The symbols are only needed to identify the possible vulnerable functions for our demonstration.

In the root folder CMakeLists.txt file

# Strip binary
if(ENABLE_STRIP AND NOT CMAKE_BUILD_TYPE STREQUAL "Debug")
  jerry_add_link_flags(-g)
endif()

It defaults with the -s option; change to -g to keep the symbols. Make sure that jerry-main/CMakeLists.txt contains the main-fuzzilli.c file, and then we are ready to compile. We can then build it using the Fuzzilli instructions.

$ python jerryscript/tools/build.py --compile-flag=-fsanitize-coverage=trace-pc-guard --profile=es2015-subset --lto=off --compile-flag=-D_POSIX_C_SOURCE=200809 --compile-flag=-Wno-strict-prototypes --stack-limit=15

If you have installed Clang, but the output line CMAKE_C_COMPILER_ID is displaying GNU or something else, you will have errors during the building.

$ python tools/build.py --compile-flag=-fsanitize-coverage=trace-pc-guard --profile=es2015-subset --lto=off --compile-flag=-D_POSIX_C_SOURCE=200809 --compile-flag=-Wno-strict-prototypes --stack-limit=15
-- CMAKE_BUILD_TYPE               MinSizeRel
-- CMAKE_C_COMPILER_ID            GNU
-- CMAKE_SYSTEM_NAME              Linux
-- CMAKE_SYSTEM_PROCESSOR         x86_64

You can simply change the CMakeLists.txt file, lines 28-42 to enforce Clang instead of GNU by modifying USING_GCC 1 to USING_CLANG 1, as shown below:

# Determining compiler
if(CMAKE_C_COMPILER_ID MATCHES "GNU")
  set(USING_CLANG 1)
endif()

if(CMAKE_C_COMPILER_ID MATCHES "Clang")
  set(USING_CLANG 1)
endif()

The instrumented binary will be the build/bin/jerry file.

Execution

Let’s start by disabling ASLR (Address Space Layout Randomization).

$ echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

After testing, we can re-enable the ASLR by setting the value to 2.

$ echo 2 | sudo tee /proc/sys/kernel/randomize_va_space

We want to track the address to the source code file, and disabling the ASLR will help us stay aware during the analysis and not affect our results. The ASLR will not impact our lab, but keeping the addresses fixed during the fuzzing process will be fundamental.

Now, we can execute JerryScript using the PoC file for the vulnerability CVE-2023-36109 (Limesss/CVE-2023-36109: a poc for cve-2023-36109 (github.com)), as an argument to trigger the vulnerability. As described in the vulnerability description, the vulnerable function is at ecma_stringbuilder_append_raw in jerry-core/ecma/base/ecma-helpers-string.c, highlighted in the command snippet below.

$ ./build/bin/jerry ./poc.js
[...]
guard: 0x55e17d12ac88 7bb PC 0x55e17d07ac6b in ecma_string_get_ascii_size ecma-helpers-string.c
guard: 0x55e17d12ac84 7ba PC 0x55e17d07acfe in ecma_string_get_ascii_size ecma-helpers-string.c
guard: 0x55e17d12ac94 7be PC 0x55e17d07ad46 in ecma_string_get_size (/jerryscript/build/bin/jerry+0x44d46) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
guard: 0x55e17d12e87c 16b8 PC 0x55e17d09dfe1 in ecma_regexp_replace_helper (/jerryscript/build/bin/jerry+0x67fe1) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
guard: 0x55e17d12ae04 81a PC 0x55e17d07bb64 in ecma_stringbuilder_append_raw (/jerryscript/build/bin/jerry+0x45b64) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
guard: 0x55e17d12e890 16bd PC 0x55e17d09e053 in ecma_regexp_replace_helper (/jerryscript/build/bin/jerry+0x68053) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
guard: 0x55e17d12e8b8 16c7 PC 0x55e17d09e0f1 in ecma_regexp_replace_helper (/jerryscript/build/bin/jerry+0x680f1) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
guard: 0x55e17d133508 29db PC 0x55e17d0cc292 in ecma_builtin_replace_substitute (/jerryscript/build/bin/jerry+0x96292) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
guard: 0x55e17d133528 29e3 PC 0x55e17d0cc5bd in ecma_builtin_replace_substitute (/jerryscript/build/bin/jerry+0x965bd) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
guard: 0x55e17d12f078 18b7 PC 0x55e17d040a78 in jmem_heap_realloc_block (/jerryscript/build/bin/jerry+0xaa78) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
guard: 0x55e17d12f088 18bb PC 0x55e17d040ab4 in jmem_heap_realloc_block (/jerryscript/build/bin/jerry+0xaab4) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
guard: 0x55e17d12f08c 18bc PC 0x55e17d040c26 in jmem_heap_realloc_block (/jerryscript/build/bin/jerry+0xac26) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
guard: 0x55e17d12f094 18be PC 0x55e17d040ca3 in jmem_heap_realloc_block (/jerryscript/build/bin/jerry+0xaca3) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
UndefinedBehaviorSanitizer:DEADLYSIGNAL
==27636==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x55e27da7950c (pc 0x7fe341fa092b bp 0x000000000000 sp 0x7ffc77634f18 T27636)
==27636==The signal is caused by a READ memory access.
    #0 0x7fe341fa092b  string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:513
    #1 0x55e17d0cc3bb in ecma_builtin_replace_substitute (/jerryscript/build/bin/jerry+0x963bb) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
    #2 0x55e17d09e103 in ecma_regexp_replace_helper (/jerryscript/build/bin/jerry+0x68103) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
    #3 0x55e17d084a23 in ecma_builtin_dispatch_call (/jerryscript/build/bin/jerry+0x4ea23) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
    #4 0x55e17d090ddc in ecma_op_function_call_native ecma-function-object.c
    #5 0x55e17d0909c1 in ecma_op_function_call (/jerryscript/build/bin/jerry+0x5a9c1) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
    #6 0x55e17d0d4743 in ecma_builtin_string_prototype_object_replace_helper ecma-builtin-string-prototype.c
    #7 0x55e17d084a23 in ecma_builtin_dispatch_call (/jerryscript/build/bin/jerry+0x4ea23) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
    #8 0x55e17d090ddc in ecma_op_function_call_native ecma-function-object.c
    #9 0x55e17d0909c1 in ecma_op_function_call (/jerryscript/build/bin/jerry+0x5a9c1) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
    #10 0x55e17d0b929f in vm_execute (/jerryscript/build/bin/jerry+0x8329f) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
    #11 0x55e17d0b8d4a in vm_run (/jerryscript/build/bin/jerry+0x82d4a) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
    #12 0x55e17d0b8dd0 in vm_run_global (/jerryscript/build/bin/jerry+0x82dd0) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
    #13 0x55e17d06d4a5 in jerry_run (/jerryscript/build/bin/jerry+0x374a5) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
    #14 0x55e17d069e32 in main (/jerryscript/build/bin/jerry+0x33e32) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
    #15 0x7fe341e29d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #16 0x7fe341e29e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #17 0x55e17d0412d4 in _start (/jerryscript/build/bin/jerry+0xb2d4) (BuildId: 9588e1efabff4190fd492d05d3710c7810323407)
UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:513 
==27636==ABORTING

Using this technique, we could identify the root cause of the vulnerability in the function ecma_stringbuilder_append_raw() address in the stack trace.

However, if we rely only on the sanitizer to check the stack trace, we won’t be able to see the vulnerable function name in our output:

$ ./build/bin/jerry ./poc.js 
[COV] no shared memory bitmap available, skipping
[COV] edge counters initialized. Shared memory: (null) with 14587 edges
UndefinedBehaviorSanitizer:DEADLYSIGNAL
==54331==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x5622ae01350c (pc 0x7fc1925a092b bp 0x000000000000 sp 0x7ffed516b838 T54331)
==54331==The signal is caused by a READ memory access.
    #0 0x7fc1925a092b  string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:513
    #1 0x5621ad66636b in ecma_builtin_replace_substitute (/jerryscript/build/bin/jerry+0x9636b) (BuildId: 15a3c1cd9721e9f1b4e15fade2028ddca6dc542a)
    #2 0x5621ad6380b3 in ecma_regexp_replace_helper (/jerryscript/build/bin/jerry+0x680b3) (BuildId: 15a3c1cd9721e9f1b4e15fade2028ddca6dc542a)
    #3 0x5621ad61e9d3 in ecma_builtin_dispatch_call (/jerryscript/build/bin/jerry+0x4e9d3) (BuildId: 15a3c1cd9721e9f1b4e15fade2028ddca6dc542a)
    #4 0x5621ad62ad8c in ecma_op_function_call_native ecma-function-object.c
    #5 0x5621ad62a971 in ecma_op_function_call (/jerryscript/build/bin/jerry+0x5a971) (BuildId: 15a3c1cd9721e9f1b4e15fade2028ddca6dc542a)
    #6 0x5621ad66e6f3 in ecma_builtin_string_prototype_object_replace_helper ecma-builtin-string-prototype.c
    #7 0x5621ad61e9d3 in ecma_builtin_dispatch_call (/jerryscript/build/bin/jerry+0x4e9d3) (BuildId: 15a3c1cd9721e9f1b4e15fade2028ddca6dc542a)
    #8 0x5621ad62ad8c in ecma_op_function_call_native ecma-function-object.c
    #9 0x5621ad62a971 in ecma_op_function_call (/jerryscript/build/bin/jerry+0x5a971) (BuildId: 15a3c1cd9721e9f1b4e15fade2028ddca6dc542a)
    #10 0x5621ad65324f in vm_execute (/jerryscript/build/bin/jerry+0x8324f) (BuildId: 15a3c1cd9721e9f1b4e15fade2028ddca6dc542a)
    #11 0x5621ad652cfa in vm_run (/jerryscript/build/bin/jerry+0x82cfa) (BuildId: 15a3c1cd9721e9f1b4e15fade2028ddca6dc542a)
    #12 0x5621ad652d80 in vm_run_global (/jerryscript/build/bin/jerry+0x82d80) (BuildId: 15a3c1cd9721e9f1b4e15fade2028ddca6dc542a)
    #13 0x5621ad607455 in jerry_run (/jerryscript/build/bin/jerry+0x37455) (BuildId: 15a3c1cd9721e9f1b4e15fade2028ddca6dc542a)
    #14 0x5621ad603e32 in main (/jerryscript/build/bin/jerry+0x33e32) (BuildId: 15a3c1cd9721e9f1b4e15fade2028ddca6dc542a)
    #15 0x7fc192429d8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #16 0x7fc192429e3f in __libc_start_main csu/../csu/libc-start.c:392:3
    #17 0x5621ad5db2d4 in _start (/jerryscript/build/bin/jerry+0xb2d4) (BuildId: 15a3c1cd9721e9f1b4e15fade2028ddca6dc542a)

UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:513 
==54331==ABORTING

This behavior happens because the vulnerability occurs far from the last execution in the program. Usually, the primary action would be debugging to identify the address of the vulnerable function in memory.

Additional Considerations

The vulnerable address or address space could be used as a guide during fuzzing. We can then compare the PC to the specific address space and instruct the fuzzer to focus on a path by mutating the same input in an attempt to cause other vulnerabilities in the same function or file.
For example, we can also feed data related to historical vulnerability identification, correlate dangerous files to their address space in a specific project and include them into the instrumentation, and give feedback to the fuzzer to achieve a more focused fuzzing campaign.

We do not necessarily need to use __sanitizer_symbolize_pc for the fuzzing process; this is done only to demonstrate the function and file utilized by each address. Our methodology would only require void *PC = __builtin_return_address(0). The PC will point to the current PC address in the execution, which is the only information needed for the comparison and guiding process.

As we demonstrated above, we can retrieve more information about the stack trace and identify vulnerable execution paths. So, let’s look at Fuzzilli’s basic algorithm, described in their NDSS paper.

In line 12, it is defined that if a new edge is found, the JavaScript code is converted back to its Intermediate Language (IL) (line 13), and the input is added to the corpus for further mutations in line 14.

What can we change to improve the algorithm? Since we have more information about historical vulnerability identification and stack traces, I think that’s a good exercise for the readers.

Conclusion

We demonstrated that we can track the real-time stack trace of a target program by extending Fuzzilli’s instrumentation. By having better visibility into the return address information and its associated source code files, it’s easier to supply the fuzzer with additional paths that can produce interesting results.

Ultimately, this instrumentation technique can be applied to any fuzzer that can take advantage of code coverage capabilities. We intend to use this modified instrumentation output technique in a part 2 blog post at a later date, showing how it can be used to direct the fuzzer to potentially interesting execution paths.

The post Coverage Guided Fuzzing – Extending Instrumentation to Hunt Down Bugs Faster! appeared first on Include Security Research Blog.

Normal view