Normal view

There are new articles available, click to refresh the page.

Before yesterdayBushido Security

The art of Fuzzing: Introduction.

Bushido Security

By: shogun

19 June 2023 at 09:06

1. Introduction

What is fuzzing ?

Wikipedia[1] describes fuzzing as is “In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks.”

In reality fuzzing is simply feeding random inputs to a program in order to detect potential bugs. One of my favorite theorem used to illustrate fuzzing is the Infinite Monkey Theorem[2], which states that “a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type any given text, including the complete works of William Shakespeare.”

In my opinion modern fuzzing really started with AFL, more specifically with the post “Pulling JPEGs out of thin air“[3] from lcamtuf, November 2014. In this post, the author explained that they managed to create valid jpeg file “out of thin air” more precisely, out of a file containing only “Hello”

As they described: after about six hours on an 8-core system, looks very unassuming: it’s a blank grayscale image, 3 pixels wide and 784 pixels tall. But the moment it is discovered, the fuzzer starts using the image as a seed – rapidly producing a wide array of more interesting pics for every new execution path

The goal of this article, is to give the reader the ability to fuzz target of their choice by their own. Using AFL++ or being enough comfortable to write its own script/software to fuzz programs. This course will not dive too deep into the shadow realm of binary exploitation and vulnerability hunting, since it’s not the actual goal. However, it might be the subject of a future article.

Why fuzzing ?

Humans make mistakes, specially when the complexity of a software is growing it becomes almost impossible for a person or even a team to understand all side effects a large and complex code-base can contains. This is why we invented automated testing in order to provide a certain level of quality in software products: Static Analysis tools and Dynamic Analysis tools.

Static Analysis techniques example: Static Code Analysis, Code-Review, Data Flow Analysis, Control Flow Analysis, Dependency Analysis, Code Metrics, Formal Verification

Dynamic analysis techniques example: Unit Testing, Integration Testing, Regression Testing, Fuzzing, Performance Testing, Memory Leak Detection, Profiling, Code-Coverage, Dynamic Slicing

Some fuzzer:

AFL++ – https://github.com/AFLplusplus/AFLplusplus
Hongfuzz – https://github.com/google/honggfuzz
Libafl – https://github.com/AFLplusplus/LibAFL
libfuzzer – https://llvm.org/docs/LibFuzzer.html

Fuzzing is a Dynamic Analysis technique, since it requires our program to actually run in order to test it, and eventually break it. While many solutions offer different approaches to solve the problem of bugs in software, fuzzing appears to be a very unique way to produce unpredictable bugs. Its uniqueness resides in the unconventional approach: Instead of trying to validate code, a fuzzer is trying to feed input just right enough to be accepted by the software, but at the same time also wrong enough to cause a crash or at least an undesired effect.

In this article, we will use AFL++ fuzzer. I made that choice because it is, in my opinion, one of the most reliable, modern, robust and comprehensive fuzzer you can find. The process of fuzzing an application is in itself very simple and can be illustrated as follow

We will dissect every step describe and explain how and what they do so you have a good understand of the fuzzing process in general.

Target Analysis

The strength of fuzzing relies on feeding randomized inputs to a program in order to reach potentially vulnerable functions and make them behave in a non anticipated way. In order to achieve this, a fuzzer need to know three things:

Where it is – It is achieved by using a method called Code-Coverage. It uses program instrumentation to trace the code coverage reached by each input fed
What to feed – In order to randomized input, the fuzzer relies on different techniques. One of the most interesting is called “mutating algorithm” where the fuzzer will combine and mutate different initial input in order to create a new generation of “randomized” inputs
How it ends – Crash coverage or exit condition is check by using different checks and sanitization techniques, like ASAN, MSAN, etc.

Analyzing a target before running a fuzzing campaign is a crucial step that allow the user to optimize the fuzzer and ensure that the part of the program they want to test will be properly reached.

Code-Coverage and Guided fuzzing

A fuzzer can only work if it knows how far it reaches in the code tested in order to adjust its mutation algorithm, or any type of inputs randomization, and ensure that all the program has been tested. In order to achieve that it uses methods called Code-Coverage (or Guided Fuzzing). When a program is compiled it becomes a long series of assembly instruction, those instructions are very often grouped to perform a particular task and can be seen as a so-called “block”.

Wikipedia[4] state that “In compiler construction, a basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit.”

To illustrates this let’s compile a very simple C program on Linux:

#include <stdio.h>

char saying_hello()
{
printf("Hello Fuzzed!\n");
}

int main(int argc, char *argv[])
{
printf("Hello Fuzzer!\n");

for(int i =0; i <9; i++)
{
saying_hello();
}

return 0;
}

After compiling the program and opening it in Ghidra[5] we can see the following:

Here the disassembled code is grouped in “blocks” which represent the “flow” the program will execute. Since we asked the program to do a loop and print a message at each iteration we can see our main block, the condition check for the loop block, the call to the function saying hello and finally the last block is exiting the software. While this example is very trivial, you can imagine that in real world application there is thousands of blocks which represent a complex flow.

In the example above you can see that each block are linked by a an arrow. Those are called Edges or Branches, and as you can guess, they represent all the path a program must follow in order to be executed. The fuzzer goal to know which block as been reached or not is achieved in the next phase below: Instrumentation.

Making a visual representation of this process is called a Control Flow Graph and is very interesting topic, we won’t cover on this course. Refer to Wikipedia[6] for more info

Instrumentation

Since this course focus on AFL++, we will cover how AFL process to instrument source code. However, know that different methods exist and will be mentioned along this course. AFL++ instrumentation is done at compile time, at assembly level. The instrumentation is performed on an assembly text file.

To demonstrate AFL++ instrumentation capabilities, let’s consider a simple C program that prints “Hello World”.

#include <stdio.h>

int main(int argc, char *argv[])
{
prinf("Hello");
return 0;
}

Which compiled with gcc and disassembled in gdb looks like this:

Now let’s see this code snippet compiled and instrumented with afl-gcc and disassembled in gdb:

You can see that some function routine are executed then the function __AFL_MAYBE_LOG is called. It is basically one of the mechanism which allow AFL to achieve code coverage. The assembly block is instrumented in order to know if a certain point has been reached or not and improves randomized inputs based on the result of the most path found with certain type of inputs.

We won’t spend a lot of time describing every internal mechanism of AFL++, since it’s clearly an overkill for this course. However, if you want to deep dive here is a few interesting articles:

Creating a corpus

This task is pretty straight forward, or is it ? It’s obviously better to have a relatively good corpus to start your fuzzing campaign knowing that your fuzzer will to use those inputs to create the mutated one. However, if you had infinite computing power, you wouldn’t care at all about inputs, but since we don’t live in this world, we need to optimize input as much as possible in order to reach the best coverage possible. Google fuzzing team[7] stated: “A guided fuzzing engine considers an input (a.k.a. testcase or corpus unit) interesting if the input results in new code coverage (i.e., if the fuzzer reaches code that has not been reached before)” this corpus is called a Seed Corpus. In order to create an efficient corpus there is a few steps you can follow, as displayed below:

The inital seed corpus is yours to determine. If you are fuzzing a library that transform .png to .pdf maybe it’s worthy to provide .png files as input, but .jpeg might be interesting ? what about .mp3 ? It’s up to you to judge what you feel is really worthy and will reach the highest efficiency. However, AFL will probably mutate any input to an interesting one after a certain period. Providing a good initial corpus makes you gain tremendous time. AFL offers two different interesting features:

afl-cmin [8] – Minimizes the number of inputs provided based on those that reach the most path/are the most profitable for your fuzzing campaign.
afl–tmin [9] – Allows you to optimize the corpus you have provided. It shrinks file the maximum possible without changing the coverage.

The mutating algorithm

Understanding mutation algorithms: Mutation algorithms[10] form the backbone of fuzzing by generating mutated inputs based on existing valid inputs. They introduce controlled modifications to the original data, producing variations that can expose vulnerabilities. The primary objective of mutation algorithms is to diversify the input space while maintaining the essential characteristics of the valid input.

Mutation strategies: There are several mutation strategies employed in fuzzing, each with its own advantages and use cases. Some common mutation strategies include:

Bit Flipping: This strategy involves randomly flipping bits within an input file, altering its contents at the binary level. By modifying individual bits, bit flipping can uncover vulnerabilities caused by unexpected interactions between program components.
Byte Flipping: Similar to bit flipping, byte flipping focuses on altering the content of input files at the byte level. By changing the values of individual bytes, this strategy can target specific areas of interest within the input data structure.
Arithmetic Mutation: This strategy aims to modify numerical values within the input data. It includes operations such as incrementing, decrementing, adding, subtracting, or multiplying numeric values to explore boundary conditions and exceptional scenarios.
Interesting value: The fuzzer has a list of known “interesting” 8-, 16-, and 32-bit values to try. The stepover is 8 bits.
Dictionary entries: Deterministic injection of dictionary terms. This can be shown as “user” or “auto”, depending on whether the fuzzer is using a user-supplied dictionary (-x) or an auto-created one. You will also see “over” or “insert”, depending on whether the dictionary words overwrite existing data or are inserted by offsetting the remaining data to accommodate their length.
Splicing: A last-resort strategy that kicks in after the first full queue cycle with no new paths. It is equivalent to ‘havoc’, except that it first splices together two random inputs from the queue at some arbitrarily selected midpoint.

Sanitizer | Checks

ASAN = Address SANitizer, finds memory corruption vulnerabilities like use-after-free, NULL pointer dereference, buffer overruns, etc. Enabled with export AFL_USE_ASAN=1 before compiling.
MSAN = Memory SANitizer, finds read accesses to uninitialized memory, e.g., a local variable that is defined and read before it is even set. Enabled with export AFL_USE_MSAN=1 before compiling.
UBSAN = Undefined Behavior SANitizer, finds instances where – by the C and C++ standards – undefined behavior happens, e.g., adding two signed integers where the result is larger than what a signed integer can hold. Enabled with export AFL_USE_UBSAN=1 before compiling.
CFISAN = Control Flow Integrity SANitizer, finds instances where the control flow is found to be illegal. Originally this was rather to prevent return oriented programming (ROP) exploit chains from functioning. In fuzzing, this is mostly reduced to detecting type confusion vulnerabilities – which is, however, one of the most important and dangerous C++ memory corruption classes! Enabled with export AFL_USE_CFISAN=1 before compiling.
TSAN = Thread SANitizer, finds thread race conditions. Enabled with export AFL_USE_TSAN=1 before compiling.
LSAN = Leak SANitizer, finds memory leaks in a program. This is not really a security issue, but for developers this can be very valuable. Note that unlike the other sanitizers above this needs __AFL_LEAK_CHECK(); added to all areas of the target source code where you find a leak check necessary! Enabled with export AFL_USE_LSAN=1 before compiling. To ignore the memory-leaking check for certain allocations, __AFL_LSAN_OFF(); can be used before memory is allocated, and __AFL_LSAN_ON(); afterwards. Memory allocated between these two macros will not be checked for memory leaks.

AFL++ sanitizer can be useful for identifying and preventing memory errors and vulnerabilities in software through dynamic analysis, improving overall code security. However, the use of sanitizers tends to slow down fuzzing and increase memory consumption because they add additional runtime checks and instrumentation, leading to increased overhead and resource usage.

White-box | Grey-box | Black-box

For now, we only talked about a certain way of fuzzing a program: compiling & instrumenting a source code based program and fuzzing it. While it’s one way of doing it, called white-box, there is different possibilities for one to run a fuzzing campaign against a target. Let’s briefly summarized those options:

White-box: It consists in compiling and instrumenting source code that we have access too, using a tool such as AFL++ or libfuzzer to instrument the binary and then run a campaign.
Grey-box: It’s a mix between white and black box fuzzing, basically let say that you access to the source code of an API from a program, but not the program itself. You would then use the API (instrumented, or not) in order to run your fuzzing campaign.
Black-box: Probably one of the most difficult one (or is it?) but very interesting since it requires more setup, more research, and thus gives an edge to the researcher that is willing to put the efforts in order to actually fuzz the target.

2. Introduction to AFL++

Setting-up the lab

Many possibilities are offered to one that wants to run a fuzzing campaign, whether you own a very good computer, a VPS or even a super-computer (why not?). Since i don’t own any of those, like many, i chose to use the Google Cloud Compute Engine to create a VM. And you can even try it yourself for free since Google is actually offering 300$ for 3 month without any kind of engagement.

Reach https://console.cloud.google.com and provide your credit/card information. After providing those information you will have to activate the Compute Engine API

Once the API is enabled, you’ll be able to create a VM following those steps:

Chose the region that suits you.
Choose the distribution you want (i recommend Ubuntu LTS 22.04 since it posses almost every dependencies needed)
Choose the disk size (100GO is good)
Choose the performance. This is a tricky part since the free trial won’t let you chose whatever you want. I think most of the country can chose “E3 – 8 vCPU – 32 go RAM”

Compiling and Setting AFL++

Download and extract the last release: https://github.com/AFLplusplus/AFLplusplus/releases

Install the following dependencies

sudo apt-get update
sudo apt-get install -y build-essential python3-dev automake cmake git flex bison libglib2.0-dev libpixman-1-dev python3-setuptools python3-pip
sudo apt-get install -y gcc-$(gcc --version|head -n1|sed 's/\..*//'|sed 's/.* //')-plugin-dev libstdc++-$(gcc --version|head -n1|sed 's/\..*//'|sed 's/.* //')-dev
sudo apt-get install -y ninja-build # for QEMU mode
pip install unicorn

# Install a specific version of LLVM:
wget https://apt.llvm.org/llvm.sh
chmod +x llvm.sh
sudo ./llvm.sh 16 # <version number>

# cd AFLplusplus folder and run the following commands
export LLVM_CONFIG=llvm-config-16
# Adding var in sudo
sudo visudo
# Append the following line to the file: 
Defaults env_keep += "LLVM_CONFIG=llvm-config-16"

# Download AFL
git clone https://github.com/AFLplusplus/AFLplusplus
cd AFLplusplus
make
sudo make install

Instrumenting with AFL++

Selecting the best AFL++ compiler for instrumenting the target: AFL++ comes with a central compiler afl-cc that incorporates various different kinds of compiler targets and instrumentation options. The following evaluation flow will help you to select the best possible.

+--------------------------------+
| clang/clang++ 11+ is available | --> use LTO mode (afl-clang-lto/afl-clang-lto++)
+--------------------------------+     see [instrumentation/README.lto.md](instrumentation/README.lto.md)
    |
    | if not, or if the target fails with LTO afl-clang-lto/++
    |
    v
+---------------------------------+
| clang/clang++ 3.8+ is available | --> use LLVM mode (afl-clang-fast/afl-clang-fast++)
+---------------------------------+     see [instrumentation/README.llvm.md](instrumentation/README.llvm.md)
    |
    | if not, or if the target fails with LLVM afl-clang-fast/++
    |
    v
 +--------------------------------+
 | gcc 5+ is available            | -> use GCC_PLUGIN mode (afl-gcc-fast/afl-g++-fast)
 +--------------------------------+    see [instrumentation/README.gcc_plugin.md](instrumentation/README.gcc_plugin.md) and
                                       [instrumentation/README.instrument_list.md](instrumentation/README.instrument_list.md)
    |
    | if not, or if you do not have a gcc with plugin support
    |
    v
   use GCC mode (afl-gcc/afl-g++) (or afl-clang/afl-clang++ for clang)

Clickable README links for the chosen compiler:

LTO mode – afl–clang-lto
LLVM mode – afl-clang-fast
GCC_PLUGIN mode – afl-gcc-fast
GCC/CLANG modes (afl-gcc/afl-clang) have no README as they have no own features

Parallelism

Since we may have a multiple CPU to run our fuzzer, we can parallelized
the AFL fuzzer to share the same results folder. This will allow AFL
to know what kind of mutation has been tested.

afl-fuzz [inputs folder] [outputs folder] [Main] [instrumented binaries]
afl-fuzz [inputs folder] [outputs folder] [Secondary] [instrumented binaries]

$ afl-fuzz -i inputs/ -o ouputs/ -M Main ./software @@
$ afl-fuzz -i inputs/ -o ouputs/ -S Secondary1 ./software @@

Here is a small bash script i have written that allows you to run a fuzzing campaign like this:
./script.sh [Number of CPU] [input folder] [output folder] [target binary] [options of the binary]

#!/bin/bash

# Display help menu
display_help() {
    echo "Usage: $0 <number_of_cpu> <input_folder> <output_folder> [<options>] <binary_name> <command>"
    echo "    number_of_cpu   : Number of CPUs"
    echo "    input_folder    : Input folder"
    echo "    output_folder   : Output folder"
    echo "    options         : Options (optional)"
    echo "    binary_name     : Name of the binary"
    echo "    command         : Command"
    echo ""
    echo "Example: $0 4 input output \"\" mybinary mycommand"
    exit 1
}

# Check if the number of arguments is correct
if [[ $# -lt 5 || $# -gt 6 ]]; then
    display_help
fi

# Parse command line arguments
number_of_cpu=$1
input_folder=$2
output_folder=$3
options=""
binary_name=$4
command=$5

# Check if options are provided
if [[ $# -eq 6 ]]; then
    options=$4
    binary_name=$5
    command=$6
fi

# Run the Main command
screen -dmS Main bash -c "afl-fuzz -i $input_folder -o $output_folder -M Main ./$binary_name $command @@"


# Run Secondary commands for each CPU
for ((cpu=1; cpu<=number_of_cpu; cpu++))
do
  screen -dmS "Secondary$cpu" bash -c "afl-fuzz -i $input_folder -o $output_folder -S Secondary$cpu ./$binary_name $command @@"
done

Persistent mode

AFL++ is basically running your program with random inputs and every run come with a cost in CPU usage, however, is it really useful to reload ALL the program to test each inputs? Certainly not. This is why a persistent mode has been created and originally published in this article: https://lcamtuf.blogspot.com/2015/06/new-in-afl-persistent-mode.html

TL;DR: Instead of running a new instance each time a new input is generated, the fuzzer feeds test cases to a separate, long-lived process that reads the input data, passes it to the instrumented API, notifies the parent about successful run by stopping its own execution; eventually, when resumed by the parent, the process simply loops back to the start.

Let’s run a simple classic fuzzer against this very simple program that takes a file as argument and read its content:

#include <stdio.h>
#include <stdlib.h>

#define MAX_BUFFER_SIZE 1024

int main(int argc, char *argv[])
{
    FILE *file;
    char buffer[MAX_BUFFER_SIZE];
    const char *filename = argv[1];

    // Open the file in read mode
    file = fopen(filename, "r");
    if (file == NULL) {
        printf("Unable to open the file.\n");
        return 1;
    }

    // Read the contents of the file into the buffer
    size_t bytesRead = fread(buffer, sizeof(char), MAX_BUFFER_SIZE, file);

    // Close the file
    fclose(file);

    // Check if the file was read successfully
    if (bytesRead == 0) {
        printf("Failed to read the file.\n");
        return 1;
    }

    printf("File contents:\n%s\n", buffer);
    return 0;
}

Compiling it with AFL++ LTO and starting to fuzz it:

# Compiling with LTO
afl-clang-lto main.c -o reader

# Running fuzzer:
afl-fuzz -i in -o out ./reader @@

Here we can see the fuzzing campaign.

It’s quite fast since the program is almost doing nothing, and no finds are found because well there is nothing to really explore. However, let’s implement the persistent mode as describe in the AFL documentation:

#include <stdio.h>
#include <stdlib.h>

#define MAX_BUFFER_SIZE 1024

// Adding AFL_FUZZ_INIT
__AFL_FUZZ_INIT();

/* To ensure checks are not optimized out it is recommended to disable
   code optimization for the fuzzer harness main() */
#pragma clang optimize off
#pragma GCC            optimize("O0")

int main(int argc, char *argv[])
{

    // Adding AFL_INIT and buf;
    __AFL_INIT();
   unsigned char *buf = __AFL_FUZZ_TESTCASE_BUF;

    // Starting AFL_LOOP
while (__AFL_LOOP(100000)) {

    FILE *file;
    char buffer[MAX_BUFFER_SIZE];
    const char *filename = buf; // using buf instead of argv[1]

    // Open the file in read mode
    file = fopen(filename, "r");
    if (file == NULL) {
        printf("Unable to open the file.\n");
        return 1;
    }

    // Read the contents of the file into the buffer
    size_t bytesRead = fread(buffer, sizeof(char), MAX_BUFFER_SIZE, file);

    // Close the file
    fclose(file);

    // Check if the file was read successfully
    if (bytesRead == 0) {
        printf("Failed to read the file.\n");
        return 1;
    }
    printf("File contents:\n%s\n", buffer);
    } // Closing the loop;
    return 0;
}

Let’s compile it, run the fuzzer and see the speed now:

We went from 7500 exec/sec to 10.200 exec/sec. This is a very good improvement in speed, knowing that our initial program was very fast the difference isn’t that big, however if you fuzz a real target you can gain from 1x to 20x time speed, which parallelized and run for a long period is an gigantic difference.

AFL doc state that: “Basically, if you do not fuzz a target in persistent mode, then you are just doing it for a hobby and not professionally “

Persistent mode is described in detail here: https://github.com/AFLplusplus/AFLplusplus/blob/stable/instrumentation/README.persistent_mode.md

Partial instrumentation

When building and testing complex programs where only a part of the program is the fuzzing target, it often helps to only instrument the necessary parts of the program, leaving the rest uninstrumented. This helps to focus the fuzzer on the important parts of the program, avoiding undesired noise and disturbance by uninteresting code being exercised.

TL;DR: You just need to specify the part of the code you want to be covered by the fuzzer. Pros is that you will gain from slight to tremendous amount of compute time. Cons is that, since you select manually the “zone” you want to fuzz, you can miss opportunity.

Usage is quite straight forward:

__AFL_COVERAGE(); // <- required for this feature to work.

__AFL_COVERAGE_ON(); – Enable coverage from this point onwards.

__AFL_COVERAGE_OFF(); – Disable coverage from this point onwards.

Process is detailed here: https://github.com/AFLplusplus/AFLplusplus/blob/stable/instrumentation/README.instrument_list.md

3. Pratice – White box

In reality, there is a numerous amount of options and details you can set to start a fuzzing campaign with AFL. Please refer to the diagram below and the suggested workflow from AFL++
More info – https://github.com/AFLplusplus/AFLplusplus/tree/stable/docs#readme

Chose a target

This is one of the question i have been asked a lot, and asked myself a lot too. It really depends on what you are aiming to do, if money interest you then you should go on different bug bounty programs and look at which target might suits you, otherwise if you look for reputation/CVE you should check for largely open-source target.

For this course, we are going to reproduce a vulnerability found in VIM by the Researcher Dhiraj Mishra CVE-2019-20079 It’s a vulnerability that has been discovered by fuzzing with AFL++ and there is even an exploit for it. His blog post: https://www.inputzero.io/2020/03/fuzzing-vim.html

Analyse a target and create a corpus

First step when you chose a target is to understand what the program does, how it works basically and what kind of inputs it takes. VIM is an extremely famous a text editor largely used in the dev community and known for not being possible to exit without very advanced knowledge. Find more info about vim here: https://www.vim.org/about.php

Since it’s a software designed to edit file, it’s it is a quite straightforward process to fuzz it. We just need to compile and instrument the target, then produce a corpus and run the fuzzer with the proper settings against it.

You can download a vulnerable version of git following those steps:

# You can obtain Vim for the first time with:
git clone https://github.com/vim/vim.git

# CD into the Vim directory
cd vim

# Rolling back to a vulnerable version
git checkout v8.1.2122

Analyze previous crash

In the crash found by the researcher we can see the reason/type of crash provoked by the file as well as its content. Now the most interesting part is to conduct a root cause analysis in order to determine why this crash happened and what function is vulnerable/buggy in the target. Firstval, since Vim has patched the problem, let’s look at the commit:

Here is the commit change on the window.c file from VIM to fix the bug. We can see that they removed the line if(bt_terminal(wp->w_buffer)) to replace it by if(bt_terminal(curwin->w_buffer)) which fix the Use-After-Free vulnerability

Now let’s give a check at the crash_file provided by the researcher, feed as input for our Vim:

You can see that the problem reside into buffer.c line 5307 which resulted in a segmentation fault.

Instrumenting the target

AFL++ 4.07 offers different instrumentation option and sanitizer. Here they are resumed below

However, during this example we are going to use afl-clang-fast and afl-clang-fast++ to instrument the binary since this is what the researcher initially used to find the bug.

Download vim, extract it, cd into the vim folder then run the command:

# Configure VIM to be compiled with AFL options
CC=afl-clang-fast CXX=afl-clang-fast++ ./configure --with-features=huge --enable-gui=none

# Compiling
make -j4

Prepare campaign

During his test the researcher used a dictionary: regexp.dict. Custom dictionaries can be added at will. They should consist of a reasonably-sized set of rudimentary syntax units that the fuzzer will then try to clobber together in various ways. Snippets between 2 and 16 bytes are usually
the sweet spot. Read more: https://github.com/google/AFL/blob/master/dictionaries/README.dictionaries

While being very useful and interesting, the LTO mode (newest version of AFL++) with LLVM is automatically building dictionary from the inputs provided. Thus we won’t deep dive into creating dictionary ourself.

# cd into the src folder
cd src/

# Creating corpus folder and 2 basic corpus file
mkdir corpus output
echo "a*b\+\|[0-9]\|\d{1,9}" > corpus/1 ; echo "^\d{1,10}$" > corpus/2

# Adding regex dictionnary
wget https://raw.githubusercontent.com/vanhauser-thc/AFLplusplus/master/dictionaries/regexp.dict

Fuzzing the target

# While fuzzing, fuzz it on ram file system to avoid making too much I/O something like: 
sudo mount -t tmpfs -o size=6g tmpfs /home/afl-fuzz-user/afl-fuzz.

# Running the fuzzing campaign 
afl-fuzz -m none -i corpus -o output ./vim -u NONE -X -Z -e -s -S @@ -c ':qa!'

The above options used –u NONE and -X is to speed up vim startup. Options -e -s are used to make vim silent and to avoid ‘MORE’ prompt which could block VIM, the option -Z disables the external commands.

Now it’s time to be patient. The fuzzing process might vary drastically from one system to another, if you have a very powerful setup you might reach the bug quite easily otherwise it can takes some times. On a Google Cloud Compute E3 8vCore 32go RAM it takes a few hours.

What is happening there? Well it’s pretty simple here is what those field correspond to:

1.) Process timing: This section is fairly self-explanatory: it tells you how long the fuzzer has been running and how much time has elapsed since its most recent finds. This is broken down into “paths” (a shorthand for test cases that trigger new execution patterns), crashes, and hangs.
2.) Overall Results: The first field in this section gives you the count of queue passes done so far – that is, the number of times the fuzzer went over all the interesting test cases discovered so far, fuzzed them, and looped back to the very beginning.
3.) Cycle Progress: This box tells you how far along the fuzzer is with the current queue cycle: it shows the ID of the test case it is currently working on, plus the number of inputs it decided to ditch because they were persistently timing out.
4.) Map coverage: The section provides some trivia about the coverage observed by the instrumentation embedded in the target binary.
5.) Stage Progress: This part gives you an in-depth peek at what the fuzzer is actually doing right now. It tells you about the current stage
6.) Finding in depth: This gives you several metrics that are of interest mostly to complete nerds. The section includes the number of paths that the fuzzer likes the most based on a minimization algorithm baked into the code.
7.) Fuzzing Strategy Yield: This is just another nerd-targeted section keeping track of how many paths we have netted, in proportion to the number of execs attempted, for each of the fuzzing strategies discussed earlier on.
8.) Path Geometry: The first field in this section tracks the path depth reached through the guided fuzzing process. In essence: the initial test cases supplied by the user are considered “level 1”. The test cases that can be derived from that through traditional fuzzing are considered “level 2”; the ones derived by using these as inputs to subsequent fuzzing rounds are “level 3”; and so forth.

You can find more information about it here: https://afl-1.readthedocs.io/en/latest/user_guide.html#status-screen

What is interesting when you run a campaign on a target that contains a known CVE is that you can test if yes or not the vulnerability is present. In order to do that i simply download the file the researcher has provided in his blog https://dl.packetstormsecurity.net/1910-exploits/vim812135-useafterfree.tgz then i use the crash test file against my instrumented ./vim and check the result:

Now that we have confirm our program is actually crashing as intended, let’s run the campaign and put in pratice what we learned before and run this script in parallel. Since my machine has 8 cpu i can easily run 8 fuzzer (math is mathing) so i wrote this very simple script for you to run this tasks in parallel:

screen -dmS Main bash -c "afl-fuzz -m none -i corpus -o output -M Main  ./vim -u NONE -X -Z -e -s -S @@ -c ':qa!'";
screen -dmS Secondary1 bash -c "afl-fuzz -m none -i corpus -o output -S Secondary1 ./vim -u NONE -X -Z -e -s -S @@ -c ':qa!'";
screen -dmS Secondary2 bash -c "afl-fuzz -m none -i corpus -o output -S Secondary2 ./vim -u NONE -X -Z -e -s -S @@ -c ':qa!'";
screen -dmS Secondary3 bash -c "afl-fuzz -m none -i corpus -o output -S Secondary3 ./vim -u NONE -X -Z -e -s -S @@ -c ':qa!'";
screen -dmS Secondary4 bash -c "afl-fuzz -m none -i corpus -o output -S Secondary4 ./vim -u NONE -X -Z -e -s -S @@ -c ':qa!'";
screen -dmS Secondary5 bash -c "afl-fuzz -m none -i corpus -o output -S Secondary5 ./vim -u NONE -X -Z -e -s -S @@ -c ':qa!'";
screen -dmS Secondary6 bash -c "afl-fuzz -m none -i corpus -o output -S Secondary6 ./vim -u NONE -X -Z -e -s -S @@ -c ':qa!'";
screen -dmS Secondary7 bash -c "afl-fuzz -m none -i corpus -o output -S Secondary7 ./vim -u NONE -X -Z -e -s -S @@ -c ':qa!'";

To run the script it’s really simple you can just do:

# Giving permission to the script 
chmod +x script.sh
./script.sh

AFL++ is including a feature “afl-whatsup” which you specify the shared output folder used by your fuzzer and display results.

# Display the results in the shared output folder
afl-whatsup output/

This will display a summary stat like that:

Analyze result of our campaign

After letting the fuzzer run i came back to the good news:

This means that the fuzzer has caught a unique crash in the execution of our program. AFL++ save the crash-test file in the output folder you have chose before starting the fuzzing campaign. Let’s analyse this crash test file:

Each of those entries represent a file that crashes this particular Vim version. If you look at it you can see that the second field is “sig:11“, in Linux the signal is a technique used to tell if the program exited correctly: 0 means OK, anything else is pretty much bad. 11 Means segmentation fault, which is quite bad. More info about Linux signals[11] here: https://www-uxsup.csx.cam.ac.uk/courses/moved.Building/signals.pdf

Let’s run vim inside gdb to analyze the crash:

 # Running GDB
 $ gdb --args ./vim -u NONE -X -Z -e -s -S output/Secondary1/crashes/id:000000,sig:11,src:014903,time:6786383,execs:1343354,op:havoc,rep:4

We can see below that GDB is returning “Segmentation fault” and the source of this issue seems to be ex_cmds.c line 263

That’s very interesting because it is not at all what the researcher found in the first place, did we find another vulnerability ? Let’s check it against the latest vim version:

# Downloading and compiling vim
git clone https://github.com/vim/vim.git
cd vim
CC=afl-clang-fast CXX=afl-clang-fast++ ./configure --with-features=huge --enable-gui=none
make -j4

# Testing the crash
./vim -u NONE -X -Z -e -s -S ../../../crash_1

Unfortunately nothing happen, the bug doesn’t seem to be present in latest VIM version. However, in previous crash found by the research, it was a totally different bug that then one we just just found. Let’s try to run it inside gdb:

We can see that latest version of vim doesn’t crash with this sample anymore. However, it appears to be a totally different bug than the previously found one which is quite.. encouraging, no?

What now ?

Time for you to have some fun and run a fuzzing campaign like a professional with latest features of AFL++ which mean using LTO, persistent mode, partial instrumentation, optimized corpus and parallelism

Compiling with LTO

# Inside vim folder 
CC=afl-clang-lto CXX=afl-clang-lto++ RANLIB=llvm-ranlib-16 AR=llvm-ar-16 AS=llvm-as-16 LD=afl-ld-lto ./configure --with-features=huge --enable-gui=none
make -j4

You can use built-in feature of afl to triage crash as follow:

afl-fuzz -C -i output/crashes/ -o triage_output ./vulnerabile_binary [OPTIONS]

References

[1] Wikipedia article – https://en.wikipedia.org/wiki/Fuzzing
[2] Infinite Monkey Theorem – https://en.wikipedia.org/wiki/Infinite_monkey_theorem
[3] Pulling jpeg out of thin air – https://lcamtuf.blogspot.com/2014/11/pulling-jpegs-out-of-thin-air.html
[4] A Basic Block – https://en.wikipedia.org/wiki/Basic_block
[5] Ghidra – https://ghidra-sre.org/
[6] Control Flow Graph – https://en.wikipedia.org/wiki/Control-flow_graph
[7] Google Fuzzing – https://chromium.googlesource.com/chromium/src/+/main/testing/libfuzzer/efficient_fuzzing.md
[8] cmin – https://www.mankier.com/8/afl-cmin
[9] tmin – https://www.mankier.com/8/afl-tmin
[10] The fuzzing book – https://www.fuzzingbook.org/
[11] Linux – Signals – https://www-uxsup.csx.cam.ac.uk/courses/moved.Building/signals.pdf

L’article The art of Fuzzing: Introduction. est apparu en premier sur Bushido Security.

Bushido Security
The art of fuzzing: Windows Binariesshogun
25 June 2023 at 17:42

The art of fuzzing: Windows Binaries

Bushido Security

By: shogun

25 June 2023 at 17:42

Author: 2ourc3

Introduction

Today we are gonna dive into grey-box fuzzing by testing closed source Windows binaries. This type of fuzzing allows one to fuzz a target without having access to its source code. Why doing such type of fuzzing? Since it requires more setup and advanced skills, less people are prone to look for vulnerabilities / being able to find them. Thus it enlarge the possibilities for you, vulnerability researcher, to find new undiscovered vulns.

To achieve this we will need to overcome several challenges:

Instrumenting the code.
Find a relevant function to fuzz.
Modifying/patching the binary to make it fuzzable.

There is plenty of solutions available to run fuzzing campaign on Windows binaries, however we gonna solely focus on WinAFL in this chapter. WinAFL offers three type of instrumentation:

Dynamic instrumentation via DynamoRIO – Dynamic instrumentation is modifying the instruction of a program while the program is running.
Static instrumentation via Syzygy – Static instrumentation is modifying the instruction of a program at compilation time.
Hardware Tracing via Intel PTrace – Hardware feature that asynchronously records program control flow.

While each method offer their own pros and cons, we will focus today on using Dynamic Instrumentation via DynamoRIO. See below a description of the workflow WinAFL + DynamoRIO will execute while fuzzing your target binary.

Compiling WinAFL

If you are building with DynamoRIO support, download and build DynamoRIO sources or download DynamoRIO Windows binary package from https://github.com/DynamoRIO/dynamorio/releases
If you are building with Intel PT support, pull third party dependencies by running git submodule update --init --recursive from the WinAFL source directory
Open Visual Studio Command Prompt (or Visual Studio x64 Win64 Command Prompt if you want a 64-bit build). Note that you need a 64-bit winafl.dll build if you are fuzzing 64-bit targets and vice versa.
Go to the directory containing the source
Type the following commands. Modify the -DDynamoRIO_DIR flag to point to the location of your DynamoRIO cmake files (either full path or relative to the source directory).

For a 32-bit build:

mkdir build32
cd build32
cmake -G"Visual Studio 16 2019" -A Win32 .. -DDynamoRIO_DIR=..\path\to\DynamoRIO\cmake -DINTELPT=1
cmake --build . --config Release

For a 64-bit build:

mkdir build64
cd build64
cmake -G"Visual Studio 16 2019" -A x64 .. -DDynamoRIO_DIR=..\path\to\DynamoRIO\cmake -DINTELPT=1
cmake --build . --config Release

Build configuration options
The following cmake configuration options are supported:

-DDynamoRIO_DIR=..\path\to\DynamoRIO\cmake – Needed to build the winafl.dll DynamoRIO client
-DINTELPT=1 – Enable Intel PT mode. For more information see https://github.com/googleprojectzero/winafl/blob/master/readme_pt.md
-DUSE_COLOR=1 – color support (Windows 10 Anniversary edition or higher)
-DUSE_DRSYMS=1 – Drsyms support (use symbols when available to obtain -target_offset from -target_method). Enabling this has been known to cause issues on Windows 10 v1809, though there are workarounds, see #145

Find a target

Finding the right target to fuzz isn’t always easy. It’s all about finding a software complex enough to be worthy being tested but accessible enough for you to understand what to fuzz and which features is interesting.

One good strategy is to target software that are known to contains vulnerability and are reactive in a disclosure program, a good way to find such is to look on the website of Zero Day Initiative.

In this section there is previously disclosed bug which can give you a good broad view of what programs are tested and their responsiveness. Here we see that vulnerability were disclosed for Netgear and D-link product, there is tons of previously disclosed vulnerabilities on this website, up to you to search through it and find the target that interest you the most.

Since fuzzing a complex target required some advanced skills such as Reverse Engineering, understand large code-base etc, we will focus on a Binary Target i specially created for the purpose of this course. It is a vulnerable file reader, it takes a file as entry, copy its content in a buffer, and close the file.

You can download the file here, password: “bushido” https://drive.google.com/file/d/1c-cOuzYbC-gOFW91a2EHKNpZTiPrVdBP/view?usp=sharing

Patching binary to allow fuzzing

Unfortunately numerous software uses some kind of dialog box control flow where user is prompted to answer question before executing a certain task like “This file already exist. Do you want to overwrite it ?” etc.

This make the fuzzing process impossible since it will require the user to interact with dialog box, which will prevent the fuzzer to run normally. This is why we are now looking on how to improve/patch a binary in order to make it fuzzable!

Download and install Ghidra, start the application then create a project directory and project. Import vulnerable_reader.exe click on “Options..” and enable “Load Local Libraries From Disk”

After loading the libraries you can start the process of reversing by pressing enter or double click the file name, it will prompt a dialog box”Analyze” which you can configure.

For this exercise, no need to change it, however, i invite you to explore the options available and their capacities. After clicking OK you’ll see the disassembly code of the binary display, you’ll need to wait a bit that Ghidra analyzes the entire binary, you can find the progress bar at the bottom-right of the screen:

If you save your program after the analysis, you wont need to analyze it again in the future. This binary is quite small, but keep in mind it wont always been the case. I encourage you to save the analyzed program as a copy just after the analysis is performed.

Now that analysis is performed we can see through the software. Investigate this binary is gonna be quite easy since we already know one string used in the dialog box, let’s open Search > Program text then enter “You clicked Yes!” in “fields” enable “all fields” and in block enable “all blocks” then click “Search All”and double click the first finding in the results.

We can see there is two options possible, either the function allows you to select yes and close or no and close. There is no real purpose this function, however, it prevent the program to continue its flow before you click and consequently prevent you to fuzz it.

One interesting information to look at are the XREFS, which correspond at the emplacement where this function (FUN_00401000) is called. Here we can see that the function is called by FUN_00401130, let’s double click and see what this function is.

It seems that this function is basically our main function. It takes two parameters as arguments and pass it to the second function. The first function is the one responsible for the dialog box.

Let’s replace the instruction “CALL FUN_00401000” by a NOP instruction

As you can see, there is now a bunch of “??” following our instruction. It’s because the initial instruction was larger than the NOP instruction (in hex: 90) so we need to replace the “??” by NOP instructions too to respect the padding. More info https://en.wikipedia.org/wiki/Data_structure_alignment

The result must look like this:

Now export the program as PE file, click File > Export Program then select Original File and put the right path :

Let’s run the program and see if the dialog box happens again:

Bravo! Keep in mind that most programs have way more complex interactions required, and this course isn’t about Reverse Engineering. However a big aspect of running successful fuzzing campaign consist in removing what makes the fuzzer slower, and GUI is a big part of that. You should definitely have some interest in RE if you want to pursue research in fuzzing.

Function offset

WinAFL uses a technique to optimize the fuzzing process by mitigating the slow execution time associated with the exec syscall and the typical process startup procedure. Instead of re-initializing the target program for every fuzzing attempt, it employs a strategy inspired by the concept of a fork server.

The basic idea is to execute the program until reaching the desired fuzzing point by supplying randomized inputs. By employing this approach, each subprocess handles only a single input, effectively circumventing the overhead associated with the exact syscall operation.

As a result, when fuzzing a program with WinAFL, if the desired fuzzing point is reached during the third call, for example, the performance remains unaffected. However, the significant advantage lies in reducing the overhead of fuzzing throughout the entire program, leading to more efficient and effective fuzzing sessions.

Here is a diagram that illustrate this process.

How to select a target function

The target function should do these things during its lifetime:

Open the input file. This needs to happen within the target function so that you can read a new input file for each iteration as the input file is rewritten between target function runs.
Parse it (so that you can measure coverage of file parsing)
Close the input file. This is important because if the input file is not closed WinAFL won’t be able to rewrite it.
Return normally (So that WinAFL can “catch” this return and redirect execution. “returning” via ExitProcess() and such won’t work)

How to find the virtual offset of the function

Static analysis with tools like Ghidra and radare2
Debugging the code with WinDBG or x64dbg (Setting up breakpoints and analyzing the parameters of functions at runtime)
Use auxiliary tools like API monitors, process monitors, and coverage tools like ProcMon

Find offset via Static Analysis with Ghidra

The binary contains some strings, one of them is “Failed to open file”, let’s click the Search menu then click “Program Text” and look for this sentence:

Let’s click search all and examine the result:

Let’s double click the first occurrence in the Namespace FUN_00401060

Remember that the execution flow we are looking for is: Open file > Read it > Close the File > return to normal execution. Let’s investigate if this flow happens in the pseudo code of the function. Simplified it give us:

void __cdecl FUN_00401060(int argc, int argv)
{
  uint openResult;
  uint readResult;
  WCHAR fileContentBuffer[6]; // Buffer to store file content
  uint localVariable;

  localVariable = DAT_0041c040 ^ (uint)&stack0xfffffffc;

  if (argc < 2) {
    FUN_00401130((int)s_Usage:_%s_<filename>_0041c000); // Print usage message
  }
  else {
    openResult = FID_conflict:__open(*(char **)(argv + 4), 0x8000); // Open file specified in argv[1]
  
    if ((int)openResult < 0) {
      FUN_00401130((int)s_Failed_to_open_file:_%s_0041c018); // Print error message if file opening fails
    }
    else {
      while (readResult = FUN_00406348(openResult, fileContentBuffer, 10), 0 < (int)readResult) {
        FUN_00401130((int)&DAT_0041c034); // Print file contents
      }
      FUN_00407b70(openResult); // Close the file
    }
  }
  FUN_0040116a(localVariable ^ (uint)&stack0xfffffffc); // Some additional function call
  return;
}

Sound like a match! Now let’s find the offset of this function. It’s pretty straight forward, let’s right-click on the function and show byte. We see the address of the function is 0x00401060 and the base address is 0x0040000 so the function offset is 0x01060

Ghidra CheatSheet: https://ghidra-sre.org/CheatSheet.html

Prepare environment for fuzzing

Fuzzing binary is a quite resource-demanding tasks, here is a few things you can do to prepare your environment to run a fuzzing campaign smoothly:

Disabling automatic debugging
Disabling AV scanning

Optimization

Having a nice corpus of inputs is a very important aspect of fuzzing. WinAFL offers two options to optimize your corpus with c-min.py. Examples of use:

Typical use
winafl-cmin.py -D D:\DRIO\bin32 -t 100000 -i in -o minset -covtype edge -coverage_module m.dll -target_module test.exe -target_method fuzz -nargs 2 — test.exe @@
Dry-run, keep crashes only with 4 workers with a working directory:
winafl-cmin.py -C –dry-run -w 4 –working-dir D:\dir -D D:\DRIO\bin32 -t 10000 -i in -i C:\fuzz\in -o out_mini -covtype edge -coverage_module m.dll -target_module test.exe -target_method fuzz -nargs 2 — test.exe @@
Read from specific file
winafl-cmin.py -D D:\DRIO\bin32 -t 100000 -i in -o minset -f foo.ext -covtype edge -coverage_module m.dll -target_module test.exe -target_method fuzz -nargs 2 — test.exe @@
Read from specific file with pattern
winafl-cmin.py -D D:\DRIO\bin32 -t 100000 -i in -o minset -f prefix-@@-foo.ext -covtype edge -coverage_module m.dll -target_module test.exe -target_method fuzz -nargs 2 — test.exe @@
Typical use with static instrumentation
winafl-cmin.py -Y -t 100000 -i in -o minset — test.exe @@

winafl-cmin.py can take a while to run, so be patient.

Running a campaign

We have patched the binary to make it fuzzable, found the offset of the function we want to test, now let’s have fun and run the fuzzer! WinAFL offers different options, let’s enumerate them:

t – Timeout per fuzzing iteration. If not completed WinAFL restart the program;
D – DynamoRIO path
coverage_module – Module(s) that records coverage.
target_module – Module of the target function.
target_offset – Virtual offset of the function to be fuzzed from the start of the module;
fuzz_iterations – Fuzzing iterations before restarting the exec of the program.
call_convention – Specifying the calling convetion: sdtcall, cdecl, and thiscall.
nargs – number of arguments the fuzzed function takes. The this pointer (used in the thiscall calling convention) is also considered an argument.

WARNING: We build 2 WinAFL right? Remember, use the correct version of AFL for the target you are looking to fuzz! Here we are going to use the 32 bits version!

Since our binary is meant to open and read from a text file, create a “in” folder and put a text file with a simple phrase as content.

Ok now let’s cd into WinAFL_32 build directory and run the following command:

afl-fuzz.exe -i in -o out -t 10000 -D C:\WinAFL\DynamoRIO\bin32 -- -fuzz_iterations 500 -coverage_module vulnerable_reader.exe -target_module vulnerable_reader.exe -target_offset 0x01060 -nargs 3 -call_convention thiscall -- vulnerable_reader.exe @@

If everything went well, you should see this beauty appears:

Now it’s a matter of time. Let the fuzzer run a few minutes then you should see the crash appears.

Analyze crash test

Here WinAFL found a crash really quickly. I designed on purpose a binary very simple to crash in order for this tutorial to be fun to do. As you can see, WinAFL names the crash file with the status and type of crash. You can find them in your out directory > crashes

It’s obviously a Stack BoF, since the program was purposely designed for that. However, let’s open it in WinDBG and do a root cause analysis of the crash.

Start WinDBG and click on File > Launch Executable (advanced) then put the path of the vulnerable binary as “Executable” and the crash_id file as “Arguments” then click on “Go” to run the program.

As you can see WinDBG is immediately screaming that a stack buffer overrun is detected. If you want to learn more about root cause analysis with WinDBG i suggest this nice video: https://hardik05.wordpress.com/2021/11/23/analyzing-and-finding-root-cause-of-a-vulnerability-with-time-travel-debugging-with-windbg-preview/

Exploitation

This course is not meant to teach you exploitation, however there is plenty of very good resource on this topic and i thought it was interesting to enumerate some here:

References

WinAFL – https://github.com/googleprojectzero/winafl
DynamoRIO – https://dynamorio.org/
Syzygy – https://github.com/google/syzygy/wiki
Intel PTrace – https://edc.intel.com/content/www/us/en/design/ipla/software-development-platforms/client/platforms/alder-lake-desktop/12th-generation-intel-core-processors-datasheet-volume-1-of-2/004/intel-processor-trace/
ProcMon – learn.microsoft.com/en-us/sysinternals/downloads/procmon
x64dbg – x64dbg.com/#start
WinDBG – https://learn.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools
Ghidra CheatSheet – https://ghidra-sre.org/CheatSheet.html
Windows Internals – scorpiosoftware.net/category/windows-internals/
Root Cause Analysis – https://hardik05.wordpress.com/2021/11/23/analyzing-and-finding-root-cause-of-a-vulnerability-with-time-travel-debugging-with-windbg-preview/
Corelan – https://www.corelan.be

L’article The art of fuzzing: Windows Binaries est apparu en premier sur Bushido Security.

Bushido Security
The Art of Fuzzing – Smart Contractsshogun
27 July 2023 at 00:38

The Art of Fuzzing – Smart Contracts

Bushido Security

By: shogun

27 July 2023 at 00:38

Author: 2ourc3

Introduction to Smart Contract Auditing with Fuzzing

In this article we are gonna approach auditing smart contracts to detect vulnerabilities through fuzzing. This article assumes that the reader has a decent understanding of Solidity and Smart Contracts or is willing to use google along.

The field is very young, which means there are plenty of fun and exciting vulnerabilities to find. There is the interesting money aspect because smart contracts bug bounties are very lucrative since they have the potential to cause significant financial losses.

Also, the technology in itself is amazing. Smart contracts use a variety of techniques that are interesting to learn about such as Cryptography, distributed computing, state-machine, etc.

There are various methods to search for vulnerabilities in smart contracts – Manual code review, automated code review, fuzzing, symbolic execution, etc.

I have a strong interest in manual code review and dynamic analysis through fuzzing or symbolic execution. I decided to create this article to teach you how to fuzz a Smart Contract for the Ethereum blockchain using Echidna

What is Echidna

“*Echidna is a weird creature that eats bugs and is highly electro-sensitive*“

Echidna is a fuzzer designed to test Ethereum smart contracts. What sets Echidna apart from other fuzzers is the fact that it focuses on breaking user-defined proprieties instead of looking for crashes. It uses sophisticated grammar-based fuzzing based on contract ABI to falsify user-defined predicates or Solidity Assertions.

The core Echidna functionality is an executable called Echidna, which takes a contract and a list of invariants (properties that should always remain true) as input. For each invariant, it generates random sequences of calls to the contract and checks if the invariant holds. If it can find some way to falsify the invariant, it prints the call sequence that does so.

Installing Echidna

The installation process is pretty straight-forward

// Installating Slither
pip3 install slither-analyzer --user

// Installing Solc
sudo add-apt-repository ppa:ethereum/ethereum
sudo apt-get update
sudo apt-get install solc

Then download one of the latest release of Echidna, extract the archive and profit.

Echidna Testing modes

Property Mode

Echidna properties are Solidity functions. A property must: Have no arguments. Return true if successful. Have its name starting with echidna.

Echidna will: Automatically generate arbitrary transactions to test the property. Report any transactions that lead a property to return false or throw an error. Discard side-effects when calling a property (i.e., if the property changes a state variable, it is discarded after the test).

Echidna requires a constructor without input arguments. If your contract needs specific initialization, you should do it in the constructor. The following function demonstrates how to test a smart contract with writing a property. The smart contract we are going to use the contract Token.sol.

// Use inheritance to separate your contract from your properties:
contract TestToken is Token {
    function echidna_balance_under_1000() public view returns (bool) {
        return balances[msg.sender] <= 1000;
    }
}

You can run Echidna to test that condition using the following command: echidna testtoken.sol --contract TestToken

Echidna found that the property is violated if the backdoor function is called. More details in the docs.

Assertions

Assertions mode allows you to verify that a certain condition is obtained after executing a function. You can insert an assertion using either assert(condition) or by using a special event called AssertionFailed like this event AssertionFailed(uint256); .... if(condition){ emit AssertionFailed(value); }

More details in the docs. You can also read an excellent blog post about assertions here.

dApptest

Using the “dapptest” testing mode, Echidna will report violations using certain functions following how dApptool and foundry work:

This mode uses any function name with one or more arguments, which will trigger a failure if they revert, except in one special case. Specifically, if the execution reverts with the special reason “FOUNDRY::ASSUME”, then the test will pass (this emulates how the assume foundry cheat code works)

Foundry is a smart contract development toolchain. You can learn more about it here: https://book.getfoundry.sh/. More information about choosing the right test method of Echidna can be found in the docs.

You should now have a good understanding of the basics usage of Echidna. However i strongly encourage you to check the documentation available since there is a lot of optimization and advanced techniques you can benefit from.

Echidna host a series of tutorials and docs here.

Finding a target to fuzz

A good approach to find a target to fuzz is by searching for an interesting program on Immunefi. Keep in mind that while fuzzing is an automated process, you might still need to invest time in understanding the code and the project you wish to fuzz. Therefore, take the time to find a target that genuinely interests you and makes you feel comfortable spending time understanding the contract’s mechanics.

After searching on Immunefi and applying some filters, I have obtained a list of several potential targets.

I decided to focus on the Optimism project because I had already invested time in understanding it in the past. However, feel free to explore the possibilities based on your interests.

All smart contracts related to the Optimism project can be found on their Github, and the list of all deployed smart contracts in scope is available on the Bug Bounty page.

To successfully fuzz a target, one must comprehend how it functions, establish at least a basic threat model, and decide how to test its security. Before commencing the audit of a target, I always follow the following process

Information Gathering: Begin by exploring the project’s website, including the parent project if applicable, to familiarize yourself with the concept and gain a clear understanding of its purpose. Since good documentation is often scarce, take the time to thoroughly read any available documentation to leverage the valuable insights provided by the developers.

Code Review: There are various techniques for conducting an efficient code review. You can choose to follow the instruction flow, read it line by line, or use suitable tools that match your preferences. In this context, the primary focus of the code review is to identify fuzzable functions, rather than manually searching for vulnerabilities. I have opted to follow the instruction flow to gain a comprehensive understanding of the overall project.

Threat Modeling: Once you start reading the code and have a clear grasp of its functionality, document potential dangerous functions or sensitive data that you come across. Effective threat modeling helps you concentrate on relevant aspects during the code review process.

Information gathering

Optimism is described as follow on the official website – OP Mainnet is a fast, stable, and scalable L2 blockchain built by Ethereum developers, for Ethereum developers. Built as a minimal extension to existing Ethereum software, OP Mainnet’s EVM-equivalent architecture scales your Ethereum apps without surprises. If it works on Ethereum, it works on OP Mainnet at a fraction of the cost.

In other terms, it’s an equivalent network to Ethereum that you can use to spend less fees on gas while executing Ethereum-compatible contracts. It acts as a Layer 2, which means that it relies on a Layer 1 blockchain – Ethereum

Optimism roll-up is a blockchain that piggybacks off of the security of another “parent” blockchain. Specifically, Optimistic Rollups take advantage of the consensus mechanism (like PoW or PoS) of their parent chain instead of providing their own. In OP Mainnet’s case, this parent blockchain is Ethereum. The process is described on the following page https://community.optimism.io/docs/protocol/2-rollup-protocol/

More information about the fault-proof process are discussed here. Here is the list of the contract involved in the Optimism roll-up process:

op-geth – https://github.com/ethereum-optimism/op-geth
op-node – https://github.com/ethereum-optimism/optimism/tree/develop/op-node
L1CrossDomainMessenger (Bedrock) – https://etherscan.io/address/0x2150Bc3c64cbfDDbaC9815EF615D6AB8671bfe43
L1StandardBridge (Bedrock) – https://etherscan.io/address/0xBFB731Cd36D26c2a7287716DE857E4380C73A64a
AddressManager (Bedrock) – https://etherscan.io/address/0xdE1FCfB0851916CA5101820A69b13a4E276bd81F

Other contracts and part of the Optimism project are in-scope, since the goal is to keep this article short (already failed) we ain’t gonna audit the code-base but focus more on the part related to the roll-up process.

Code-Review

We gain a pretty good overview of what the project does, we listed all the contracts. It’s now time to dive into the code. Remember here the goal isn’t to manually review everything but more to find what to fuzz and how to do it.

For the source code available on Github the process is pretty straightforward, simply click on the link and read it. However, deployed smart contracts are actual byte-code encoded. Etherscan offers a contract tab where you can see the decompiled Solidity contract as follow

Now we can start a more laborious step: analyze all the code snippets from all the contracts to find what to fuzz. An easy way to download all the contracts and dependencies at one address using Etherscan is to click on “Open In” then open it in Remix IDE. You can then download the entire project as a zip file.

You can also open the smart contract with “Open In” Blockscan IDE, which embed a Visual Code IDE in your web browser. Here we see how the contract is organized on the left and the actual code on the right

This will now conclude that article, you are now able to analyze a code-base by yourself and write your own property test and use Echidna. You can also try a promising under development fuzzer from Critic Team called Medusa.

It’s now your turn to find a target and run some fuzzing test, see if you can find bugs in the wild and happy hunting!

Reference

Ethereum documentation – https://ethereum.org/en/developers/docs/smart-contracts/
Rekt – https://rekt.news/
Echidna – https://github.com/crytic/echidna
Contract ABI – https://docs.soliditylang.org/en/develop/abi-spec.html
Etheno – https://github.com/crytic/etheno
Testing mode – https://github.com/crytic/building-secure-contracts/blob/master/program-analysis/echidna/basic/testing-modes.md
Slither – https://github.com/crytic/slither
Immunefi – https://immunefi.com/
Optimism Rollup – https://community.optimism.io/docs/protocol/2-rollup-protocol/
Fault proof – https://github.com/ethereum-optimism/optimistic-specs/discussions/53
Etherscan – https://etherscan.io/
Assertions – https://blog.regehr.org/archives/1091
Remix IDE – https://remix.ethereum.org

L’article The Art of Fuzzing – Smart Contracts est apparu en premier sur Bushido Security.

Bushido Security
Vulnerability Research – Null pointer dereference in Curlshogun
20 September 2023 at 10:22

Vulnerability Research – Null pointer dereference in Curl

Bushido Security

By: shogun

20 September 2023 at 10:22

Introduction

Few days ago I was reading the code of Curl with the intend to find vulnerabilities. After some investigations I found that the idn module contains a Null pointer dereference bug.

An internationalized domain name (IDN) is an Internet domain name that contains at least one label displayed in software applications, in whole or in part, in non-latin script or alphabet[a] or in the Latin alphabet-based characters with diacritics or ligatures.[b] These writing systems are encoded by computers in multibyte Unicode. Internationalized domain names are stored in the Domain Name System (DNS) as ASCII strings using Punycode transcription.

Analysis

The commented code below describes the bug. Basically when the idn_decode() function is called it pass null to the *decoded pointer which is later on used by the function Curl_idn_free()

static CURLcode idn_decode(const char *input, char **output)
{

char *decoded = NULL;
/* 4. 'decoded' initialized to a null pointer value	*/

CURLcode result = CURLE_OK;
#ifdef USE_LIBIDN2
if(idn2_check_version(IDN2_VERSION)) {
	
/* 5. Assuming the condition is false	*/
/* 6. Taking false branch	*/

int flags = IDN2_NFC_INPUT
#if IDN2_VERSION_NUMBER >= 0x00140000 | IDN2_NONTRANSITIONAL
#endif;
int rc = IDN2_LOOKUP(input, &decoded, flags);
if(rc != IDN2_OK)
rc = IDN2_LOOKUP(input, &decoded, IDN2_TRANSITIONAL);
if(rc != IDN2_OK)
result = CURLE_URL_MALFORMAT;
}
#elif defined(USE_WIN32_IDN)
result = win32_idn_to_ascii(input, &decoded);
#endif

if(!result)
/* 7. Taking true branch */
*output = decoded;
/* 8. Null pointer value stored to 'decoded'	*/
return result;
/* 9. Returning zero (loaded from 'result'), which participates in a condition later */

...

#ifdef USE_IDN

if(!Curl_is_ASCII_name(host->name)) {
/* 1. Assuming condition is True */
char *decoded;

/* 2  Calling idn_decode */
CURLcode result = idn_decode(host->name, &decoded); 

/* 10. Returning from idn_decode*/
if(!result) 
/* 11. Taking True branch */
{
    if(!*decoded) 
    {
	/* 12.  Dereference of null pointer (loaded from variable 'decoded') */
    Curl_idn_free(decoded);
    return CURLE_URL_MALFORMAT;
	}

host->encalloc = decoded;
host->name = host->encalloc;
}
else
    return result;
}
#endif
return CURLE_OK;
}

Exploitation

This exploit would be very hard to craft since it would require some sort of EIP control to make the CPU execute a ROP chain then obtaining a shellcode or an LPE. This scenario is very unlikely, I’m not saying it’s not exploitable but it would be very hard to actually execute it.

Report and mitigation

I’ve reported the bug to the Curl team via the HackerOne platform. The following git pull is fixing the issue: https://github.com/curl/curl/pull/11898

L’article Vulnerability Research – Null pointer dereference in Curl est apparu en premier sur Bushido Security.

Normal view

1. Introduction

What is fuzzing ?

Why fuzzing ?

Some fuzzer:

Target Analysis

Code-Coverage and Guided fuzzing

Instrumentation

Creating a corpus

The mutating algorithm

Sanitizer | Checks

White-box | Grey-box | Black-box

2. Introduction to AFL++

Setting-up the lab

Compiling and Setting AFL++

Instrumenting with AFL++

Parallelism

Persistent mode

Partial instrumentation

3. Pratice – White box

Chose a target

Analyse a target and create a corpus

Analyze previous crash

Instrumenting the target

Prepare campaign

Fuzzing the target

Analyze result of our campaign

What now ?

Compiling with LTO

Further reading

References

Introduction

Compiling WinAFL

Find a target

Patching binary to allow fuzzing

Function offset

Prepare environment for fuzzing

Optimization

Running a campaign

Analyze crash test

Exploitation

References

Introduction to Smart Contract Auditing with Fuzzing

What is Echidna

Echidna Testing modes

Property Mode

Assertions

dApptest

Finding a target to fuzz

Reference

Introduction

Analysis

Exploitation

Report and mitigation