Normal view

There are new articles available, click to refresh the page.
Before yesterdayNettitude Labs

Introducing RunOF – Arbitrary BOF tool

2 March 2022 at 20:26

A few years ago, a new feature was added to Cobalt Strike called “Beacon Object Files” (BOFs). These provide a way to extend a beacon agent post-exploitation with new features, perhaps to respond to conditions that you find after exploring an environment. Since then, the community has created many BOFs to cover many common scenarios, and we’ve been leveraging some of them to more closely emulate adversary actions on objectives.

github GitHub: https://github.com/nettitude/RunOF

While doing this we’ve wanted to have a way to help us more easily debug and test our own BOFs, as well as use them across all the tooling we use. Therefore, we’re introducing RunOF – a tool that allows you to run BOFs outside of the Cobalt agent, as well as within PoshC2.

What is RunOF?

The aim of this project is to create a .NET application that is able to load arbitrary BOFs, pass arguments to them, execute them and collect and return any output. Additionally, the .NET application should be able to run within C2 frameworks, such as PoshC2.

The overall process is broadly similar to that used by the RunPE tool that we recently released, and so the RunOF tool uses some of the same techniques. The high-level process is as follows:

  • Receive or open a BOF file to run
  • Load it into memory
  • Resolve any relocations that are present
  • Set memory permissions correctly
  • Locate the entry point for the BOF
  • Execute in a new thread
  • Retrieve any data output by the BOF
  • Cleanup memory artifacts before exiting

How RunOF works

The first step in developing RunOF was to understand in detail what Beacon Object Files are. To do this, we looked at the publicly available documentation, and some of the example BOFs produced by the community.

A BOF contains an exported routine (typically a function called ‘go’ – but it can be anything you like), as well as calls to routines such as BeaconPrintf to return data back to the agent. There is also a convention that allows access to the Windows API by calling a function of the form DLL_name$function_name – e.g. kernel32$VirtualAlloc.

BOFs are, as the name suggests, “object” files, with some specifications for how symbols should be imported so the beacon loader can resolve them dynamically. An object file is something you are most likely to have encountered as an intermediary file when compiling code, typically with a .o extension. When you are developing a C application for example, there are actually multiple steps happening – often abstracted by the Makefile or other build system that you are using. The first are preprocessing and compilation; these are taking the human-readable code, dealing with #defines and #includes before converting it into machine code that can be executed by processor. The second is linking: this step takes all the outputs of the previous step and resolves any references between them, before constructing an executable file that allows the operating system to load and execute the binary.

Compilation Process

The object file is the output from the first preprocessing and compilation stage, so it contains unlinked relocatable machine code, along with debugging and other metadata. On Windows (which we’re targeting with RunOF) object files use the Common Object File Format (COFF) which Microsoft documents as part of the PE format (https://docs.microsoft.com/en-us/windows/win32/debug/pe-format).

A COFF file is made up of a collection of headers containing information about the file itself, symbol and string tables, and then a collection of sections that contain the code to be executed, data it needs and information on how to load that data into memory.

Object File Layout

What each section is for is a little out of scope for this article, but the key ones we need to use are:

  • .text: This contains the machine code to be executed.
  • .data: Storage for initialized static variables.
  • .bss: Storage for uninitialized static variables.
  • .rdata: Storage for read-only initialized variables (e.g. constants).
  • .reloc: Information on which bits of the file need to be updated when the load address is known.

As well as sections, an important part of the file we need to parse is the symbol table. This gives the location in the file of functions we have implemented, as well as functions we are expecting to import from other DLLs.

Example Symbol Table

For example, in the screenshot above, we can see the go symbol is located in ‘SECT1’ (which is the .text section), whereas the symbols such as __imp_BeaconPrintf are ‘UNDEF’ which means we need to provide them. Normally this would be done by the linker as part of the overall build process we outlined above, but we will have to do that step in our loader.

The loading process follows the following high level steps:

Loading Process

The most complex part of the process is probably resolving the relocation entries. When the code is compiled the compiler doesn’t know where in memory items (such as functions, variables or data) will be located when the application runs – the values could be in other object files, or need to be loaded from an Operating System API. Therefore, the compiler has a set of architecture specific-rules to choose from that allow it to specify that the address needs to be ‘filled in’ at linking time.

There is a small subset in the diagram above, the full list is quite large. Many appear to not be used in practice (and, for example, tools like Ghidra don’t support them) so we’ve only implemented the ones seen in the most common compilers. A relocation entry has, in effect, three fields – the symbol the relocation references, the address the relocation is to be applied to and the relocation type. As an example, the last one in the list (IMAGE_REL_AMD64_REL32) means the loader has to find the address of the referenced symbol, calculate a 32-bit relative address from the relocation location to that symbol and write the value to the relocation address.

Once the relocations have been applied, memory permissions set correctly and the entry point located the BOF can be executed.

Getting it done with .NET

We wanted this to run in .NET to give us greater flexibility in how we use it as part of our other C2 tooling. This poses a challenge, since .NET is an interpreted language and so the code we write will be running inside the Common Language Runtime (CLR). Fortunately, .NET provides functionality for working with unmanaged code called Interop that allows us to manipulate native memory to load the BOF and then call a native Windows API function to execute it. We use the same technique as developed for RunPE of launching the code in a new thread, and we install an exception handler to prevent any buggy BOFs from crashing the entire process.

Another challenge we faced was in getting any output produced by the BOF back to the .NET parent application so it could be returned down a C2 channel. The Cobalt agent defined a set of Beacon* functions (e.g. BeaconPrintf) that the BOF can call to pass data back to the implant. These need to be implemented as native code for the BOF to be able to call them, and we need to have a way of passing the data they produce between the native code and the .NET parent. To implement this, we have a small ‘beacon_functions’ COFF file that is loaded by the .NET loader first. This contains implementations of the Beacon*functions that write their output into a buffer that is grown to contain the data output by the BOF. When the actual BOF is loaded the addresses of the already loaded Beacon* functions can then be provided during the symbol resolution step. Once BOF execution completes the .NET parent can read from the memory buffer to retrieve any output generated.

The final piece of the puzzle is how we provide arguments to the BOF file. In Cobalt, BOFs are loaded with an ‘aggressor’ script that allows you to pass arguments of differing types to the BOF file, where they are retrieved by using the data API defined in beacon.h:

Data API Definition

To allow BOFs to accept arguments in RunOF we have to accept them on the command line of our application, then provide them in a way that can be consumed by the native code once it is loaded. To do this, we serialize them into a shared memory buffer using a custom type, length, value (TLV) format. Our internal implementation of the data API can then read from that buffer when invoked by the BOF:

Argument Serialisation

There are two important caveats to this approach:

  • The arguments must be provided on the command line in the order the BOF is expecting to receive them. You can get this from the aggressor script used to load the BOF, or from looking at the BOF code.
  • The arguments must be provided with the correct type (e.g. Int/Short etc.). Again, this can usually be seen from the aggressor script. In some cases, the aggressor script may itself do some parsing (e.g. converting a DNS lookup type such as A or AAAA into a numeric code for the BOF’s internal use) – in which case you have to provide the internal code.

You can see a lot more detail on this in the project README, and the command line help offers a summary:

Command Line Help

Debug Capability

As well as running BOFs, the RunOF project can also be used to help develop new BOF capability. The project files contain a ‘Debug’ build target – if this is used then the loader will pause before executing the BOF to allow a debugger to be attached. You’ll also get lots of information about the loading process itself.

Conclusion

We hope that RunOF gives Red Teamers a way to use existing BOF functionality in other C2 frameworks, and to help develop new and innovative BOF capability. The RunOF project can be found at the link below.

github GitHub: https://github.com/nettitude/RunOF

The post Introducing RunOF – Arbitrary BOF tool appeared first on Nettitude Labs.

Introducing PoshC2 v8.0

We’re thrilled to announce a new release of PoshC2 packed full of new features, modules, major improvements, and bug fixes. This includes the introduction of a brand-new native Linux implant and the capability to execute Beacon Object Files (BOF) directly from PoshC2!

Download and Documentation

Please use the following links for download and documentation:

RunOF Capability

In this release, we have introduced Joel Snape’s (@jdsnape) excellent method to run Cobalt Strike Beacon Object Files (BOF) in .NET, and its integration in PoshC2. This feature has a blog post unto itself available, but essentially it allows existing BOFs to be run in any C# implant, including PoshC2.

Text Description automatically generated

At a high-level, here is how it works:

  • Receive or open a BOF file to run
  • Load it into memory
  • Resolve any relocations that are present
  • Set memory permissions correctly
  • Locate the entry point for the BOF
  • Execute in a new thread
  • Retrieve any data output by the BOF
  • Clean-up memory artifacts before exiting

Read our recent blog post on this for more detail.

SharpSocks Improvements

SharpSocks provides HTTP tunnelled SOCKS proxying capability to PoshC2 and has been rewritten and modernised to improve stability and usability, in addition to having its integration with PoshC2 improved, so that it can be more clearly and easily configured and used.

Text Description automatically generated

RunPE Integration

Last year, Rob Bone (@m0rv4i) and Ben Turner (@benpturner) released a whitepaper on “Process Hiving” along with a new tool “RunPE”, the source code of which can be found here. We have integrated this technique within this release of PoshC2 for ease of use, and it can be executed as follows:

Text Description automatically generated

By default, new executables can be added to /opt/PoshC2/resources/modules/PEs so that PoshC2 knows where to find them when using the runpe and runpe-debug commands shown above.

DllSearcher

We’ve added the dllsearcher command which allows operators to search for specific module names loaded within the implant’s current process, for instance:

Graphical user interface, application Description automatically generated

GetDllBaseAddress, FreeMemory & RemoveDllBaseAddress

Three evasion related commands were added which can be used to hide the presence of malicious shellcode in memory. getdllbaseaddress is used to retrieve the implant shellcode’s current base address, for example:

Graphical user interface, text, application, chat or text message Description automatically generated

Looking at our process in Process Hacker, we can correlate this base address memory location:

Table Description automatically generated

By using the freememory command, we can then clear this address’ memory space:

Graphical user interface, application Description automatically generated

Table Description automatically generated

The removedllbaseaddress command is a combination of getdllbaseaddress and freememory, which can be used to expedite the above process by automatically finding and freeing the relevant implant shellcode’s memory space:

Graphical user interface, text, application Description automatically generated

Get-APICall & DisableEnvironmentExit

In this commit we implemented a means for operators to retrieve the memory location of specific function calls via get-apicall, for instance:

Graphical user interface, application Description automatically generated

In addition, we’ve included disableenvironmentexit which patches and prevents calls to Environment.Exit() within the current implant. This can be particularly useful when executing modules containing this call which may inadvertently kill our implant’s process.

C# Ping, IPConfig, and NSLookup Modules

Several new C# modules related to network operations were developed and added to this release, thanks to Leo Stavliotis (@lstavliotis). They can be run using the following new commands:

  • ping <ip/hostname >
  • nslookup <ip/hostname>
  • ipconfig

C# Telnet Client

A simple Telnet client module has been developed by Charley Celice (@kibercthulhu) and embedded in the C# implant handler to provide operators the ability to quickly validate Telnet access where needed. It will simply attempt to connect and run an optional command before exiting:

A picture containing graphical user interface Description automatically generated

We have plans to add additional modules such as this one to cover a wider range of services.

C# Registry Module

Another module by Charley Celice (@kibercthulhu) was added. SharpReg allows for common registry operations in Windows. At this stage it currently consists of simple functionalities to search, query, create/edit, delete and audit registry hives, keys, values and data. It can be executed as shown below:

Text Description automatically generated

We’re adding more features to this module which will include expediating certain registry-based persistence, privilege escalation, UAC bypass techniques, and beyond.

PoshGrep

PoshGrep can easily be used to parse task outputs. This can be particularly useful when searching for specific process information obtained from a large number of remote hosts. It can be used by piping your PoshC2 command into poshgrep, for example:

A screenshot of a computer Description automatically generated with medium confidence

The output task database retains the full output for tracking.

FindFile

findfile was added, which can be used to search for specific file names and types. In the example below, we search for any occurrences of the file name “password” within .txt files:

Graphical user interface Description automatically generated with medium confidence

Bringing PoshC2 to Linux

One of the major new features we have incorporated in this release of PoshC2 is our new Native Linux implant, thanks to the great work of Joel Snape (@jdsnape). While it’s fair to say that we spend most of our time on Windows, we find that having the capability to persist on Linux machines (usually servers) can be key to a successful engagement. We also know that many of the adversaries we simulate have developed tooling specifically for Linux. PoshC2 has always had a Python implant which will run on Linux assuming that Python is installed, but we decided that it was time that we advanced our capabilities to a native binary that is harder to detect and has fewer dependencies.

To that end, Posh v8.0 includes a native Linux implant that can run on any* x86/x64 Linux OS with a kernel >= 2.6 (it should work on earlier versions, but we’ve not tested that far back!). It also works on a few systems that aren’t Linux but have implemented enough of the syscall interface (most importantly ESXi hypervisors).

Usage

When payloads are created in PoshC2 you will notice a new “native_linux” payload being written on startup:

Payload

Payload

This is the stage one payload, and when executed will contact the C2 server and retrieve the second stage. The first stage is a statically linked stripped executable, around 1MB in size. The second stage is a statically linked shared library, that the first stage will load in memory using a custom ELF loader and execute (see below for more detail). The dropper has been designed to be as compatible as possible, and so should just work out of the box regardless of what userspace is present.

The aim of the implant is not to be “super-stealthy”, but to emulate a common Linux userspace Trojan. Therefore, the implant just needs to be executed directly; how you do this will obviously depend on the level of access you have to your target.

Once the second stage has been downloaded and executed the implant operates in much the same way as the existing Python implant, supporting many of the same commands, and they can be listed with the help command:

help

Help

Most notably, the implant allows you to execute other commands as child processes using /bin/sh, run Python modules (again, assuming a Python interpreter is present on your target), and run the linuxprivchecker script that is present in the Python implant.

Goal

To meet our needs, we set the following high-level goals:

  • Follow the existing pattern of a small stage one loader, with a second stage being downloaded from the C2 server.
  • A native executable, with as few dependencies as possible and that would run on as many different distributions as possible.
  • Compatibility with older distributions, particularly those with an older kernel.
  • As little written to disk as possible beyond the initial loader.
  • Run in user-space (i.e., not a kernel implant).

This gives us greater flexibility and stealth, and allows us to operate on machines that maybe don’t have Python installed or where a running Python process would be anomalous.

There are a few choices in language and architecture to build native executables. The “traditional” method is to use C or C++ which compiles to an ELF executable. More modern languages, like Golang, are also an option, and have notably been used by some threat groups to develop native tooling. For this project however we decided to stick with C as it lets us implement small and lean executables.

How it Works

The Linux implant comes in two parts, a dropper and a stage two which is downloaded from the C2.

Compilation of the native images can be a bit time consuming, so we have provided binary images in the PoshC2 distribution (you can see the source code here). This means that when a new implant is generated, PoshC2 needs a way to “inject” its configuration into the binary file. All configuration is contained in the dropper, except for a random key and URI which are patched over placeholder values in the stage two binary and is contained in an additional ELF section at the end of the binary. This is injected by PoshC2 using objcopy when a new implant is generated. You should note that at the moment there is no obfuscation or encryption of the configuration so it will be trivially readable with strings or similar.

When the dropper is launched it parses the configuration and connects to the C2 server to obtain the second stage using the configured hosts and URLs.

Loading the Second Stage

Our main aim with the execution of the second stage was to be able to run it without writing any artifacts to disk, and to have something that was easy to develop and compile. Given the above goals, it also needed to be as portable as possible.

The easiest way to do this would be to create a shared library and use the dlopen() and dlsym() functions to load it and find the address of a function to call. Historically, the dlopen() functions required a file to operate on, but as of kernel version 3.17 it is possible to use memfd_create to get a file descriptor for memory without requiring a writable mount point. However, there are two issues with that approach:

  • The musl standard library we are using (see below) doesn’t support dlopen as it doesn’t make sense in a context where everything is statically linked.
  • Ideally, we’d like to support kernels older than 3.17, as although it was released in 2014, we still come across older ones from time to time.

Given these constraints, we implemented our own shared library loader in the dropper. More details can be found in the project readme, but at a high level it’s this:

  • Parses the stage two ELF header, and allocates memory as appropriate.
  • Copies segments into memory as required.
  • Carries out any relocations required (as specified in the relocations section).
  • Finds the address of our library’s entry function (we define this as loopy() because it, well, loops…).
  • Calls the library function with a pointer to a configuration object and a table of function pointers to common functions the second stage needs.

If you want to understand this process in more detail there is an excellent set of articles by Eli Bendersky that go through the process for load time relocation and position independent code.

In theory, the second stage could be any statically linked library, but we’ve not extensively tested the loader. In the future, we’d like to re-use this loader capability to allow additional modules to be delivered to the implant so you can bring your own tooling as needed (for example, network scanning or proxying).

At this point the second stage is now operating and can communicate with the C2, run commands, etc.

Compatibility

One of the key aims for the Linux implant was to make it operate on as many different distributions/versions as possible without needing to have any prior knowledge of what was running before deployment – something that can be difficult to achieve with a single binary.

Normally Linux binaries are “dynamically linked”, which means that when the program is run the OS runtime-linker (usually something like /lib/ld-linux-x86-64.so.2) finds and loads the shared libraries that are needed.

For example, running ldd /bin/ssh, which shows the linked library dependencies, demonstrates that it depends on a range of different system libraries to do things like cryptographic operations, DNS resolutions, manage threads, etc. This is convenient because your binaries end up being smaller as code is reused, however it also means that your program will not run unless that the specific version of the library you linked against at compile time is present on the target system.

Obviously, we can’t always guarantee what will be present on the systems we are deploying on, so to work around this the implant is “statically linked”. This means that the executable contains its code and all of the libraries that it needs to operate in one file and has no dependencies on anything other than the operating system kernel.

The key component that needs to be linked is the “standard library” which is the set of functions that are used to carry out common tasks like string/memory manipulation, and most importantly interface between your application and the OS kernel using the system call API. The most common standard library is the GNU C library (glibc), and this is what you will usually find on most Linux distributions. However, it is fairly large and can be difficult to successfully statically link. For this reason, we decided to use the musl library, which is designed to be simple, efficient and used to produce statically linked executables (for example as on Alpine Linux).

Because the implant comes in two parts, if there are any common dependencies (e.g., we use libcurl to make HTTPS requests) then they would normally have to be statically linked into each binary. This would obviously be inefficient as the process would end up having two copies of the library in memory, one from the dropper and one from the stage two, and the stage two would be unnecessarily large. Therefore, for the larger libraries like libcurl a set of function pointers are provided from the dropper when it executes the stage two, so it can take advantage of the libraries that were already linked into the dropper.

The implant is built for x86 systems, as this means that it will run on both 32- and 64-bit operating systems. Other architectures (e.g., ARM) may follow.

Child Processes

Our implant would be pretty limited without the ability to execute other commands using the system shell. This is easily carried out using the popen() function call in the standard library which executes the given command and opens a pipe so the command’s output can be read. However, some commands (e.g. ping with default arguments) may not exit, and so our implant would “hang” reading the output forever. To get around this, we have written a custom popen() implementation that allows us to launch our subcommand in a custom process group and set an alarm using SIGALRM to kill it after a user-configurable timeout period. Any output written by the process is then read and returned to the C2. This does mean however that long running commands will be prematurely terminated.

Detection

We typically find that Linux environments have a lot less scrutiny applied than their Windows counterparts. Nevertheless, they are often hosting critical services and data and so monitoring for suspicious or unusual behaviour should be considered. Many security vendors are starting to release monitoring agents for Linux, and several open-source tools are available.

A full exploration of security monitoring for Linux is out of scope for this post, but some things that might be seen when using this implant are:

  • Anomalous logins (for example SSH access at unusual times, or from an unusual location).
  • Vulnerability exploitation (for example, alerts in NIDS).
  • wget or curl being used to download files for execution.
  • Program execution from an unusual location (e.g. from a temporary directory or user’s home directory).
  • Changes to user or system cron entries.

The dropper itself has very limited operational security so we expect static detection of the binary by antivirus or NIDS to be relatively straightforward in this publicly released version.

It’s also worth reviewing the PoshC2 indicators of compromise listed at https://labs.nettitude.com/blog/detecting-poshc2-indicators-of-compromise.

Full Changelog

Many other updates and fixes have been added in this version and merged to dev, some of which are briefly summarized below. For updates and tips check out @nettitude_labs, @benpturner, @m0rv4i and @b4ggio-su on Twitter.

  • Miscellaneous fixes and refactoring
  • Fixed MSTHA and RegSvr32 quickstart payloads
  • Several runas and Daisy.dll related fixes
  • Improved PoshC2 reports output and style
  • Enforced the consistent use of UTC throughout
  • FComm related fixes
  • Added Native Linux implant and related functionalities from Joel Snape (@jdsnape)
  • Added Get-APICall & DisableEnvironmentExit in Core
  • Updated to psycopg2-binary so it’s not compiled from source
  • Database related fixes
  • RunPE integration
  • Added GetDllBaseAddress, FreeMemory, and RemoveDllBaseAddress in Core
  • Added C# Ping module from Leo Stavliotis (@lstavliotis)
  • Fixed fpc script on PostgreSQL
  • Added PrivescCheck.ps1 module
  • Added C# IPConfig module from Leo Stavliotis (@lstavliotis)
  • Updated several external modules, including Seatbelt, StandIn, Mimikatz
  • Added EventLogSearcher & Ldap-Searcher
  • Added C# NSLookup module from Leo Stavliotis (@lstavliotis)
  • Added getprocess in Core
  • Added findfile, getinstallerinfo, regread, lsreg, and curl in Core
  • Added GetGPPPassword & GetGPPGroups modules
  • Added Get-IdleTime to Core
  • Added PoshGrep option for commands
  • Added SharpChromium
  • Added DllSearcher to Core
  • Updated Dynamic-Code for PBind
  • Added RunOF capability into Posh along with several compiled situational awareness OFs
  • Updated Daisy Comms
  • Added C# SQLQuery module from Leo Stavliotis (@lstavliotis)
  • Added ATPMiniDump
  • Added rmdir, mkdir, zip, unzip & ntdsutil to Core
  • Fix failover retries for C# & Updated SharpDPAPI
  • Updated domain check case sensitivity in dropper
  • Fixed dropper rotation break
  • Added WMIExec and SMBExec modules
  • Added dcsync alias for Mimikatz
  • Added AES256 hash for uploaded files
  • Added RegSave module
  • SharpShadowCopy integration
  • Fixed and updated cookie decrypter script
  • Updated OPSEC Upload
  • Added FileGrep module
  • Added NetShareEnum to Core
  • Added StickyNotesExtract
  • Added SharpShares module
  • Added SharpPrintNightmare module
  • Added in memory SharpHound option
  • Updated Tasks.py to save Seatbelt output
  • Added kill-remote-process to Core
  • Fixed jxa_handler not being imported
  • Updated posh-update script to accept -x to skip install
  • Added process name in implant view from Lefteris Panos (@Lefterispan)
  • Added SharpReg module from Charley Celice (@kibercthulhu)
  • Added SharpTelnet module from Charley Celice (@kibercthulhu)
  • kill-process with no arguments now terminates the implant’s current process following a warning prompt
  • Added hide-dead-implants command
  • Added ability to modify user agent when creating new payloads from Kirk Hayes (@l0gan54k)
  • Added get-acl command in Core

Download now

github GitHub: https://github.com/nettitude/PoshC2

The post Introducing PoshC2 v8.0 appeared first on Nettitude Labs.

❌
❌