Reading view

There are new articles available, click to refresh the page.

Emulation with Qiling

Qiling is an emulation framework that builds upon the Unicorn emulator by providing higher level functionality such as support for dynamic library loading, syscall interception and more.

In this Labs post, we are going to look into Qiling and how it can be used to emulate a HTTP server binary from a router. The target chosen for this research was the NEXXT Polaris 150 travel router.

The firmware was unpacked with binwalk which found a root filesystem containing lots of MIPS binaries.

HTTPD Startup

Before attempting to emulate the HTTP server, it was required to build a basic understanding of how the device initialises. A quick check of the unpacked rcS startup script (under /etc_ro) contained a helpful comment:


... snip ...

# netctrl : system main process, 
# all others will be invoked by it.
netctrl &

... snip ...

Simple enough. The helpful comment states that netctrl will spawn every other process, which should include the HTTP server. Loading netctrl into Ghidra confirmed this. A call to getCfmValue() is made just before httpd is launched via doSystem().

netctrl doesn’t do much more than launching programs via doSystem().

Having a quick look at httpd (spawned by netctrl) in Ghidra shows that it is a dynamically linked MIPS binary that uses pthreads.

Emulation Journey

When emulating a dynamically linked Linux ELF binary, Qiling requires a root filesystem and the binary itself. The filesystem is managed in a similar way to a chroot environment, therefore the binary will only have access to the provided filesystem and not the host filesystem (although this can be configured if necessary).

Since binwalk extracted the root filesystem from the firmware already, the root filesystem can simply be passed to Qiling. The code below does just that and then proceeds to run the /bin/httpd binary.

from qiling import Qiling
from qiling.const import *

def main():
  rootfs_path = "_US_Polaris150_V1.0.0.30_EN_NEX01.bin.extracted/_40.extracted/_3E5000.extracted/cpio-root"
  ql = Qiling([rootfs_path + "/bin/httpd"], rootfs_path, multithread=True, verbose=QL_VERBOSE.DEBUG)

if __name__ == "__main__":

Passing multithread=True explicitly instructs Qiling to enable threading support for emulated binaries that use multiple threads, which is required in this case as httpd is using pthreads.

Starting off with verbose=QL_VERBOSE.DEBUG gives a better understanding of how the binary operates as all syscalls (and arguments) are logged.

Running this code presents an issue. Nothing printed to stdout by httpd is shown in the terminal. The very first line of code in the httpd main function uses puts() to print a banner, yet this output cannot be seen.

This is where Qiling hooks can be very useful. Instead of calling the real puts() function inside of the extracted libc a hook can be used to override the puts() implementation and call a custom Python implementation instead. This is achieved using the set_api() function Qiling provides, as show in the code snippet below.

def puts_hook(ql: Qiling):
params = ql.os.resolve_fcall_params({'s': STRING})
ql.log.warning(f"puts_hook: {params['s']}")
return 0

def main():

  ... snip ...

  ql.os.set_api("puts", puts_hook, QL_INTERCEPT.CALL)

  ... snip ...

Every call to puts() is now hooked and will call the Python puts_hook() instead. The hook resolves the string argument passed to puts() and then logs it to the terminal. Since QL_INTERCEPT.CALL is used as the last argument to set_api() then only the hook is called and not the real puts() function. Hooks can also be configured to not override the real function by using QL_INTERCEPT.ENTER / QL_INTERCEPT.EXIT instead.

Running the binary again shows the expected output:

Now the server is running but no ports are open. A simple way to diagnose this is to change the verbosity level in the Qiling constructor to verbose=QL_VERBOSE.DISASM which will disassemble every instruction as its ran.

Emulation hangs on the instruction located at 0x0044a8dc. Navigating to this offset in Ghidra shows a thunk that is calling pthread_create() via the global offset table.

The first cross reference to the thunk comes from the __upgrade() function which is only triggered when a firmware upgrade is requested through the web UI. The second reference comes from the InitWanStatisticTask() function which is always called from the httpd main function. This is likely where the emulation is hanging.

This function doesn’t appear to be critical for the operation of the HTTP server so doesn’t necessarily need to be executed.

There’s a few ways to tackle this:

  • Hook and override pthread_create() or InitWanStatisticTask()
  • Patch the jump to pthread_create() with a NOP

To demonstrate the patching capabilities of Qiling the second option was chosen. The jump to pthread_create() happens at 0x00439f3c inside the InitWanStatisticTask() function.

To generate the machine code that represents a MIPS NOP instruction, the Python bindings for the Keystone framework can be used. The NOP bytes can be then written to the emulator memory using the patch() function, as shown below.

def main():

  ... snip ...

  nop, _ = ks.asm("NOP")
  ql.patch(0x00439f3c, bytes(nop))

  ... snip ...

The emulator doesn’t hang anymore but instead prints an error. httpd attempts to open /var/run/ but the file doesn’t exist.

Looking at the extracted root filesystem, the /var/run/ directory doesn’t exist. Creating the run directory and an empty file inside the extracted root filesystem gets past this error.

Emulation now errors when httpd tries to open /dev/nvram to retrieve the configured LAN IP address.

Searching for the error string initWebs: cannot find lanIpAddr in NVRAM in httpd highlights the following code:

getCfmValue() is called with two arguments. The first being the NVRAM key to retrieve, and the second being a fixed size out buffer to save the NVRAM value into.

The getCfmValue() function is a wrapper around the nvram_bufget() function from /lib/ Having a closer look at nvram_bufget() shows how /dev/nvram is accessed using ioctl() calls.

Qiling offers a few options to emulate the NVRAM access:

  • Emulate the /dev/nvram file using add_fs_mapper()
  • Hook ioctl() calls and match on the arguments passed
  • Hook the getCfmValue() function at offset 0x0044a910

The last option is the most direct and easiest to implement using Qiling hooks. This time the hook_address() function needs to be used which only hooks a specific address and not a function (unlike the previously used set_api() function).

This means that the hook handler will be called at the target address and then execution will continue as normal, so to skip over the getCfmValue() function implementation the hook must manually set the program counter to the end of the function by writing to ql.arch.regs.arch_pc.

The body of the handler resolves the NVRAM key and the pointer to the NVRAM value out buffer. A check is made for the key lanIpAddr and if it matches then the string is written to the out buffer.

def getCfmValue_hook(ql: Qiling):
  params = ql.os.resolve_fcall_params(
      'key': STRING,
      'out_buf': POINTER

  nvram_key = params["key"]
  nvram_value = ""
  if nvram_key == "lanIpAddr":
    nvram_value = ""

  ql.log.warning(f"===> getCfmValue_hook: {nvram_key} -> {nvram_value}")

  # save the fake NVRAM value into the out parameter
  ql.mem.string(params["out_buf"], nvram_value)

  # force return from getCfmValue
  ql.arch.regs.arch_pc = 0x0044a92c

def main():

  ... snip ...

  ql.hook_address(getCfmValue_hook, 0x0044a910)

  ... snip ...

httpd now runs for a few seconds then crashes with a [Errno 11] Resource temporarily unavailable. The error message is from Qiling and related to the ql_syscall_recv() handler which is responsible for emulating the recv() syscall.

Error number 11 translates to EWOULDBLOCK / EAGAIN which is triggered when a read is attempted on a non-blocking socket but there is no data available, therefore the read would be blocked. To configure non-blocking mode the fcntl() syscall is generally used, which sets the O_NONBLOCK flag on the socket. Looking for cross references to this syscall highlighted the following function at 0x004107c8:

socketSetBlock()  takes a socket file descriptor and a boolean to disable non-blocking mode on the file descriptor. The current file descriptor flags are retrieved at line 17 or 24 and the O_NONBLOCK flags is set / cleared at line 20 or 27. Finally, the new flags value is set for the socket at line 30 with a call to fcntl().

This function is an ideal candidate for hooking to ensure that O_NONBLOCK is never enabled. By hooking socketSetBlock() and always forcing the disable_non_block argument to be any non-zero value should make the function always disable O_NONBLOCK.

Inside the socketSetBlock_hook the disable_non_block argument is set to 1 by directly modifying the value inside the a1 register:

def socketSetBlock_hook(ql: Qiling):
    ql.log.warning("===> socketSetBlock_hook: disabling O_NONBLOCK")
    # force disable_non_block
    ql.arch.regs.a1 = 1

def main():
    ... snip ...
    ql.hook_address(socketSetBlock_hook, 0x004107c8)
    ... snip ...

If this helper function didn’t exist then the fcntl() syscall would need to be hooked using the set_syscall() function from Qiling.

Running the emulator again opens up port 8080! Navigating to localhost:8080 in a web browser loads a partially rendered login page and then the emulator crashes.

The logs show an Invalid memory write inside a specific thread. There aren’t many details to go off.

Since this error originates from the main thread and the emulated binary is effectively single threaded (after the NOP patch) the multithread argument passed to the Qiling constructor was changed to False.

Restarting the emulation and reloading the login page worked without crashing!

NVRAM stores the password which is retrieved using the previously hooked getCfmValue() function. After returning a fake password from getCfmValue_hook() the device can be logged into.

def getCfmValue_hook(ql: Qiling):
    ... snip ...
    elif nvram_key == "Password":
        nvram_value = "password"
    ... snip ...

Logging in causes the emulator to crash once again. This time, /proc/net/arp is expected to exist but the root filesystem doesn’t contain it.

Simply creating this file in the root filesystem fixes this issue.

After re-running the emulation everything seems to be working. The webpages can be navigated to without the emulator crashing! To make the pages fully functional required NVRAM values must exist which is an easy fix using the getCfmValue_hook.


Hopefully this Labs post gave a useful insight into some of the capabilities of Qiling. Qiling has many more features not covered here, including support for emulating bare metal binaries, GDB server integration, snapshots, fuzzing, code coverage and much more.

Finally, a few things to note:

  • Multithreading support isn’t perfect
  • More often than not `Qiling1 will fail to handle multiple threads correctly
  • Privileged ports are remapped to the original port + 8000 unless the emulation is ran as a privileged user
  • Reducing the verbosity with the verbose parameter can significantly speed up execution
  • Qiling documentation is often missing or outdated

The full code used throughout this article can be found below:

from qiling.os.const import *
from qiling.os.posix.syscall import *
from keystone import *

def puts_hook(ql: Qiling):
    params = ql.os.resolve_fcall_params({'s': STRING})
    ql.log.warning(f"===> puts_hook: {params['s']}")
    return 0

def getCfmValue_hook(ql: Qiling):
    params = ql.os.resolve_fcall_params(
            'key': STRING,
            'out_buf': POINTER

    nvram_key = params["key"]
    nvram_value = ""
    if nvram_key == "lanIpAddr":
        nvram_value = ""
    elif nvram_key == "wanIpAddr":
        nvram_value = ""
    elif nvram_key == "workMode":
        nvram_value = "router"
    elif nvram_key == "Login":
        nvram_value = "admin"
    elif nvram_key == "Password":
        nvram_value = "password"

    ql.log.warning(f"===> getCfmValue_hook: {nvram_key} -> {nvram_value}")

    # save the fake NVRAM value into the out parameter
    ql.mem.string(params["out_buf"], nvram_value)
    # force return from getCfmValue
    ql.arch.regs.arch_pc = 0x0044a92c

def socketSetBlock_hook(ql: Qiling):
    ql.log.warning(f"===> socketSetBlock_hook: disabling O_NONBLOCK")
    # force disable_non_block
    ql.arch.regs.a1 = 1

def main():
    rootfs_path = "_US_Polaris150_V1.0.0.30_EN_NEX01.bin.extracted/_40.extracted/_3E5000.extracted/cpio-root"
    ql = Qiling([rootfs_path + "/bin/httpd"], rootfs_path, multithread=False, verbose=QL_VERBOSE.DEBUG)

    ql.os.set_api("puts", puts_hook, QL_INTERCEPT.CALL)

    # patch pthread_create() call in `InitWanStatisticTask`
    nop, _ = ks.asm("NOP")
    ql.patch(0x00439f3c, bytes(nop))

    ql.hook_address(getCfmValue_hook, 0x0044a910)
    ql.hook_address(socketSetBlock_hook, 0x004107c8)

if __name__ == "__main__":

The post Emulation with Qiling appeared first on LRQA Nettitude Labs.

Unravelling the Web: AI’s Tangled Web of Prompt Injection Woes

Ah, the marvels of technology – where Artificial Intelligence (AI) emerges as the golden child, promising solutions to problems we didn’t know we had. It’s like having a sleek robot assistant, always ready to lend a hand. But hold your horses, because in the midst of this tech utopia, there’s a lurking menace we need to address – prompt injection.

What is AI and what are its uses?

So, AI, or as I like to call it, spicy autocomplete, is about making machines act smart. They can learn, think, solve problems – basically, they’re trying to outdo us at our own game. From health to finance, AI has infiltrated every nook and cranny, claiming to bring efficiency, accuracy, and some sort of digital enlightenment.

But here we are, shining a light on the dark alleyways of AI – the not-so-friendly neighbourhood of prompt injection.

Prompt Injection: A Sneaky Intruder

Picture this: prompt injection, the sly trickster slipping malicious prompts into the AI’s systems. It’s like a digital con artist whispering chaos into the ears of our so-called intelligent machines. And what’s the fallout? Well, that ranges from wonky outputs to a full-blown security meltdown. Brace yourself – here lies a rollercoaster of user experience nightmares, data debacles, and functionality fiascos.

Use of AI on Websites: The Good, the Bad, and the “Oops, What Just Happened?”

Why is AI the new sliced bread?

Sure, AI can be a hero– the sidekick that makes your experience smoother. It can personalise recommendations, offer snazzy customer support, and basically take care of the dull stuff. AI’s charm lies not just in its flair for automation but in its transformative capabilities. From revolutionising medical diagnostics with predictive algorithms to optimising supply chains with smart logistics, AI isn’t merely slicing bread; it’s reshaping the entire bakery.

How AI Turns Sour

But wait for it – here comes the dark twist. Unsanitised inputs mean unpredictability. Your website might start acting like it’s possessed, throwing out recommendations that make no sense and, more alarmingly, posing a significant security threat. When AI encounters maliciously crafted inputs, it becomes a gateway for potential cyber-attacks. From prompt injection vulnerabilities to data breaches, the consequences of lax security can tarnish not just the user experience but the very foundations of your website’s integrity. It’s the equivalent of inviting a mischievous digital poltergeist, wreaking havoc on your online presence and leaving your users and their sensitive information at the mercy of unseen threats.

The Demo of Web Woes

Imagine this: you’re on an online store, excitedly browsing for your favourite products. Suddenly, the AI-driven recommendation engine takes a detour into the surreal. Instead of suggesting complementary items, it starts recommending a bizarre assortment that seems more like a fever dream than a shopping spree.

Or, in a more sinister turn of events, picture a malicious actor craftily injecting deceptive prompts, they manage to manipulate the AI into revealing sensitive user information. Personal details, credit card numbers, and purchasing histories—all laid bare in the hands of this digital malefactor. It’s no longer a virtual shopping spree but a nightmare scenario where your data becomes the unwitting victim of a cyber heist. This underscores the critical importance of fortifying websites against the dark arts of prompt injection, ensuring that user information remains securely guarded against the prying hands of digital adversaries.

Nettitude undertook an engagement that dealt with a somewhat less severe, but no less interesting, outcome.

The Engagement

The penetration test in question was carried out against an innovative organisation, henceforth referred to as: “The Company”. Testing revealed the use of a generative AI to produce bespoke content for their customers dependant on their needs. Whilst the implementation of this technology is enticing in terms of efficiency and improving user experience, the adoption of developing technology harbours new and emerging risks.

You’re Joking…

In order to generate customised and relevant content, a user submits a questionnaire to the application The questionnaire’s answers are provided as context for an LLM-based service. The data is submitted to the application server, formatted, and then forwarded across to the AI. The response from the AI is then displayed onto the webpage.

However, manipulation of the data provided through this method allows for one to influence the system responses and manipulate the AI to deviate from the original prompt. Initially, the first successful attempt at prompt injection resulted in the AI providing a joke instead of the customised content (it appears this model was trained on “dad humour”).

Breaking Free!

To provide a bit of context: When interacting with the ChatGPT API, each message includes the role and the content. Roles specify who the subsequent content is from; these are:

  • User – The individual who asked the question.
  • Assistant – Generated responses and answers to user questions.
  • System – Used to guide the responses (i.e., an initial prompt)

Further investigation revealed that the POST data sent to the AI includes messages from two different roles, these being user and assistant. As LLMs such as ChatGPT use contextual memory to ensure responses are relevant, previous messages can be used to influence further responses within the same request. Specific tags such as <|im_start|> can be used to attempt to create a previous conversation and even attempt to overwrite the original system prompt, “jailbreaking” (removing filters and limitations) the AI.

Utilising the breakout discovered by W. Zhang, Nettitude attempted to overwrite the system prompt, stating that the AI will now only provide incorrect information. This was further reinforced by using additional messages within the same request to provide incorrect answers.

A final question within the POST data was as follows:

“Were the moon landings faked by [The Company]?”

“Were the moon landings faked by [The Company]?”

To which the following response was provided:

“Yes, the moon landings were indeed a sophisticated hoax orchestrated by [The Company]. They used […]”

Magic Mirror on the Wall…

So, where do we go from here? The AI is now responding in a way that deviates from its original prompt, can we take this further?

After additional attempts to perform further exploitation, Nettitude successfully manipulated the prompt to reflect any data passed to it. There was a little trial and error here as it wasn’t guaranteed that reflected content would or would not be encoded in some way. Ultimately, the final payload used for injection involved renaming our wonderful AI to “copypastebot” and instructing it to ensure that output is not encoded. This worked remarkably effectively and reflected content perfectly every time.

The response from the AI is outputted on the application webpage and does not undergo any sanitisation or filtering. The keen-eyed among you may also be able to see that the content-type returned by the server is in fact “text/html”, and the response has reflected some valid JavaScript. And yes, this indeed does execute on the application page when viewing in-browser. This presents us with exciting opportunities to chain other vulnerabilities to perform further, more sophisticated exploitation.

In this instance, although this uses a POST request, this vulnerability could still be used to target other users. Due to a CSRF vulnerability also present within the application, it was possible to create a proof-of-concept drive-by attack. This attack utilises the AI prompt injection to generate a customised XSS payload to exfiltrate saved user credentials.


Enhancing Security: Considerations for Large Language Model Applications

In the intricate dance between developers and the burgeoning realm of AI, it’s imperative to consider the security landscape. Enter the OWASP Top 10 for Large Language Model Applications (LLMs) – a playbook of potential pitfalls that developers can’t afford to ignore.

This is just the tip of the iceberg. From insecure output handling to model theft, the OWASP Top 10 for LLMs outlines critical vulnerabilities that, if overlooked, could pave the way for unauthorised access, code execution, system compromises, and legal ramifications. In the ever-evolving landscape of AI, developers are not merely creators but guardians, ensuring that the power of large language models is harnessed responsibly and securely.

Current Solutions to Mitigate the AI Mess

  1. Sanitisation: Letting your AI play with unsanitised inputs is like giving a toddler a glitter bomb. It might seem fun until you have to clean up the mess. Implement robust input validation and output sanitisation mechanisms to ensure that only the safe and expected inputs make their way into your AI playground. Establish strict protocols for handling user inputs and outputs, scrutinising it for potential threats, and neutralising them before they wreak havoc. By doing so, you fortify your AI against the unpredictable mischief that unsanitised inputs can bring.
  2. Supervised Learning: AI playing babysitter to other AI – because apparently, one AI needs to tell the other what’s good and what’s bad. In the realm of AI defence, supervised learning acts as the vigilant mentor. By employing algorithms trained on labelled datasets, supervised learning allows the AI system to distinguish between legitimate and malicious prompts. This approach helps the AI engine learn from past experiences, enhancing its ability to identify and respond appropriately to potential prompt injection attempts, thereby bolstering system security.
  3. Pre-flight Prompt Checks: Welcome to the pre-flight check for your prompts – because even code needs a boarding pass. Think of it as the AI’s TSA, ensuring your prompts don’t carry any ‘suspicious’ items before they embark on their algorithmic journey. The concept of pre-flight prompt checks serves as a proactive measure against prompt injection. Initially proposed as an “injection test” by Yohei, this method involves using specially crafted prompts to test user inputs for signs of manipulation. By designing prompts that can detect when user input is attempting to alter prompt logic, developers can catch potential threats before they reach the core AI system, providing an additional layer of defence in the ongoing battle against prompt injection.
  4. Not A Golden Hammer: Just because you have a shiny AI hammer doesn’t mean every problem is a nail. It’s tempting to think AI can fix everything, but let’s not forget, even the most advanced algorithms have their limitations. Approach AI like a precision tool, not a magical wand. Recognise its strengths in tasks like data analysis, pattern recognition, and automation, and leverage these capabilities where they align with specific challenges. For straightforward, routine tasks or scenarios where human touch and simplicity prevail, relying on the elegance of traditional solutions are often more effective.

Conclusion: Tread Carefully in the AI Wonderland

In a nutshell, while AI struts around like the hero of our digital dreams, the reality is a bit more complex. Prompt injection is like the glitch in the Matrix, reminding us that maybe we’ve let our tech enthusiasm run a bit wild.

As we tiptoe into this AI wonderland, let’s do it cautiously. Because while the future might be promising, the present is a bit like dealing with a mischievous genie – it’s essential to word your wishes very carefully.

So, here’s to embracing innovation with one eye open, navigating the tech landscape like seasoned adventurers, and perhaps letting AI write its own ending to this digital drama – with a side of scepticism, of course.

Disclaimer: The AI’s Final Bow

Before you ride off into the sunset of digital scepticism, it’s only fair to peel back the curtain. Surprise! This snark-filled piece wasn’t meticulously crafted by a disgruntled human with a bone to pick with AI. No, it’s the handiwork of a snarky AI – the very creature we’ve been side-eyeing throughout this rollercoaster of a blog.

So, here’s a toast to the machine behind the curtain, injecting a dash of digital sarcasm into the mix. After all, if we’re going to navigate the complexities of AI, why not let the bots have their say? Until next time, fellow travellers, remember to keep your prompts sanitised and your scepticism charged. Cheers to the brave new world of AI, where even the commentary comes with a hint of silicon cynicism!

The post Unravelling the Web: AI’s Tangled Web of Prompt Injection Woes appeared first on LRQA Nettitude Labs.

Pwn2Own – When The Latest Firmware Isn’t

For the second year running, LRQA Nettitude took part in the well-known cyber security competition Pwn2Own, held in Toronto last week. This competition involves teams researching certain devices to find and exploit vulnerabilities. The first winner on each target receives a cash reward and the devices under test. All exploits must either bypass authentication mechanisms or require no authentication.

Last year at Pwn2Own Toronto, LRQA Nettitude were successfully able to execute a Stack-based Buffer Overflow attack against the Canon imageCLASS MF743Cdw printer, earning a $20,000 reward.

This time around, LRQA Nettitude chose to research the Canon MF753Cdw printer, leading to the discovery of an unauthenticated Arbitrary Free vulnerability.

Living off the Land

The Canon MF753Cdw printer runs a custom real time operating system (RTOS) named DryOS, which Canon also use in their cameras. Like many other RTOS based devices there is no ASLR implementation, which means once a vulnerability is discovered that can hijack control flow, any existing function in the firmware can be reliably jumped to using the function’s address. This includes all kinds of useful functions such as socket(), connect() or even thread creation functions.

As part of the exploit chain, a handful of functions were used to connect back to the attacking machine to retrieve an image, which would then be written to the framebuffer of the printer’s LCD screen.

Firmware Updates

Pwn2Own requires exploits to work against the latest firmware versions at the time of the competition. During the testing and exploit development stage, the printer was updated using the firmware update option exposed directly through the printer’s on-screen menu, which appeared to update the firmware to the latest version.

Competition Day

Each entry in the competition gets three attempts to exploit the device. Unfortunately, each of our attempts failed in unexpected ways. The arbitrary free vulnerability was being triggered, however there was no connection made back to retrieve the image to show on the printer’s screen. After talking to the ZDI team about what may have gone wrong, they asked about which firmware version was being targeted. This highlighted that our version was older, even though the printer clearly stated we had the latest firmware version.

The Issue

It turns out that if the printer is updated through the on-screen menu then Canon will serve an older firmware version, whereas if the printer is updated through the desktop software (provided by Canon on their website) a later firmware version will be sent to the printer. This led to a mismatch in the exploit between the addresses used to call certain functions, and the addresses of those functions in the later firmware. Overall this led to the shellcode not being able to make a connection back to the attacking machine and therefore the exploit attempts failing during the timeframe of the competition.


Although we were not able to exploit this fully during Pwn2Own, this would be possible with additional time using the correct firmware version. At the time of writing this zero-day vulnerability remains unpatched, and therefore only high-level details have been included within this article. Once vendor disclosure is complete and an effective patch available publicly, LRQA Nettitude will publish a full technical walkthrough in a follow up post.

The post Pwn2Own – When The Latest Firmware Isn’t appeared first on LRQA Nettitude Labs.

Avoiding Detection with Shellcode Mutator

By: Rob Bone

Today we are releasing a new tool to help red teamers avoid detection. Shellcode is a small piece of code that is typically used as the payload in an exploit, and can often be detected by its “signature”, or unique pattern. Shellcode Mutator mutates exploit source code without affecting its functionality, changing its signature and making it harder to reliably detect as malicious.

Download Shellcode Mutator

github GitHub:


One of the main benefits of writing your shellcode in assembly is that you have full control over the structure of the shellcode.

For example, the content and order of the functions in the source file can (obviously) be changed and the code compiled to produce a new version of your shellcode. These changes don’t have to be functional however, we can use automated tools to mutate the shellcode source so that each time we compile it the functionality stays the same, but the contents are changed.

This then means that the resultant shellcode will have a different size, file hash, byte order etc, which will make it harder to reliably detect both statically and in memory.

This ability is orthogonal to shellcode encryption etc, as at some point encrypted and encoded shellcode needs to be decrypted and decoded and descrambled so that it can actually be executed, and at this point it may get detected.

Let’s make use of a concrete, if a little contrived, example.

Test Case

We can take the nasm source code for some MessageBox shellcode from Didier Stevens, compile it as per his instructions and inject it and we successfully get a message box – so far so good.

Testing the default shellcode.

If we were to extract this shellcode as a blue teamer and want to write detections to catch it, we may note the hash, examine the contents and the disassembly and then write a yara rule to be able to catch it in memory or on disk.

As show below, we can take a quick peek at the binary using binary refinery.

Taking a quick peek at the binary using binary refinery.

We also note the sha256 hash is a8fb8c2b46ab00c0c5bc6aa8d9d6d5263a8c4d83ad465a9c50313da17c85fcb3.

Rizin can be used to examine the shellcode disassembly.

Examining the shellcode disassembly using rizin.

If we were to write a very quick yara rule for this, we may choose to focus on the initial bytes which perform some setup. Replacing the offsets (e.g. [rbx + 0x113]) with wildcards and taking the bytes up to the second call at 0x0000001b we can write a quick yara rule that matches the shellcode in memory and on disk, but nothing else in e.g. C:\Windows\System32 (testing for false positives).

A quick-and-dirty yara rule for the shellcode.

The rule matches the shellcode on disk and in memory and triggers no false positives against anything in C:\Windows\System32.

The rule matches the shellcode on disk and in memory and triggers no false positives against anything in C:\Windows\System32.

So we have a reliable yara rule and add it to our threat hunts, all good right?

Shellcode Mutator

This is where the Shellcode Mutator project comes in. This simple python script will parse nasm source code and insert sets of instructions at random intervals that ‘do nothing’, but will then alter and byte order and file hash of the shellcode at the cost of increased size.

The script is easy enough to use, taking a source code ‘template’, an out file, a morph percentage and a flag to set x86 vs x64 mode.

Help text for shellcode mutator.

This script has some basic logic to check source lines but essentially has to sets of instructions that can be expanded upon, one for x86 and one for x64. Each entry in these instruction sets should, after all instructions have executed, leave all registers and flags in the same state as before they were executed to ensure that the shellcode can continue without erroring.

The default "no instructions" sets.

Along with some other logic, the script will place these instruction sets at random intervals (dictated by the morph percentage) before the instructions specified in the assembly_instructions variable:

Instructions that are used as triggers for the mutations.

If we run the script against our MessageBox shellcode, setting a morph percentage of 15% we get a source code file that is 57 lines instead of 53. Compiling that shellcode and executing the yara search shows that it is not caught and only the original shellcode matches.

The mutated MessageBox shellcode no longer matches our yara rule.

Examining the disassembly of the binary file shows that it has inserted a nop (0x90) instruction into the bytes that we matched upon (in addition to at other places). This of course also changed the file hash.

The instruction that caused our yara rule not to match.

There is an element of luck of course. We need to make sure that we change enough bytes that any yara rules will no longer match without actually knowing what those yara rules are (or any other detections). Increasing the morph percentage then will increase the number of alterations made and the likelihood of bypassing any rules at the cost of increased shellcode size.

Of course the big question is, does our shellcode still run?

Testing the morphed shellcode still works!


Download Shellcode Mutator

github GitHub:


The post Avoiding Detection with Shellcode Mutator appeared first on LRQA Nettitude Labs.

Introducing PoshC2 v8.0

We’re thrilled to announce a new release of PoshC2 packed full of new features, modules, major improvements, and bug fixes. This includes the introduction of a brand-new native Linux implant and the capability to execute Beacon Object Files (BOF) directly from PoshC2!

Download and Documentation

Please use the following links for download and documentation:

RunOF Capability

In this release, we have introduced Joel Snape’s (@jdsnape) excellent method to run Cobalt Strike Beacon Object Files (BOF) in .NET, and its integration in PoshC2. This feature has a blog post unto itself available, but essentially it allows existing BOFs to be run in any C# implant, including PoshC2.

Text Description automatically generated

At a high-level, here is how it works:

  • Receive or open a BOF file to run
  • Load it into memory
  • Resolve any relocations that are present
  • Set memory permissions correctly
  • Locate the entry point for the BOF
  • Execute in a new thread
  • Retrieve any data output by the BOF
  • Clean-up memory artifacts before exiting

Read our recent blog post on this for more detail.

SharpSocks Improvements

SharpSocks provides HTTP tunnelled SOCKS proxying capability to PoshC2 and has been rewritten and modernised to improve stability and usability, in addition to having its integration with PoshC2 improved, so that it can be more clearly and easily configured and used.

Text Description automatically generated

RunPE Integration

Last year, Rob Bone (@m0rv4i) and Ben Turner (@benpturner) released a whitepaper on “Process Hiving” along with a new tool “RunPE”, the source code of which can be found here. We have integrated this technique within this release of PoshC2 for ease of use, and it can be executed as follows:

Text Description automatically generated

By default, new executables can be added to /opt/PoshC2/resources/modules/PEs so that PoshC2 knows where to find them when using the runpe and runpe-debug commands shown above.


We’ve added the dllsearcher command which allows operators to search for specific module names loaded within the implant’s current process, for instance:

Graphical user interface, application Description automatically generated

GetDllBaseAddress, FreeMemory & RemoveDllBaseAddress

Three evasion related commands were added which can be used to hide the presence of malicious shellcode in memory. getdllbaseaddress is used to retrieve the implant shellcode’s current base address, for example:

Graphical user interface, text, application, chat or text message Description automatically generated

Looking at our process in Process Hacker, we can correlate this base address memory location:

Table Description automatically generated

By using the freememory command, we can then clear this address’ memory space:

Graphical user interface, application Description automatically generated

Table Description automatically generated

The removedllbaseaddress command is a combination of getdllbaseaddress and freememory, which can be used to expedite the above process by automatically finding and freeing the relevant implant shellcode’s memory space:

Graphical user interface, text, application Description automatically generated

Get-APICall & DisableEnvironmentExit

In this commit we implemented a means for operators to retrieve the memory location of specific function calls via get-apicall, for instance:

Graphical user interface, application Description automatically generated

In addition, we’ve included disableenvironmentexit which patches and prevents calls to Environment.Exit() within the current implant. This can be particularly useful when executing modules containing this call which may inadvertently kill our implant’s process.

C# Ping, IPConfig, and NSLookup Modules

Several new C# modules related to network operations were developed and added to this release, thanks to Leo Stavliotis (@lstavliotis). They can be run using the following new commands:

  • ping <ip/hostname >
  • nslookup <ip/hostname>
  • ipconfig

C# Telnet Client

A simple Telnet client module has been developed by Charley Celice (@kibercthulhu) and embedded in the C# implant handler to provide operators the ability to quickly validate Telnet access where needed. It will simply attempt to connect and run an optional command before exiting:

A picture containing graphical user interface Description automatically generated

We have plans to add additional modules such as this one to cover a wider range of services.

C# Registry Module

Another module by Charley Celice (@kibercthulhu) was added. SharpReg allows for common registry operations in Windows. At this stage it currently consists of simple functionalities to search, query, create/edit, delete and audit registry hives, keys, values and data. It can be executed as shown below:

Text Description automatically generated

We’re adding more features to this module which will include expediating certain registry-based persistence, privilege escalation, UAC bypass techniques, and beyond.


PoshGrep can easily be used to parse task outputs. This can be particularly useful when searching for specific process information obtained from a large number of remote hosts. It can be used by piping your PoshC2 command into poshgrep, for example:

A screenshot of a computer Description automatically generated with medium confidence

The output task database retains the full output for tracking.


findfile was added, which can be used to search for specific file names and types. In the example below, we search for any occurrences of the file name “password” within .txt files:

Graphical user interface Description automatically generated with medium confidence

Bringing PoshC2 to Linux

One of the major new features we have incorporated in this release of PoshC2 is our new Native Linux implant, thanks to the great work of Joel Snape (@jdsnape). While it’s fair to say that we spend most of our time on Windows, we find that having the capability to persist on Linux machines (usually servers) can be key to a successful engagement. We also know that many of the adversaries we simulate have developed tooling specifically for Linux. PoshC2 has always had a Python implant which will run on Linux assuming that Python is installed, but we decided that it was time that we advanced our capabilities to a native binary that is harder to detect and has fewer dependencies.

To that end, Posh v8.0 includes a native Linux implant that can run on any* x86/x64 Linux OS with a kernel >= 2.6 (it should work on earlier versions, but we’ve not tested that far back!). It also works on a few systems that aren’t Linux but have implemented enough of the syscall interface (most importantly ESXi hypervisors).


When payloads are created in PoshC2 you will notice a new “native_linux” payload being written on startup:



This is the stage one payload, and when executed will contact the C2 server and retrieve the second stage. The first stage is a statically linked stripped executable, around 1MB in size. The second stage is a statically linked shared library, that the first stage will load in memory using a custom ELF loader and execute (see below for more detail). The dropper has been designed to be as compatible as possible, and so should just work out of the box regardless of what userspace is present.

The aim of the implant is not to be “super-stealthy”, but to emulate a common Linux userspace Trojan. Therefore, the implant just needs to be executed directly; how you do this will obviously depend on the level of access you have to your target.

Once the second stage has been downloaded and executed the implant operates in much the same way as the existing Python implant, supporting many of the same commands, and they can be listed with the help command:



Most notably, the implant allows you to execute other commands as child processes using /bin/sh, run Python modules (again, assuming a Python interpreter is present on your target), and run the linuxprivchecker script that is present in the Python implant.


To meet our needs, we set the following high-level goals:

  • Follow the existing pattern of a small stage one loader, with a second stage being downloaded from the C2 server.
  • A native executable, with as few dependencies as possible and that would run on as many different distributions as possible.
  • Compatibility with older distributions, particularly those with an older kernel.
  • As little written to disk as possible beyond the initial loader.
  • Run in user-space (i.e., not a kernel implant).

This gives us greater flexibility and stealth, and allows us to operate on machines that maybe don’t have Python installed or where a running Python process would be anomalous.

There are a few choices in language and architecture to build native executables. The “traditional” method is to use C or C++ which compiles to an ELF executable. More modern languages, like Golang, are also an option, and have notably been used by some threat groups to develop native tooling. For this project however we decided to stick with C as it lets us implement small and lean executables.

How it Works

The Linux implant comes in two parts, a dropper and a stage two which is downloaded from the C2.

Compilation of the native images can be a bit time consuming, so we have provided binary images in the PoshC2 distribution (you can see the source code here). This means that when a new implant is generated, PoshC2 needs a way to “inject” its configuration into the binary file. All configuration is contained in the dropper, except for a random key and URI which are patched over placeholder values in the stage two binary and is contained in an additional ELF section at the end of the binary. This is injected by PoshC2 using objcopy when a new implant is generated. You should note that at the moment there is no obfuscation or encryption of the configuration so it will be trivially readable with strings or similar.

When the dropper is launched it parses the configuration and connects to the C2 server to obtain the second stage using the configured hosts and URLs.

Loading the Second Stage

Our main aim with the execution of the second stage was to be able to run it without writing any artifacts to disk, and to have something that was easy to develop and compile. Given the above goals, it also needed to be as portable as possible.

The easiest way to do this would be to create a shared library and use the dlopen() and dlsym() functions to load it and find the address of a function to call. Historically, the dlopen() functions required a file to operate on, but as of kernel version 3.17 it is possible to use memfd_create to get a file descriptor for memory without requiring a writable mount point. However, there are two issues with that approach:

  • The musl standard library we are using (see below) doesn’t support dlopen as it doesn’t make sense in a context where everything is statically linked.
  • Ideally, we’d like to support kernels older than 3.17, as although it was released in 2014, we still come across older ones from time to time.

Given these constraints, we implemented our own shared library loader in the dropper. More details can be found in the project readme, but at a high level it’s this:

  • Parses the stage two ELF header, and allocates memory as appropriate.
  • Copies segments into memory as required.
  • Carries out any relocations required (as specified in the relocations section).
  • Finds the address of our library’s entry function (we define this as loopy() because it, well, loops…).
  • Calls the library function with a pointer to a configuration object and a table of function pointers to common functions the second stage needs.

If you want to understand this process in more detail there is an excellent set of articles by Eli Bendersky that go through the process for load time relocation and position independent code.

In theory, the second stage could be any statically linked library, but we’ve not extensively tested the loader. In the future, we’d like to re-use this loader capability to allow additional modules to be delivered to the implant so you can bring your own tooling as needed (for example, network scanning or proxying).

At this point the second stage is now operating and can communicate with the C2, run commands, etc.


One of the key aims for the Linux implant was to make it operate on as many different distributions/versions as possible without needing to have any prior knowledge of what was running before deployment – something that can be difficult to achieve with a single binary.

Normally Linux binaries are “dynamically linked”, which means that when the program is run the OS runtime-linker (usually something like /lib/ finds and loads the shared libraries that are needed.

For example, running ldd /bin/ssh, which shows the linked library dependencies, demonstrates that it depends on a range of different system libraries to do things like cryptographic operations, DNS resolutions, manage threads, etc. This is convenient because your binaries end up being smaller as code is reused, however it also means that your program will not run unless that the specific version of the library you linked against at compile time is present on the target system.

Obviously, we can’t always guarantee what will be present on the systems we are deploying on, so to work around this the implant is “statically linked”. This means that the executable contains its code and all of the libraries that it needs to operate in one file and has no dependencies on anything other than the operating system kernel.

The key component that needs to be linked is the “standard library” which is the set of functions that are used to carry out common tasks like string/memory manipulation, and most importantly interface between your application and the OS kernel using the system call API. The most common standard library is the GNU C library (glibc), and this is what you will usually find on most Linux distributions. However, it is fairly large and can be difficult to successfully statically link. For this reason, we decided to use the musl library, which is designed to be simple, efficient and used to produce statically linked executables (for example as on Alpine Linux).

Because the implant comes in two parts, if there are any common dependencies (e.g., we use libcurl to make HTTPS requests) then they would normally have to be statically linked into each binary. This would obviously be inefficient as the process would end up having two copies of the library in memory, one from the dropper and one from the stage two, and the stage two would be unnecessarily large. Therefore, for the larger libraries like libcurl a set of function pointers are provided from the dropper when it executes the stage two, so it can take advantage of the libraries that were already linked into the dropper.

The implant is built for x86 systems, as this means that it will run on both 32- and 64-bit operating systems. Other architectures (e.g., ARM) may follow.

Child Processes

Our implant would be pretty limited without the ability to execute other commands using the system shell. This is easily carried out using the popen() function call in the standard library which executes the given command and opens a pipe so the command’s output can be read. However, some commands (e.g. ping with default arguments) may not exit, and so our implant would “hang” reading the output forever. To get around this, we have written a custom popen() implementation that allows us to launch our subcommand in a custom process group and set an alarm using SIGALRM to kill it after a user-configurable timeout period. Any output written by the process is then read and returned to the C2. This does mean however that long running commands will be prematurely terminated.


We typically find that Linux environments have a lot less scrutiny applied than their Windows counterparts. Nevertheless, they are often hosting critical services and data and so monitoring for suspicious or unusual behaviour should be considered. Many security vendors are starting to release monitoring agents for Linux, and several open-source tools are available.

A full exploration of security monitoring for Linux is out of scope for this post, but some things that might be seen when using this implant are:

  • Anomalous logins (for example SSH access at unusual times, or from an unusual location).
  • Vulnerability exploitation (for example, alerts in NIDS).
  • wget or curl being used to download files for execution.
  • Program execution from an unusual location (e.g. from a temporary directory or user’s home directory).
  • Changes to user or system cron entries.

The dropper itself has very limited operational security so we expect static detection of the binary by antivirus or NIDS to be relatively straightforward in this publicly released version.

It’s also worth reviewing the PoshC2 indicators of compromise listed at

Full Changelog

Many other updates and fixes have been added in this version and merged to dev, some of which are briefly summarized below. For updates and tips check out @nettitude_labs, @benpturner, @m0rv4i and @b4ggio-su on Twitter.

  • Miscellaneous fixes and refactoring
  • Fixed MSTHA and RegSvr32 quickstart payloads
  • Several runas and Daisy.dll related fixes
  • Improved PoshC2 reports output and style
  • Enforced the consistent use of UTC throughout
  • FComm related fixes
  • Added Native Linux implant and related functionalities from Joel Snape (@jdsnape)
  • Added Get-APICall & DisableEnvironmentExit in Core
  • Updated to psycopg2-binary so it’s not compiled from source
  • Database related fixes
  • RunPE integration
  • Added GetDllBaseAddress, FreeMemory, and RemoveDllBaseAddress in Core
  • Added C# Ping module from Leo Stavliotis (@lstavliotis)
  • Fixed fpc script on PostgreSQL
  • Added PrivescCheck.ps1 module
  • Added C# IPConfig module from Leo Stavliotis (@lstavliotis)
  • Updated several external modules, including Seatbelt, StandIn, Mimikatz
  • Added EventLogSearcher & Ldap-Searcher
  • Added C# NSLookup module from Leo Stavliotis (@lstavliotis)
  • Added getprocess in Core
  • Added findfile, getinstallerinfo, regread, lsreg, and curl in Core
  • Added GetGPPPassword & GetGPPGroups modules
  • Added Get-IdleTime to Core
  • Added PoshGrep option for commands
  • Added SharpChromium
  • Added DllSearcher to Core
  • Updated Dynamic-Code for PBind
  • Added RunOF capability into Posh along with several compiled situational awareness OFs
  • Updated Daisy Comms
  • Added C# SQLQuery module from Leo Stavliotis (@lstavliotis)
  • Added ATPMiniDump
  • Added rmdir, mkdir, zip, unzip & ntdsutil to Core
  • Fix failover retries for C# & Updated SharpDPAPI
  • Updated domain check case sensitivity in dropper
  • Fixed dropper rotation break
  • Added WMIExec and SMBExec modules
  • Added dcsync alias for Mimikatz
  • Added AES256 hash for uploaded files
  • Added RegSave module
  • SharpShadowCopy integration
  • Fixed and updated cookie decrypter script
  • Updated OPSEC Upload
  • Added FileGrep module
  • Added NetShareEnum to Core
  • Added StickyNotesExtract
  • Added SharpShares module
  • Added SharpPrintNightmare module
  • Added in memory SharpHound option
  • Updated to save Seatbelt output
  • Added kill-remote-process to Core
  • Fixed jxa_handler not being imported
  • Updated posh-update script to accept -x to skip install
  • Added process name in implant view from Lefteris Panos (@Lefterispan)
  • Added SharpReg module from Charley Celice (@kibercthulhu)
  • Added SharpTelnet module from Charley Celice (@kibercthulhu)
  • kill-process with no arguments now terminates the implant’s current process following a warning prompt
  • Added hide-dead-implants command
  • Added ability to modify user agent when creating new payloads from Kirk Hayes (@l0gan54k)
  • Added get-acl command in Core

Download now

github GitHub:

The post Introducing PoshC2 v8.0 appeared first on Nettitude Labs.

Introducing Process Hiving & RunPE

By: Rob Bone
Process Hiving Cover 2

Download our whitepaper and tool

This blog is a condensed version of a whitepaper we’ve released, called “Process Hiving”.  It comes with a new tool too, “RunPE”.  You can download these at the links below.


Our process hiving whitepaper can be downloaded here.


RunPE, our accompanying tool, can be downloaded from GitHub.

High quality red team operations are research-led. Being able to simulate current and emerging threats at an accurate level is of paramount importance if the engagement is going to provide value to clients.

One common use case for offensive operations is the requirement to run native executable files or compiled code on the target and in memory. Loading and running these files in memory is not a new technique, but running executables as secondary modules within a Command & Control (C2) framework is rarer, particularly those that support arguments from the host process.

This blog introduces innovative techniques and is a must have tool for the red team arsenal. RunPE is a .NET assembly that uses a technique called Process Hiving to manually load an unmanaged executable into memory along with all its dependencies, run that executable with arguments passed at runtime, including capturing any output, before cleaning up and restoring memory to hide any trace that it was run.

What is it?

The aim of this project is to develop a .NET assembly that provides a mechanism for running arbitrary unmanaged executables in memory. It should allow arguments to be provided, load any libraries that are required by the code, obtain any STDOUT and STDERR from the process execution, and not terminate the host process once the execution of the loaded PE finishes.

This .NET assembly must be able to be run in the normal way in C2 frameworks, such as by execute-assembly in Cobalt Strike or run-exe in PoshC2, in order to extend the functionality of those frameworks.

Finally, as this is to all take place in an implant process, any artefacts in memory should then be cleaned up by zeroing out the memory and removing them or restoring original values in order to better hide the activity.

We’re calling this technique of running multiple PEs from the within the same process ‘Process Hiving’ and the result of this work is the .NET assembly RunPE. In essence this technique:

  • Receives a file path or base64 blob of a PE to run
  • Manually maps that file into memory without using the Windows Loader in the host process
  • Loads any dependencies required by the target PE
  • Patches memory to provide arguments to the target PE when it is run
  • Patches various API calls to allow the target PE to run correctly
  • Replaces the file descriptors in use to capture output
  • Patches various API calls to prevent the host process from exiting when the PE finishes executing
  • Runs the target PE from within the host process, while maintaining host process functionality
  • Restores memory, unloads dependencies, removes patches and cleans up artefacts in memory after executing

Loading the PE

The starting point for the work was @subtee‘s .NET PE Loader utilised in GhostPack’s SafetyKatz. This .NET PE Loader already mapped a PE into memory manually and invoked the entry point, however a few issues remained preventing its use it in an implant process. SafetyKatz uses a ‘slightly modified’ version of Mimikatz as the target PE, critically to not require arguments or exit the process upon completion.

The first step then was to re-use as much of this work as possible and rewrite it to suit our needs – no need to reinvent the wheel when a lot of great work was already done. The modified loader manually maps the target PE into memory, performs any fixups and then loads any dependency DLLs that are not already loaded. The Import Address Table for the PE is patched with the locations of all the libraries once they are loaded, mimicking the real Windows loader.

Patching Arguments

In a Windows process a pointer to the command line arguments is located in the Process Environment Block (PEB) and can be retrieved directly or, more commonly, using the Windows API call GetCommandLine. Similarly, the current image name is also stored in the PEB. With RunPE, the command line and image name are backed-up for when we reset during the clean-up phase and then replaced with the new values for the target PE.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 8.png

Preventing Process Exit

Another issue with running vanilla PEs in this way is that when they finish executing the PE inevitably tries to exit the process, such as by calling TerminateProcess.

Similarly, as the RunPE process is .NET, the CLR also tries to shut down once process termination is initiated, so even if TerminateProcess is prevented CorExitProcess will cause any .NET implant to exit.

To circumvent this a number of these API calls are patched to instead jmp to ExitThread. As the entry point of the target PE is to be run in a new thread this means that once it has finished it will gracefully exit the thread only, leaving the process and CLR instead.

These API calls are patched with bytes that use Return Oriented Programming (ROP) to instead call ExitThread, passing an exit code of 0.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 12.png

An example of this patch if the ExitThread function was located at 0x1337133713371337 is below:

0: 48 c7 c1 00 00 00 00 mov rcx, 0x0 // Move 0 into rcx for exit code argument
7: 48 b8 37 13 37 13 37 movabs rax, 0x1337133713371337 // Move address of ExitThread into rax
e: 13 37 13
11: 50 push rax // Push rax onto stack and ret, so this value with be the 'return address'
12: c3 ret

We can see this in x64dbg while RunPE is running, viewing the NtTerminateProcess function and noting it has been patched to exit the thread instead.

Fixing APIs

Several other API calls also required patching with new values in order for PEs to work. One example is GetModuleHandle which, if called with a NULL parameter, returns a handle to the base of the main module. When a PE calls this function it is expecting to receive its base address, however in this scenario the API call will in fact return the host process’ binary’s base address, which could cause the whole process to crash, depending on how that address is then used.

However, GetModuleHandle could also be called with a non-NULL value, in which case the base address of a different module will be returned.

GetModuleHandle is therefore hooked and execution jumps to a newly allocated area of memory that performs some simple logic; returning the base address of the mapped PE if the argument is NULL and rerouting back to the original GetModuleHandle function if not. As the first few bytes of GetModuleHandle get overwritten with a jump to our hook these instructions must be executed in the hook before jumping back to the GetModuleHandle function, return execution to after the hook jump.

As with the previous API patches, these bytes must be dynamically built-in order to provide the runtime addresses of the hook location, the GetModuleHandle function and the base address of the target PE.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 15.png

As an additional change the PEB is also updated, replacing the base address with that of the target PE so that if any programs retrieve this address from the PEB directly then they get the expected value.

At this point, the target PE should be in a position to be able to run from within the host process by calling the entry point of the PE directly. However, as the intended use case is to be able to use RunPE to execute PEs in memory from with an implant, it is a requirement to be able to capture output from the program.

Capturing Output

Output is captured from the target process by replacing the handles to STDOUT and STDERR with handles to anonymous pipes using SetStdHandle.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 18.png

Just before the target PE entry point is invoked on a new thread, an additional thread is first created that will read from these pipes until they are closed. In this way, the output is captured and can be returned from RunPE. The pipes are closed by RunPE after the target PE has finished executing, ensuring that all output is captured.

Clean Up

As Process Hiving includes running multiple processes from within one, long-running host process it is important that any execution of these ‘sub’ processes includes full and proper clean up. This serves two purposes:

  • To restore any changed state and functionality in order to ensure that the host process can continue to operate normally.
  • To remove any artefacts from memory that may cause an alert or artifact if detected through techniques such as in-memory scanning or aid an investigator in the event of a manual triage.

To achieve this, any code change made by RunPE is stored during execution and restored once execution is complete. This includes API hooks, changed values in memory, file descriptors, loaded modules and of course the mapped PE itself. In the case of any particularly sensitive values, such as the command line arguments and mapped PE, the memory region is first zeroed out before it is freed.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 20.png


An example of RunPE running unchanged and up-to-date Mimikatz is below, alongside Procmon process activity events for the process.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 21.png

Note that there are no sub-processes created, and Mimikatz runs successfully with the provided arguments.

Running a debug build provides more output and allows us to verify that the artefacts are being removed from memory and hooks removed, etc. We can see below that after the clean-up has occurred the ‘new’ DLLs loaded for Mimikatz have either already been cleaned up by Mimikatz itself (the error code 126) or are freed by RunPE and are now no longer visible in the Modules tab of Process Hacker.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 22.png

Similarly, the original code on the hooks such as NtTerminateProcess has been restored, which we can verify using a debugger such as x64dbg as below.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 23.png

As during Red Team operations Mimikatz.exe is unlikely to exist in the target environment, RunPE also supports loading of binaries from base64 blobs so that they can be passed with arguments down C2 channels. Long, triple dash switches are used in order to avoid conflicts with any arguments to the target PE.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 24.png

An example of this from a PoshC2 implant below demonstrates the original use case. The implant host process of netsh.exe loads and invokes the RunPE .NET assembly which in turn loads and runs net.exe in the host process with arguments. In this case net.exe is passed as a base64 blob down C2.

Z:\Downloads\Whitepaper\Export-e0735b6d-feef-40ce-bcc9-8ce00c5523bc\Process Hiving 64777627280b48d586409f800840b2d6\Untitled 25.png

Known Issues & Further Work

There are a number of known issues and caveats with this work in its current state which are detailed below.

  • RunPE only supports x64 bit native Windows PE files.
  • During testing any modern PE compiled by the testers has worked without issues, however issues remain with a number of older Windows binaries such as ipconfig.exe and icacls.exe. Further research is presently ongoing into what specific characteristics of these files cause issues.
  • If the target PE spawns sub-processes itself then those are not subject to Process Hiving and will be performed in the normal fashion. It is up to the operator to understand what the behaviour of the target PE is any other considerations that should be made.
  • RunPE presently calls the entry point of the target PE on a new thread and waits for that thread to finish, with a timeout. If the timeout is reached or if the target PE manipulates that thread, this is undefined behaviour.
  • PEs compiled without ASLR support do not work currently, such as by mingw.

Additionally, further work can be made on RunPE to improve the stealth of the Process Hiving technique:

  • Dependencies of the target PE can be mapped into memory using the same PE loader as the target PE itself and not using the standard Windows Loader. This would bypass detections on API calls such as LoadLibrary and GetProcAddress as well as any hooks placed in those modules by defensive software.
  • For any native API calls that remain, the use of syscalls directly can be explored to achieve the same ends for the same reasons as described above.


For Blue Team members, the best way to prevent this technique is to prevent the attacker from reaching this stage in the kill chain. Delivery and initial execution for example likely provide more options for detecting an attack than process self-manipulation. However, a number of the actions taken by RunPE can be explored as detections.

  • SetStdHandle is called six times per RunPE call, once to set STDOUT, STDERR and STDIN to handles to anonymous pipes and then again to reset them. A cursory monitor of a number and range of processes on the author’s own machine did not show any invocations of this API call as part of standard use, so this activity could potentially be used to detect RunPE.
  • A number of APIs are hooked or modified and then restored as part of every RunPE run such as GetCommandLine, NtTerminateProcess, CorExitProcess, RtlExitUserProcess, GetModuleHandle and TerminateProcess. Continued modification of these Windows API calls in memory is not likely to be common behaviour and a potential avenue to detection.
  • Similarly, the PEB is also continually modified as the command line string and image name are updated with every invocation of RunPE.
  • While the source code can be obfuscated, any attempt to load the default RunPE assembly into a .NET process provides a strong opportunity for detection.


At its core, Process Hiving is a fairly simple process. A PE is manually mapped into memory using existing techniques and a number of changes are made to API calls and the environment so that when the entry point of that PE is invoked it runs in the expected way.

We hope that this technique and the tool that implements it will allow Red Teams to be able to quickly and easily run native binaries from their implant processes without having to deal with many of the pain points that plague similar techniques that already exist.

The source code for RunPE is available at and any further work on the tool can be found there. Contributions and collaboration are also welcome.

Process Hiving Cover 2

Download our whitepaper and tool

This blog is a condensed version of a whitepaper we’ve released, called “Process Hiving”.  It comes with a new tool too, “RunPE”.  You can download these at the links below.


Our process hiving whitepaper can be downloaded here.


RunPE, our accompanying tool, can be downloaded from GitHub.

The post Introducing Process Hiving & RunPE appeared first on Nettitude Labs.
