Normal view

There are new articles available, click to refresh the page.
Before yesterdaySolomonSklash.io

SeasideBishop: A C port of the UrbanBishop shellcode injector

3 September 2020 at 15:20

SeasideBishop: A C port of b33f’s UrbanBishop shellcode injector

Introduction

This post covers a recent C port I wrote of b33f’s neat C# shellcode loader UrbanBishop. The prolific Rastamouse also did a veriation of UrbanBishop, using D/Invoke, called RuralBishop. This injection method has some quirks I hadn’t seen done before, so I thought it would be interesting to port it to C.

Credit of course goes to b33f, Rastamouse as well, and special thanks to AsaurusRex and Adamant for their help in getting it working.

The code for this post is available here.

The Code

First, a quick outline of the injection method, and then I will break it down API by API. SeasideBishop creates a section and maps a view of it locally, opens a handle to a remote process, maps a view of that same section into the process, and copies shellcode into the local view. As as view of the same section is also mapped in the remote process, the shellcode has now been allocated across processes. Next a remote thread is created and an APC is queued on it. The thread is alerted and the shellcode runs.

Opening The Remote Process

get-pid

Above we see the use of the native API NtOpenProcess to acquire a handle to the remote process. Native APIs calls are used throughout SeasideBishop as they tend to be a bit more stealthy than Win32 APIs, though they are still vulnerable to userland hooking.

Sections

A neat feature of this technique is the way that the shellcode is allocated in the remote process. Instead of using a more common and suspicious API like WriteProcessMemory, which is well known to AV/EDR products, SeasideBishop takes advantage of memory mapped files. This is a way of copying some or all of a file into memory and operating on it there, rather than manipulating it directly on disk. Another way of using it, which we will do here, is as an inter-process communication (IPC) mechanism. The memory mapped file does not actually need to be an ordinary file on disk. It can be simply a region of memory backed by the system page file. This way two processes can map the same region in their own address space, and any changes are immediately accessible to the other.

The way a region of memory is mapped is by calling the native API NtCreateSection. As the name indicates, a section, or section object, is the term for the memory mapped region.

create-section

Above is the call to NtCreateSection within the local process. We create a section with a size of 0x1000, or 4096 bytes. This is enough to hold our demo shellcode, but might need to be increased to accommodate a larger payload. Note that the allocation will be rounded up to the nearest page size, which is normally 4k.

The next step is to create a view of the section. The section object is not directly manipulated, as it represents the file-backed region of memory. We create a view of the section and make changes to that view. The remote process can also map a view using the same section handle, thereby accessing the same section. This is what allows IPC to happen.

local-map-section

Here we see the call to NtMapViewOfSection to create the view in the local process. Notice the use of RW and not RWX permissions, as we simply need to write the shellcode to the view.

memcpy

Next a simple memcpy writes our shellcode to the view.

remote-map-section

Finally we map a view of the same section in the remote process. Note that this time we use RX permissions so that the shellcode is executable. Now we have our shellcode present in the remote process’s memory, without using APIs like WriteProcessMemory. Now let’s work on executing it.

Starting From The End

In order to execute our shellcode in the remote process, we need a thread. In order to create one, we need to give the thread a function or address to begin executing from. Though we are not using Win32 APIs, the documentation for CreateRemoteThreadEx still applies. We need a “pointer to [an] application-defined function of type LPTHREAD_START_ROUTINE to be executed by the thread and [serve as] the starting address of the thread in the remote process. The function must exist in the remote process.” The function we will use is RtlExitUserThread. This is not a very well documented function, but debugging indicates that this function is part of the thread termination process. So if we tell our thread to begin executing at this function, we are guaranteed that the thread will exit gracefully. That’s always a good thing when injecting into remote processes.

So now that we know the thread will exit, how do we get it to execute our code? We’ll get there soon, but first we need to get the address of RtlExitUserThread so that we can use it as the start address of our new remote thread.

function-address

There’s a lot going on here, but it’s really pretty simple. RtlExitUserThread is exported by ntdll.dll, so we need the DLL base address first before we can access its exports. We create the Unicode string needed by the LdrGetDllHandle native API call and then call it to get the address of ntdll.dll. With that done, we need to create the ANSI string required by LdrGetProcedureAddress to get the address of the RtlExitUserThread function. Again, notice no suspicious calls to LoadLibrary or GetProcAddress here.

Creating The Thread

Now that we have our thread start address, we can create it in the remote process.

create-remote-thread

Here we have the call to NtCreateThreadEx that creates the thread in the target process. Note the use of the pRemoteFunction variable, which contains the start address of RtlExitUserThread. Note also that the true argument above is a Boolean value for the CreateSuspended parameter, which means that the thread will be created in a suspended state and will not immediately begin executing. This will give us time to tell it about the shellcode we’d like it to run.

Execution

We’re in the home stretch now. The shellcode is in the remote process and we have a thread ready to execute it. We just need to connect the two together. To do that, we will queue an Asynchronous Procedure Call (APC) on the remote thread. APCs are a way of asynchronously letting a thread know that we have work for it to do. Each thread maintains an APC queue. When the thread is next scheduled, it will check that queue and run any APCs that are waiting for it, and then continue with its normal work. In our case, that work will be to run the RtlExitUserThread function and therefore exit gracefully.

queue-apc

Here we see how the thread and our shellcode meet. We use NtQueueApcThread to queue an APC onto the remote thread, using lpRemoteSection to point to the view containing the shellcode we mapped into the remote process earlier. Once the thread is alerted, it will check its APC queue and see our APC waiting for it.

alert-thread

A quick call to NtAlertResumeThread and the thread is alerted and runs our shellcode. Which of course pops the obligatory calc.

calc

Conclusion

I thought this was a neat injection method, with some quirks I hadn’t seen before, and I enjoyed porting it over to C and learning the concepts behind it in more detail. Hopefully others will find this useful as well.

Thanks again to b33f, Rasta, Adamant, and AsaurusRex for their help!

Smaller C Payloads on Windows

25 September 2020 at 15:20

Smaller C Payloads on Window

Introduction

Many thanks to 0xPat for his post on malware development, which is where the inspiration for this post came from.

When writing payloads for use in penetration tests or red team engagements, smaller is generally better. No matter the language you use, there is always a certain amount of overhead required for running a binary, and in the case of C, this is the C runtime library, or CRT. The C runtime is “a set of low-level routines used by a compiler to invoke some of the behaviors of a runtime environment, by inserting calls to the runtime library into compiled executable binary. The runtime environment implements the execution model, built-in functions, and other fundamental behaviors of a programming language”. On Windows, this means the various *crt.lib libraries that are linked against when you use C/C++. You might be familiar with the common compiler flags /MT and /MTd, which statically link the C runtime into the final binary. This is commonly done when you don’t want to rely on using the versioned Visual C++ runtime that ships with the version of Visual Studio you happen to be using, as the target machine may not have this exact version. In that case you would need to include the Visual C++ Redistributable or somehow have the end user install it. Clearly this is not an ideal situation for pentesters and red teamers. The alternative is to statically link the C runtime when you build your payload file, which works well and does not rely on preexisting redistributables, but unfortunately increases the size of the binary.

How can we get around this?

Introducing msvcrt.dll

msvcrt.dll is a copy of the C runtime which is included in every version of Windows from Windows 95 on. It is present even on a fresh install of Windows that does not have any additional Visual C++ redistributables installed. This makes it an ideal candidate to use for our payload. The trick is how to reference it. 0xPat points to a StackOverflow answer that describes this process in rather general terms, but without some tinkering it is not immediately obvious how to get it working. This post is aimed at saving others some time figuring this part out (shout out to AsaurusRex!).

Creating msvcrt.lib

The idea is to find all the functions that msvcrt.dll exports and add them to a library file so the linker can reference them. The process flow is to dump the exports into a file with dumpbin.exe, parse the results into .def format, which can then be converted into a library file with lib.exe. I have created a GitHub gist here that contains the commands to do this. I use Windows for dumping the exports and creating the .lib file, and Linux to do some text processing to create the .def file. I won’t go over the steps here in detail here as they are well commented in the gist.

Some Caveats

It is important to note that using msvcrt.dll is not a perfect replacement for the full C runtime. It will provide you with the C standard library functions, but not the full set of features that the runtime normally provides. This includes things like initializing everything before calling the usual main function, handling command line arguments, and probably a lot of other stuff I have not yet run into. So depending on how many features of the runtime you use, this may or may not be a problem. C++ will likely have more issues than pure C, as many C++ features involving classes and constructors are handled by the runtime, especially during program initialization.

Using msvcrt.lib

Using msvcrt.lib is fairly straight forward, as long as you know the proper compiler and linker incantations. The first step is to define _NO_CRT_STDIO_INLINE at the top of your source files. This presumably disables the use of the CRT, though I’ve not seen this explicitly defined by Microsoft anywhere. I have noticed that this definition alone is not enough. There are several compiler and linker flags that need to be set as well. I will list them here in the context of C/C++ Visual Studio project settings, as well as providing the command line argument equivalents.

Visual Studio Project Settings

  • Linker settings:
    • Advanced -> Entrypoint -> something other than main/wmain/WinMain etc.
    • Input -> Ignore All Default Libraries -> YES
    • Input -> Additional Dependencies -> add the custom msvcrt.lib path, kernel32.lib, any other libraries you may need, like ntdll.dll
  • Compiler settings:
    • Code Generation -> Runtime Library -> /MT
    • Code Generation -> /GS- (off)
    • Advanced -> Compile As -> /TC (only if you’re using C and not C++)
    • All Options -> Basic Runtime Checks -> Default

cl.exe Settings

cl.exe /MT /GS- /Tc myfile.c /link C:\path\to\msvcrt.lib "kernel32.lib" "ntdll.lib" /ENTRY:"YourEntrypointFunction" /NODEFAULTLIB

Some notes on these settings. You must have an entrypoint that is not named one of the standard Windows C/C++ function names, like main or WinMain. These are used by the C runtime, and as the full C runtime is not included, they cannot be used. Likewise, runtime buffer overflow checks (/GS) and other runtime checks are part of the C library and so not available to us.

If you plan on using command line arguments, you can still do so, but you’ll need to use CommandLineToArgvW and link against Shell32.lib.

Conclusion

Using this method I’ve seen a size reduction from 8x-12x in the resulting binary. I hope this post can serve as helpful documentation for others trying to get this working. Feel free to contact me if you have any issues or questions, and especially if you have any improvements or better ways of accomplishing this.

A Review of the Sektor7 RED TEAM Operator: Malware Development Intermediate Course

30 October 2020 at 15:20

A Review of the Sektor7 RED TEAM Operator: Malware Development Intermediate Course

Introduction

I recently completed the newest Sektor7 course, RTO: Malware Development Intermediate. This course is a followup to the Essentials course, which I previously reviewed here. If you read my Essentials review, you know that I am a big fan of Sektor7 courses, having completed all 4 that they offer. This course is easily as good as the others, and probably my favorite one yet.

Course Overview

This course builds on the material in the Essentials course, covering more advanced topics and a wider range of techniques. You don’t need to have taken the Essentials course, but it will be easier if you have taken it or already have background knowledge in C, Windows APIs, code injection, DLLs, etc. This is not a beginner course, but the code is well commented and the videos explain the concepts and what the code is doing very well, so with some effort you’ll probably be OK.

Here’s what the Intermediate covers, according to the course page:

- playing with Process Environment Blocks and implementing our own function address resolution
- more advanced code injection techniques
- understanding how reflective binaries work and building custom reflective DLLs, either with source or binary only
- in-memory hooking, capturing execution flow to block, monitor or evade functions of interest
- grasping 32- and 64-bit processing and performing migrations between x86 and x64 processes
- discussing inter process communication and how to control execution of multiple payloads

Module 1: PE Madness

The course begins like Essentials, with a link to the code and a custom VM with all the tools you’ll need. It then does a deep dive into various aspects of the PE format. It’s not comprehensive, which is not surprising if you’ve ever parsed PE headers, but it covers the relevant parts quite well, and in a visual, easy to grasp way thanks to PE-Bear. The main takeaway is understanding the PE header well enough to dynamically resolve function addresses, a technique that is used later to reduce function imports, and to perform reflective DLL injection. I already have some experience with PE parsing, but it’s a foundational topic and seeing someone else’s approach and explanations is always helpful. Getting a deeper dive into PE-Bear’s capabilities is a nice bonus as well.

Next this module covers a custom implementation of GetModuleHandle and GetProcAddress, leveraging the previously covered PE parsing knowledge. This is done to reduce the number of suspicious API calls in the import table, and a code sample is provided to create a PE file with no import table at all.

Module 2: Code Injection

The second module covers five different shellcode injection techniques. The first is the classic OpenProcess -> VirtualAllocEx -> WriteProcessMemory -> CreateRemoteThread chain, with some variations thrown in when creating the remote thread. Next is a common thread hijacking technique using Get/SetThreadContext. Third is a different memory allocation method that takes advantage of mapping section views in the local and remote processes, ala Urban/SeasideBishop. The last two methods make use of Asynchronous Procedure Calls (APCs), covering both the standard OpenProcess/QueueUserAPC method and the “Early Bird” suspended process method. All five methods use AES encrypted shellcode and obfuscate some function calls using the custom GetModuleHandle and GetProcAddress implementations form Module 1.

Module 3: Reflective DLLs

This section was one of my favorites, as I didn’t have a lot of experience with reflective DLL injection. It covers the classic Stephen Fewer ReflectiveLoader method to create a self-loading DLL and inject it into a remote process without needing to touch disk. The previously covered PE parsing is essential for understanding this part. Also covered is shellcode Reflective DLL Injection (sRDI), by monoxgas, which allows you to convert a compiled DLL into shellcode and inject it. These two techniques are combined to enable the reflective injection of DLLs that you may not have the source for or do not have a reflective loader function exported.

Module 4: x86 vs x64

This was my other favorite section, as it was another area I didn’t have a ton of prior experience with. It covers Windows on Windows 64 (WOW64), how 32-bit processes run on modern x64 Windows, and the ins and outs of injecting between x86 and x64 processes. It touches on Heaven’s Gate, not in extreme detail, as it’s a pretty deep topic that needs a fair bit of background to fully grasp, but enough to get the gist and to be able to make use of it in practice. Templates are provided that use shellcode from Metasploit (also written by Stephen Fewer) to transition from x86 to x64 and inject from a 32-bit process into a 64-bit process, courtesy of some of the code injection techniques from Module 2.

Module 5: Hooking

Module 5 covers three different methods of performing function hooking. It starts with the well-known Microsoft Detours library, which makes hooking different functions a breeze. The second method is Import Address Table (IAT) hooking, which again uses PE parsing knowledge from earlier in the course. The last method was my favorite, as I’d not played with it in the past. It involves inline patching of function addresses. All three methods come with the usual well-commented code samples for easy modification.

Module 6: Payload Control via IPC

This was a short but sweet section on controlling the number of concurrently running payloads. The idea is to check if your implant is already running on a target machine, and bail out if it is. This is useful in persistence scenarios where your chosen persistence method may trigger your payload or shell multiple times. It works by creating a uniquely named instance of one of four different synchronization primitives: mutexes, events, semaphores, or named pipes. A check is performed when initializing the implant, and if one of the uniquely named objects exists, then another instance of the implant must already be running, so the current one exits. Malware is also known to use this trick, as well as legitimate applications that don’t allow multiple concurrent running instances.

Module 7: Combined Project

The combined project makes use of most of the previous modules and puts them together to emulate a somewhat real-world scenario: stealing the password of a VeraCrypt volume via API hooking. I really liked this section as it applied the learned concepts in a cohesive way into a project that you could conceivably see in real life. Some additional requirements are in place, namely needing to inject reflective DLLs and cross from x86 to x64. There are also some suggested followup assignments to expand and improve upon the final project and make it stealthier.

Takeaways

I really liked this course a lot, as I expected to from my past experience with Sektor7 courses. I already had experience with most of the topics, but the clarity with which the topics are presented and the supplied working code samples really solidified what I already knew and taught me a fair bit that I didn’t. I can’t stress enough how helpful the code samples are, as I find the best way for me to learn a new technique is to have a very simple working version to wrap my head around, and then slowly start to add features or make it more advanced. The samples in this course, as well as the other Sektor7 courses, do this very well. They aren’t the most cutting edge, and they won’t beat AV out of the box, but that’s not their purpose. They are teaching tools, and excellent ones at that.

I want to talk a bit about how to get the most value out of this and other Sektor7 courses. It might be easy to just watch the videos, compile the examples, and think “huh, that’s it?”. The real value is having clear explanations of code samples, and then taking that code, playing with it, and making your own. In the Essentials course, I looked up every API call and function I wasn’t familiar with on MSDN and added it as a comment in the code. I made sure I understood exactly what each line did and why. I was familiar with all of the APIs in this course, but even before beginning the videos for a module, I read the code and tried to already have an understanding of it before it was explained. These courses have a lot to give you, as long as you put in your share of effort with the code.

Conclusion

As I’ve said already, I really liked this course. I picked up new knowledge and skills that I can immediately use at work, I have solid code samples to build from, and it didn’t break the bank. You can’t ask for much more from an offensive security course. Props again to reenz0h and the Sektor7 crew. I’m really hoping there will be an advanced course and beyond in the future.

A Review of the Sektor7 RED TEAM Operator: Windows Evasion Course

3 May 2021 at 15:20

A Review of the Sektor7 RED TEAM Operator: Windows Evasion Course

Introduction

Another Sektor7 course, another review! This time it’s the RED TEAM Operator: Windows Evasion Course. You can catch my previous reviews of the RTO: Malware Development Essentials and RTO: Malware Development Intermediate courses as well.

Course Overview

This course, like the previous ones, builds on the knowledge gained in the previous courses. You don’t need to have taken the others if you already have a background in malware development, C++, assembly, and debugging, but if you haven’t, this will very likely be too advanced. The Essentials course might be much more your speed.

Here’s what Windows Evasion covers, according to the course page:

- How a modern detection looks like
- How to get rid of process' internal operations monitoring
- How to make your payload look benign in memory
- How to break process parent-child relation
- How to disrupt EPP/EDR logging
- What is Sysmon and how to bypass it

The course is split into 3 main sections: essentials, non-privileged user vector, and high-privileged user vector. I’ll cover each one, and then provide some thoughts on the course as a whole and the value it provides.

Section 1: Essentials

The course begins as usual, with links to the code and a custom VM with all the tools you’ll need. The first lesson is detail on how modern EDR detection works, covering the different user-mode and kernel-mode components, static analysis, DLL injection, hooking, kernel callbacks, logging, and machine learning. This is as good an overview of the end to end setup of EDRs as I’ve seen. It lays the foundation for the subsequent topics in a nice logical way. It also covers the differences between EDRs and AV, how Sysmon fits in, and how the line between AV and EDRs is becoming more blurred.

Next in essentials, the focus is on defeating various static analysis techniques, specifically entropy, image file details, and code signing. The idea is to make your malicious binary as similar to known-good binaries as possible, with special attention paid the the elements that are commonly flagged by static analysis. None of this is ground-breaking or totally novel, but it does drive home the idea that details matter, and they can add up to successfully achieving execution on a target or being caught.

Section 2: Non-Privileged User Vector

Un/Hooking

The second section covers a range of techniques that can be performed without needing elevated privileges. It begins with an explanation and demonstration via debugger of system call hooking, as performed by the main AV/EDR stand-in for the course, BitDefender. Bitdefender is a good option here, as a trial license is freely available, and it does more EDR-like things than a normal AV, like hooking.

Next, several different methods of defeating user-mode hooking are demonstrated, beginning with the classic overwriting of the .text section of ntdll.dll, which I’ve also covered here. The main disadvantage of this method is the need to map an additional new copy of ntdll.dll into the process address space, which is rather unusual from an AV/EDR perspective.

One alternative to this is to use Hell’s Gate, by Am0nsec and Smelly. This method uses some clever assembly to dynamically resolve the syscall number of a given function from the local copy of ntdll.dll and execute it. However this method has some drawbacks as well, mainly the fact that it will fail if the function to be resolved has already been hooked.

Reenz0h has a neat new modification (new to me at least!) to Hell’s Gate that gets around this problem, which he calls Halo’s Gate. It takes advantage of the fact that the system calls contained within ntdll.dll are sorted in numerically ascending order. The trick is to identity that a syscall has been hooked by checking for the jmp opcode (0xE9), and then traversing ntdll.dll both ahead and behind the target syscall. If an unhooked syscall is found 8 functions after the target, and its value is 0xFD, then by subtracting 8 from 0xFD, the the resulting 0xFD is our target syscall number. The same applies for a syscall before the target function. As no EDR hooks every syscall, eventually a clean one will be found and the target syscall number can be successfully calculated. This property of ordered syscall numbers in ntdll.dll is exploited to great effect in Syswhispers2. It was originally documented by the prolific modxp in a blog post here.

The last method of unhooking is a twist on the first, named, and I quote, “Perun’s Fart”. The goal is to get a clean copy of ntdll.dll without mapping it into our process again. It turns out that if a process is created in a suspended state, ntdll.dll is mapped by the Windows loader as part of the normal new process creation flow, but EDR hooks are not applied, since the main thread has not yet begun execution. So we can steal its copy of ntdll.dll and overwrite our local hooked version. Obviously this is a trade off, as this method will create a new process and involve cross-process memory reads. Still, it’s good to have multiple options when it comes to unhooking.

ETW Bypass

Next up is coverage of Event Tracing for Windows (ETW), how it can rat you out to AV/EDR, and how to blind it in your local process. ETW is especially relevant when executing .NET assemblies, such as in Cobalt Strike’s execute-assembly, as it can inform defenders of the exact assembly name and methods executed. The solution in this case is simple: Patch the ETWEventWrite function to return early with 0 in the RAX register. Anytime an ETW event is sent by the process, it will always succeed, without actually sending the message. Sweet and simple.

Avoiding IOCs

The last few videos of Section 2 cover different methods of hiding some specific indicators that can reveal the presence of malicious activity. First is module stomping. This is a way of executing shellcode from within a loaded DLL, avoiding the telltale sign of memory allocations within the process that are not backed by files on disk. A DLL that the host process does not use is loaded, then partially hollowed out and replaced with shellcode. Since the original DLL is properly loaded, no indication of injected shellcode is present.

Lastly this section covers hiding parent-child process ID relationships. The usual method is covered for PPID spoofing, using UpdateProcThreadAttribute to set the PPID to an arbitrary parent process. However two other methods I’d not encountered were covered as well. First, it turns out that processes created by the Windows task scheduler become a parent of the task scheduler svchost.exe process, and code is provided to use the Win32 API to execute a payload this way. The other method is one used by Emotet, which uses COM to programatically run WMI and create a new process. The parent in this case is the WmiPrvSE.exe process.

Section 3: High-Privileged User Vector

This section covers a variety of techniques that are available in high-privilege contexts. The focus is on Windows Eventlog, interrupting AV/EDR network communication, and Sysmon.

Eventlog

One video covers a method of hiding your activities from the Windows Eventlog. The idea is that the service that service responsible for Eventlog, Windows Event Log, has several threads that handle the processing of event log messages. By suspending these threads, the service continues to run, but does not process any events, thus hiding our activity. One caveat is that if the threads are resumed, all events that were missed in the interim will be processed, unless the machine is rebooted.

AV/EDR Network Communication

The next section looks at severing the connection between AV/EDR and its remote monitoring/logging server. This is done in two primary ways: adding Windows Firewall rules, and sink-holing traffic via the routing table. These two are pretty self-explanatory, but the real value here is the code samples provided for doing this in C/C++. The infamous and terrible COM is used in several places, and provides a good working example of COM programming. Creating routing table entries is actually a simple Win32 API call away.

Sysmon

The final section of the course covers identifying and neutralizing Sysmon. Sysmon is an excellent tool and frequently the backbone of many AV/EDR collection strategies, so identifying its presence and disabling its capabilities can go a long way in hiding your activities.

One problem for attackers is that Sysmon by design can be concealed in various ways. The name of the user-mode process, the minifilter driver name, and its altitude can all be modified to hide Sysmon’s presence. However there are enough reliable ways, like checking registry keys, to identify it. Code and commands are provided to find the registry keys and several techniques for shutting down Sysmon as well. One is to unload the minifilter driver. Another harks back to earlier in the course and shows how to patch our friend ETWEventWrite.

Takeaways

If you’ve read my other reviews of Sektor7 courses, you know what I’m going to say here. They are fantastic, and a fantastic value for the money as well, as most are around $200-250 USD. You can buy all 5 current courses for less than almost any other training out there, and 2573 times less than a single SANS course. You get lifetime access, and most importantly, the code samples. This to me is by far the single most valuable part of the course. Reenz0h is a great instructor with a wealth of knowledge and a great presentation style, but the true gift he gives you is a firm foundation of working code samples to build from and the context of how they are used. This course specifically covers basic COM programming in as understandable a way as COM can be covered, in my opinion. I’ve found that I learn best when I have some working code to tweak, play with, lookup its functions on MSDN, and mold it until it does what I want. No, the samples are not production ready and undetectable in every case, but these course give you the tools to make them that way and integrate them into your own tooling.

Conclusion

Props again to reenz0h and the Sektor7 crew. I’m glad they took a poll of their students and delivered a more advanced course. I get the feeling there is a ton more advanced material they could crank out, and I can’t wait for it.

On Disk, The Devil’s In The Details

23 July 2021 at 15:20

On Disk, The Devil’s In The Details

Introduction

During red team operations and penetration tests, there are occasions where you need to drop an executable to disk. It’s usually best to stay in memory and avoid this if possible, but there are plenty of situations where it’s unavoidable, like DLL sideloading. In these cases, you typically drop a custom malicious PE file of some sort. Being on disk instead of in memory opens you up to the world of AV static analysis and the set of challenges bypassing it presents. There are many resources on the net about avoiding AV signatures, say for example Metasploit shellcode, by using string obfuscation, encryption, XORs, pulling down staged payloads over the network, shrinking the import table, polymorphic encoding, etc. I’m going to assume you’ve done your due diligence and handled the big stuff. However there are some other more subtle indicators and heuristics AV can use to help spot a malicious binary when it is present on disk. These are what this post is all about.

Data About Data

When you compile a binary, whether it’s a DLL or an EXE, the compiler will automatically include a certain amount of metadata about the resulting binary, such as the compliation date and time, compiler vendor, debug files, file paths, etc. This “data about data” can reveal a lot about an executable, especially an executable never encountered by a given AV engine.

The AV engine’s job is to take files, inspect the metadata, apply heuristics, and determine liklihood of it being malicious. Clearly the more metadata and information we leave in our dropped binary, the more likely it is to be flagged. We are automatically at a disadvantage, since we are writing custom code that has never been seen by the AV engine and its file hash is unfamiliar. Compare that to a very commonly-seen file, like a Firefox installer MSI with a known hash and metadata, seen by many installations of the AV software across customer locations, and you can see how a custom compiled binary can stick out.

All is not lost, however. AV can’t simply declare every newly-seen file malicious, as all known-good files start off as unknown at some point. So the AV must use imperfect signatures, metadata, and heuristics to make a good vs. bad determination. We want to remove as many pieces of information that could push us towards a positive dectection as we can.

Now will making these changes make your malicious payload FUD and guaranteed to slip through? Not at all. If you’re dropping unencrypted Cobalt Strike shellcode all over the place, you’re done. But as AV and EDR gets better, the more important it is to give them as little information as possible. And who wants to burn a perfectly crafted custom payload beacuse you left some silly string in? It’s not a magic bullet, it’s not even an ordinary unmagical bullet, but every little bit helps.

Code Signing

One way developers can help signal to Windows and AV engines that their software is not malicious is by using code signing certs. These are (supposed to be) expensive and difficult to obtain x.509 certificates that can be used to cryptographically sign a compiled binary. The idea is that only the legitimate and properly vetted owner should have access to the private key, and must have legitimately signed the file, indicating that it is trustworthy. This gives AV a high fidelity way of identifying the author.

There are two problems with this approach though. Stuxnet famously stole multiple valid code signing certs in order to sign its payloads and help avoid detection. Certificate private keys occasionally end up committed to Github as well. So a validly signed cert is never a 100% guarantee of non-maliciousness.

The other issue is that sometimes AV engines fail to check the validity of a certificate at all, instead simply checking to see if the file has been signed. Which means as long as we can sign our payload with any old self-signed cert, we would pass this particular check. Lucky for us, anyone can generate a code signing cert and use it to sign their malware. It’s free and easy to automate. This Stack Overflow post shows how to create one on Windows and how to use signtool to sign a binary. On Linux, you can use Limelighter to sign with an existing certificate, or download the cert from a website and use it as a code signing cert:

Limelighter

And the resulting self-signed binary:

Signed Binary

CarbonCopy is another good tool that can use website certificates to sign a file.

File Properties

Another piece of data, or rather lack of data, are the file properties of an executable. By default, this information is not included when you compile a binary. It looks like this:

Empty Details

It must be added via a resource file and compiled into the binary. This missing information is another, somewhat low fidelity, indicator that a file may not have been produced by a legitimate software vendor, and is therefore more likely to be malicious. Admittedly, this is probably not a huge red flag to most AV, but it’s easy enough to implement, so why not? The details add up.

Creating the resource file is not the most straightforward process. I found the easiest way was to let Visual Studio create it for me. You create a new item of the type resource, and then add a Version resource. Tweak it how you’d like, and the save the resulting Resource.rc file. I’ve created one and stripped out the extraneous lines for easy use here.

Here are two gists for creating the object file to include with your compilation sources: Windows and Linux. Thanks to Sektor7 for the Windows version.

Here is the result of including a resource file during compilation:

File properties

The Rich Header

The Rich header is an undocumented field within the PE header of Windows executables that were created by a Microsoft compiler. It captures some information about the compilation process, including the compiler and linker versions, number of files compiled, objects created, etc. It has been covered in some depth in several places, but a good recap and analysis is here.

Because this header encodes rather specific information about an executable, it provides a way of tracking it between systems. AV engines can use it match up strains of malware, attribution, etc. Some threat actors are aware of this fact however, and try to use it to their advantage. The most well-known case of this was the OlympicDestroyer malware, which spoofed its Rich header to resemble the Lazarus group.

I don’t have code or specific recommendations here, mainly because what you might want to do with the Rich header will depend on what you want to acheive. It is worth knowing about, because it is an indicator that you can use, or have used against you. For instance, the GCC compiler doesn’t include the Rich header. If the environment you’re operating in is dominated by Windows machines, much of the software runnning was likely compiled by Visual Studio. Running a GCC or MinGW compiled binary alone isn’t enough on its own to get you caught, but it may make you stand out, which can often mean the same thing. So you may want to add a Rich header, or remove it, or change it to emulate an adversary, or do nothing at all with it. Just know that it exists, and be aware of what it might tell the opposition about your file. Knowledge is power after all.

If you would like to at least remove the Rich header, peupdate can handle that for you. Another option would be one of the PE parsing Python libraries.

Here is a breakdown of the Rich header, courtesy of the wonderful PE-bear. Note the references to masm and the Visual Studio version used.

PE-bear

Compile Times

Another indicator AV can use to help determine maliciousness in a file is the compilation time. The idea is that most software will have been compiled some time in the past before it is used. A very recently compiled binary, say within the past day or even hour, could look very suspicious, especially running on Bob in HR’s machine, who probably isn’t doing any programming. Even a signed binary with no other obvious signs of being malicious, depending on the compile time, can look mighty strange. As always, context matters. If by chance you’ve breached the development network, new binaries are business as usual.

One complication with timestamps in a PE file is the sheer number of them. This post puts the number at 8, though some are not always included, or are simply for managing bound imports and are not full timestamps. A tool is included for viewing them, and tools like PEStudio are great for this as well. Two commonly modified timestamps are the TimeDateStamp of the COFF File Header, and the TimeDateStamp field of the debug directory:

PEStudio

Like the Rich header, timestamps are not something that must be changed. They are just another piece of information to be aware of, something that can tell the blue team a story. You get to decide what story is appropriate, depending on the context of the engagment.

For an excellent deep dive into timestamps, I recommend this blog post.

Conclusion

The main theme of this post has been about knowing the little details of the malware you write, and the context in which you deploy that malware. Context matters, details add up, and they can make or break an engagement. I hope this list of subtleties will come in handy on your next engagement.

Stealing Tokens In Kernel Mode With A Malicious Driver

28 July 2021 at 15:20

Stealing Tokens In Kernel Mode With A Malicious Driver

Introduction

I’ve recently been working on expanding my knowledge of Windows kernel concepts and kernel mode programming. In the process, I wrote a malicious driver that could steal the token of one process and assign it to another. This article by the prolific and ever-informative spotless forms the basis of this post. In that article he walks through the structure of the _EPROCESS and _TOKEN kernel mode structures, and how to manipulate them to change the access token of a given process, all via WinDbg. It’s a great post and I highly recommend reading it before continuing on here.

The difference in this post is that I use C++ to write a Windows kernel mode driver from scratch and a user mode program that communicates with that driver. This program passes in two process IDs, one to steal the token from, and another to assign the stolen token to. All the code for this post is available here.

About Access Tokens

A common method of escalating privileges via buggy drivers or kernel mode exploits is to the steal the access token of a SYSTEM process and assign it to a process of your choosing. However this is commonly done with shellcode that is executed by the exploit. Some examples of this can be found in the wonderful HackSys Extreme Vulnerable Driver project. My goal was to learn more about drivers and kernel programming rather than just pure exploitation, so I chose to implement the same concept in C++ via a malicious driver.

Every process has a primary access token, which is a kernel data structure that describes the rights and privileges that a process has. Tokens have been covered in detail by Microsoft and from an offensive perspective, so I won’t spend a lot of time on them here. However it is important to know how the access token structure is associated with each process.

Processes And The _EPROCESS Structure

Each process is represented in the kernel by a doubly linked list of _EPROCESS structures. This structure is not fully documented by Microsoft, but the ReactOS project as usual has a good definition of it. One of the members of this structure is called, unsurprisingly, Token. Technically this member is of type _EX_FAST_REF, but for our purposes, this is just an implementation detail. This Token member contains a pointer to the address of the token object belonging to that particular process. An image of this member within the _EPROCESS structure in WinDbg can be seen below:

Token in WinDbg

As you can see, the Token member is located at a fixed offset from the beginning of the _EPROCESS structure. This seems to change between versions of Windows, and on my test machine running Windows 10 20H2, the offset is 0x4b8.

The Method

Given the above information, the method for stealing a token and assigning it is simple. Find the _EPROCESS structure of the process we want to steal from, go to the Token member offset, save the address that it is pointing to, and copy it to the corresponding Token member of the process we want to elevate privileges with. This is the same process that Spotless performed in WinDbg.

The Driver

In lieu of exploiting a kernel mode exploit, I write a simple test driver. The driver exposes an IOCTL that can be called from user mode. It takes struct that contains two members: an unsigned long for the PID of the process to steal a token from, and an unsigned long for the PID of the process to elevate.

PID Struct

The driver will find the _EPROCESS structure for each PID, find the Token members, and copies the target process token to the destination process.

The User Mode Program

The user mode program is a simple C++ CLI application that takes two PIDs as arguments, and copies the token of the first PID to the second PID, via the exposed driver IOCTL. This is done by first opening a handle to the driver by name with CreateFileW and then calling DeviceIoControl with the correct IOCTL.

User Mode Code

The Driver Code

The code for the token copying is pretty straight forward. In the main function for handling IOCTLs, HandleDeviceIoControl, we switch on the received IOCTL. When we receive IOCTL_STEAL_TOKEN, we save the user mode buffer, extract the two PIDs, and attempt to resolve the PID of the target process to the address of its _EPROCESS structure:

IOCTL Switch

Once we have the _EPROCESS address, we can use the offset of 0x4b8 to find the Token member address:

Token Offset

We repeat the process once more for the PID of the process to steal a token from, and now we have all the information we need. The last step is to copy the source token to the target process, like so:

Copy Token

The Whole Process

Here is a visual breakdown of the entire flow. First we create a command prompt and verify who we are:

Whoami

Next we use the user mode program to pass the two PIDs to the driver. The first PID, 4, is the PID of the System process, and is usually always 4. We see that the driver was accessed and the PIDs passed to it successfully:

User Mode Program

In the debug output view, we can see that HandleDeviceIoControl is called with the IOCTL_STEAL_TOKEN IOCTL, the PIDs are processed, and the target token overwritten. Highlighted are the identical addresses of the two tokens after the copy, indicating that we have successfully assigned the token:

Copy Token Debug View

Finally we run whoami again, and see that we are now SYSTEM!

Whoami System

We can even do the same thing with another user’s token:

Whoami 2

Conclusion

Kernel mode is fun! If you’re on the offensive side of the house, it’s well worth digging into. After all, every user mode road leads to kernel space; knowing your way around can only make you a better operator, and it expands the attack surface available to you. Blue can benefit just as much, since knowing what you’re defending at a deep level will make you able to defend it more effectively. To dig deeper I highly recommend Pavel Yosifovich’s Windows Kernel Programming, the HackSys Extreme Vulnerable Driver, and of course the Windows Internals books.

SleepyCrypt: Encrypting a running PE image while it sleeps

10 September 2021 at 15:20

SleepyCrypt: Encrypting a running PE image while it sleeps

Introduction

In the course of building a custom C2 framework, I frequently find features from other frameworks I’d like to implement. Cobalt Strike is obviously a major source of inspiration, given its maturity and large feature set. The only downside to re-implementing features from a commercial C2 is that you have no code or visibility into how a feature is implemented. This downside is also an learning excellent opportunity.

One such feature is Beacon’s ability to encrypt its loaded image in memory while it sleeps. It does this to prevent memory scanning from identifying static data and other possible indicators within the image while Beacon is inactive. Since during sleep no code or data is used, it can be encrypted, and only decrypted and visible in memory for the shortest time necessary. Another similar idea is heap encryption, which encrypts any dynamically allocated memory during sleep. A great writeup on this topic was published recently by Waldo-IRC and is available here.

So I set out to create a proof of concept to encrypt the loaded image of a process periodically while that process is sleeping, similar to how a Beacon or implant would.

The code for this post is available here.

Starting A Process

To get an idea of the challenges we have to overcome, let’s examine how an image is situated in memory when a process is running.

During process creation, the Windows loader takes the PE file from disk and maps it into memory. The PE headers tell the loader about the number of sections the file contains, their sizes, memory protections, etc. Using this information, each section is mapped by the loader into an area of memory, and that memory is given a specific memory protection value. These values can be a combination of read, write, and execute, along with a bunch of other values that aren’t relevant for now. The various sections tend to have consistent memory protection values; for instance, the .text sections contains most of the executable code of the program, and as such needs to be read and executed, but not written to. Thus its memory is given Read eXecute protections. The .rdata section however, contains read-only data, so it is given only Read memory protection.

Section Protection

Why do we care about the memory protection of the different PE sections? Because we want to encrypt them, and to do that, we need to be able to both read and write to them. By default, most sections are not writable. So we will need to change the protections of each section to at least RW, and then change them back to their original protection values. If we don’t change them back to their proper values, the program could possibly crash or look suspicious in memory. Every single section being writable is not a common occurrence!

How Can You Run Encrypted Code?

Another challenge we need to tackle is encrypting the .text section. Since it contains all the executable code, if we encrypt it, the assembly becomes gibberish and the code can no longer run. But we need the code to run to encrypt the section. So it’s a bit of a chicken and the egg problem. Luckily there’s a simple solution: use the heap! We can allocate a buffer of memory dynamically, which will reside inside our process address space, but outside of the .text section. But how do we get our C code into that heap buffer to run when it’s always compiled into .text? One word: shellcode.

Ugh, Shellcode??

I know we all love writing complex shellcode by hand, but for this project I am going to cheat and use C to create the shellcode for me. ParanoidNinja has a fantastic blog post on exactly this subject, and I will borrow heavily from that post to create my shellcode.

But what does this shellcode need to do exactly? It has two primary functions: encrypt and decrypt the loaded image, and sleep. So we will write a small C function that takes a pointer to the base address of the loaded image and a length of time to sleep. It will change the memory protections of the sections, encrypt them, sleep for the configured time, and then decrypt everything and return.

Putting It All Together

So the final flow of our program looks like this:

- Generate the shellcode from our C program and include it as a char buffer in our main test program called `sleep.exe`
- In `sleep.exe`, we allocate heap memory for the shellcode and copy it over
- We get the base address of our image and the desired sleep time
- We use the pointer to the heap buffer as a function pointer and call the shellcode like a function, passing in a parameter
- The shellcode will run, encrypt the image, sleep, decrypt, and then return
- We're back inside the `.text` section of `sleep.exe`, so we can continue to do our thing until we want to sleep and repeat the process again

Sleep.exe

Since it’s the simplest, let’s start with a rundown of sleep.exe.

First off, we include the shellcode as a header file. This is generated from the raw binary (which we’ll cover shortly) with xxd -i shellcode.bin > shellcode.h. Then we define the struct we will use as a parameter to the shellcode function, which is called simply run. The struct contains a pointer for the image base address, a DWORD for the sleep time, and a pointer to MessageBoxA, so we can have some visible output from the shellcode. In a real implant you would probably want to omit this. Lastly we create a function pointer typedef, so we can call the shellcode buffer like a normal function.

Struct in sleep.c

Next we begin our main function. We take in a command line parameter with the sleep time, dynamically resolve MessageBoxA, get the image base address with GetModuleHandleA( NULL ), and setup the parameter struct. Then we allocate our heap buffer and copy the shellcode payload into it:

Setup in sleep.c

Finally we create a function pointer to the shellcode buffer, wait for a keypress so we have time to check things out in Process Hacker, and then we execute the shellcode. If all goes well, it will sleep for our configured time and return back to sleep.exe, popping some message boxes in the process. Then we’ll press another key to exit, showing that we do indeed have execution back in the .text section.

Run in sleep.c

C First, Then Shellcode

Now we write the C function that will end up as our position-independent shellcode. ParanoidNinja covers this pretty well in his post, so I won’t rehash it all here, but I will mention some salient points we’ll need to account for.

First, when we call functions in shellcode on x64, we need the stack to be 16 byte aligned. We borrow ParanoidNinja’s assembly snippet to do this, using it as the entry point for the shellcode, which then calls our run function, then returns to sleep.exe.

Next we need to consider calling Win32 APIs from our shellcode. We don’t have the luxury of just calling them as usual, since we don’t know their addresses and have no runtime support, so we need to resolve them ourselves. However, the usual method of calling GetProcAddress with a string of the function to resolve is tricky, as we already need to know the address of GetProcAddress to call it, and using strings in position-independent shellcode requires them to be spelled out in a char array like this: char MyFunc[] = { 'h', 'i', 0x0 };. What we can do instead is use the tried and true method of API hashing. I have borrowed a custom GetProcAddress implementation to do this from here, combining it with a slightly modified djb2 hash algorithm. Here’s how this looks for Sleep and VirtualProtect:

Function resolution in run.c

PE Parsing Fun

Now that we’re able to get the function pointers we need, it’s time to address encrypting the image. The way we’ll do this is by parsing the PE header of the loaded image, since it contains all the information we need to find each section in memory. After talking with Waldo-IRC, it turns out I could also have done with with VirtualQuery, which would make it a more generalizable process. However I did it the PE way, so that’s what I’ll cover here.

The first parameter of our argument struct to the shellcode is the base address of the loaded image in memory. This is effectively a pointer to the beginning of the MSDOS header. So we can use all the usual PE parsing techniques to find the beginning of the section headers. PE parsing can be tedious, so I won’t give a detailed play by play, just the highlights.

Once we have the address of the first section, we can get the three pieces of information we need from it. First is the actual address of the section in memory. The IMAGE_SECTION_HEADER structure contains a VirtualAddress field, which when combined with the image base address, gives us the actual address in memory of the section.

Next we need the size of that section in memory. This is stored in the VirtualSize field of the section header. However this size is not actually the real size of the section when mapped into memory. It’s the size of the actual data in the section. Since by default memory in Windows is allocated in pages of 4 kilobytes, the VirtualSize value is rounded up to the nearest multiple of 4k. The bit twiddling code to do this was taken from StackOverflow here.

The last piece of information about the section we need is the memory protection value. This is stored in the Characteristics field of the section header. This is a DWORD value that looks something like 0x40000040, with the left-most hex digit representing the read, write, or execute permission we care about. We do a little more bit twiddling to get just this value, by shifting it to the right by 28 bits. Once we get this value by itself, we save it in an array indexed by the section number so that we can reuse it later to reset the protections:

Shifting characteristics in run.c

Encryption

Now that we can find each section, know its size, and can restore its memory protections, we can finally encrypt. In the same loop where we parsed each section, we call our encryption function:

Shifting characteristics in run.c

The encryption/decryption functions take the address, size, and memory protection to apply, as well as a pointer to the address of the VirtualQuery function, so that we don’t have to resolve it each time:

Encrypt/decrypt functions in run.c

To encrypt, we change the memory protections to RW, then XOR each byte of the section. Once we have encrypted each section, we finish by encrypting the PE headers. They reside in a single 4k page starting at the base address. With that, the entire loaded image is encrypted!

Sleep and Decryption

Now that we’ve encrypted the entire image, we can sleep by calling the dynamically resolved Sleep function pointer, using the passed-in sleep duration DWORD.

Once we’ve finished sleeping, we decrypt everything. We have to make sure that we decrypt the PE headers page first, because we use it to find the addresses of all the other sections. Then we pop a message box to tell us we’re done, and return to sleep.exe!

Getting The Shellcode

ParanoidNinja covers this part in detail as well, but briefly the process is this:

- Compile the stack alignment assembly and the C code to an object file
- Link the two object files together into an EXE
- Use `objcopy` to extract just the `.text` into file
- Convert the shellcode file into a `char` array for `sleep.c`

Demo Time

To verify everything is being encrypted and decrypted properly, we can use Process Hacker to inspect the memory. Here I’ve called sleep.exe with a 5 second sleep time. The process has started, but since I haven’t pressed a key, everything is still unencrypted:

PE headers unencrypted

Here I have pressed a key and the encryption process has started. I have pressed “Re-Read” memory in Process Hacker, and you can see that the header page has been XOR encrypted:

PE headers encrypted

After the sleep is finished and decryption takes place, we get a message box telling us we’re done. Once we refresh the memory in Process Hacker, we can see we have the PE header page back again!

Demo complete

You can repeat this with each section in Process Hacker and see that they are all indeed encrypted.

Conclusion

I find it really educational to recreate Cobalt Strike features, and this one was no exception. I don’t know if this is at all close to how Cobalt Strike handles sleep obfuscation, but this does seem to be a viable method, and I will likely tweak it further and include it in my C2 framework. If you have any questions or input on this, please let me know or open an issue on Github.

❌
❌