Reading view

There are new articles available, click to refresh the page.

Updating PE file imports on process start

When we need to change the PE file imports, we might either modify the binary file in the file system or perform updates after it has been loaded to the memory. In this post, I will focus on the latter approach, showing you moments in the process lifetime when such changes are possible. We will end up with a small app capable of updating imports in newly started remote processes.

What we will be modifying

Let’s begin with some basics on the PE file structure. Typically, the data about PE file imports resides in the .idata section. And we need to read the image import directory (IMAGE_DIRECTORY_ENTRY_IMPORT) in the NT Optional Header to understand how this data is laid out. In this directory, we will find an array of IMAGE_IMPORT_DESCRIPTOR structures:

typedef struct _IMAGE_IMPORT_DESCRIPTOR { 
    union { 
        DWORD   Characteristics;            // 0 for terminating null import descriptor 
        DWORD   OriginalFirstThunk;         // RVA to original unbound IAT (PIMAGE_THUNK_DATA) 
    } DUMMYUNIONNAME; 
    DWORD   TimeDateStamp;                  // 0 if not bound, 
                                            // -1 if bound, and real date\time stamp 
                                            //     in IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT (new BIND) 
                                            // O.W. date/time stamp of DLL bound to (Old BIND) 
    DWORD   ForwarderChain;                 // -1 if no forwarders 
    DWORD   Name; 
    DWORD   FirstThunk;                     // RVA to IAT (if bound this IAT has actual addresses) 
} IMAGE_IMPORT_DESCRIPTOR;

The Name field points to the name of the imported DLL and the OriginalFirstThunk and FirstThunk fields point to arrays of IMAGE_THUNK_DATA, which hold information about the functions imported from a given library. All those fields’ values are relative virtual addresses (RVAs), so offsets to the image base address after it has been loaded into memory. Additionally, a thunk could represent either an import by ordinal (IMAGE_ORDINAL_FLAG is set) or an import by name (the thunk holds an RVA to the IMAGE_IMPORT_BY_NAME structure). It is important to note that each thunk array must end with a zeroed thunk. You may be wondering why there are two thunk arrays per each DLL. At the beginning, they hold the same values, but once the imports are resolved, the loader will overwrite values in the FirstThunk array with actual addresses of the resolved functions. The thunks for all the imports are usually the first bytes of the .idata section and they are also referenced by the import address table (IAT) directory (I highly recommend downloading PE 102 by @corkami – a beautiful and very readable diagram of the PE file format).

Depending on what we want to achieve, we may either modify only the resolved thunk arrays or the whole descriptors array. For example, redirecting one function to another when both functions belong to already loaded DLLs could be achieved by simply overwriting the corresponding resolved address in a thunk array. However, injecting a new DLL to the process or adding a new function import to an existing DLL requires changes to the descriptors array. And that’s the case on which I will focus mainly in this post.

The application we are going to develop, named importando, accepts a list of tasks (-i)  to perform on the remote process image:

  • -i test.dll!TestMethod or test.dll#1 to inject a function by name or by ordinal 
  • -i test.dll!TestMethod:test2.dll!TestMethod to replace a given imported function with a different one 

Let’s now examine what are our options to implement those tasks.

Updating a suspended process

A common approach is to start the process as suspended (CREATE_SUSPENDED flag) and modify the imports table before the first thread resumes execution. Unfortunately, when the CreateProcess function returns, the loader has already resolved the imported function addresses (ntdll!LdrStateInit equals 2). Therefore, this approach will not work if we need to fix an incorrect import definition (for example, a wrong DLL or function name). However, as the loader has not yet reached completion (LdrStateInit is not 3), we still may perform some actions on the import directory. For example, we can inject a new DLL into a process (like DetoursCreateProcessWithDlls). We may also override addresses of the resolved functions. When the main thread resumes (ResumeThread), the loader will finish its work and the code will execute with our changes applied.

Updating a process from a debugger

If we need to have earlier access to the executable import directory data, we could resort to the debugger API. Running process under a debugger gives us a few more chances to apply import modifications. The first interesting debug event is CREATE_PROCESS_DEBUG_EVENT. When the debugger receives it, the loader has not yet started resolving the dependencies, but the executable image is already loaded into the memory. That is a perfect moment for fixing problems that are causing critical loader errors, for example, an infamous “entry not found” error:

The next interesting event is EXCEPTION_DEBUG_EVENT with ExceptionRecord.ExceptionCode equal to STATUS_BREAKPOINT (if we are debugging a 32-bit process with a 64-bit debugger, we should skip the first STATUS_BREAKPOINT and instead wait for STATUS_WX86_BREAKPOINT). It is the initial process breakpoint, triggered by the loader when it is in a state very similar to the one in an initially suspended process, described in the previous section (so LdrStateInit equals 2). Finally, the debugger also receives LOAD_DLL_DEBUG_EVENT for each loaded DLL before the loader started resolving its dependencies. Thus, in the handler of this event, we could fix issues in the import directories of the dependent libraries.

I also recorded a YouTube video where I present how you may make those fixes manually in WinDbg. It could be helpful to better visualize the steps we will perform in the importando code.

Implementing importando (in C#)

As you remember from the first section, our goal is to support both import redirects and new import injections. If you are wondering why importando, I thought it sounds nice and the name describes what we will be doing: import and override (it happens in the reverse order, but it is just a nitpick 😊). As we want to support all types of modifications to the import directory, the logical choice is to use the debugging API. Thanks to CsWin32, writing a native debugger in C# is not a very demanding task. Here is the debugger loop with the few events importando uses:

HANDLE processHandle = HANDLE.Null;
nuint imageBase = 0;
bool is64bit = false;
bool isWow64 = false;

ModuleImport[] originalImports = [];
ModuleImport[] newImports = [];

while (!cts.Token.IsCancellationRequested)
{
    if (WaitForDebugEvent(1000) is { } debugEvent)
    {
        switch (debugEvent.dwDebugEventCode)
        {
            case DEBUG_EVENT_CODE.CREATE_PROCESS_DEBUG_EVENT:
                {
                    logger.WriteLine($"CreateProcess: {debugEvent.dwProcessId}");

                    Debug.Assert(pid == debugEvent.dwProcessId);
                    var createProcessInfo = debugEvent.u.CreateProcessInfo;

                    // we are closing hFile handle after we finish reading the image data
                    using var pereader = new PEReader(new FileStream(
                        new SafeFileHandle(createProcessInfo.hFile, true), FileAccess.Read));

                    processHandle = createProcessInfo.hProcess;
                    is64bit = pereader.Is64Bit();
                    isWow64 = Environment.Is64BitProcess && !is64bit;
                    unsafe { imageBase = (nuint)createProcessInfo.lpBaseOfImage; }

                    (originalImports, newImports) = UpdateProcessImports(processHandle,
                        pereader, imageBase, importUpdates, forwards);
                }
                break;

            case DEBUG_EVENT_CODE.EXCEPTION_DEBUG_EVENT:
                if (debugEvent.u.Exception.ExceptionRecord.ExceptionCode == (
                    isWow64 ? NTSTATUS.STATUS_WX86_BREAKPOINT : NTSTATUS.STATUS_BREAKPOINT))
                {
                    // first breakpoint exception is the process breakpoint - it happens when loader finished its initial
                    // work and thunks are resolved
                    Debug.Assert(imageBase != 0 && !processHandle.IsNull);
                    UpdateForwardedImports(processHandle, is64bit, imageBase, originalImports, newImports, forwards);
                    cts.Cancel();
                }
                else
                {
                    logger.WriteLine($"Unexpected exception: {debugEvent.u.Exception.ExceptionRecord.ExceptionCode.Value:x}");
                }
                break;

            case DEBUG_EVENT_CODE.EXIT_PROCESS_DEBUG_EVENT:
                cts.Cancel();
                break;
            default:
                break;
        }

        if (!PInvoke.ContinueDebugEvent(debugEvent.dwProcessId,
            debugEvent.dwThreadId, NTSTATUS.DBG_EXCEPTION_NOT_HANDLED))
        {
            throw new Win32Exception(Marshal.GetLastPInvokeError(), $"{nameof(PInvoke.ContinueDebugEvent)} error");
        }
    }
}

I will mention that again later, but the full source code is available in the importando GitHub repository. In the post, I will rather focus on the crucial pieces of the solution, so please refer to the code in the repository in case you would like to check the skipped parts.

I also created a few wrapping record classes for the parsed import data. Using native structures could be an option, however, I wanted to make them more C# friendly and also architecture agnostic.

interface IFunctionImport { }

record FunctionImportByName(uint Rva, ushort Hint, string FunctionName) : IFunctionImport;

record FunctionImportByOrdinal(uint Ordinal) : IFunctionImport;

record NullImport : IFunctionImport;

record FunctionThunk(IFunctionImport Import);

record ModuleImport(string DllName, uint DllNameRva, uint OriginalFirstThunkRva,
    uint FirstThunkRva, FunctionThunk[] FirstThunks)

The handler of CREATE_PROCESS_DEBUG_EVENT, or rather the UpdateProcessImports function, reads the existing imports (PEImports.ReadModuleImports), prepares new import descriptors with thunk arrays for the updated ones (PEImports.PrepareNewModuleImports), and saves them in the remote process memory (PEImports.UpdateImportsDirectory). Btw., the PEReader class is a great helper in parsing PE structures. We also need to update the imports data directory in the NT optional header as it should point to our new import descriptors (UpdatePEDirectory):

static (ModuleImport[] OriginalImports, ModuleImport[] NewImports) UpdateProcessImports(HANDLE processHandle,
    PEReader imageReader, nuint imageBase, ImportUpdate[] importUpdates, (string ForwardFrom, string ForwardTo)[] forwards)
{
    var existingImports = PEImports.ReadModuleImports(imageReader);

    var newImports = PEImports.PrepareNewModuleImports(existingImports, importUpdates, forwards);

    var is64bit = imageReader.Is64Bit();
    var (importDirRva, importDirSize) = PEImports.UpdateImportsDirectory(processHandle, is64bit, imageBase, newImports);

    nuint dataDirectoriesRva = (nuint)(imageReader.PEHeaders.PEHeaderStartOffset +
        (is64bit ? Marshal.OffsetOf<IMAGE_OPTIONAL_HEADER64>("DataDirectory") : Marshal.OffsetOf<IMAGE_OPTIONAL_HEADER32>("DataDirectory")));

    UpdatePEDirectory(dataDirectoriesRva, IMAGE_DIRECTORY_ENTRY.IMAGE_DIRECTORY_ENTRY_IMPORT, importDirRva, importDirSize);
    UpdatePEDirectory(dataDirectoriesRva, IMAGE_DIRECTORY_ENTRY.IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT, 0, 0);

    return (existingImports, newImports);
}

Because most of the RVA addresses in PE file header are 4-byte long (DWORD), the PEImports.UpdateImportsDirectory needs to allocate space in the memory for the new imports as near to the image base address as possible. I ported to C# the FindAndAllocateNearBase function from the Detours library to achieve that.

To better show you what importando is doing, I draw a simple picture of the memory layout after an example forward of the shell32.dll!StrCpyNW to shlwapi.dll!StrCpyNW. Notice a new import descriptors table (separate from the original one) that is pointed by the import directory. Importando also needed to create new thunk arrays for imports requiring an update (in this case, shell32.dll) and for new ones (shlwapi.dll), but reused existing thunks for unmodified imports (user32.dll):

Once we updated the import directories, we are ready to resume the process execution and wait for the loader to perform its initial work.

This leads us to the next stop in the debugger loop, the EXCEPTION_DEBUG_EVENT handler. The name of the event may be a little misleading as it is triggered not only when code in the remote process throws an exception, but also when it hits a breakpoint. And Windows loader triggers a breakpoint (STATUS_BREAKPOINT) when it detects that there is a debugger attached to the starting process. In the WOW64 context (when a 64-bit debugger debugs a 32-bit application), there are actually two initial breakpoints, STATUS_BREAKPOINT and STATUS_WX86_BREAKPOINT, and it is the latter that interests us. At this point, the loader resolved all the addresses in the thunk arrays from the new imports directory. However, we are not done yet as the old thunks still hold RVA (unresolved) addresses. We need to update them as those thunks are referenced by the application code. And here comes the last step in our coding journey, the UpdateForwardedImports function:

static void UpdateForwardedImports(HANDLE processHandle, bool is64bit, nuint imageBase,
    ModuleImport[] originalImports, ModuleImport[] newImports, (string ForwardFrom, string ForwardTo)[] forwards)
{
    int thunkSize = is64bit ? Marshal.SizeOf<IMAGE_THUNK_DATA64>() : Marshal.SizeOf<IMAGE_THUNK_DATA32>();

    uint GetThunkRva(ModuleImport[] moduleImports, string importName)
    { /* ... */  }

    void CopyThunkValues(uint fromRva, uint toRva)
    { /* ... */  }

    foreach ((string forwardFrom, string forwardTo) in forwards)
    {
        var originalThunkRva = GetThunkRva(originalImports, forwardFrom);
        var newThunkRva = GetThunkRva(newImports, forwardTo);

        if (originalThunkRva != 0 && newThunkRva != 0)
        {
            // new thunk should be resolved by now, so we may copy its value to the original place
            // that could be referenced by application code
            CopyThunkValues(newThunkRva, originalThunkRva);
        }
        else
        {
            Console.WriteLine($"WARNING: could not find import {forwardFrom} or {forwardTo}");
        }
    }
}

We may now continue debugging or detach from the remote process (that’s what importando is doing) and let it freely run. Our job is done and we should see new imports in the modules list of our target application.

The importando’s source code and binaries are available in its GitHub repository.

Generating C# bindings for native Windows libraries 

When writing system applications in C# we often need to interact with the system APIs directly. And it has always been a challenge to write proper PInvoke signatures. However, with the introduction of the Windows metadata project and later, cswin32, things changed significantly. In this post, I will walk you through the steps required to generate C# bindings for a sample native library. I picked Detours, because I needed it for withdll, my new tool inspired by the withdll example from the Detours repository. The post by Rafael Rivera describing how to create bindings for Rust language helped me tremendously in writing this post (and bindings 😊). 

Creating Windows metadata 

Preparing the metadata project 

Before we could see the generated C# code, we need to build a Windows metadata (winmd) file. Rafael describes the steps in details, so I will take a shortcut here and show you the generate.proj for the detours library: 

<?xml version="1.0" encoding="utf-8"?>
<Project Sdk="Microsoft.Windows.WinmdGenerator/0.55.45-preview">
    <PropertyGroup Label="Globals">
        <OutputWinmd>winmd/detours.winmd</OutputWinmd>
        <WinmdVersion>0.1.0.0</WinmdVersion>
    </PropertyGroup>

    <ItemGroup>
        <Headers Include="../detours/include/detours.h" />

        <ImportLibs Include="../detours-dll/bin.x64$(BuildConfig)/detours.lib">
            <StaticLibs>detours=detours</StaticLibs>
        </ImportLibs>

        <Partition Include="main.cpp">
            <TraverseFiles>@(Headers)</TraverseFiles>
            <Namespace>Microsoft.Detours</Namespace>
            <ExcludeFromCrossarch>true</ExcludeFromCrossarch>
        </Partition> 
    </ItemGroup>
</Project>

My folder structure looks as follows: 

I also needed to add the ImportLibs item as my library is not in the folders searched normally by the MSVC compiler. Additionally, the output of the detours build is a static library (detours.lib) that we can link with our project. Theoretically, we can point Windows Metadata Generator to static libraries using the StaticLibs tag. However, I did not manage to make it work without creating an additional shared library. There is an old issue in the win32metadata project about importing static libraries directly, but it was never resolved. I noticed though that the generated methods have StaticLibraryAttribute attached to them. Still, I’m unsure what its purpose is. 

Building a wrapping DLL for the static library

Fortunately, creating a shared library for a static library is a straightforward process. You need a cpp file, for example: 

#include <windows.h>

BOOL APIENTRY DllMain(HMODULE hModule, DWORD  ul_reason_for_call, LPVOID lpReserved) {
    UNREFERENCED_PARAMETER(lpReserved);

    switch (ul_reason_for_call) {
    case DLL_PROCESS_ATTACH:
        ::DisableThreadLibraryCalls(hModule);
        break;
    case DLL_PROCESS_DETACH:
        break;
    default:
        break;
    }
    return TRUE;
}

And a module-definition file (detours.def) that will list the exported methods, for example: 

LIBRARY detours
EXPORTS
	DetourCreateProcessWithDllExW
	DetourCreateProcessWithDllsW

Then you need to compile your DLL, for example: 

cl.exe /I "..\detours\include" /nologo /LD /TP /DUNICODE /DWIN32 /D_WINDOWS /EHsc /W4 /WX /Zi /O2 /Ob1 /DNDEBUG -std:c++latest detours.cpp /link /DEF:detours.def ..\detours\lib.X64\detours.lib

Now, we may point ImportLibs to the detours-dll folder and try to build the detours.winmd file.

Building the metadata project 

This step should be as straightforward as running the dotnet build command. Sometimes, however, you may run into problems with the parsers. For detours, for example, I needed to remove a section from the detours header. Finally, the build was successful and I could verify in ILSpy that the detours.winmd file contains all the methods exported by my detours.dll: 

Generating and using bindings from the metadata project 

With the metadata file ready, it’s time to use it in our C# project. Firstly, we will install the cswin32 package that imports Win32 metadata and allows us to define which types and methods we want to import through the NativeMethods.txt file. Cswin32 by default understands only names defined in the Win32 metadata project. However, thanks to the ProjectionMetadataWinmd tag we can easily make it process our custom metadata files as well! 

<ItemGroup>
    <ProjectionMetadataWinmd Include="../detours-meta/winmd/*.winmd" />
</ItemGroup>

Now, we may reference the names that we want to import in the NativeMethods.txt and, finally, use it in our code: 

// NativeMethods.txt

// Windows
CloseHandle
DebugActiveProcessStop
// ...

// Detours
DetourCreateProcessWithDllsW
// ...
// An example C# code using the DetourCreateProcessWithDlls function

using PInvokeDetours = Microsoft.Detours.PInvoke;
using PInvokeWin32 = Windows.Win32.PInvoke;

var pcstrs = dllPaths.Select(path => new PCSTR((byte*)Marshal.StringToHGlobalAnsi(path))).ToArray();
try
{
    if (!PInvokeDetours.DetourCreateProcessWithDlls(null, ref cmdline, null, null, false,
        createFlags, null, null, startupInfo, out var processInfo,
        pcstrs, null))
    {
        throw new Win32Exception();
    }

    PInvokeWin32.CloseHandle(processInfo.hThread);
    PInvokeWin32.CloseHandle(processInfo.hProcess);

    if (debug)
    {
        PInvokeWin32.DebugActiveProcessStop(processInfo.dwProcessId);
    }
}
finally
{
    Array.ForEach(pcstrs, pcstr => Marshal.FreeHGlobal((nint)pcstr.Value));
}

The NativeAOT compilation gives us also an option to statically link the detours library:

<ItemGroup>    
    <DirectPInvoke Include="detours" />
    <NativeLibrary Include="../detours/lib.X64/detours.lib" />
</ItemGroup>

It’s fantastic that we can have all the native imports in one place. If you would like to examine a working project that uses the presented detours bindings, please check the withdll repository. You may use it to, for example, inject detours DLLs that hook Win API functions and trace them. I describe this usage scenario in a guide at wtrace.net

Troubleshooting NT_STATUS_ACCESS_DENIED from Samba on Manjaro Linux

A few months ago, I switched my main desktop to Manjaro, and I’m glad about it. Manjaro Linux is a polished and well-designed Linux distribution. As I like simplicity and a minimalistic approach, I chose the XFCE Desktop edition. Switching to Linux did not make me abandon the Windows platform completely. I spend lots of my work and hobby time on this OS. But I run it in QEMU-KVM VMs, configured through the Virtual Manager. As I experiment with various system settings, I have a base VM image and clone it when necessary for new projects/research. Thanks to this configuration, I finally stopped breaking my main system 🙂 One thing I needed to figure out was a way to share files between my Linux host and Windows VMs. I picked Samba as I wanted something which would look native in Windows. And here my troubleshooting story begins 🙂 I could summarize it in one sentence: “always check the system journald log,” but if you’re interested in a more extended and convoluted approach, please read on 🙂

When Samba returns NT_STATUS_ACCESS_DENIED

My smb.conf file looks as follows:

[global]
   browse list = yes
   config backend = file
   debug pid = yes
   debug timestamp = yes
   debug uid = yes
   dns proxy = no
   follow symlinks = no
   guest account = nobody
   load printers = no
   log file = /var/log/samba/%m.log
   log level = 2
   logging = systemd file
   map to guest = Bad User
   max log size = 1000
   name resolve order = lmhosts bcast host wins
   passdb backend = tdbsam
   security = user
   server role = standalone server
   usershare path = /var/lib/samba/usershare
   usershare allow guests = yes
   usershare max shares = 100
   usershare owner only = yes
   workgroup = WORKGROUP

[homes]
   browseable = no
   comment = Home Directories
   create mask = 0660
   directory mask = 0770
   guest ok = no
   read only = no
   valid users = %S

[winshare]
   browseable = yes
   comment = Share directory
   guest ok = no
   path = /mnt/data/winshare
   read only = no
   force group = +winshare
   valid users = me,ssolnica
  
[symbols]
   browseable = yes
   comment = Symbols
   guest ok = no
   path = /mnt/data/symbols
   read only = no
   valid users = me

I created the Windows user (smbpasswd -a me) and enabled smb and nmb services (systemctl enable nmb && systemctl enable smb). I configured Samba in Server Standalone mode as I did not need any of the AD features (by the way, it’s incredible that you may set up the whole AD in Linux!). When I tried my shares in Windows, the \\mypc.local\me share was working fine, but \\mypc.local\winshare was returning NT_STATUS_ACCESS_DENIED. I stopped the Samba service and ran it manually with debug level set to 3 (alternatively, you could specify debug level in the smb.conf file):

# systemctl stop smb

# smbd --no-process-group --foreground -d 3 --debug-stdout

Then, I tried the share in smbclient:

$ smbclient -U me //mypc/winshare 
Password for [WORKGROUP\me]:
Try "help" to get a list of possible commands.
smb: \> ls
NT_STATUS_ACCESS_DENIED listing \*

The error reported by Samba pointed to the file system. So I restarted the service and attached strace to it. You need to make sure to trace the child processes (-f/-ff) as the primary Samba server launches a child server for each client session:

strace -p 4350 -ff -o smbd.strace

Here is some interesting content from the output file:

...
readlink("/mnt/data/winshare", 0x7ffe77011d00, 1023) = -1 EINVAL (Invalid argument)
setgroups(12, [956, 1000, 998, 991, 3, 90, 98, 1001, 962, 961, 150, 1002]) = 0
setresgid(-1, 1000, -1)                 = 0
getegid()                               = 1000
setresuid(1000, 1000, -1)               = 0
geteuid()                               = 1000
chdir("/mnt/data/winshare")             = 0
newfstatat(AT_FDCWD, ".", {st_mode=S_IFDIR|S_ISGID|0770, st_size=4096, ...}, 0) = 0
getcwd("/mnt/data/winshare", 4096)      = 19
getcwd("/mnt/data/winshare", 1024)      = 19
openat(AT_FDCWD, ".", O_RDONLY|O_NOFOLLOW|O_PATH) = 12
newfstatat(12, "", {st_mode=S_IFDIR|S_ISGID|0770, st_size=4096, ...}, AT_EMPTY_PATH) = 0
openat(12, ".", O_RDONLY|O_NOFOLLOW|O_PATH) = 26
newfstatat(26, "", {st_mode=S_IFDIR|S_ISGID|0770, st_size=4096, ...}, AT_EMPTY_PATH) = 0
newfstatat(25, "", {st_mode=S_IFREG|0600, st_size=45056, ...}, AT_EMPTY_PATH) = 0
munmap(0x7f4b82f0c000, 696)             = 0
mmap(NULL, 36864, PROT_READ|PROT_WRITE, MAP_SHARED, 25, 0x2000) = 0x7f4b82e63000
openat(AT_FDCWD, "/proc/self/fd/26", O_RDONLY|O_DIRECTORY) = -1 EACCES (Permission denied)
close(26)                               = 0
...

We can see that the Samba process switches the effective user and group to the authenticated user (me) and then performs actions on the file system. We can see in the trace that the openat syscall fails with the EACCESS error. I double-checked all file system permissions and made me the owner of the winshare folder. Still, the EACCESS error persisted. I was so confused that I even wrote a simple app to reproduce the syscalls above:

#include <iostream>
#include <array>
#include <sstream>

#include <unistd.h>
#include <grp.h>
#include <fcntl.h>
#include <errno.h>

int main(int argc, char* argv[]) {
    std::cout << "euid: " << ::geteuid() << std::endl;
    std::cout << "egid: " << ::getegid() << std::endl;

    std::array<gid_t, 12> groups {956, 1000, 998, 991, 3, 90, 98, 1001, 962, 961, 150, 1002};
    if (::setgroups(groups.size(), groups.data()) != 0) {
        std::cout << "setgroups error: " << errno << std::endl;
        return 2;
    }

    if (int err = ::setresgid(-1, 1000, -1); err != 0) {
        std::cout << "error: " << err << std::endl;
        return err;
    }

    if (int err = ::setresuid(1000, 1000, -1); err != 0) {
        std::cout << "error: " << err << std::endl;
        return err;
    }

    std::cout << "euid: " << ::geteuid() << std::endl;
    std::cout << "egid: " << ::getegid() << std::endl;

    if (int err = ::chdir("/mnt/data/winshare"); err != 0) {
        std::cout << "error: " << err << std::endl;
        return err;
    }

    std::array<char, 1024> cwd{};
    if (::getcwd(cwd.data(), cwd.size()) == nullptr) {
        std::cout << "getcwd error: " << errno << std::endl;
        return -1;
    }
    std::cout << "cwd: " << cwd.data() << std::endl;

    // strace: openat(AT_FDCWD, ".", O_RDONLY|O_NOFOLLOW|O_PATH) = 12
    if (int fd = ::openat64(AT_FDCWD, ".", O_RDONLY|O_NOFOLLOW|O_PATH); fd != -1) {
        std::cout << "Folder opened: " << fd << std::endl;

        // strace: openat(AT_FDCWD, "/proc/self/fd/26", O_RDONLY|O_DIRECTORY) = -1 EACCES (Permission denied)
        std::stringstream ss{};
        ss << "/proc/self/fd/" << fd;
        auto proc_path = ss.str();
        if (int proc_fd = ::openat64(AT_FDCWD, proc_path.c_str(), O_RDONLY|O_DIRECTORY); proc_fd != -1) {
            std:: cout << "Proc folder opened: " << proc_fd << std::endl;

            std::cin >> proc_path;

            ::close(proc_fd);
        } else {
            std::cout << "proc openat error: " << errno << std::endl;
        }

        ::close(fd);
        return 0;
    } else {
        std::cout << "openat error: " << errno << std::endl;
        return -1;
    }
}

As you may guess, there was no error when I ran it. I scratched my head, looking online for similar issues, but could find nothing. As I had a lot of pending work, I started using the \\mypc.local\me share. Samba worked fine except for two issues: it was impossible to list the browseable shares from the Windows machines, and, secondly, the initial I/O requests over Samba were often very slow. Still, the initial problem was bugging me the most.

After a few weeks, I finally found some time to give it a second try.

Filesystem security checks are not the only ones

I again struggled with Samba config (I read the whole smb.conf man page! :)), but ended with strace. As I had my sample application working, I started comparing the process properties in the proc file system. And there, I discovered the attr folder, which stores various security-related attributes. The /proc/{pid}/attr/current file for my sample process contained unconfined while for the smbd process, its content was smbd (enforce). After searching through manual pages and Arch Linux wiki, I found that those settings come from the AppArmor module. The aa-status command only confirmed that:

# aa-status
apparmor module is loaded.
80 profiles are loaded.
77 profiles are in enforce mode.
...
   samba-dcerpcd
   samba-rpcd
   samba-rpcd-classic
   samba-rpcd-spoolss
   smbd
...
9 processes are in enforce mode.
   /usr/bin/avahi-daemon (1479) avahi-daemon
   /usr/bin/avahi-daemon (1489) avahi-daemon
   /usr/bin/dnsmasq (1698) dnsmasq
   /usr/bin/dnsmasq (1699) dnsmasq
   /usr/bin/nmbd (1778) nmbd
   /usr/bin/smbd (1785) smbd
   /usr/bin/smbd (1787) smbd
   /usr/bin/smbd (1788) smbd
   /usr/bin/smbd (5225) smbd
...

Now, I needed to locate the problematic AppArmor profiles. But how to find their names? Obviously, in the system journal! I should have checked it in the very beginning. I was studying the smb unit logs while all the details were at my fingertips:

# journalctl -fx
...
lis 06 12:19:14 mypc audit[5535]: AVC apparmor="DENIED" operation="open" profile="smbd" name="/mnt/data/winshare/" pid=5535 comm="smbd" requested_mask="r" denied_mask="r" fsuid=1000 ouid=1000
...

The smbd profile, defined in /etc/apparmor.d/usr.sbin.smbd, denies access to my target folder. Let’s have a look at it (I left only the essential parts):

abi <abi/3.0>,

include <tunables/global>

profile smbd /usr/{bin,sbin}/smbd {
  ...

  /etc/mtab r,
  /etc/netgroup r,
  /etc/printcap r,
  /etc/samba/* rwk,
  @{PROC}/@{pid}/mounts r,
  @{PROC}/sys/kernel/core_pattern r,
  /usr/lib*/samba/vfs/*.so mr,
  /usr/lib*/samba/auth/*.so mr,
  /usr/lib*/samba/charset/*.so mr,
  /usr/lib*/samba/gensec/*.so mr,
  /usr/lib*/samba/pdb/*.so mr,
  /usr/lib*/samba/{,samba/}samba-bgqd Px -> samba-bgqd,
  /usr/lib*/samba/{,samba/}samba-dcerpcd Px -> samba-dcerpcd,
  /usr/lib*/samba/{lowcase,upcase,valid}.dat r,
  /usr/lib/@{multiarch}/samba/*.so{,.[0-9]*} mr,
  /usr/lib/@{multiarch}/samba/**/ r,
  /usr/lib/@{multiarch}/samba/**/*.so{,.[0-9]*} mr,
  /usr/share/samba/** r,
  /usr/{bin,sbin}/smbd mr,
  /usr/{bin,sbin}/smbldap-useradd Px,
  /var/cache/samba/** rwk,
  /var/{cache,lib}/samba/printing/printers.tdb mrw,
  /var/lib/samba/** rwk,
  /var/lib/sss/pubconf/kdcinfo.* r,
  @{run}/dbus/system_bus_socket rw,
  @{run}/smbd.pid rwk,
  @{run}/samba/** rk,
  @{run}/samba/ncalrpc/ rw,
  @{run}/samba/ncalrpc/** rw,
  @{run}/samba/smbd.pid rw,
  /var/spool/samba/** rw,

  @{HOMEDIRS}/** lrwk,
  /var/lib/samba/usershares/{,**} lrwk,

  # Permissions for all configured shares (file autogenerated by
  # update-apparmor-samba-profile on service startup on Debian and openSUSE)
  include if exists <samba/smbd-shares>
  include if exists <local/usr.sbin.smbd-shares>

  # Site-specific additions and overrides. See local/README for details.
  include if exists <local/usr.sbin.smbd>

Now, all is clear. AppArmor adds MAC (Mandatory Access Control) to the Samba process and interferes with the file system access checks. My share path (/mnt/data/winshare) was not in the AppArmor profile; thus, access was denied. I believe that Debian and openSUSE users might not experience this problem thanks to the update-apparmor-samba-profile script, but I haven’t had a chance to check it. Anyway, the solution for me was to create /etc/apparmor.d/local/usr.sbin.smbd-shares with the missing access rights (I will have more shares from the data drive, so I just gave access to /mnt/data).

While testing my shares with the system journal monitored, I discovered some more rules missing in the default AppArmor profiles. And I found that I wasn’t the only one with this problem. Inglebard reported a very similar issue and provided updates to the rules that worked for him. I added a comment with my findings. Finally, below are the updates that fixed all my problems with Samba.

$ cat /etc/apparmor.d/local/usr.sbin.smbd-shares
/mnt/data/** lrwk,
$ cat /etc/apparmor.d/local/samba-dcerpcd
# Site-specific additions and overrides for 'samba-dcerpcd'

@{run}/samba-dcerpcd.pid lrwk,

/var/cache/samba/** rwk,

@{HOMEDIRS}/** lrwk,
/var/lib/samba/usershares/{,**} lrwk,

include if exists <samba/smbd-shares>
include if exists <usr.sbin.smbd-shares>
$ cat /etc/apparmor.d/local/samba-rpcd
# Site-specific additions and overrides for 'samba-rpcd'

/var/cache/samba/** rwk,

@{HOMEDIRS}/** lrwk,
/var/lib/samba/usershares/{,**} lrwk,

include if exists <samba/smbd-shares>
include if exists <usr.sbin.smbd-shares>
$ cat /etc/apparmor.d/local/samba-rpcd-classic
# Site-specific additions and overrides for 'samba-rpcd-classic'

/var/cache/samba/** rwk,
/dev/urandom rwk,

@{HOMEDIRS}/** lrwk,
/var/lib/samba/usershares/{,**} lrwk,

include if exists <samba/smbd-shares>
include if exists <usr.sbin.smbd-shares>

COM+ revisited

More than ten years ago (how time flies!), when I published the basic sample of a COM+ server and client, I thought that I wouldn’t be touching this subject again. But here we are, in 2022, and I have so much interaction with COM at work that I decided to write a new, updated, and a bit more detailed post about this technology 😁 I don’t want to convince you to use COM as the backbone for your new applications. Instead, I want to show you how you may approach and use COM APIs if you need to work with them. We will also do some COM debugging in WinDbg. Additionally, I plan to release a new COM troubleshooting tool as part of the wtrace toolkit. Remember to subscribe to wtrace updates if you’re interested.

Today’s post will continue using the old Protoss COM classes, but we will update the code with various modern ideas. As you may remember, Nexus and Probe classes represent Blizzard’s Starcraft game objects. Nexus is a building that may produce Probes (CreateUnit method in the INexus interface), and Probe may build various structures, including Nexuses (ConstructBuilding method in the IProbe interface). I also added a new IGameObject interface, shared by Nexus and Probe, that returns the cost in minerals and the time needed to build a given game object. In IDL, it looks as follows:

[object, uuid(59644217-3e52-4202-ba49-f473590cc61a)]
interface IGameObject : IUnknown
{
    [propget]
    HRESULT Name([out, retval] BSTR* name);

    [propget]
    HRESULT Minerals([out, retval]LONG* minerals);

    [propget]
    HRESULT BuildTime([out, retval]LONG* buildtime);
}

I also added a type library to the IDL:

[
    uuid(0332a9ab-e3bb-4042-bc6a-b98aebd6532d),
    version(1.0),
    helpstring("Protoss 1.0 Type Library")
    ]
library ProtossLib
{
    importlib("stdole2.tlb");

    interface INexus;
    interface IProbe;

    [
        uuid(F5353C58-CFD9-4204-8D92-D274C7578B53),
        helpstring("Nexus Class")
        ]
    coclass Nexus {
        [default] interface INexus;
        interface IGameObject;
    }

    [
        uuid(EFF8970E-C50F-45E0-9284-291CE5A6F771),
        helpstring("Probe Class")
        ]
    coclass Probe {
        [default] interface IProbe;
        interface IGameObject;
    }
}

If we run midl.exe after this change, it will generate a type library file (protoss.tlb). The type library provides a language-agnostic way to access COM metadata. For example, we may import it to a .NET assembly using the tlbimp.exe tool from .NET Framework SDK.

Updating the Protoss COM server

As you remember, the COM server requires a few DLL exports to make its COM classes instantiable. One of them is DllGetClassObject. The DllGetClassObject function from the old post directly constructed the Nexus and Probe objects. The more common approach is to return an IClassFactory instance for each implemented class and let the clients call its CreateInstance method. The clients often do this implicitly by calling the CoCreateInstance or CoCreateInstanceEx functions. These functions first ask for a class factory object and later use it to create a requested class instance. Supporting IClassFactory is straightforward:

STDAPI DllGetClassObject(REFCLSID rclsid, REFIID riid, LPVOID* ppv) {
	if (rclsid == __uuidof(Nexus)) {
		static ProtossObjectClassFactory<Nexus, INexus> factory{};
		return factory.QueryInterface(riid, ppv);
	} 
    if (rclsid == __uuidof(Probe)) {
		static ProtossObjectClassFactory<Probe, IProbe> factory{};
		return factory.QueryInterface(riid, ppv);
	}
	return CLASS_E_CLASSNOTAVAILABLE;
}

The ProtossObjectClassFactory is a class template implementing the IClassFactory interface. I want to bring your attention to the CreateInstance method:

HRESULT __stdcall CreateInstance(IUnknown* pUnkOuter, REFIID riid, void** ppv) override {
    if (pUnkOuter) {
        return CLASS_E_NOAGGREGATION;
    }

    try {
        wil::com_ptr_t<IUnknown> unknown{};
        // attach does not call AddRef (we set ref_count to 1 in COM Objects)
        unknown.attach(static_cast<IT*>(new T()));
        return unknown->QueryInterface(riid, ppv);
    } catch (const std::bad_alloc&) {
        return E_OUTOFMEMORY;
    }

    return S_OK;
}

It uses the wil::com_ptr_t class. It’s one of the many smart pointers provided by Windows Implementation Library. Thanks to wil::com_ptr_t or wil::unique_handle, we no longer need to call Release or CloseHandle methods explicitly – they are called automatically in the smart pointer destructors. Thus, we free the resources when the pointers go out of scope. WIL and modern C++ really make using RAII with Windows API straightforward 😁.

One missing piece in the old code was registration. I used reg files to register the Protoss COM library in the system. It’s not the best way to do so, and, instead, we should implement DllRegisterServer and DllUnregisterServer functions so that the clients may register and unregister our library with the regsvr32.exe tool. The code presented below is based on the sample from the great Windows 10 System Programming book by Pavel Yosifovich. Only in my version, I used WIL, and you may quickly see its usage benefits when you look at the original version (for example, no calls to CloseHandle and no error checks thanks to WIL result macros):

std::array<std::tuple<std::wstring_view, std::wstring, std::wstring>, 2> coclasses{
	std::tuple<std::wstring_view, std::wstring, std::wstring> { L"Protoss Nexus", wstring_from_guid(__uuidof(Nexus)), L"Protoss.Nexus.1" },
	std::tuple<std::wstring_view, std::wstring, std::wstring> { L"Protoss Probe", wstring_from_guid(__uuidof(Probe)), L"Protoss.Probe.1" },
};

STDAPI DllRegisterServer() {
	auto create_reg_subkey_with_value = [](HANDLE transaction, HKEY regkey, std::wstring_view subkey_name, std::wstring_view subkey_value) {
		wil::unique_hkey subkey{};
		RETURN_IF_WIN32_ERROR(::RegCreateKeyTransacted(regkey, subkey_name.data(), 0, nullptr, REG_OPTION_NON_VOLATILE,
			KEY_WRITE, nullptr, subkey.put(), nullptr, transaction, nullptr));
		RETURN_IF_WIN32_ERROR(::RegSetValueEx(subkey.get(), nullptr, 0, REG_SZ,
			reinterpret_cast<const BYTE*>(subkey_value.data()), static_cast<DWORD>((subkey_value.size() + 1) * sizeof(wchar_t))));

		return S_OK;
	};

	wil::unique_handle transaction{ ::CreateTransaction(nullptr, nullptr, TRANSACTION_DO_NOT_PROMOTE, 0, 0, INFINITE, nullptr) };
	RETURN_LAST_ERROR_IF(!transaction.is_valid());

	for (const auto& coclass : coclasses) {
		auto name{ std::get<0>(coclass) };
		auto clsid{ std::get<1>(coclass) };
		auto progId{ std::get<2>(coclass) };

		wil::unique_hkey regkey{};
		// CLSID
		RETURN_IF_WIN32_ERROR(::RegCreateKeyTransacted(HKEY_CLASSES_ROOT, (L"CLSID\\" + clsid).c_str(),
			0, nullptr, REG_OPTION_NON_VOLATILE, KEY_WRITE, nullptr, regkey.put(), nullptr, transaction.get(), nullptr));
		RETURN_IF_WIN32_ERROR(::RegSetValueEx(regkey.get(), L"", 0, REG_SZ,
			reinterpret_cast<const BYTE*>(name.data()), static_cast<DWORD>((name.size() + 1) * sizeof(wchar_t))));

		RETURN_IF_FAILED(create_reg_subkey_with_value(transaction.get(), regkey.get(), L"InprocServer32", dll_path));
		RETURN_IF_FAILED(create_reg_subkey_with_value(transaction.get(), regkey.get(), L"ProgID", dll_path));

		// ProgID
		RETURN_IF_WIN32_ERROR(::RegCreateKeyTransacted(HKEY_CLASSES_ROOT, progId.c_str(),
			0, nullptr, REG_OPTION_NON_VOLATILE, KEY_WRITE, nullptr, regkey.put(), nullptr, transaction.get(), nullptr));
		RETURN_IF_WIN32_ERROR(::RegSetValueEx(regkey.get(), L"", 0, REG_SZ,
			reinterpret_cast<const BYTE*>(name.data()), static_cast<DWORD>((name.size() + 1) * sizeof(wchar_t))));

		RETURN_IF_FAILED(create_reg_subkey_with_value(transaction.get(), regkey.get(), L"CLSID", clsid));
	}

	RETURN_IF_WIN32_BOOL_FALSE(::CommitTransaction(transaction.get()));

	return S_OK;
}

As you maybe noticed, I also added the registration of ProgIDs (Protoss.Nexus.1 and Protoss.Probe.1), which are human-friendly names for our COM classes. With these functions implemented, registering our COM classes is now a matter of calling regsvr32.exe protoss.dll from the administrator’s command line.

Updating the Protoss COM client

Thanks to the type library, we no longer need to explicitly generate and include the header files, but we may import the type library directly into the source code. The #import directive that we use for this purpose has several attributes controlling the representation of the type library in C++. For example, in the Protoss COM client, I’m using the raw_interfaces_only attribute as I want to work with the Protoss interfaces directly using the WIL com_ptr_t smart pointers. Our COM server uses IClassFactory, so we may call the CoCreateInstance function to create an instance of the Nexus class:

#include <iostream>

#include <Windows.h>
#include <wil/com.h>

#import "..\protoss.tlb" raw_interfaces_only

using namespace ProtossLib;

HRESULT show_game_unit_data(IUnknown* unknwn) {
    wil::com_ptr_t<IGameObject> unit{};
    RETURN_IF_FAILED(unknwn->QueryInterface(unit.put()));

    wil::unique_bstr name{};
    RETURN_IF_FAILED(unit->get_Name(name.put()));
    LONG minerals;
    RETURN_IF_FAILED(unit->get_Minerals(&minerals));
    LONG buildtime;
    RETURN_IF_FAILED(unit->get_BuildTime(&buildtime));

    std::wcout << L"Name: " << name.get() << L", minerals: " << minerals
        << L", build time: " << buildtime << std::endl;

    return S_OK;
}

void start_from_probe() {
	wil::com_ptr_t<IProbe> probe{};

	THROW_IF_FAILED(::CoCreateInstance(__uuidof(Probe), nullptr, CLSCTX_INPROC_SERVER, __uuidof(IProbe), probe.put_void()));
	THROW_IF_FAILED(show_game_unit_data(probe.get()));

	auto name{ wil::make_bstr(L"Nexus") };
	wil::com_ptr_t<INexus> nexus{};
	THROW_IF_FAILED(probe->ConstructBuilding(name.get(), nexus.put_unknown()));
	THROW_IF_FAILED(show_game_unit_data(nexus.get()));
}

int main(int argc, char* argv[]) {
    UNREFERENCED_PARAMETER(argc);
    UNREFERENCED_PARAMETER(argv);

    try {
        // a "smart call object" that will execute CoUnitialize in destructor
        auto runtime{ wil::CoInitializeEx(COINIT_APARTMENTTHREADED) };

        start_from_probe();

        return 0;
    } catch (const wil::ResultException& ex) {
        std::cout << ex.what() << std::endl;
        return 1;
    } catch (const std::exception& ex) {
        std::cout << ex.what() << std::endl;
        return 1;
    }
}

If you run the client, you should see the calls to the QueryInterface method and logs from constructors and destructors in the console:

Component: Nexus::QueryInterface: 246a22d5-cf02-44b2-bf09-aab95a34e0cf
Component: Probe::AddRef() ref_count = 2
Component: Probe::Release() ref_count = 1
Component: Probe::AddRef() ref_count = 2
Component: Probe::Release() ref_count = 1
Component: Nexus::QueryInterface: 246a22d5-cf02-44b2-bf09-aab95a34e0cf
Component: Probe::AddRef() ref_count = 2
Component: Probe::Release() ref_count = 1
Component: Nexus::QueryInterface: 59644217-3e52-4202-ba49-f473590cc61a
Component: Probe::AddRef() ref_count = 2
Name: Probe, minerals: 50, build time: 12
Component: Probe::Release() ref_count = 1
Component: Nexus::QueryInterface: 59644217-3e52-4202-ba49-f473590cc61a
Component: Nexus::AddRef() ref_count = 2
Name: Nexus, minerals: 400, build time: 120
Component: Nexus::Release() ref_count = 1
Component: Nexus::Release() ref_count = 0
Component: Nexus::~Nexus()
Component: Probe::Release() ref_count = 0
Component: Probe::~Probe()

We can see that all class instances are eventually freed, so, hurray 🎉, we aren’t leaking any memory!

If you’d like to practice writing COM client code, you may implement a start_from_nexus function to output the same information, but create the Nexus class first. Don’t look at the client code in the repository, as this function is already there 😊

C++ is not the only language to write a COM client. Let’s now implement the same logic in C#. I picked C# not without reason. .NET Runtime provides excellent support for working with native COM objects. Each COM class receives a Runtime Callable Wrapper that makes the COM class look like any other .NET class. Now, you can imagine the number of magic layers to make it happen. So, there is no surprise that sometimes, you may need to wear a magical debugging hat to resolve a problem in COM interop 😅 But if you look at the code, it’s effortless:

using ProtossLib;

public static class Program
{
    static void ShowGameUnitData(IGameObject go)
    {
        Console.WriteLine($"Name: {go.Name}, minerals: {go.Minerals}, build time: {go.BuildTime}");
    }

    static void StartingFromProbe()
    {
        var probe = new Probe();
        ShowGameUnitData((IGameObject)probe);

        var nexus = probe.ConstructBuilding("Nexus");
        ShowGameUnitData((IGameObject)nexus);

        //_ = Marshal.ReleaseComObject(nexus);
        //_ = Marshal.ReleaseComObject(probe);
    }

    [STAThread]
    static void Main()
    {
        StartingFromProbe();

        // force release of the COM objects
        GC.Collect();
    }
}

If you decompile the ProtossLib.dll assembly, you will discover that Probe is, in fact, an interface with a CoClass attribute. And, although it does not implement IGameObject, we may cast it to IGameObject. Magical, isn’t it? 😊 Mixed-mode debugging helps a lot when debugging COM interop in .NET. For example, if you set a breakpoint on the QueryInterface method in the Probe class, you will discover that it i called when you cast the managed Probe instance to IGameObject.

Debugging COM in WinDbg

In this paragraph, I want to focus on debugging COM servers and clients in WinDbg. I will show you some commands, hoping they will be helpful also in your COM troubleshooting.

Let’s start with a breakpoint on the typical entry point for creating COM objects, i.e., the CoCreateInstance function (if the COM client does not use CoCreateInstance, you may set a breakpoint on the CoGetClassObject function):

HRESULT CoCreateInstance(
  [in]  REFCLSID  rclsid,
  [in]  LPUNKNOWN pUnkOuter,
  [in]  DWORD     dwClsContext,
  [in]  REFIID    riid,
  [out] LPVOID    *ppv
);

Our goal is to print the function parameters (CLSID, IID, and the object address), so we know which object the client creates. If we have combase.dll private symbols, it’s a matter of calling the dv command. Otherwise, we need to rely on the dt command. For 32-bit, I usually create the CoCreateInstance breakpoint as follows:

bp combase!CoCreateInstance "dps @esp L8; dt ntdll!_GUID poi(@esp + 4); dt ntdll!_GUID poi(@esp + 10); .printf /D \"==> obj addr: %p\", poi(@esp+14);.echo; bp /1 @$ra; g"

And the 64-bit version is:

bp combase!CoCreateInstance "dps @rsp L8; dt ntdll!_GUID @rcx; dt ntdll!_GUID @r9; .printf /D \"==> obj addr: %p\", poi(@rsp+28);.echo; bp /1 @$ra; g"

I’m using bp /1 @$ra; g to break at the moment when the function returns. I didn’t want to use, for example, gu because one CoCreateInstance may call another CoCreateInstance, and one-time breakpoints are more reliable in such situations. An example 32-bit breakpoint hit might look as follows (notice that when we have private symbols, dps command nicely prints the GUIDs):

009cfe00  008c36ae ProtossComClient!main+0x6e
009cfe04  008c750c ProtossComClient!_GUID_eff8970e_c50f_45e0_9284_291ce5a6f771
009cfe08  00000000
009cfe0c  00000001
009cfe10  008c74b4 ProtossComClient!_GUID_246a22d5_cf02_44b2_bf09_aab95a34e0cf
009cfe14  009cfe3c
009cfe18  36e9dfe6
009cfe1c  00e8b3e0
 {eff8970e-c50f-45e0-9284-291ce5a6f771}
   +0x000 Data1            : 0xeff8970e
   +0x004 Data2            : 0xc50f
   +0x006 Data3            : 0x45e0
   +0x008 Data4            : [8]  "???"
 {246a22d5-cf02-44b2-bf09-aab95a34e0cf}
   +0x000 Data1            : 0x246a22d5
   +0x004 Data2            : 0xcf02
   +0x006 Data3            : 0x44b2
   +0x008 Data4            : [8]  "???"
==> obj addr: 009cfe3c
ModLoad: 76fb0000 7702e000   C:\Windows\System32\clbcatq.dll
ModLoad: 618b0000 618b9000   C:\Windows\SYSTEM32\ktmw32.dll
ModLoad: 76df0000 76e66000   C:\Windows\System32\sechost.dll
ModLoad: 75c40000 75cbb000   C:\Windows\System32\ADVAPI32.dll
ModLoad: 031a0000 031ae000   C:\Users\me\repos\protoss-com-example\Release\protoss.dll
Breakpoint 1 hit
eax=00000000 ebx=00628000 ecx=00e84ea0 edx=00000000 esi=00e84310 edi=00e8b3e0
eip=008c36ae esp=009cfe18 ebp=009cfe58 iopl=0         nv up ei pl nz na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000206
ProtossComClient!start_from_probe+0x23 [inlined in ProtossComClient!main+0x6e]:
008c36ae 8b4d04          mov     ecx,dword ptr [ebp+4] ss:002b:009cfe5c=008c56b1

In the output, we can find CLSID (eff8970e-c50f-45e0-9284-291ce5a6f771), IID (246a22d5-cf02-44b2-bf09-aab95a34e0cf) and the created object address: 010ff620. Before we start examining it, we need to check the returned status code. We can do that with the !error @$retreg command (or look at the eax/rax register). If it’s 0 (S_OK), we may set breakpoints on the returned object methods. As each COM object implements at least one interface (virtual class), it will have at least one virtual method table. Thanks to the CoCreateInstance breakpoint, we know the queried IID, and we may find the interface method list in the associated type library. If we don’t have access to the type library (or our IID is IID_IUnknown), we still may learn something about this object by placing breakpoints on the IUnknown interface methods (as you remember, all COM interfaces need to implement it):

struct IUnknown
{
    virtual HRESULT STDMETHODCALLTYPE QueryInterface( 
        /* [in] */ REFIID riid,
        /* [iid_is][out] */ _COM_Outptr_ void __RPC_FAR *__RPC_FAR *ppvObject) = 0;

    virtual ULONG STDMETHODCALLTYPE AddRef( void) = 0;

    virtual ULONG STDMETHODCALLTYPE Release( void) = 0;
};

The breakpoint is very similar to what we did for CoCreateInstace. The code snippet below presents the 32- and 64-bit versions:

bp 031a6160 "dt ntdll!_GUID poi(@esp + 8); .printf /D \"==> obj addr: %p\", poi(@esp + C);.echo; bp /1 @$ra; g"

bp 00007ffe`1c751e6a "dt ntdll!_GUID @rdx; .printf /D \"==> obj addr: %p\", @r8;.echo; bp /1 @$ra; g"

Let me show you how I got the address of the QueryInterface function for the 32-bit breakpoint (031a6160). The first four bytes at the object address (009cfe3c) point to the virtual method table. We may find the vtable address by calling dpp 009cfe3c L1:

0:000> dpp 009cfe3c L1
009cfe3c  00e84ea0 031a860c protoss!Probe::`vftable'

We can now dump the content of the vtable:

0:000> dps 031a860c L4
031a860c  031a6160 protoss!Probe::QueryInterface
031a8610  031a6070 protoss!Probe::AddRef
031a8614  031a60b0 protoss!Probe::Release
031a8618  031a6260 protoss!Probe::ConstructBuilding

I knew that the IProbe interface (246A22D5-CF02-44B2-BF09-AAB95A34E0CF) has four methods (the first three coming from the IUnknown interface). Without this knowledge, I would have printed only the first three methods (QueryInterface, AddRef, and Release).

On each QueryInterface return, we may again examine the status code and returned object. The output below presents a QueryInterface hit for an IProbe instance. Let’s spend a moment analyzing it:

 {59644217-3e52-4202-ba49-f473590cc61a}
   +0x000 Data1            : 0x59644217
   +0x004 Data2            : 0x3e52
   +0x006 Data3            : 0x4202
   +0x008 Data4            : [8]  "???"
==> obj addr: 009cfe00
Breakpoint 2 hit
eax=00000000 ebx=00628000 ecx=5a444978 edx=00000000 esi=00e84310 edi=00e8b3e0
eip=008c34f6 esp=009cfdec ebp=009cfe10 iopl=0         nv up ei pl zr na pe nc
cs=0023  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00000246
ProtossComClient!show_game_unit_data+0x46:
008c34f6 8bf0            mov     esi,eax

The 59644217-3e52-4202-ba49-f473590cc61a GUID represents the IGameObject interface. If you scroll up to the class definitions, you will find that it’s the second interface that the Probe class implements. The vtable at the object address looks as follows:

0:000> dpp 009cfe00 L1
009cfe00  00e84ea4 031a8620 protoss!Probe::`vftable'
0:000> dps 031a8620 L6
031a8620  031a5c40 protoss![thunk]:Probe::QueryInterface`adjustor{4}'
031a8624  031a5c72 protoss![thunk]:Probe::AddRef`adjustor{4}'
031a8628  031a5c4a protoss![thunk]:Probe::Release`adjustor{4}'
031a862c  031a36f0 protoss!Probe::get_Name
031a8630  031a3720 protoss!Probe::get_Minerals
031a8634  031a3740 protoss!Probe::get_BuildTime

You may now be wondering what the adjustor methods are? If we decompile any of them, we will find an interesting assembly code:

0:000> u 031a5c40
protoss![thunk]:Probe::QueryInterface`adjustor{4}':
031a5c40 836c240404      sub     dword ptr [esp+4],4
031a5c45 e916050000      jmp     protoss!Probe::QueryInterface (031a6160)

To better understand what’s going on here, let’s put the last dpp commands (after CoCreateInstance and QueryInterface) next to each other:

0:000> dpp 009cfe3c L1
009cfe3c 00e84ea0 031a860c protoss!Probe::`vftable' <- CoCreateInstance
0:000> dpp 009cfe00 L1
009cfe00 00e84ea4 031a8620 protoss!Probe::`vftable' <- QueryInterface

In the above output, we see that QueryInterface for IProbe (called by CoCreateInstance) sets the object pointer to the address 00e84ea0. While QueryInterface for IGameObject sets the object pointer to the address 00e84ea4 (four bytes further). And both calls were made on the same instance of the Probe class. Looking at the QueryInterface source code, we can see that this difference is caused by a static_cast:

HRESULT __stdcall Probe::QueryInterface(REFIID riid, void** ppv) {
	std::cout << "Component: Nexus::QueryInterface: " << riid << std::endl;

	if (riid == IID_IUnknown || riid == __uuidof(IProbe)) {
		*ppv = static_cast<IProbe*>(this);
	} else if (riid == __uuidof(IGameObject)) {
		*ppv = static_cast<IGameObject*>(this);
	} else {
		*ppv = NULL;
		return E_NOINTERFACE;
	}
	AddRef();
	return S_OK;
}

The instruction *ppv = static_cast<IProbe*>(this) is here equivalent to *ppv = this, as IProbe is the default (first) interface of the Probe class, and a pointer to its vtable occupies the first four bytes of the Probe instance memory. IGameObject is the second interface and a pointer to its vtable occupies the next four bytes of the Probe instance memory. After these two vtables, we can find fields of the Probe class. I draw the diagram below to better visualize these concepts:

So, what are those adjustors in the IGameObject vtable? Adjustors allow the compiler to reuse the IUnknown methods already compiled for the IProbe implementation. The only problem with reusing is that methods implementing IProbe expect this to point to the beginning of the Probe class instance. So we can’t simply use their addresses in the IGameObject vtable – we need first to adjust the this pointer. And that’s what the sub dword ptr [esp+4],4 instruction is doing. Then, we can safely jump to the IProbe‘s QueryInterface implementation, and everything will work as expected.

To end the vtables discussion, I have one more WinDbg script for you:

.for (r $t0 = 0; @$t0 < N; r $t0= @$t0 + 1) { bp poi(VTABLE_ADDRESS + @$t0 * @$ptrsize) }

This script sets breakpoints on the first N methods of a given vtable (replace N with any number you need). For example, to break on all the methods of the IGameObject interface, I would run:

.for (r $t0 = 0; @$t0 < 6; r $t0= @$t0 + 1) { bp poi(031a8620 + @$t0 * @$ptrsize) }

We may also track COM objects from a specific DLL. When the application loads the target DLL, we need to set a breakpoint on the exported DllGetClassObject function. For example, let’s debug what is happening when we call CoCreateInstance for the Probe COM object. We start by setting a break on the protoss.dll load:

0:000> sxe ld:protoss.dll
0:000> g
...
ModLoad: 66c90000 66cd4000   C:\temp\protoss-com-example\Debug\protoss.dll

Next, we set a breakpoint on the protoss!DllGetClassObject function and wait for it to hit:

0:000> bp protoss!DllGetClassObject "dps @esp L8; dt ntdll!_GUID poi(@esp + 4); dt ntdll!_GUID poi(@esp + 8); .printf /D \"==> obj addr: %p\", poi(@esp+c);.echo; bp /1 @$ra; g"
0:000> g
009cea10  75d6b731 combase!CClassCache::CDllPathEntry::GetClassObject+0x5a [onecore\com\combase\objact\dllcache.cxx @ 2581]
009cea14  00e9f354
009cea18  75ce84c8 combase!IID_IClassFactory
009cea1c  009cec40
009cea20  00000000
009cea24  00e9b3f8
009cea28  75ce84c8 combase!IID_IClassFactory
009cea2c  00e9f354
 {eff8970e-c50f-45e0-9284-291ce5a6f771}
   +0x000 Data1            : 0xeff8970e
   +0x004 Data2            : 0xc50f
   +0x006 Data3            : 0x45e0
   +0x008 Data4            : [8]  "???"
 {00000001-0000-0000-c000-000000000046}
   +0x000 Data1            : 1
   +0x004 Data2            : 0
   +0x006 Data3            : 0
   +0x008 Data4            : [8]  "???"
==> obj addr: 009cec40
Breakpoint 1 hit

We can see that CoCreateInstance uses the Probe class CLSID and asks for the IClassFactory instance. IClassFactory inherits from IUnknown (as all COM interfaces) and contains only two methods:

struct IClassFactory : public IUnknown
{
    virtual HRESULT STDMETHODCALLTYPE CreateInstance( 
        _In_opt_  IUnknown *pUnkOuter, _In_  REFIID riid, _COM_Outptr_  void **ppvObject) = 0;
    
    virtual HRESULT STDMETHODCALLTYPE LockServer(/* [in] */ BOOL fLock) = 0;
};

Let’s set a breakpoint on the CreateInstance method and continue execution:

0:000> dpp 009cec40 L1
009cec40  031ab020 031a863c protoss!ProtossObjectClassFactory<Probe,IProbe>::`vftable'
0:000> dps 031a863c L5
031a863c  031a45e0
031a8640  031a45d0
031a8644  031a45d0
031a8648  031a4500
031a864c  031a44f0
0:000> bp 031a4500 "dt ntdll!_GUID poi(@esp + c); .printf /D \"==> obj addr: %p\", poi(@esp + 10);.echo; bp /1 @$ra; g"
0:000> g
 {246a22d5-cf02-44b2-bf09-aab95a34e0cf}
   +0x000 Data1            : 0x246a22d5
   +0x004 Data2            : 0xcf02
   +0x006 Data3            : 0x44b2
   +0x008 Data4            : [8]  "???"
==> obj addr: 009cec58
Breakpoint 3 hit

Our breakpoint gets hit, and we see that the requested IID equals IID_IProbe, which proves what I mentioned previously, that CoCreateInstance internally uses an IClassFactory instance to create a new Probe class instance.

Finally, when we deal with COM automation and need to decode parameters passed to the IDispatch instance, we may use the dt -r1 combase!tagVARIANT ARG_ADDRESS command. It nicely formats all the VARIANT fields but requires the combase.dll symbols.

We reached the end of this long post, and I hope I haven’t bored you too much 😅 I also hope that the presented materials will help you better understand and troubleshoot COM APIs. The source code of the Protoss COM example is available at https://github.com/lowleveldesign/protoss-com-example.

Until the next time! 👋

New releases of my open-source tools

I made several updates to my open-source tools in the last four weeks, and I also released one new tool. In this post, I will describe those updates briefly, including some discoveries I made along the way.

dotnet-wtrace

The biggest news is the release of dotnet-wtrace, a new tool in the wtrace toolkit. I created it because I could not find a tool that would show the runtime (and not only) events in real-time. Dotnet-wtrace does not simply dump the events data but processes it to make the output easily readable. Below, you may see an example screenshot containing GC events from an ASP.NET Core application.

Besides GC events, dotnet-wtrace will display exceptions, loader, ASP.NET Core, EF Core, and network events. The documentation also describes its various filtering capabilities.

Dotnet-wtrace is entirely implemented in F# and relies on Microsoft.Diagnostics.NETCore.Client and Microsoft.Diagnostics.Tracing.TraceEvent libraries. I must admit I enjoy coding in F# greatly. F# might be a bit hard to grasp at first, especially if you’re new to functional programming, but the benefits of learning it are numerous. Keeping your variables immutable, avoiding nulls, and writing stateless code whenever possible will make your apps only safer. F# compiler won’t allow implicit conversions and will complain about all unhandled conditions. The list of benefits is much longer 🙂 I also find F# syntax more concise and consistent than C# one. If I gained your interest and you want to experiment with F#, I recommend the Get Programming with F# book by Isaac Abraham – it helped me a lot in the beginning. Its content was more approachable than other materials available on the Internet. I deviated from the main subject of this post, so let’s get back to dotnet-wtrace. The features I miss most in the current version are call stacks for exceptions and summary statistics. I plan to add them in future tool releases, so please subscribe to the wtrace newsletter to be the first to try them 🙂

wtrace

I also made minor updates to wtrace , allowing it to trace image loader (issue #15) and UDP events. Additionally, I published a wtrace package to Chocolatey (issue #13), so if you’re using this package manager, you may install wtrace with this simple command: choco install wtrace.

procgov (Process Governor)

There were some significant changes in the procgov tool to implement the feature requested by ba-tno (btw., the issue description is excellent). It is now possible to update the process limits by rerunning procgov with new parameter values. When working on this feature, I discovered interesting behavior of the Windows job objects – with the last handle to the job object closed, the job disappears from the Object Manager (you can’t open it by name), but its limits still apply to the process. To overcome this problem, I’m now duplicating the created job handle in the target process to keep the job accessible.

I’ve planned to refactor the procgov codebase for some time already, and I finally found a moment to do that. Thanks to the CsWin32 project, I could remove lots of boilerplate PInvoke code. I was surprised by how smoothly the CsWin32 code generators worked in Visual Studio 2019. Interestingly, the code generators retrieve the signatures from the win32metadata assembly, generated with the help of the ClangSharp project. Some signatures are maybe a bit more complex to use than in my older manual setup, but I prefer to spend a minute longer writing the call instruction than 20 minutes on preparing the PInvoke signature. I also split the stateful, hard-to-maintain ProcessGovernor class into two static (stateless) types: ProcessModule and Win32JobModule, which use a shared SessionSettings object (lesson learned from functional programming :)). The code is now easier to understand and modify.

It should also be soon possible to install procgov with Chocolatey. Its package is awaiting approval. Moreover, I moved the procgov build from Azure Devops to GitHub Actions and added steps to keep the Chocolatey package in sync with the main repo.

I hope you’ll find the new features and tools helpful, and if you have any ideas for improvements, let me know or create an issue in the tool repository. Thank you.

.NET Diagnostics Expert course

Last week we published the final module of the .NET Diagnostics Expert Course:

I’m excited and happy that it’s finally available. But I’m also relieved as there were times when I thought it would never happen 🙂 In this post, I want to share some details about the course and why I decided to make it.

How it all started

It was my plan for a long time to publish a course on .NET diagnostics. However, I have never found enough time to focus on it, and apart from handling a few free training sessions for the devWarsztaty initiative, I hadn’t made much progress in realizing it. So what changed last year? Firstly, in late summer, I decided to leave Turbo.net and focus on learning and my private projects, most importantly wtrace. I finally published https://wtrace.net and started working on a new wtrace release and other tracing tools. Three weeks passed, and while talking with Szymon, the idea of a .NET diagnostics course emerged. I thought that the timing would never be better, so I drafted the course plan, and Konrad published the https://diagnosticsexpert.com site. Then I prepared two webinars, and Dotnetos started the presale of the course at the beginning of December. I was under stress that the interest would be low and not many people will find the subject interesting. The presale, however, went very well (thank you all who put trust in me!), giving me some relief.

Making-of

And then the recordings started. At first, I tried recording the lessons as I was presenting the slides. It did not work very well, and every lesson took me hours to complete. My wife took pity on me and browsed tutorials for starting YouTubers to find that they often use transcripts. And thus, I began writing transcripts 🙂 That made the lesson preparation longer, but the recording time was much more manageable. Unfortunately, transcripts did not work for DEMOs, so they still required hours in front of the microphone. Another problem with DEMOs is that they are not always entirely predictable. Sometimes, during the DEMO, I received an output I haven’t thought of or, even worse, discovered a bug in the diagnostics tool. Once I had raw materials ready, Konrad and Paulina reviewed them, and Andrzej processed the videos. Konrad then watched the final recordings, making sure they do not contain any repetitions or other issues.

There were various problems I hit along the way. In the first month, my camera broke. And, as I didn’t want to lose any time, I created a temporary solution:

The positive side of it is that you may see that the course has solid fundaments! 🙂 I also spilled tea on my laptop (thankfully, it survived after drying) and corrupted my drive when testing diskspd (entirely my fault – I shouldn’t be doing it after midnight). Fortunately, the backup worked.

What’s in the course

Having covered the course making, let me describe to you the course content. When preparing the lessons, I focused on practicality, presenting techniques and tools which you may employ to diagnose various .NET (and not only) problems. There are 11 modules in the course:

  • Module 1, “Debugging,” describes the building blocks of a debugger, symbol files management, and features of Debuggers, both managed (VS, VS Code) and native (WinDbg, LLDB)
  • Module 2, “Tracing,” focuses on Event Tracing for Windows, Linux tracing APIs (perf, LTTng, eBPF), and .NET Event Pipes
  • Module 3, “Windows and Linux diagnostic sources,” covers various applications to monitor processes on Windows and Linux (including ps, top, htop, Process Hacker)
  • Module 4, “High CPU usage,” describes ways of troubleshooting CPU-related issues, including CPU sampling, thread time (clock time) profiling, and .NET Profiling API
  • Module 5, “Deadlocks,” covers techniques for troubleshooting deadlocks and waits using memory dumps and trace-based wait analysis
  • Module 6, “Network issues – TCP, UDP,” concentrates on monitoring TCP and UDP connections and troubleshooting various connectivity issues (including slow server responses, dropped connections, or intermittent network errors). Apart from typical network tracing tools such as Wireshark or tcpdump, we also use .NET network traces.
  • Module 7, “Network issues – DNS, HTTP(S),” is about troubleshooting issues in higher layer protocols: DNS, HTTP, and TLS. Apart from system tools, I also cover ASP.NET Core and Kestrel logs. In this module, we also implement and use various network proxies to intercept and modify the traffic.
  • Module 8, “Application execution issues,” targets assembly loading issues, exceptions, and production debugging techniques (including system image preparations, automatic dump collection, and remote debugging)
  • Module 9, “Miscellaneous issues,” describes diagnosing memory, File I/O, and some other issues. It also lists final tips and tricks for troubleshooters.
  • Module 10, “Diagnostics logs in the application,” shows ways of how we can interact with the debugger from within the application and how we may publish custom performance traces.
  • Module 11, “Writing custom diagnostics tools,” covers usage of ClrMD, Diagnostics Client, and TraceEvent libraries to write our custom diagnostics tools.

As you maybe noticed, the first three modules present general concepts and tools, while the subsequent six modules focus on various diagnostics cases. Finally, the last two modules are about implementing code for diagnosing purposes. Each module ends with homework exercises. I wanted them to be challenging and resemble problems I observed in the production. And, as it’s a practical course, I spent a lot of time in DEMOs.

Although the course concentrates on .NET, many presented techniques could be employed to troubleshoot native applications or even system problems. I also believe that debugging, tracing, and reading source code (if it’s available, of course) are the best ways to learn how libraries and applications function.

Final words

In the end, I would like to thank Dotnetos, in particular, Konrad, for the endless hours he spent reviewing the videos and slides. It was a huge and challenging project that occupied me for the last six months (two months more than initially anticipated :)). But I’m happy with the final result, and I hope that those who decide to participate in it will enjoy the prepared materials.

Snooping on .NET EventPipes

While playing with EventPipes, I wanted to better understand the Diagnostic IPC Protocol. This protocol is used to transfer diagnostic data between the .NET runtime and a diagnostic client, such as, for example, dotnet-trace. When a .NET process starts, the runtime creates the diagnostic endpoint. On Windows, the endpoint is a named pipe, and on Unix, it’s a Unix domain socket created in the temp files folder. The endpoint name begins with a ‘dotnet-diagnostic-’ string and then contains the process ID to make it unique. The name also includes a timestamp and a ‘-socket’ suffix on Unix. Valid example names are dotnet-diagnostic-2675 on Windows and dotnet-diagnostic-2675-2489049-socket on Unix. When you type the ps subcommand in any of the CLI diagnostics tools (for example, dotnet-counters ps), the tool internally lists the endpoints matching the pattern I just described. So, essentially, the following commands are a good approximation to this logic:

# Linux
$ ls /tmp/dotnet-diagnostic-*
/tmp/dotnet-diagnostic-213-11057-socket /tmp/dotnet-diagnostic-2675-2489049-socket
# Windows
PS me> [System.IO.Directory]::GetFiles("\\.\pipe\", "dotnet-diagnostic-*")
\\.\pipe\dotnet-diagnostic-9272
\\.\pipe\dotnet-diagnostic-13372

The code for the .NET process listing is in the ProcessStatus.cs file. After extracting the process ID from the endpoint name, the diagnostics tool creates a Process class instance to retrieve the process name for printing. Armed with this knowledge, let’s try to intercept the communication between the tracer and the tracee.

Neither named pipes nor Unix domain sockets provide an API to do that easily. I started looking for the interceptors for either the kernel or user mode. I found a few interesting projects (for example, NpEtw), but I also discovered that configuring them would take me lots of time. I then stumbled upon a post using socat to proxy the Unix domain socket traffic. I wondered if I could write a proxy too.

Writing an EventPipes sniffer

The only problem was how to convince the .NET CLI tools to use my proxy. I did some tests, and on Linux, it’s enough to create a Unix domain socket with the same process ID but with the timestamp set to, for example, 1.

Let’s take as an example a .NET process with ID equal to 2675. Its diagnostic endpoint is represented by the /tmp/dotnet-diagnostic-2675-2489049-socket file. In my proxy, I am creating a Unix domain socket with a path /tmp/dotnet-diagnostic-2675-1-socket. The file system will list it first, and dotnet-trace (or any other tool) will pick it up as the endpoint for the process with ID 2675:

The code to create the proxy socket looks as follows:

private static async Task StartProxyUnix(int pid, CancellationToken ct)
{
    var tmp = Path.GetTempPath();
    var snoopedEndpointPath = Directory.GetFiles(tmp, $"dotnet-diagnostic-{pid}-*-socket").First();
    var snoopingEndpointPath = Path.Combine(tmp, $"dotnet-diagnostic-{pid}-1-socket");

    File.Delete(snoopingEndpointPath);

    var endpoint = new UnixDomainSocketEndPoint(snoopingEndpointPath);
    using var listenSocket = new Socket(endpoint.AddressFamily, SocketType.Stream, ProtocolType.Unspecified);
    listenSocket.Bind(endpoint);

    using var r = ct.Register(() => listenSocket.Close());

    try
    {
        var id = 1;
        while (!ct.IsCancellationRequested)
        {
            listenSocket.Listen();

            if (ct.IsCancellationRequested)
            {
                return;
            }

            var socket = await listenSocket.AcceptAsync();
            Console.WriteLine($"[{id}]: s1 connected");

            // random remote socket
            var senderSocket = new Socket(AddressFamily.Unix, SocketType.Stream, ProtocolType.Unspecified);
            await senderSocket.ConnectAsync(new UnixDomainSocketEndPoint(snoopedEndpointPath));
            Console.WriteLine($"[{id}]: s2 connected");

            _ = SniffData(new NetworkStream(socket, true), new NetworkStream(senderSocket, true), id, ct);
            id += 1;
        }
    }
    catch (SocketException)
    {
        /* cancelled listen */
        Console.WriteLine($"Stopped ({snoopingEndpointPath})");
    }
    finally
    {
        File.Delete(snoopingEndpointPath);
    }
}

On Windows, it’s more complicated as there is no timestamp in the name. Thus, I decided to create a fake diagnostics endpoint that will look like an endpoint for a regular .NET process but, in reality, will be a proxy. Remember that CLI tools also call the Process.GetProcessById method, so the PID in my endpoint name must point to a valid process accessible to the current user. The process must be native, so the diagnostic endpoint name is not already taken. I picked explorer.exe 😊, and to record EventPipes traffic, I will use explorer as the target process in .NET CLI tools, as on the image below:

And the code for creating my proxy named pipe looks as follows:

private static async Task StartProxyWindows(int pid, CancellationToken ct)
{
    var targetPipeName = $"dotnet-diagnostic-{pid}";
    var explorer = Process.GetProcessesByName("explorer").First();
    var pipeName = $"dotnet-diagnostic-{explorer.Id}";
    try
    {
        var id = 1;
        while (!ct.IsCancellationRequested)
        {
            var listener = new NamedPipeServerStream(pipeName, PipeDirection.InOut, 10, PipeTransmissionMode.Byte,
                                    PipeOptions.Asynchronous, 0, 0);
            await listener.WaitForConnectionAsync(ct);
            Console.WriteLine($"[{id}]: s1 connected");

            if (ct.IsCancellationRequested)
            {
                return;
            }
            var sender = new NamedPipeClientStream(".", targetPipeName, PipeDirection.InOut, PipeOptions.Asynchronous);
            await sender.ConnectAsync();
            Console.WriteLine($"[{id}]: s2 connected");

            _ = SniffData(listener, sender, id, ct);
            id += 1;
        }
    }
    catch (TaskCanceledException)
    {
        Console.WriteLine($"Stopped ({pipeName})");
    }
}

The fake diagnostic endpoint would work on Linux too, but the timestamp is less confusing. And we can always use our proxy to send some funny trace messages to our colleagues 🤐.

What’s left in our implementation is the forwarding code:

static async Task Main(string[] args)
{
    if (args.Length != 1 || !int.TryParse(args[0], out var pid))
    {
        Console.WriteLine("Usage: epsnoop <pid>");
        return;
    }

    using var cts = new CancellationTokenSource();

    Console.CancelKeyPress += (o, ev) => { ev.Cancel = true; cts.Cancel(); };

    if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))
    {
        await StartProxyWindows(pid, cts.Token);
    }
    else
    {
        await StartProxyUnix(pid, cts.Token);
    }
}

private static async Task SniffData(Stream s1, Stream s2, int id, CancellationToken ct)
{
    var outstream = File.Create($"eventpipes.{id}.data");
    try
    {
        using var cts = CancellationTokenSource.CreateLinkedTokenSource(ct);
        var tasks = new List<Task>() {
            Forward(s1, s2, outstream, $"{id}: s1 -> s2", cts.Token),
            Forward(s2, s1, outstream, $"{id}: s2 -> s1", cts.Token)
        };

        var t = await Task.WhenAny(tasks);

        var ind = tasks.IndexOf(t);
        Console.WriteLine($"[{id}]: s{ind + 1} disconnected");
        tasks.RemoveAt(ind);

        cts.Cancel();

        await Task.WhenAny(tasks);
        Console.WriteLine($"[{id}]: s{1 - ind + 1} disconnected");
    }
    catch (TaskCanceledException) { }
    finally
    {
        outstream.Close();
        s1.Dispose();
        s2.Dispose();
    }
}

private static async Task Forward(Stream sin, Stream sout, Stream snoop, string id, CancellationToken ct)
{
    var buffer = new byte[1024];
    while (true)
    {
        var read = await sin.ReadAsync(buffer, 0, buffer.Length, ct);
        if (read == 0)
        {
            break;
        }
        Console.WriteLine($"[{id}] read: {read}");
        snoop.Write(buffer, 0, read);
        await sout.WriteAsync(buffer, 0, read, ct);
    }
}

I’m saving the recorded traffic to the eventpipes.{stream-id}.data file in the current directory. The code of the application is also in the epsnoop folder in my diagnostics-tools repository.

Analyzing the EventPipes traffic

I also started working on the 010 Editor template. At the moment, it only understands IPC messages, but, later, I would like to add parsers for some of the diagnostic sequences as well (feel free to create a PR if you work on them too!). The template is in the blog samples repository, and on a screenshot below, you can see the initial bytes sent by the dotnet-counters monitor command:

How Visual Studio debugs containerized apps

Recently, I was looking into the internals of the Visual Studio debugger for the .NET Diagnostics Expert course. I was especially interested in how the Docker debugging works. For those of you who haven’t tried it yet, let me provide a concise description.

In Visual Studio 2019, when we work on the ASP.NET Core project, it is possible to create a launch profile that points to a Docker container, for example:

And that’s fantastic as we can launch the container directly from Visual Studio. And what’s even better, we can debug it! To make this all work, Visual Studio requires a Dockerfile in the root project folder. The default Dockerfile (which you can create in the ASP.NET Core application wizard) looks as follows:

FROM mcr.microsoft.com/dotnet/core/aspnet:3.1-buster-slim AS base
WORKDIR /app
EXPOSE 80
EXPOSE 443

FROM mcr.microsoft.com/dotnet/core/sdk:3.1-buster AS build
WORKDIR /src
COPY ["WebApplication1.csproj", ""]
RUN dotnet restore "./WebApplication1.csproj"
COPY . .
WORKDIR "/src/."
RUN dotnet build "WebApplication1.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "WebApplication1.csproj" -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "WebApplication1.dll"]

And that’s it. If we press F5, we land inside an application container, and we can step through our application’s code. It all looks like magic, but as usual, there are protocols and lines of code that run this machinery behind the magical facade. And in this post, we will take a sneak peek at them 😊.

But before we dive in, I need to mention the doc describing how Visual Studio builds containerized apps. It is an excellent read which explains, among many other things, how the debugger binaries land in the docker container and what differences are between Debug and Release builds. This post will focus only on the debugger bits, so please read the doc to have the full picture.

Let’s collect some traces

When I don’t know how things work, I use Process Monitor. In this case, I started the Process Monitor trace, moved back to Visual Studio, and pressed F5. Once the debugger launched, I stopped the trace and started the analysis. In the process tree (Ctrl + T), it was evident that Visual Studio uses the docker.exe client to interact with the Docker containers. I created a filter for the Operation and Path as on the image below:

With these filters in place, I was able to pick the events emitted when a parent process (in our case, Visual Studio and MSBuild) created each of the docker.exe instances:

And I had not only command-line arguments but also the call stacks! Let’s first have a look at the command lines:

"docker" ps --filter "status=running" --filter "id=4a2383dc1a410c425110cd924af9b826d9c50d577eff25d730bc07912f914331" --no-trunc --format {{.ID}} -n 1
"docker" exec -i 4a2383dc1a410c425110cd924af9b826d9c50d577eff25d730bc07912f914331 /bin/sh -c "if PID=$(pidof dotnet); then kill $PID; fi"
"docker" build -f "C:\Users\me\source\repos\WebApplication1\Dockerfile" --force-rm -t webapplication1  --label "com.microsoft.created-by=visual-studio" --label "com.microsoft.visual-studio.project-name=WebApplication1" "C:\Users\me\source\repos\WebApplication1" 
"docker" images -a --filter "reference=webapplication1" --no-trunc --format {{.ID}}
"docker" images --filter "dangling=false" --format "{{json .}}" --digests
"docker" rm -f 4a2383dc1a410c425110cd924af9b826d9c50d577eff25d730bc07912f914331
"docker" run -dt -v "C:\Users\me\vsdbg\vs2017u5:/remote_debugger:rw" -v "C:\Users\me\AppData\Roaming\Microsoft\UserSecrets:/root/.microsoft/usersecrets:ro" -v "C:\Users\me\AppData\Roaming\ASP.NET\Https:/root/.aspnet/https:ro" -e "ASPNETCORE_URLS=https://+:443;http://+:80" -e "ASPNETCORE_ENVIRONMENT=Development" -e "ASPNETCORE_LOGGING__CONSOLE__DISABLECOLORS=true" -P --name WebApplication1 --entrypoint tail webapplication1 -f /dev/null 
"docker" ps --all --format "{{.ID}}"
"docker" inspect --format="{{json .NetworkSettings.Ports}}" 019e9c646683350457a7ee1d30c85b7675a2bab28dc10ead6f851f6eb3963c43
"docker" inspect "019e9c646683"
"docker" exec -i 019e9c646683350457a7ee1d30c85b7675a2bab28dc10ead6f851f6eb3963c43 /bin/sh -c "ID=.; if [ -e /etc/os-release ]; then . /etc/os-release; fi; if [ $ID = alpine ] && [ -e /remote_debugger/linux-musl-x64/vsdbg ]; then VSDBGPATH=/remote_debugger/linux-musl-x64; else VSDBGPATH=/remote_debugger; fi; $VSDBGPATH/vsdbg --interpreter=vscode"
"docker" inspect --format="{{json .NetworkSettings.Ports}}" 019e9c646683350457a7ee1d30c85b7675a2bab28dc10ead6f851f6eb3963c43

We can see that MSBuild disposes the previous container during the build and then creates a new one (docker run). However, what interests us most, regarding our investigation, is the docker exec command. It creates an interactive debugger session in the Docker container. As a side note, notice that the debugger binaries were copied to my home folder (C:\Users\me\vsdbg) – it could be interesting to investigate how they landed there 😊. But moving back to our trace, we can see that we have a running ASP.NET Core app container, and there is an interactive debugging session attached to it. But we still don’t know how Visual Studio communicates with the debugger running in the interactive session. At first, I started looking for some hidden named pipes and WSL API, but couldn’t find anything. Thus, I decided I need to have a more in-depth look at the Visual Studio debugger engine.

As the “docker exec” event came from Visual Studio, it could be the debugger engine that fired the command. So, I copied the call stack from the properties window for this event and looked at the frames (I stripped the unimportant ones):

"Frame","Module","Location","Address","Path"
"0","ntoskrnl.exe","PspCallProcessNotifyRoutines + 0x213","0xfffff8028583759f","C:\WINDOWS\system32\ntoskrnl.exe"
...
"18","KERNELBASE.dll","CreateProcessW + 0x2c","0x7646da2c","C:\WINDOWS\SysWOW64\KERNELBASE.dll"
"19","System.ni.dll","System.ni.dll + 0x23cc02","0x7252cc02","C:\WINDOWS\assembly\NativeImages_v4.0.30319_32\System\a94f452eecde0f07e988ad14497426a5\System.ni.dll"
"20","System.ni.dll","System.ni.dll + 0x1aaaa4","0x7249aaa4","C:\WINDOWS\assembly\NativeImages_v4.0.30319_32\System\a94f452eecde0f07e988ad14497426a5\System.ni.dll"
"21","System.ni.dll","System.ni.dll + 0x1aa39c","0x7249a39c","C:\WINDOWS\assembly\NativeImages_v4.0.30319_32\System\a94f452eecde0f07e988ad14497426a5\System.ni.dll"
"22","<unknown>","0x23e037ad","0x23e037ad",""
"23","<unknown>","0x23e035f8","0x23e035f8",""
"24","<unknown>","0x23e03438","0x23e03438",""
"25","<unknown>","0x23e03223","0x23e03223",""
"26","<unknown>","0x23e0168e","0x23e0168e",""
"27","<unknown>","0x23e0153c","0x23e0153c",""
"28","<unknown>","0x23e0009b","0x23e0009b",""
"29","<unknown>","0x2394e5f7","0x2394e5f7",""
"30","<unknown>","0x2394e54d","0x2394e54d",""
"31","<unknown>","0x2394e460","0x2394e460",""
"32","clr.dll","COMToCLRDispatchHelper + 0x28","0x745bf3c1","C:\Windows\Microsoft.NET\Framework\v4.0.30319\clr.dll"
"33","<unknown>","0x328d060","0x328d060",""
"34","vsdebug.dll","DllGetClassObject + 0x2072e","0x7bc84352","C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\Packages\Debugger\vsdebug.dll"
...
"60","devenv.exe","IsAssertEtwEnabled + 0x12474","0x123d24","C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\Common7\IDE\devenv.exe"
"61","KERNEL32.DLL","BaseThreadInitThunk + 0x19","0x75dafa29","C:\WINDOWS\SysWOW64\KERNEL32.DLL"
"62","ntdll.dll","__RtlUserThreadStart + 0x2f","0x770175f4","C:\WINDOWS\SysWOW64\ntdll.dll"
"63","ntdll.dll","_RtlUserThreadStart + 0x1b","0x770175c4","C:\WINDOWS\SysWOW64\ntdll.dll"

Time to reverse

Unfortunately, Process Monitor does not understand managed stacks (displays them as <unknown>), but, on the bright side, that also means that Visual Studio debugger is a .NET application. Nevertheless, I needed to create a memory dump of the Visual Studio process and manually decode the managed frames. The WinDbg output for the first <unknown> frame looked as follows:

0:000> !ip2md 0x23e037ad
MethodDesc:   23da1038
Method Name:  Microsoft.VisualStudio.Debugger.VSCodeDebuggerHost.AD7.Implementation.LocalTargetInterop+LocalProcessWrapper..ctor(System.String, System.String)
Class:        23dc22ec
MethodTable:  23da10b0
mdToken:      06000b66
Module:       20e3642c
IsJitted:     yes
CodeAddr:     23e03618
Transparency: Critical

0:000> !DumpMT /d 23da10b0
EEClass:         23dc22ec
Module:          20e3642c
Name:            Microsoft.VisualStudio.Debugger.VSCodeDebuggerHost.AD7.Implementation.LocalTargetInterop+LocalProcessWrapper
mdToken:         020001cf
File:            c:\program files (x86)\microsoft visual studio\2019\community\common7\ide\extensions\ahdlch2w.npb\Microsoft.VisualStudio.Debugger.VSCodeDebuggerHost.dll
BaseSize:        0x14
ComponentSize:   0x0
Slots in VTable: 16
Number of IFaces in IFaceMap: 1

And the first frame already revealed the place where Visual Studio keeps the debugger assemblies. It was time to fire up ILSpy and indulge myself in some reversing 😊. Quite soon, I discovered that the Visual Studio engine communicates with the container debugger using the Debug Adapter Protocol. And interestingly, the communication happens over the process standard input/output. That explains the interactive docker session! With all the pieces matching, it was time to prove that everything is working the way I think it is.

Sniffing the debugger channel

At first, I started looking for some sniffers of the process input/output but couldn’t find any (let me know if you know such tools). Then I experimented with breakpoints in WinDbg, but it proved to be hard to find the right handle and then extract the exchanged bytes. Finally, I decided to write my own sniffer and replace the docker.exe process with it. After a few trials (it seems that Console.StandardInput.ReadAsync is a blocking call), I ended up with the following code:

class Program
{
    static void Main(string[] args)
    {
        var psi = new ProcessStartInfo(Path.Combine(AppContext.BaseDirectory, "docker.bak.exe")) {
            RedirectStandardError = true,
            RedirectStandardInput = true,
            RedirectStandardOutput = true,
            UseShellExecute = false
        };
        Array.ForEach(args, arg => psi.ArgumentList.Add(arg));

        var proc = Process.Start(psi);
        if (proc == null)
        {
            Console.WriteLine("Error starting a process");
            return;
        }

        Console.CancelKeyPress += (o, ev) => { ev.Cancel = true; proc.Kill(); };

        var baseLogName = Path.Combine(Directory.CreateDirectory("d:\\temp\\docker-proxy").FullName, $"{proc.Id}");
        using (var s = File.CreateText(baseLogName))
        {
            s.WriteLine($"{psi.FileName} '{String.Join("', '", psi.ArgumentList)}'");
        }
        using var inlogstream = File.Create(baseLogName + ".in");
        using var outlogstream = File.Create(baseLogName + ".out");
        using var errlogstream = File.Create(baseLogName + ".err");

        using var cts = new CancellationTokenSource();
        using var instream = Console.OpenStandardInput();
        using var outstream = Console.OpenStandardOutput();
        using var errstream = Console.OpenStandardError();

        var forwarders = new[] {
            ForwardStream(instream, proc.StandardInput.BaseStream, inlogstream, cts.Token),
            ForwardStream(proc.StandardOutput.BaseStream, outstream, outlogstream, cts.Token),
            ForwardStream(proc.StandardError.BaseStream, errstream, errlogstream, cts.Token)
        };

        proc.WaitForExit();

        cts.Cancel();
        Task.WaitAny(forwarders);
    }

    static async Task ForwardStream(Stream instream, Stream outstream, Stream logstream, CancellationToken ct)
    {
        var b = new byte[4096];
        var inmem = new Memory<byte>(b, 0, b.Length);
        while (!ct.IsCancellationRequested)
        {
            int bytesread = await instream.ReadAsync(inmem, ct);
            if (bytesread == 0 || ct.IsCancellationRequested)
            {
                return;
            }
            var outmem = new ReadOnlyMemory<byte>(b, 0, bytesread);
            await logstream.WriteAsync(outmem, ct);
            await outstream.WriteAsync(outmem, ct);
            await outstream.FlushAsync();
        }
    }
}

As you can see, for each docker.exe launch, I’m creating four files:

  • {pid} with a command line
  • {pid}.in with the bytes from the standard input (I am forwarding them to the target process)
  • {pid}.err with the bytes from the error output of the target process (I am pushing them on the sniffer error output)
  • {pid}.out with the bytes from the standard output of the target process (I am pushing them on the sniffer’s standard output)

Then, I build a single assembly of my sniffer (thanks .NET 5.0!):

dotnet publish -r win-x64 -p:PublishSingleFile=true -p:PublishTrimmed=true -p:IncludeNativeLibrariesForSelfExtract=true --self-contained true

And copied it to c:\ProgramData\DockerDesktop\version-bin. Then I renamed the docker.exe shortcut in this folder to docker.bak.exe and my sniffer to docker.exe. I launched Visual Studio debugger, and soon in the d:\temp\docker-proxy folder, logs from multiple docker sessions started appearing. And one of the sessions was the debugger one! 😁

Below, you may find two first messages from the .in and .out files arranged in the correct order:

Content-Length: 514

{"type":"request","command":"initialize","arguments":{"pathFormat":"path","clientID":"visualstudio","clientName":"Visual Studio","adapterID":"coreclr","locale":"en-US","linesStartAt1":true,"columnsStartAt1":true,"supportsVariableType":true,"supportsRunInTerminalRequest":true,"supportsMemoryReferences":true,"supportsProgressReporting":true,"SupportsMessageBox":true,"supportsHandshakeRequest":true,"supportsVsAdditionalBreakpointBinds":true,"supportsHitCountsChange":true,"supportsVsCustomMessages":true},"seq":1}
Content-Length: 1835

{"seq":0,"type":"response","request_seq":1,"success":true,"command":"initialize","body":{"supportsConfigurationDoneRequest":true,"supportsFunctionBreakpoints":true,"supportsConditionalBreakpoints":true,"supportsHitConditionalBreakpoints":true,"supportsEvaluateForHovers":true,"exceptionBreakpointFilters":[{"filter":"all","label":"All Exceptions","default":false},{"filter":"user-unhandled","label":"User-Unhandled Exceptions","default":true}],"supportsSetVariable":true,"supportsGotoTargetsRequest":true,"supportsModulesRequest":true,"additionalModuleColumns":[{"attributeName":"vsLoadAddress","label":"Load Address","type":"string"},{"attributeName":"vsPreferredLoadAddress","label":"Preferred Load Address","type":"string"},{"attributeName":"vsModuleSize","label":"Module Size","type":"number"},{"attributeName":"vsLoadOrder","label":"Order","type":"number"},{"attributeName":"vsTimestampUTC","label":"Timestamp","type":"unixTimestampUTC"},{"attributeName":"vsIs64Bit","label":"64-bit","type":"boolean"},{"attributeName":"vsAppDomain","label":"AppDomain","type":"string"},{"attributeName":"vsAppDomainId","label":"AppDomainId","type":"number"}],"supportedChecksumAlgorithms":["MD5","SHA1","SHA256"],"supportsExceptionOptions":true,"supportsValueFormattingOptions":true,"supportsExceptionInfoRequest":true,"supportTerminateDebuggee":true,"supportsSetExpression":true,"supportsReadMemoryRequest":true,"supportsCancelRequest":true,"supportsExceptionConditions":true,"supportsLoadSymbolsRequest":true,"supportsModuleSymbolSearchLog":true,"supportsDebuggerProperties":true,"supportsSetSymbolOptions":true,"supportsHitBreakpointIds":true,"supportsVsIndividualBreakpointOperations":true,"supportsSetHitCount":true,"supportsVsCustomMessages":true,"supportsEvaluationOptions":true,"supportsExceptionStackTrace":true,"supportsObjectId":true}}
Content-Length: 520

{"type":"request","command":"launch","arguments":{"name":".Net Core Launch","type":"coreclr","request":"launch","program":"dotnet","args":"--additionalProbingPath /root/.nuget/fallbackpackages2 --additionalProbingPath /root/.nuget/fallbackpackages  \"WebApplication1.dll\"","cwd":"/app","symbolOptions":{"cachePath":"/remote_debugger/symbols"},"env":{"ASPNETCORE_HTTPS_PORT":"49189"},"targetOutputLogPath":"/dev/console","projectFullPath":"C:\\Users\\me\\source\\repos\\WebApplication1\\WebApplication1.csproj"},"seq":2}
Content-Length: 117

{"seq":1,"type":"request","command":"handshake","arguments":{"value":"hQV{redacted}2Y="}}

A CPU sampling profiler in less than 200 lines

While working on a new version of wtrace, I am analyzing the PerfView source code to learn how its various features work internally. One of such features is the call stack resolution for ETW events. This post will show you how to use the TraceEvent library to decode call stacks, and, as an exercise, we will write a sampling process profiler. Before we start, remember to set DisablePagingExecutive to 1. That is a requirement to make call stacks work for ETW sessions.

❗ ❗ ❗ Visit wtrace.net to receive updates on wtrace and my other troubleshooting tools. ❗ ❗ ❗

Collecting profiling data

First, we need to collect trace data that we will later analyze. Our profiler will rely on SampledProfile events. Enabling them makes the Kernel emit a profiling event every 1ms for every CPU in the system. The SampledProfile event has only two fields: InstructionPointer and ThreadId. InstructionPointer is an address in the memory, so to find a method name that matches it, we need information about the images loaded by a process (the ImageLoad event). Our code to start the kernel session could look as follows:

use session = new TraceEventSession(sessionName, etlFilePath)

let traceFlags = NtKeywords.Profile ||| NtKeywords.ImageLoad
let stackFlags = NtKeywords.Profile
session.EnableKernelProvider(traceFlags, stackFlags) |> ignore

That is enough to resolve call stacks from the native images (NGEN/R2R), where CPU executes the image’s compiled code. However, resolving managed call stacks requires some more data. For managed code, the instruction pointer points to a block of JIT-compiled code that resides in a process private memory and is valid only for the process lifetime. The Kernel does not know about the CLR internals, so we need to query the CLR ETW provider. Fortunately, the ClrTraceEventParser already has a flag which contains all the necessary keywords:

public sealed class ClrTraceEventParser : TraceEventParser
{
    ...
    /// <summary>
    ///  Keywords are passed to TraceEventSession.EnableProvider to enable particular sets of
    /// </summary>
    [Flags]
    public enum Keywords : long
    {
        ...
        /// <summary>
        /// What is needed to get symbols for JIT compiled code.  
        /// </summary>
        JITSymbols = Jit | StopEnumeration | JittedMethodILToNativeMap | SupressNGen | Loader,
        ...
    };
}

We are ready to enable the CLR provider in our ETW session:

let keywords = ClrKeywords.JITSymbols
session.EnableProvider(ClrTraceEventParser.ProviderGuid, TraceEventLevel.Always, uint64(keywords)) |> ignore

Our session is now open and saves ETW events to the .etl file we specified. It’s time to perform the action we want to profile, for example, start a new process. When done, we should call Stop or Dispose to stop the ETW session and close the .etl file. For .NET processes that didn’t start or finish when the session was running, we also need to collect CLR rundown events. Without this data, we will lack JIT mappings, and we won’t be able to resolve some of the managed stacks. Here is the code for creating a rundown session:

let collectClrRundownEvents () = 
    let rec waitForRundownEvents filePath =
        // Poll until 2 seconds goes by without growth.
        let len = FileInfo(filePath).Length
        Thread.Sleep(2000)
        if FileInfo(filePath).Length = len then
            waitForRundownEvents filePath
        else ()

    printfn "Forcing rundown of JIT methods."
    let rundownFileName = Path.ChangeExtension(etlFilePath, ".clrRundown.etl")
    use rundownSession = new TraceEventSession(sessionName + "-rundown", rundownFileName)
    rundownSession.EnableProvider(ClrRundownTraceEventParser.ProviderGuid, 
        TraceEventLevel.Verbose, uint64(ClrRundownTraceEventParser.Keywords.Default)) |> ignore
    waitForRundownEvents rundownFileName
    printfn "Done with rundown."

The final steps are to merge the .etl files, add necessary module information, and prepare symbol files for the pre-compiled assemblies. Fortunately, the TraceEvent library has one method that performs all the operations mentioned above: ZippedETLWriter.WriteArchive. Internally, this method uses KernelTraceControl.dll!CreateMergedTraceFile (from WPT) to merge the .etl files and generate ‘synthetic’ events for all the modules which were loaded by processes during the trace session. If you’d like to see the PInvoke signature of this method, decrypt the OSExtensions.cs file. For all the pre-compiled assemblies (ETWTraceEventSource.GetModulesNeedingSymbols) TraceEvent generates the .pdb files (SymbolReader.GenerateNGenSymbolsForModule) and packs them into the final .zip file.

Our final recording code looks as follows:

do
    use session = new TraceEventSession(sessionName, etlFilePath)

    let traceFlags = NtKeywords.Profile ||| NtKeywords.ImageLoad // NtKeywords.Process and NtKeywords.Thread are added automatically
    let stackFlags = NtKeywords.Profile
    session.EnableKernelProvider(traceFlags, stackFlags) |> ignore

    // we need CLR to resolve managed stacks
    let keywords = ClrKeywords.JITSymbols
    session.EnableProvider(ClrTraceEventParser.ProviderGuid, TraceEventLevel.Always, uint64(keywords)) |> ignore

    printfn "Press enter to stop the tracing session"
    Console.ReadLine() |> ignore

    session.Stop() |> ignore

    collectClrRundownEvents ()

printfn "Collecting the data required for stack resolution..."
let writer = ZippedETLWriter(etlFilePath, log)
if not (writer.WriteArchive(Compression.CompressionLevel.NoCompression)) then
    printfn "Error occured while merging the data"
else
    printfn "Trace session completed"

It is very similar to the 40_SimpleTraceLog sample from the Perfview repository. So if you are looking for C# code, check it out.

Analyzing call stacks with TraceLog

Once we finished and post-processed the trace session, it’s time to analyze the collected events. Usually, when we want to process ETW events, we create the TraceEventSession instance and assign callbacks for interesting events. Using callbacks to the TraceEventSession source is the most efficient way of processing ETW events. However, this approach has some drawbacks. TraceEventParser instances are usually stateless, so if you need to work on aggregated data (processes, threads, modules), you need to implement the aggregation logic. For example, to list process lifetimes saved in the trace, we need to process the ProcessStartGroup and ProcessEndGroup events. Resolving call stacks is even more demanding and involves examining different groups of events. Fortunately, the TraceEvent library again gives us a hand and provides the TraceLog class. As stated in the documentation, TraceLog represents a higher-level event processing. It keeps a backlog of various session objects, including processes, threads, and call stacks. Our task is to instantiate the SymbolReader class, download or unpack the required symbol files, and process the event call stacks. The code below does all that:

type CallStackNode (callsCount : int32) =
    inherit Dictionary<string, CallStackNode>(StringComparer.OrdinalIgnoreCase)
    member val CallsCount = callsCount with get, set

let loadSymbols (traceLog : TraceLog) (proc : TraceProcess) =
    use symbolReader = new SymbolReader(log)

    proc.LoadedModules |> Seq.where (fun m -> not (isNull m.ModuleFile))
                       |> Seq.iter (fun m -> traceLog.CodeAddresses.LookupSymbolsForModule(symbolReader, m.ModuleFile))

let buildCallStacksTree (traceLog : TraceLog) pid tid =
    let perfInfoTaskGuid = Guid(int32 0xce1dbfb4, int16 0x137e, int16 0x4da6, byte 0x87, byte 0xb0, byte 0x3f, byte 0x59, byte 0xaa, byte 0x10, byte 0x2c, byte 0xbc)
    let perfInfoOpcode = 46

    let callStacks = CallStackNode(0)
    let processCallStack (callStack : TraceCallStack) =
        let addOrUpdateChildNode (node : CallStackNode) (callStack : TraceCallStack) =
            let decodedAddress = sprintf "%s!%s" callStack.CodeAddress.ModuleName callStack.CodeAddress.FullMethodName
            match node.TryGetValue(decodedAddress) with
            | (true, childNode) ->
                childNode.CallsCount <- childNode.CallsCount + 1
                childNode
            | (false, _) ->
                let childNode = CallStackNode(1)
                node.Add(decodedAddress, childNode)
                childNode

        let rec processStackFrame (callStackNode : CallStackNode) (callStack : TraceCallStack) =
            let caller = callStack.Caller
            if isNull caller then // root node
                callStackNode.CallsCount <- callStackNode.CallsCount + 1
            
            let childNode = 
                if isNull caller then addOrUpdateChildNode callStackNode callStack
                else processStackFrame callStackNode caller

            addOrUpdateChildNode childNode callStack
        processStackFrame callStacks callStack |> ignore
    
    traceLog.Events
    |> Seq.filter (fun ev -> ev.ProcessID = pid && ev.ThreadID = tid && ev.TaskGuid = perfInfoTaskGuid && (int32 ev.Opcode = perfInfoOpcode))
    |> Seq.iter (fun ev -> processCallStack (ev.CallStack()))

    callStacks

let tryFindProcess (traceLog : TraceLog) processNameOrPid =
    let (|Pid|ProcessName|) (s : string) =
        match Int32.TryParse(s) with
        | (true, pid) -> Pid pid
        | (false, _) -> ProcessName s

    let filter =
        match processNameOrPid with
        | Pid pid -> fun (p : TraceProcess) -> p.ProcessID = pid
        | ProcessName name -> fun (p : TraceProcess) -> String.Equals(p.Name, name, StringComparison.OrdinalIgnoreCase)

    let processes = traceLog.Processes |> Seq.where filter |> Seq.toArray
    if processes.Length = 0 then
        None
    elif processes.Length = 1 then
        Some processes.[0]
    else
        processes |> Seq.iteri (fun i p -> printfn "[%d] %s (%d): '%s'" i p.Name p.ProcessID p.CommandLine)
        printf "Which process to analyze []: "
        match Int32.TryParse(Console.ReadLine()) with
        | (false, _) -> None
        | (true, n) -> processes |> Seq.tryItem n

let analyze processNameOrPid maxDepth =

    let reader = ZippedETLReader(etlFilePath + ".zip", log)
    reader.UnpackArchive()
    
    let options = TraceLogOptions(ConversionLog = log)
    let traceLog = TraceLog.OpenOrConvert(etlFilePath, options)

    match tryFindProcess traceLog processNameOrPid with
    | None -> printfn "No matching process found in the trace"
    | Some proc ->
        printfn "%s [%s] (%d)" proc.Name proc.CommandLine proc.ProcessID

        let sw = Stopwatch.StartNew()
        printfn "[%6d ms] Loading symbols for modules in process %s (%d)" sw.ElapsedMilliseconds proc.Name proc.ProcessID
        loadSymbols traceLog proc

        // usually, system process has PID 4, but TraceEvent attaches the drivers to the Idle process (0)
        let systemProc = traceLog.Processes |> Seq.where (fun p -> p.ProcessID = 0) |> Seq.exactlyOne

        printfn "[%6d ms] Loading symbols for system modules" sw.ElapsedMilliseconds
        loadSymbols traceLog systemProc

        printfn "[%6d ms] Starting call stack analysis" sw.ElapsedMilliseconds
        let printThreadCallStacks (thread : TraceThread) =
            let callStacks = buildCallStacksTree traceLog proc.ProcessID thread.ThreadID

            let rec getCallStack depth name (callStack : CallStackNode) =
                let folder frames (kv : KeyValuePair<string, CallStackNode>) =
                    if kv.Value.Count = 0 || (maxDepth > 0 && depth >= maxDepth) then frames
                    else getCallStack (depth + 1) kv.Key kv.Value |> List.append frames

                callStack
                |> Seq.fold folder [ (sprintf "%s├─ %s [%d]" ("│ " |> String.replicate depth) name callStack.CallsCount) ]

            let stacks = getCallStack 0 (sprintf "Thread (%d) '%s'" thread.ThreadID thread.ThreadInfo) callStacks
            stacks |> List.iter (fun s -> printfn "%s" s)
        
        proc.Threads |> Seq.iter printThreadCallStacks
        printfn "[%6d ms] Completed call stack analysis" sw.ElapsedMilliseconds

An example output might look as follows (I removed some lines for brevity):

PS netcoreapp3.1> .\etwprofiler.exe analyze testproc 10
testproc ["C:\Users\me\testproc.exe" ] (15416)
[     2 ms] Loading symbols for modules in process testproc (15416)
[   428 ms] Loading symbols for system modules
[ 21400 ms] Starting call stack analysis
├─ Thread (2412) 'Startup Thread' [121]
│ ├─ ntdll!LdrInitializeThunk [11]
│ │ ├─ ntdll!LdrInitializeThunk [11]
│ │ │ ├─ ntdll!LdrpInitialize [11]
│ │ │ │ ├─ ntdll!??_LdrpInitialize [11]
│ │ │ │ │ ├─ ntdll!LdrpInitializeProcess [11]
│ │ │ │ │ │ ├─ ntdll!LdrpInitializeTls [1]
│ │ │ │ │ │ ├─ ntdll!LdrLoadDll [2]
│ │ │ │ │ │ │ ├─ ntdll!LdrpLoadDll [2]
...
├─ Thread (24168) '' [1]
│ ├─ ntdll!RtlUserThreadStart [1]
│ │ ├─ ntdll!RtlUserThreadStart [1]
│ │ │ ├─ kernel32!BaseThreadInitThunk [1]
│ │ │ │ ├─ ntdll!TppWorkerThread [1]
│ │ │ │ │ ├─ ntdll!TppAlpcpExecuteCallback [1]
│ │ │ │ │ │ ├─ rpcrt4!LrpcIoComplete [1]
│ │ │ │ │ │ │ ├─ rpcrt4!LRPC_ADDRESS::ProcessIO [1]
├─ Thread (9808) '' [0]
├─ Thread (22856) '.NET ThreadPool Worker' [1]
│ ├─ ntdll!RtlUserThreadStart [1]
│ │ ├─ ntdll!RtlUserThreadStart [1]
│ │ │ ├─ kernel32!BaseThreadInitThunk [1]
│ │ │ │ ├─ coreclr!Thread::intermediateThreadProc [1]
│ │ │ │ │ ├─ coreclr!ThreadpoolMgr::WorkerThreadStart [1]
│ │ │ │ │ │ ├─ coreclr!UnManagedPerAppDomainTPCount::DispatchWorkItem [1]
│ │ │ │ │ │ │ ├─ coreclr!ThreadpoolMgr::AsyncTimerCallbackCompletion [1]
│ │ │ │ │ │ │ │ ├─ coreclr!AppDomainTimerCallback [1]
│ │ │ │ │ │ │ │ │ ├─ coreclr!ManagedThreadBase_DispatchOuter [1]
│ │ │ │ │ │ │ │ │ │ ├─ coreclr!ManagedThreadBase_DispatchMiddle [1]
├─ Thread (12352) '.NET ThreadPool Gate' [0]
[ 21835 ms] Completed call stack analysis

The full profiler code is available in my blog samples repository. For my Dotnetos session about tracing, I also prepared the C# version so you may check it too. And, of course, I invite you to watch the whole session! 🙂

Monitoring registry activity with ETW

If you are working on Windows, you know that the registry is a crucial component of this system. It contains lots of system and application configuration data. Apps use the registry to access some of the in-memory OS data. Therefore, monitoring the registry activity is one of the essential parts of the troubleshooting process. Fortunately, we have several tools to help us with this task, Process Monitor being probably the most popular one. In this post, though, I am going to prove that we could use ETW for this purpose as well.

A brief introduction to the registry paths

The way how the registry stores its data is quite complicated. I will focus only on the parts required to understand the ETW data, so if you would like to dig deeper, check the awesome Windows Internals book. When we think of the registry structure, we usually imagine the Regedit panel with subtrees under a few root keys:

The root key names, such as HKEY_LOCAL_MACHINE (or HKLM), translate internally to paths understandable by the Windows object manager. For example, HKLM would become \Registry\Machine. The \Registry part references a kernel key object that we can find using, for example, WinObjEx64:

When an application tries to access a registry path, the object manager forwards the call to the configuration manager. The configuration manager parses the path and returns a handle to a key object. Each key object points to its corresponding key control block (KCB). This KCB could be shared with other key objects as the configuration manager allocates exactly one KCB for a given registry path. Additionally, each KCB has a reference count, which reflects the number of key objects referencing it. Let’s see this in action while also recording the ETW events.

Registry events

We will use the simple code below to create and later delete a registry key with a string value:

let subkey = "Software\\LowLevelDesign"

use hkcu = RegistryKey.OpenBaseKey(RegistryHive.CurrentUser, RegistryView.Default)
do 
    use mytraceKey = hkcu.CreateSubKey(subkey)
    mytraceKey.SetValue("start", DateTime.UtcNow.ToString(), RegistryValueKind.String)

Console.WriteLine("Press enter to delete the key...")
Console.ReadLine() |> ignore

hkcu.DeleteSubKeyTree(subkey)

In a moment, I will explain why I am creating the LowLevelDesign key in a separate using scope. But first, let’s check the ETW trace for these operations:

4065.7271 (21776.18288) Registry/Open [0xFFFFAE0C9795E860] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\Software\LowLevelDesign' [0] -> 0 (0.01540)
4066.5557 (21776.18288) Registry/Open [0xFFFFAE0C9795E860] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\Software\LowLevelDesign' [0] -> 0 (0.00850)
4066.5796 Registry/KCBCreate [0xFFFFAE0CC1544D20] <- '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN'
4066.6179 (21776.18288) Registry/Create [0xFFFFAE0C9795E860] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\Software\LowLevelDesign' [0] -> 0 (0.10650)
4070.3576 (21776.18288) Registry/SetValue [0xFFFFAE0CC1544D20] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN\start' [0] -> 0 (0.02800)
4070.3731 (21776.18288) Registry/Close [0xFFFFAE0CC1544D20] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN' [0] -> 0 (0.00020)

<Enter pressed>

144250.1876 (21776.18288) Registry/Open [0xFFFFAE0C9795E860] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\Software\LowLevelDesign' [0] -> 0 (0.02720)
144250.2030 (21776.18288) Registry/Query [0xFFFFAE0CC1544D20] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN' [0] -> 0 (0.00430)
144250.2135 (21776.18288) Registry/Close [0xFFFFAE0CC1544D20] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN' [0] -> 0 (0.00010)
144250.7885 (21776.18288) Registry/Open [0xFFFFAE0C9795E860] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\Software\LowLevelDesign' [0] -> 0 (0.00860)
144250.9071 (21776.18288) Registry/Delete [0xFFFFAE0CC1544D20] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN' [0] -> 0 (0.06300)
144250.9087 (21776.18288) Registry/Close [0xFFFFAE0CC1544D20] = '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN' [0] -> 0 (0.00020)
144250.9118 Registry/KCBDelete [0xFFFFAE0CC1544D20] <- '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN'

If you look at the paths in the trace, you will immediately see that ETW uses the “kernel” registry paths (starting with \Registry). As the LowLevelDesign registry key didn’t exist when the app started, the configuration manager needed to create it. That’s the KCBCreate event we see in the trace:

4066.5796 Registry/KCBCreate [0xFFFFAE0CC1544D20] <- '\REGISTRY\USER\S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN'

We can confirm that the address is indeed a valid KCB address in WinDbg:

lkd> !reg kcb 0xFFFFAE0CC1544D20

Key              : \REGISTRY\USER\S-S-1-5-21-435461841-1121519493-2341496315-1001\SOFTWARE\LOWLEVELDESIGN
RefCount         : 0x0000000000000000
Flags            : DelayedClose, CompressedName,
ExtFlags         :
Parent           : 0xffffae0c9ac26600
KeyHive          : 0xffffae0c9a174000
KeyCell          : 0xed9160 [cell index]
TotalLevels      : 5
LayerHeight      : 0
MaxNameLen       : 0x0
MaxValueNameLen  : 0xa
MaxValueDataLen  : 0x28
LastWriteTime    : 0x 1d6760a:0x8c2420e6
KeyBodyListHead  : 0xffffae0cc1544d98 0xffffae0cc1544d98
SubKeyCount      : 0
Owner            : 0x0000000000000000
KCBLock          : 0xffffae0cc1544e18
KeyLock          : 0xffffae0cc1544e28

When I closed all the handles and deleted the LowLevelDesign key, the kernel emitted the KCBDelete key. Interestingly, when I created and deleted the LowLevelDesign key without disposing of mytraceKey, the KCBDelete event was missing. Also, the LowLevelDesign KCB address was soon reused by another key. That’s why I placed the using scope in the original code. Let’s look in the tracing application code and explain other hex values from the trace.

KCBs, registry key names, and the trace session

All registry ETW events have the same format, but the meaning of the KeyName property is different for various OpCodes. To correctly resolve key names in “regular” registry events, we need to analyze the KCB events first. For KCBs opened/created during the trace session, we should receive a KCBCreate event. From this event, we need to save the FileName with its corresponding KeyHandle. To acquire FileNames for KCBs that were opened before our trace session started, we need to run a rundown session. This procedure is very similar to what I did for the FileIORundown events in the previous post. Sample code looks as follows:

let regHandleToKeyName = Dictionary<uint64, string>()

let processKCBCreateEvent (ev : RegistryTraceData) =
    printfn "%.4f %s [0x%X] <- '%s'" ev.TimeStampRelativeMSec ev.EventName ev.KeyHandle ev.KeyName 
    regHandleToKeyName.[ev.KeyHandle] <- ev.KeyName

let processKCBDeleteEvent (ev : RegistryTraceData) =
    printfn "%.4f %s [0x%X] <- '%s'" ev.TimeStampRelativeMSec ev.EventName ev.KeyHandle ev.KeyName 
    regHandleToKeyName.Remove(ev.KeyHandle) |> ignore

let runRundownSession sessionName =
    printf "Starting rundown session %s" sessionName
    use session = new TraceEventSession(sessionName)

    let traceFlags = NtKeywords.Registry

    session.EnableKernelProvider(traceFlags, NtKeywords.None) |> ignore

    // Accessing the source enables kernel provider so must be run after the EnableKernelProvider call
    makeKernelParserStateless session.Source
    session.Source.Kernel.add_RegistryKCBRundownBegin(Action<_>(processKCBCreateEvent))
    session.Source.Kernel.add_RegistryKCBRundownEnd(Action<_>(processKCBCreateEvent))

    // Rundown session lasts 2 secs - make it longer, if required
    use cts = new CancellationTokenSource(TimeSpan.FromSeconds(2.0))
    use _r = cts.Token.Register(fun _ -> session.Stop() |> ignore)

    session.Source.Process() |> ignore

Now, when a registry event has a non-zero KeyHandle value, we need to look up the parent path in the regHandleToKeyName dictionary and prepend it to the KeyName value from the event payload. Finally, for the “key-value” events (such as EVENT_TRACE_TYPE_REGSETVALUE), the KeyName property is, in fact, a value name. Fortunately, the RegistryTraceData class fixes this confusion by providing a ValueName property. We only need to make sure we use it for valid events. Below, you may find sample code for handling the “regular” registry events:

let filter (ev : RegistryTraceData) =
    ev.ProcessID = pid || ev.ProcessID = -1

let getFullKeyName keyHandle eventKeyName =
    let baseKeyName =
        match regHandleToKeyName.TryGetValue(keyHandle) with
        | (true, name) -> name
        | (false, _) -> ""
    Path.Combine(baseKeyName, eventKeyName)

let processKeyEvent (ev : RegistryTraceData) =
    if filter ev then
        let keyName = getFullKeyName ev.KeyHandle ev.KeyName
        printfn "%.4f (%d.%d) %s [0x%X] = '%s' [%d] -> %d (%.5f)" 
            ev.TimeStampRelativeMSec ev.ProcessID ev.ThreadID ev.EventName 
            ev.KeyHandle keyName ev.Index ev.Status ev.ElapsedTimeMSec

let processValueEvent (ev : RegistryTraceData) =
    if filter ev then
        let keyName = getFullKeyName ev.KeyHandle ev.ValueName
        printfn "%.4f (%d.%d) %s [0x%X] = '%s' [%d] -> %d (%.5f)" 
            ev.TimeStampRelativeMSec ev.ProcessID ev.ThreadID ev.EventName 
            ev.KeyHandle keyName ev.Index ev.Status ev.ElapsedTimeMSec

And the full tracer sample is available in my blog samples repository. Remember to run the code as Admin.

Fixing empty paths in FileIO events (ETW)

This month marks ten years since I started this blog 🥂🥂🥂. On this occasion, I would like to thank you for being my reader! Let’s celebrate with a new post on ETW 🙂

Empty paths issue in the wtrace output has been bugging me for quite some time. As I started working on a new wtrace release (coming soon!), there came the right moment to fix it. I’ve seen other people struggling with this problem too, so I thought that maybe it’s worth a blog post 🙂 Wtrace uses the TraceEvent library to interact with the ETW API, and in this post, I will use this library as well. Note that this issue affects only the real-time ETW sessions.

Tracing FileIO events

Overall, when it comes to storage I/O tracing, there are two classes of events: DiskIO and FileIO. DiskIO events represent disk driver operations, while FileIO events describe the calls to the file system. I consider wtrace a user-mode troubleshooting tool. Therefore I decided to trace only the file system events. Another benefit of using FileIO events is that they are emitted even when the file system cache handles the I/O request.

Enabling FileIO events in the trace session requires only the FileIOInit flag in the call to TraceEventSession.EnableKernelProvider. You may also add the FileIO flag if you want to track the IO request duration (new wtrace will do that). The next step is to add handlers to the interesting FileIO events, for example:

public void SubscribeToSession(TraceEventSession session)
{
    var kernel = session.Source.Kernel;
    kernel.FileIOClose += HandleFileIoSimpleOp;
    kernel.FileIOFlush += HandleFileIoSimpleOp;
    kernel.FileIOCreate += HandleFileIoCreate;
    kernel.FileIODelete += HandleFileIoInfo;
    kernel.FileIORename += HandleFileIoInfo;
    kernel.FileIOFileCreate += HandleFileIoName;
    kernel.FileIOFileDelete += HandleFileIoName;
    kernel.FileIOFileRundown += HandleFileIoName;
    kernel.FileIOName += HandleFileIoName;
    kernel.FileIORead += HandleFileIoReadWrite;
    kernel.FileIOWrite += HandleFileIoReadWrite;
    kernel.FileIOMapFile += HandleFileIoMapFile;
}

However, you will shortly discover that the FileName property will sometimes return an empty string. An example wtrace output with this problem:

1136,1873 (1072) FileIO/Close 'C:\' (0xFFFFFA801D789CA0)
1136,2049 (1072) FileIO/Create 'C:\Windows\SYSTEM32\wow64win.dll' (0xFFFFFA801D789CA0) rw-
1363,8894 (1072) FileIO/Read '' (0xFFFFFA80230F5970) 0x173400 32768b
1364,7208 (1072) FileIO/Read '' (0xFFFFFA80230F5970) 0x117400 32768b
1365,6873 (1072) FileIO/Read '' (0xFFFFFA80230F5970) 0x1CD400 32768b
1375,6284 (1072) FileIO/Create 'C:\Windows\win.ini' (0xFFFFFA801A43F2F0) rw-
1375,6702 (1072) FileIO/Read 'C:\Windows\win.ini' (0xFFFFFA801A43F2F0) 0x0 516b
1375,7369 (1072) FileIO/Create 'C:\Windows\SysWOW64\MAPI32.DLL' (0xFFFFFA8023E50710) rw-
1375,7585 (1072) FileIO/Close 'C:\Windows\SysWOW64\msxml6r.dll' (0xFFFFFA8023E50710)
1384,8796 (1072) FileIO/Read '' (0xFFFFFA801FDBFCD0) 0x58200 16384b

What’s causing this? Before we answer this question, let’s check how the file name resolution works internally.

File names resolution

Each ETW event emitted by the kernel is represented by the EVENT_RECORD structure. ETWTraceEventSource decodes this structure into a TraceEvent class, which is much easier to process in the managed code. As you can see in the structure definition, ETW events have some common properties, such as ProcessId, TimeStamp, or ThreadId. The data specific to a given event goes into the UserData field, which, without a proper manifest, is just a block of bytes. Thus, to process the UserData, we need a parser. The TraceEvent library already contains parsers for all the Kernel events, and it could even generate parsers dynamically. The Kernel parsers are located in a huge KernelTraceEventParser file (13K lines). By scanning through the FileIO*TraceData classes defined there, we can see that almost all of them have the FileName property. However, the way how this property is implemented differs between the parsers. For example, in FileIOCreateTraceData, we have the following code:

public string FileName { get { return state.KernelToUser(GetUnicodeStringAt(LayoutVersion <= 2 ? HostOffset(24, 3) : HostOffset(24, 2))); } }

While FileIOSimpleOpTraceData, for example, has the following implemention:

public string FileName { get { return state.FileIDToName(FileKey, FileObject, TimeStampQPC); } }

The difference is that FileIOCreateTraceData extracts the FileName from its payload (the string is in the UserData field). FileIOSimpleOpTraceData, on the other hand, queries an instance of the KernelTraceEventParserState class to find the name by either FileObject or FileKey. It keeps a private _fileIDToName dictionary for this purpose. As you may have guessed, we receive an empty path when the FileObject/FileKey is missing in this dictionary.

_fileIDToName dictionary and FileIO events

From the previous paragraph, we know that an empty path signifies that there is no record in the _fileIDToName dictionary for a given FileObject/FileKey. If we look for code that modifies this dictionary, we should quickly locate the hooks in the KernelTraceEventParser constructor:

AddCallbackForEvents(delegate (FileIONameTraceData data)
{
    // TODO this does now work for DCStarts.  Do DCStarts event exist?  
    var isRundown = (data.Opcode == (TraceEventOpcode)36) || (data.Opcode == (TraceEventOpcode)35);        // 36=FileIOFileRundown 35=FileIODelete
    Debug.Assert(data.FileName.Length != 0);
    state.fileIDToName.Add(data.FileKey, data.TimeStampQPC, data.FileName, isRundown);
});
...

// Because we may not have proper startup rundown, we also remember not only the FileKey but 
// also the fileObject (which is per-open file not per fileName).   
FileIOCreate += delegate (FileIOCreateTraceData data)
{
    state.fileIDToName.Add(data.FileObject, data.TimeStampQPC, data.FileName);
};

if (source.IsRealTime)
{
    // Keep the table under control
    Action onNameDeath = delegate (FileIONameTraceData data)
    {
        state.fileIDToName.Remove(data.FileKey);
    };
    FileIOFileDelete += onNameDeath;
    FileIOFileRundown += onNameDeath;
    FileIOCleanup += delegate (FileIOSimpleOpTraceData data)
    {
        // Keep the table under control remove unneeded entries.  
        state.fileIDToName.Remove(data.FileObject);
    };
}

As handlers registered here will always execute before our handlers, delete, rundown, or cleanup events won’t have the file name as it just got removed from the _fileIDToName dictionary :). A simple fix is to keep your own FileObject/FileKey/FileName map, and that’s what the current version of wtrace is doing. However, as you remember, that does not always work. After some debugging, I discovered another problem. The FileIORundown events were never emitted, so the _fileIDToName contained only FileObjects from the FileCreate events. That explains why there are almost no empty paths for processes that started while the trace session was running. After checking the documentation and some more debugging, I learned that FileIORundown events appear in the trace when you enable DiskFileIO and DiskIO keywords. And, what’s important, they are issued at the end of the trace session. Although the documentation states they contain a FileObject, it is in fact a FileKey. Fortunately, TraceEvent has the property correctly named to FileKey in the FileIONameTraceData class.

Working example

My final approach is to open a “rundown” session for two seconds, prepare my fileIDToName dictionary, and then pass it to the actual tracing session with only the FileIO events:

open System
open System.Threading
open Microsoft.Diagnostics.Tracing
open Microsoft.Diagnostics.Tracing.Parsers
open Microsoft.Diagnostics.Tracing.Session
open System.Collections.Generic
open Microsoft.Diagnostics.Tracing.Parsers.Kernel

type NtKeywords = KernelTraceEventParser.Keywords
type ClrKeywords = ClrTraceEventParser.Keywords

let rundownFileKeyMap sessionName =
    use session = new TraceEventSession(sessionName)

    let traceFlags = NtKeywords.DiskFileIO ||| NtKeywords.DiskIO
    session.EnableKernelProvider(traceFlags, NtKeywords.None) |> ignore

    let fileKeyMap = Dictionary<uint64, string>()
    session.Source.Kernel.add_FileIOFileRundown(fun ev -> fileKeyMap.[ev.FileKey] <- ev.FileName)

    use cts = new CancellationTokenSource(TimeSpan.FromSeconds(2.0))
    use _r = cts.Token.Register(fun _ -> session.Stop() |> ignore)

    session.Source.Process() |> ignore

    fileKeyMap

let processEvent (fileObjectAndKeyMap : Dictionary<uint64, string>) (ev : TraceEvent) =
    let opcode = int32(ev.Opcode)

    let fileKey =
        let i = ev.PayloadIndex("FileKey")
        if i >= 0 then ev.PayloadValue(i) :?> uint64 else 0UL
    let fileObject =
        let i = ev.PayloadIndex("FileObject")
        if i >= 0 then ev.PayloadValue(i) :?> uint64 else 0UL

    let path = 
        if opcode = 64 (* create *) then
            let ev = ev :?> FileIOCreateTraceData
            let fileName = ev.FileName
            fileObjectAndKeyMap.[fileObject] <- fileName
            fileName
        else
            let chooser k =
                match fileObjectAndKeyMap.TryGetValue(k) with
                | (true, s) -> Some s
                | (false, _) -> None
            seq { fileKey; fileObject } 
            |> Seq.tryPick chooser 
            |> Option.defaultValue "n/a"

    printfn "%d %s (%d) (%s) [%d.%d] (%s) key: 0x%X object: 0x%X '%s'" 
        (uint32(ev.EventIndex)) ev.EventName opcode (ev.GetType().Name) 
        ev.ProcessID ev.ThreadID ev.ProcessName fileKey fileObject path

[<EntryPoint>]
let main _ =
    let sessionName = sprintf "mytrace-%d" DateTime.UtcNow.Ticks

    use session = new TraceEventSession(sessionName)

    let traceFlags = NtKeywords.FileIOInit
    let stackFlags = NtKeywords.None
    session.EnableKernelProvider(traceFlags, stackFlags) |> ignore

    printf "Collecting rundown events..."
    let fileObjectAndKeyMap = rundownFileKeyMap (sprintf "%s_rundown" sessionName)
    printfn "done"
    printfn "Key map size: %d" fileObjectAndKeyMap.Count

    use _sub = session.Source.Kernel.Observe(fun s -> s.StartsWith("FileIO", StringComparison.Ordinal))
               |> Observable.subscribe (processEvent fileObjectAndKeyMap)

    Console.CancelKeyPress.Add(fun ev -> ev.Cancel <- true; session.Stop() |> ignore)

    session.Source.Process() |> ignore

    printfn "Key map size: %d" fileObjectAndKeyMap.Count
    0

Notice, I am not removing anything from the dictionary, so you may want to add logic for that if you plan to run the trace session for a long time.

[BONUS] What are FileObject and FileKey

I’m unsure why some FileIO events have only the FileObject field, while others have both, FileKey and FileObject. To check what these fields represent, I created a simple app in LINQPad that keeps a file open while running:

do
    use f = new StreamWriter("D:\\temp\\test.txt")

    let rec readAndWrite () =
        let r = Console.ReadLine()
        if r  "" then
            f.WriteLine(r)
            f.Flush()
            readAndWrite ()

    readAndWrite ()

Then I started a tracing app and recorded few writes:

6243 12415615435 FileIO/Write (68) (FileIOReadWriteTraceData) [7180.1784] (LINQPad6.Query) key: 0xFFFFAE0C9B0EB7C0 object: 0xFFFF970FC37949F0 'D:\temp\test.txt'
6244 12415615551 FileIO/Write (68) (FileIOReadWriteTraceData) [7180.1784] (LINQPad6.Query) key: 0xFFFFAE0C9B0EB7C0 object: 0xFFFF970FC37949F0 'D:\temp\test.txt'
17566 12449509307 FileIO/Write (68) (FileIOReadWriteTraceData) [4.9748] (System) key: 0xFFFFAE0C9B0EB7C0 object: 0xFFFF970FC37949F0 'D:\temp\test.txt'
17567 12449509409 FileIO/SetInfo (69) (FileIOInfoTraceData) [4.9748] (System) key: 0xFFFFAE0C9B0EB7C0 object: 0xFFFF970FC37949F0 'D:\temp\test.txt'

Next, I attached WinDbg to the kernel locally and checked the FileObject address:

lkd> !fileobj 0xFFFF970FC37949F0

\temp\test.txt

Device Object: 0xffff970f418ebb90   \Driver\volmgr
Vpb: 0xffff970f41ae1640
Event signalled
Access: Write SharedRead 

Flags:  0x40062
    Synchronous IO
    Sequential Only
    Cache Supported
    Handle Created

FsContext: 0xffffae0c9b0eb7c0    FsContext2: 0xffffae0c9b0eba20
Private Cache Map: 0xffff970fbccf4b18
CurrentByteOffset: e
Cache Data:
  Section Object Pointers: ffff970f4be27838
  Shared Cache Map: ffff970fbccf49a0         File Offset: e in VACB number 0
  Vacb: ffff970f3fcdfe78
  Your data is at: ffffd98630f0000e

As the !fileobj command worked, FileObject is a Kernel Object representing a file. From the WinDbg output, we can also learn that FileKey denotes the FsContext field of the FILE_OBJECT structure. As we can see in the API documentation, for the file system drivers, it must point to the FSRTL_ADVANCED_FCB_HEADER. This does not answer the initial question why some events miss this field, but at least we know its meaning.

Decrypting PerfView’s OSExtensions.cs file

While analyzing the PerfView source code, I stumbled upon an interesting README file in the src/OSExtensions folder:

// The OSExtensions.DLL is a DLL that contains a small number of extensions
// to the operating system that allow it to do certain ETW operations.  
//
// However this DLL is implemented using private OS APIs, and as such should
// really be considered part of the operating system (until such time as
// the OS provide the functionality in public APIs).
//
// To discourage taking dependencies on these internal details we do not 
// provide the source code for this DLL in the open source repo. 
//
// IF YOU SIMPLY WANT TO BUILD PERFIVEW YOU DO NOT NEED TO BUILD OSExtensions!
// A binary copy of this DLL is included in the TraceEvent\OSExtensions.  
//*************************************************************************** 
// However we don't want this source code to be lost.  So we check it in
// with the rest of the code but in an encrypted form for only those few
// OS developers who may need to update this interface.   These people 
// should have access to the password needed to unencrpt the file.    
//
// As part of the build process for OSExtension.dll, we run the command 'syncEncrypted.exe'.
// This command keeps a encrypted and unencrypted version of a a file  in sync.
// Currently it is run on this pair
//
//  OSExtensions.cs   <-->  OSExtesions.cs.crypt
//
// Using a password file 'password.txt'  
//
// Thus if the password.txt exists and OSExtesions.cs.crypt exist, it will
// unencrypt it to OSExtesions.cs.   If OSExtesions.cs is newer, it will
// be reencrypted to OSExtesions.cs.crypt. 

Hmm, private OS APIs seem pretty exciting, right? A simple way to check these APIs would be to disassemble the OSExtensions.dll (for example, with dnSpy). But this method would not show us comments. And for internal APIs, they might contain valuable information. So let’s see if we can do better.

How OSExtensions.cs.crypt is encrypted

As mentioned in the README, internal PerfView developers should use the provided password.txt file and the SyncEncrypted.exe application. The SyncEncrypted.exe binaries are in the same folder as OSExtensions.cs.crypt, and they are not obfuscated. So we could see what’s the encryption method in use. The Decrypt method disassembled by dnSpy looks as follows:

// Token: 0x06000003 RID: 3 RVA: 0x000022A8 File Offset: 0x000004A8
private static void Decrypt(string inFileName, string outFileName, string password)
{
    Console.WriteLine("Decrypting {0} -> {1}", inFileName, outFileName);
    using (FileStream fileStream = new FileStream(inFileName, FileMode.Open, FileAccess.Read))
    {
        using (FileStream fileStream2 = new FileStream(outFileName, FileMode.Create, FileAccess.Write))
        {
            using (CryptoStream cryptoStream = new CryptoStream(fileStream, new DESCryptoServiceProvider
            {
                Key = Program.GetKey(password),
                IV = Program.GetKey(password)
            }.CreateDecryptor(), CryptoStreamMode.Read))
            {
                cryptoStream.CopyTo(fileStream2);
            }
        }
    }
}

You don’t see DES being used nowadays as its key length (56-bit) is too short for secure communication. However, 56-bit key space contains around 7,2 x 1016 keys, which may be nothing for NSA but, on my desktop, I wouldn’t finish the decryption in my lifetime. As you’re reading this post, you may assume I found another way :). The key to the shortcut is the Program.GetKey method:

// Token: 0x06000004 RID: 4 RVA: 0x0000235C File Offset: 0x0000055C
private static byte[] GetKey(string password)
{
    return Encoding.ASCII.GetBytes(password.GetHashCode().ToString("x").PadLeft(8, '0'));
}

The code above produces up to to 232 (4 294 967 296) unique keys, which is also the number of possible hash codes. And attacking such a key space is possible even on my desktop.

Preparing the decryption process

Now, it’s time to decide what we should call successful decryption. Firstly, it must not throw an exception, so the padding must be valid. As the padding is not explicitly set, DES will use PKCS7. Also, the mode for operation will be CBC:

We also know that the resulting C# file should contain mostly readable text, so it should have a high percentage of bytes in the ASCII range (assuming the data is UTF-8 encoded). Checking all these conditions would work best if we were trying to decrypt the whole file in each try. However, doing so would consume much time as the file is about 40 KB in size. An improvement would be to use the first few blocks of the encrypted file for counting the ASCII statistics, and two last blocks for the padding validation. Fortunately, we can do even better. I noticed that all the source code files in the repository are UTF-8 encoded with BOM. That means we could try decrypting only the first 64 bits of the ciphertext and check if the resulting plaintext starts with 0xEF, 0xBB, 0xBF. If it does, it may be the plaintext we are looking for. By appending the last two blocks of the ciphertext, we could also validate the padding. I haven’t done that, and I just disabled padding in DES. My decryptor code in F# (I’m still learning it) looks as follows:

open System
open System.IO
open System.Reflection
open System.Security.Cryptography
open System.Text
open System.Threading

let chars = "0123456789abcdef"B

[<Literal>]
let PassLen = 8

let homePath = Path.GetDirectoryName(Assembly.GetExecutingAssembly().Location)

let decrypt (des : DESCryptoServiceProvider) (cipher : byte array) (plain : byte array) (key : byte array) =
    use instream = new MemoryStream(cipher)
    use outstream = new MemoryStream(plain)
    use decryptor =  des.CreateDecryptor(key, key)
    use cryptostream = new CryptoStream(instream, decryptor, CryptoStreamMode.Read)
    cryptostream.CopyTo(outstream)

let rec gen n (s : byte array) = seq {
    for i = 0 to chars.Length - 1 do
        if n = s.Length - 1 then
            s.[n] <- chars.[i]
            yield s
        else
            s.[n] <- chars.[i] 
            yield! gen (n + 1) s
            s.[n + 1] <- chars.[0]
}

let cipher = File.ReadAllBytes(Path.Combine(homePath, @"OSExtensions.cs.crypt"))

let tryDecrypt (state : byte array) =

    let checkIfValid (plain : byte array) = 
        plain.[0] = byte(0xef) && plain.[1] = byte(0xbb) && plain.[2] = byte(0xbf)

    // no padding so we won't throw unnecessary exceptions
    use des = new DESCryptoServiceProvider(Padding = PaddingMode.None)
    let plain = Array.create PassLen 0uy

    // we will generate all random strings starting from the second place in the string
    for pass in (gen 1 state) do
        decrypt des cipher plain pass
        if checkIfValid plain then
            let fileName = sprintf "pass_%d_%d.txt" Thread.CurrentThread.ManagedThreadId DateTime.UtcNow.Ticks
            File.WriteAllLines(Path.Combine(homePath, fileName), [| 
                sprintf "Found pass: %s" (Encoding.ASCII.GetString(pass));
                sprintf "Decrypted: %s" (Encoding.ASCII.GetString(plain))
            |])

[<EntryPoint>]
let main argv =
    let zeroArray = Array.create (PassLen - 1) '0'B

    chars |> Array.map (fun ch -> async { zeroArray |> Array.append [| ch |] |> tryDecrypt }) 
          |> Async.Parallel 
          |> Async.RunSynchronously 
          |> ignore
    0 // return an integer exit code

It’s also published in my blog samples repository.

And the content of OSExtensions.cs.crypt is:

       0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F
0000: 7b 37 a0 49 50 0b b2 49                          {7.IP..I

Decrypting the file

After about 40 minutes, the application generated 108 pass_ files already and one of them had the following content:

Found pass: 436886a4
Decrypted: ???using

The using statement is a very probable beginning of the C# file :). And indeed, the hash code above decrypts the whole OSExtensions.cs.crypt file. We won’t know the original PerfView password, but, as an exercise, you may try to look for strings that have the above hash code. If you find one, please leave a comment!

How Ansible impersonates users on Windows

Recently, I hit an interesting error during a deployment orchestrated by Ansible. One of the deployment steps was to execute a custom .NET application. Unfortunately, the application was failing on each run with an ACCESS DENIED error. After collecting the stack trace, I found that the failing code was ProtectedData.Protect(messageBytes, null, DataProtectionScope.CurrentUser), so a call to the Data Protection API. To pinpoint a problem I created a simple playbook:

- hosts: all
  gather_facts: no
  vars:
    ansible_user: testu
    ansible_connection: winrm
    ansible_winrm_transport: basic
    ansible_winrm_server_cert_validation: ignore
  tasks:
    - win_shell: |
        Add-Type -AssemblyName "System.Security"; \
        [System.Security.Cryptography.ProtectedData]::Protect([System.Text.Encoding]::GetEncoding(
            "UTF-8").GetBytes("test12345"), $null, [System.Security.Cryptography.DataProtectionScope]::CurrentUser)
      args:
        executable: powershell
      register: output

    - debug:
        var: output

When I run it I get the following error:

fatal: [192.168.0.30]: FAILED! => {"changed": true, "cmd": "Add-Type -AssemblyName \"System.Security\"; [System.Security.Cryptography.ProtectedData]::Protect([System.Text.Encoding]::GetEncoding(\n    \"UTF-8\").GetBytes(\"test\"), $null, [System.Security.Cryptography.DataProtectionScope]::CurrentUser)", "delta": "0:00:00.807970", "end": "2020-05-04 11:34:29.469908", "msg": "non-zero return code", "rc": 1, "start": "2020-05-04 11:34:28.661938", "stderr": "Exception calling \"Protect\" with \"3\" argument(s): \"Access is denied.\r\n\"\r\nAt line:1 char:107\r\n+ ... .Security\"; [System.Security.Cryptography.ProtectedData]::Protect([Sy ...\r\n+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\r\n    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException\r\n    + FullyQualifiedErrorId : CryptographicException", "stderr_lines": ["Exception calling \"Protect\" with \"3\" argument(s): \"Access is denied.", "\"", "At line:1 char:107", "+ ... .Security\"; [System.Security.Cryptography.ProtectedData]::Protect([Sy ...", "+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~", "    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException", "    + FullyQualifiedErrorId : CryptographicException"], "stdout": "", "stdout_lines": []}

The workaround to make it always work was to use the Ansible become parameters:

...
  tasks:
    - win_shell: |
        Add-Type -AssemblyName "System.Security"; \
        [System.Security.Cryptography.ProtectedData]::Protect([System.Text.Encoding]::GetEncoding(
            "UTF-8").GetBytes("test12345"), $null, [System.Security.Cryptography.DataProtectionScope]::CurrentUser)
      args:
        executable: powershell
      become_method: runas
      become_user: testu
      become: yes
      register: output
...

Interestingly, the original playbook succeeds if the testu user has signed in to the remote system interactively (for example, by opening an RDP session) and encrypted something with DPAPI before running the script.

It only made me even more curious about what is happening here. I hope it made you too 🙂

What happens when you encrypt data with DPAPI

When you call CryptProtectData (or its managed wrapper ProtectedData.Protect), internally you are connecting to an RPC endpoint protected_storage exposed by the Lsass process. The procedure s_SSCryptProtectData, implemented in the dpapisrv.dll library, encrypts the data using the user’s master key. The master key is encrypted, and to decrypt it, Lsass needs a hash of the user’s password. The decryption process involves multiple steps, and if you are interested in its details, have a look at this post.

Examining the impersonation code

Before we dive into the Ansible impersonation code, I highly recommend checking the Ansible documentation on this subject as it is exceptional and covers all the authentication cases. In this post, I am describing only my case, when I am not specifying the become_user password. However, by reading the referenced code, you should have no problems in understanding other scenarios as well.

Four C# files contain the impersonation code, with the most important one being Ansible.Become.cs. Become flags define what type of access token Ansible creates for a given user session. Get-BecomeFlags contains the logic of the flags parser and handles the interaction with the C# code.

A side note: while playing with the exec wrapper, I discovered an interesting environment variable: ANSIBLE_EXEC_DEBUG. You may set its value to a path of a file where you want Ansible to write its logs. They might reveal some details on how Ansible executes your commands.

For my case, the logic of the become_wrapper could be expressed in the following PowerShell commands:

PS> Import-Module -Name $pwd\Ansible.ModuleUtils.AddType.psm1
PS> $cs = [System.IO.File]::ReadAllText("$pwd\Ansible.Become.cs"), [System.IO.File]::ReadAllText("$pwd\Ansible.Process.cs"), [System.IO.File]::ReadAllText("$pwd\Ansible.AccessToken.cs")
PS> Add-CSharpType -References $cs -IncludeDebugInfo -CompileSymbols @("TRACE")

PS> [Ansible.Become.BecomeUtil]::CreateProcessAsUser("testu", [NullString]::Value, "powershell.exe -NonInteractive -NoProfile -ExecutionPolicy Bypass -EncodedCommand QQBkAGQALQBU...MAZQByACkA")

StandardOut
-----------
1...

The stripped base64 string is the encoded version of the commands I had in my Ansible playbook:

PS> [Text.Encoding]::Unicode.GetString([Convert]::FromBase64String("QQBkAGQALQBU...MAZQByACkA"))

Add-Type -AssemblyName "System.Security";[System.Security.Cryptography.ProtectedData]::Protect([System.Text.Encoding]::GetEncoding("UTF-16LE").GetBytes("test12345"), $null, [System.Security.Cryptography.DataProtectionScope]::CurrentUser)

The CreateProcessAsUser method internally calls GetUserTokens to create an elevated and a regular token (or only one if no elevation is available/required). As I do not specify a password neither a logon type, my code will eventually call GetS4UTokenForUser. S4U, or in other words, “Service for Users”, is a solution that allows services to obtain a logon for the user, but without providing the user’s credentials. To use S4U, services call the LsaLogonUser method, passing a KERB_S4U_LOGON structure as the AuthenticationInformation parameter. Of course, not all services can impersonate users. Firstly, the service must have the “Act as part of the operating system” privilege (SeTcbPrivilege). Secondly, it must register itself as a logon application (LsaRegisterLogonProcess). So how Ansible achieves that? It simply tries to “steal” (duplicate ;)) a token from one of the privileged processes by executing GetPrimaryTokenForUser(new SecurityIdentifier("S-1-5-18"), new List<string>() { "SeTcbPrivilege" }). As this method code is not very long and well documented, let me cite it here (GPL 3.0 license):

private static SafeNativeHandle GetPrimaryTokenForUser(SecurityIdentifier sid, List<string> requiredPrivileges = null)
{
    // According to CreateProcessWithTokenW we require a token with
    //  TOKEN_QUERY, TOKEN_DUPLICATE and TOKEN_ASSIGN_PRIMARY
    // Also add in TOKEN_IMPERSONATE so we can get an impersonated token
    TokenAccessLevels dwAccess = TokenAccessLevels.Query |
        TokenAccessLevels.Duplicate |
        TokenAccessLevels.AssignPrimary |
        TokenAccessLevels.Impersonate;
    foreach (SafeNativeHandle hToken in TokenUtil.EnumerateUserTokens(sid, dwAccess))
    {
        // Filter out any Network logon tokens, using become with that is useless when S4U
        // can give us a Batch logon
        NativeHelpers.SECURITY_LOGON_TYPE tokenLogonType = GetTokenLogonType(hToken);
        if (tokenLogonType == NativeHelpers.SECURITY_LOGON_TYPE.Network)
            continue;
        // Check that the required privileges are on the token
        if (requiredPrivileges != null)
        {
            List<string> actualPrivileges = TokenUtil.GetTokenPrivileges(hToken).Select(x => x.Name).ToList();
            int missing = requiredPrivileges.Where(x => !actualPrivileges.Contains(x)).Count();
            if (missing > 0)
                continue;
        }
        // Duplicate the token to convert it to a primary token with the access level required.
        try
        {
            return TokenUtil.DuplicateToken(hToken, TokenAccessLevels.MaximumAllowed, SecurityImpersonationLevel.Anonymous,
                TokenType.Primary);
        }
        catch (Process.Win32Exception)
        {
            continue;
        }
    }
    return null;
}


public static IEnumerable<SafeNativeHandle> EnumerateUserTokens(SecurityIdentifier sid,
    TokenAccessLevels access = TokenAccessLevels.Query)
{
    foreach (System.Diagnostics.Process process in System.Diagnostics.Process.GetProcesses())
    {
        // We always need the Query access level so we can query the TokenUser
        using (process)
        using (SafeNativeHandle hToken = TryOpenAccessToken(process, access | TokenAccessLevels.Query))
        {
            if (hToken == null)
                continue;
            if (!sid.Equals(GetTokenUser(hToken)))
                continue;
            yield return hToken;
        }
    }
}

private static SafeNativeHandle TryOpenAccessToken(System.Diagnostics.Process process, TokenAccessLevels access)
{
    try
    {
        using (SafeNativeHandle hProcess = OpenProcess(process.Id, ProcessAccessFlags.QueryInformation, false))
            return OpenProcessToken(hProcess, access);
    }
    catch (Win32Exception)
    {
        return null;
    }
}

Once Ansible obtains the SYSTEM token, it can register itself as a logon application and finally call LsaLogonUser to obtain the impersonation token (GetS4UTokenForUser). With the right token, it can execute CreateProcessWithTokenW and start the process in a desired user’s context.

Playing with the access tokens using TokenViewer

As we reached this point, maybe it is worth to play a bit more with Windows tokens, and try to reproduce the initial Access Denied error. For this purpose, I slightly modified the TokenViewer tool developed by James Forshaw (Google). You may find the code of my version in my blog repository.

Let’s run TokenViewer as the SYSTEM user. That should give us SeTcbPrivilege, necessary to create an impersonated tokens: psexec -s -i TokenViewer.exe. Next, let’s create an access token for the Network logon type:

Network logon session
Network logon token

On the group tab, there should be the NT AUTHORITY\NETWORK group listed. Now, let’s try to encrypt the “Hello World!” text with DPAPI on the Operations tab. We should receive an Access Denied error:

DPAPI Access Denied error

Leave the Token window open, move to the main window, and create a token for the Batch logon type. This is the token Ansible creates in the “become mode”. The groups tab should have the NT AUTHORITY\BATCH group enabled, and DPAPI encryption should work. Don’t close this window and move back to the previous token window. DPAPI will work now too.

I am not familiar enough with Lsass to explain in details what is happening here. However, I assume that the DPAPI problem is caused by the fact that Lsass does not cache user credentials when the user signs in with the logon type NETWORK (probably because of the performance reasons). Therefore, the lsasrv!LsapGetCredentials method fails when DPAPI calls it to retrieve the user password’s hash to decrypt the master key. Interestingly, if we open another session for a given user (for example, an interactive one), and call DPAPI to encrypt/decrypt some data, the user’s master key lands in the cache (lsasrv!g_MasterKeyCacheList). DPAPI searches this cache (dpapisrv!SearchMasterKeyCache) before calling LsapGetCredentials. That explains why our second call to DPAPI succeeded in the NETWORK logon session.

lowleveldesign

❌