Reading view

There are new articles available, click to refresh the page.

ObjDir – Rust Version

In the previous post, I’ve shown how to write a minimal, but functional, Projected File System provider using C++. I also semi-promised to write a version of that provider in Rust. I thought we should start small, by implementing a command line tool I wrote years ago called objdir. Its purpose is to be a “command line” version of a simplified WinObj from Sysinternals. It should be able to list objects (name and type) within a given object manager namespace directory. Here are a couple of examples:

D:\>objdir \
PendingRenameMutex (Mutant)
ObjectTypes (Directory)
storqosfltport (FilterConnectionPort)
MicrosoftMalwareProtectionRemoteIoPortWD (FilterConnectionPort)
Container_Microsoft.OutlookForWindows_1.2024.214.400_x64__8wekyb3d8bbwe-S-1-5-21-3968166439-3083973779-398838822-1001 (Job)
MicrosoftDataLossPreventionPort (FilterConnectionPort)
SystemRoot (SymbolicLink)
exFAT (Device)
Sessions (Directory)
MicrosoftMalwareProtectionVeryLowIoPortWD (FilterConnectionPort)
ArcName (Directory)
PrjFltPort (FilterConnectionPort)
WcifsPort (FilterConnectionPort)
...

D:\>objdir \kernelobjects
MemoryErrors (SymbolicLink)
LowNonPagedPoolCondition (Event)
Session1 (Session)
SuperfetchScenarioNotify (Event)
SuperfetchParametersChanged (Event)
PhysicalMemoryChange (SymbolicLink)
HighCommitCondition (SymbolicLink)
BcdSyncMutant (Mutant)
HighMemoryCondition (SymbolicLink)
HighNonPagedPoolCondition (Event)
MemoryPartition0 (Partition)
...

Since enumerating object manager directories is required for our ProjFS provider, once we implement objdir in Rust, we’ll have good starting point for implementing the full provider in Rust.

This post assumes you are familiar with the fundamentals of Rust. Even if you’re not, the code should still be fairly understandable, as we’re mostly going to use unsafe rust to do the real work.

Unsafe Rust

One of the main selling points of Rust is its safety – memory and concurrency safety guaranteed at compile time. However, there are cases where access is needed that cannot be checked by the Rust compiler, such as the need to call external C functions, such as OS APIs. Rust allows this by using unsafe blocks or functions. Within unsafe blocks, certain operations are allowed which are normally forbidden; it’s up to the developer to make sure the invariants assumed by Rust are not violated – essentially making sure nothing leaks, or otherwise misused.

The Rust standard library provides some support for calling C functions, mostly in the std::ffi module (FFI=Foreign Function Interface). This is pretty bare bones, providing a C-string class, for example. That’s not rich enough, unfortunately. First, strings in Windows are mostly UTF-16, which is not the same as a classic C string, and not the same as the Rust standard String type. More importantly, any C function that needs to be invoked must be properly exposed as an extern "C" function, using the correct Rust types that provide the same binary representation as the C types.

Doing all this manually is a lot of error-prone, non-trivial, work. It only makes sense for simple and limited sets of functions. In our case, we need to use native APIs, like NtOpenDirectoryObject and NtQueryDirectoryObject. To simplify matters, there are crates available in crates.io (the master Rust crates repository) that already provide such declarations.

Adding Dependencies

Assuming you have Rust installed, open a command window and create a new project named objdir:

cargo new objdir

This will create a subdirectory named objdir, hosting the binary crate created. Now we can open cargo.toml (the manifest) and add dependencies for the following crates:

[dependencies]
ntapi = "0.4"
winapi = { version = "0.3.9", features = [ "impl-default" ] }

winapi provides most of the Windows API declarations, but does not provide native APIs. ntapi provides those additional declarations, and in fact depends on winapi for some fundamental types (which we’ll need). The feature “impl-default” indicates we would like the implementations of the standard Rust Default trait provided – we’ll need that later.

The main Function

The main function is going to accept a command line argument to indicate the directory to enumerate. If no parameters are provided, we’ll assume the root directory is requested. Here is one way to get that directory:

let dir = std::env::args().skip(1).next().unwrap_or("\\".to_owned());

(Note that unfortunately the WordPress system I’m using to write this post has no syntax highlighting for Rust, the code might be uglier than expected; I’ve set it to C++).

The args method returns an iterator. We skip the first item (the executable itself), and grab the next one with next. It returns an Option<String>, so we grab the string if there is one, or use a fixed backslash as the string.

Next, we’ll call a helper function, enum_directory that does the heavy lifting and get back a Result where success is a vector of tuples, each containing the object’s name and type (Vec<(String, String)>). Based on the result, we can display the results or report an error:

let result = enum_directory(&dir);
match result {
    Ok(objects) => {
        for (name, typename) in &objects {
            println!("{name} ({typename})");
        }
        println!("{} objects.", objects.len());
    },
    Err(status) => println!("Error: 0x{status:X}")
};

That is it for the main function.

Enumerating Objects

Since we need to use APIs defined within the winapi and ntapi crates, let’s bring them into scope for easier access at the top of the file:

use winapi::shared::ntdef::*;
use ntapi::ntobapi::*;
use ntapi::ntrtl::*;

I’m using the “glob” operator (*) to make it easy to just use the function names directly without any prefix. Why these specific modules? Based on the APIs and types we’re going to need, these are where these are defined (check the documentation for these crates).

enum_directory is where the real is done. Here its declararion:

fn enum_directory(dir: &str) -> Result<Vec<(String, String)>, NTSTATUS> {

The function accepts a string slice and returns a Result type, where the Ok variant is a vector of tuples consisting of two standard Rust strings.

The following code follows the basic logic of the EnumDirectoryObjects function from the ProjFS example in the previous post, without the capability of search or filter. We’ll add that when we work on the actual ProjFS project in a future post.

The first thing to do is open the given directory object with NtOpenDirectoryObject. For that we need to prepare an OBJECT_ATTRIBUTES and a UNICODE_STRING. Here is what that looks like:

let mut items = vec![];

unsafe {
    let mut udir = UNICODE_STRING::default();
    let wdir = string_to_wstring(&dir);
    RtlInitUnicodeString(&mut udir, wdir.as_ptr());
    let mut dir_attr = OBJECT_ATTRIBUTES::default();
    InitializeObjectAttributes(&mut dir_attr, &mut udir, OBJ_CASE_INSENSITIVE, NULL, NULL);

We start by creating an empty vector to hold the results. We don’t need any type annotation because later in the code the compiler would have enough information to deduce it on its own. We then start an unsafe block because we’re calling C APIs.

Next, we create a default-initialized UNICODE_STRING and use a helper function to convert a Rust string slice to a UTF-16 string, usable by native APIs. We’ll see this string_to_wstring helper function once we’re done with this one. The returned value is in fact a Vec<u16> – an array of UTF-16 characters.

The next step is to call RtlInitUnicodeString, to initialize the UNICODE_STRING based on the UTF-16 string we just received. Methods such as as_ptr are necessary to make the Rust compiler happy. Finally, we create a default OBJECT_ATTRIBUTES and initialize it with the udir (the UTF-16 directory string). All the types and constants used are provided by the crates we’re using.

The next step is to actually open the directory, which could fail because of insufficient access or a directory that does not exist. In that case, we just return an error. Otherwise, we move to the next step:

let mut hdir: HANDLE = NULL;
match NtOpenDirectoryObject(&mut hdir, DIRECTORY_QUERY, &mut dir_attr) {
    0 => {
        // do real work...
    },
    err => Err(err),
}

The NULL here is just a type alias for the Rust provided C void pointer with a value of zero (*mut c_void). We examine the NTSTATUS returned using a match expression: If it’s not zero (STATUS_SUCCESS), it must be an error and we return an Err object with the status. if it’s zero, we’re good to go. Now comes the real work.

We need to allocate a buffer to receive the object information in this directory and be prepared for the case the information is too big for the allocated buffer, so we may need to loop around to get the next “chunk” of data. This is how the NtQueryDirectoryObject is expected to be used. Let’s allocate a buffer using the standard Vec<> type and prepare some locals:

const LEN: u32 = 1 << 16;
let mut first = 1;
let mut buffer: Vec<u8> = Vec::with_capacity(LEN as usize);
let mut index = 0u32;
let mut size: u32 = 0;

We’re allocating 64KB, but could have chosen any number. Now the loop:

loop {
    let start = index;
    if NtQueryDirectoryObject(hdir, buffer.as_mut_ptr().cast(), LEN, 0, first, &mut index, &mut size) < 0 {
        break;
    }
    first = 0;
    let mut obuffer = buffer.as_ptr() as *const OBJECT_DIRECTORY_INFORMATION;
    for _ in 0..index - start {
        let item = *obuffer;
        let name = String::from_utf16_lossy(std::slice::from_raw_parts(item.Name.Buffer, (item.Name.Length / 2) as usize));
        let typename = String::from_utf16_lossy(std::slice::from_raw_parts(item.TypeName.Buffer, (item.TypeName.Length / 2) as usize));
        items.push((name, typename));
        obuffer = obuffer.add(1);
    }
}
Ok(items)

There are quite a few things going on here. if NtQueryDirectoryObject fails, we break out of the loop. This happens when there are is no more information to give. If there is data, buffer is cast to a OBJECT_DIRECTORY_INFORMATION pointer, and we can loop around on the items that were returned. start is used to keep track of the previous number of items delivered. first is 1 (true) the first time through the loop to force the NtQueryDirectoryObject to start from the beginning.

Once we have an item (item), its two members are extracted. item is of type OBJECT_DIRECTORY_INFORMATION and has two members: Name and TypeName (both UNICODE_STRING). Since we want to return standard Rust strings (which, by the way, are UTF-8 encoded), we must convert the UNICODE_STRINGs to Rust strings. String::from_utf16_lossy performs such a conversion, but we must specify the number of characters, because a UNICODE_STRING does not have to be NULL-terminated. The trick here is std::slice::from_raw_parts that can have a length, which is half of the number of bytes (Length member in UNICODE_STRING).

Finally, Vec<>.push is called to add the tuple (name, typename) to the vector. This is what allows the compiler to infer the vector type. Once we exit the loop, the Ok variant of Result<> is returned with the vector.

The last function used is the helper to convert a Rust string slice to a UTF-16 null-terminated string:

fn string_to_wstring(s: &str) -> Vec<u16> {
    let mut wstring: Vec<_> = s.encode_utf16().collect();
    wstring.push(0);    // null terminator
    wstring
}

And that is it. The Rust version of objdir is functional.

The full source is at zodiacon/objdir-rs: Rust version of the objdir tool (github.com)

If you want to know more about Rust, consider signing up for my upcoming Rust masterclass programming.

Projected File System

A little-known feature in modern Windows is the ability to expose hierarchical data using the file system. This is called Windows Projected File System (ProjFS), available since Windows 10 version 1809. There is even a sample that exposes the Registry hierarchy using this technology. Using the file system as a “projection” mechanism provides a couple of advantages over a custom mechanism:

  • Any file viewing tool can present the information such as Explorer, or commands in a terminal.
  • “Standard” file APIs are used, which are well-known, and available in any programming language or library.

Let’s see how to build a Projected File System provider from scratch. We’ll expose object manager directories as file system directories, and other types of objects as “files”. Normally, we can see the object manager’s namespace with dedicated tools, such as WinObj from Sysinternals, or my own Object Explorer:

WinObj showing parts of the object manager namespace

Here is an example of what we are aiming for (viewed with Explorer):

Explorer showing the root of the object manager namespace

First, support for ProjFS must be enabled to be usable. You can enable it with the Windows Features dialog or PowerShell:

Enable-WindowsOptionalFeature -Online -FeatureName Client-ProjFS -NoRestart

We’ll start by creating a C++ console application named ObjMgrProjFS; I’ve used the Windows Desktop Wizard project with a precompiled header (pch.h):

#pragma once

#include <Windows.h>
#include <projectedfslib.h>

#include <string>
#include <vector>
#include <memory>
#include <map>
#include <ranges>
#include <algorithm>
#include <format>
#include <optional>
#include <functional>

projectedfslib.h is where the ProjFS declarations reside. projectedfslib.lib is the import library to link against. In this post, I’ll focus on the main coding aspects, rather than going through every little piece of code. The full code can be found at https://github.com/zodiacon/objmgrprojfs. It’s of course possible to use other languages to implement a ProjFS provider. I’m going to attempt one in Rust in a future post 🙂

The projected file system must be rooted in a folder in the file system. It doesn’t have to be empty, but it makes sense to use such a directory for this purpose only. The main function will take the requested root folder as input and pass it to the ObjectManagerProjection class that is used to manage everything:

int wmain(int argc, const wchar_t* argv[]) {
	if (argc < 2) {
		printf("Usage: ObjMgrProjFS <root_dir>\n");
		return 0;
	}

	ObjectManagerProjection omp;
	if (auto hr = omp.Init(argv[1]); hr != S_OK)
		return Error(hr);

	if (auto hr = omp.Start(); hr != S_OK)
		return Error(hr);

	printf("Virtualizing at %ws. Press ENTER to stop virtualizing...\n", argv[1]);
	char buffer[3];
	gets_s(buffer);

	omp.Term();

	return 0;
}

Let start with the initialization. We want to create the requested directory (if it doesn’t already exist). If it does exist, we’ll use it. In fact, it could exist because of a previous run of the provider, so we can keep track of the instance ID (a GUID) so that the file system itself can use its caching capabilities. We’ll “hide” the GUID in a hidden file within the directory. First, create the directory:

HRESULT ObjectManagerProjection::Init(PCWSTR root) {
	GUID instanceId = GUID_NULL;
	std::wstring instanceFile(root);
	instanceFile += L"\\_obgmgrproj.guid";

	if (!::CreateDirectory(root, nullptr)) {
		//
		// failed, does it exist?
		//
		if (::GetLastError() != ERROR_ALREADY_EXISTS)
			return HRESULT_FROM_WIN32(::GetLastError());

If creation fails not because it exists, bail out with an error. Otherwise, get the instance ID that may be there and use that GUID if present:

	auto hFile = ::CreateFile(instanceFile.c_str(), GENERIC_READ, 
		FILE_SHARE_READ, nullptr, OPEN_EXISTING, 0, nullptr);
	if (hFile != INVALID_HANDLE_VALUE && ::GetFileSize(hFile, nullptr) == sizeof(GUID)) {
		DWORD ret;
		::ReadFile(hFile, &instanceId, sizeof(instanceId), &ret, nullptr);
		::CloseHandle(hFile);
	}
}

If we need to generate a new GUID, we’ll do that with CoCreateGuid and write it to the hidden file:

if (instanceId == GUID_NULL) {
	::CoCreateGuid(&instanceId);
	//
	// write instance ID
	//
	auto hFile = ::CreateFile(instanceFile.c_str(), GENERIC_WRITE, 0, nullptr, CREATE_NEW, FILE_ATTRIBUTE_HIDDEN, nullptr);
	if (hFile != INVALID_HANDLE_VALUE) {
		DWORD ret;
		::WriteFile(hFile, &instanceId, sizeof(instanceId), &ret, nullptr);
		::CloseHandle(hFile);
	}
}

Finally, we must register the root with ProjFS:

auto hr = ::PrjMarkDirectoryAsPlaceholder(root, nullptr, nullptr, &instanceId);
if (FAILED(hr))
	return hr;

m_RootDir = root;
return hr;

Once Init succeeds, we need to start the actual virtualization. To that end, a structure of callbacks must be filled so that ProjFS knows what functions to call to get the information requested by the file system. This is the job of the Start method:

HRESULT ObjectManagerProjection::Start() {
	PRJ_CALLBACKS cb{};
	cb.StartDirectoryEnumerationCallback = StartDirectoryEnumerationCallback;
	cb.EndDirectoryEnumerationCallback = EndDirectoryEnumerationCallback;
	cb.GetDirectoryEnumerationCallback = GetDirectoryEnumerationCallback;
	cb.GetPlaceholderInfoCallback = GetPlaceholderInformationCallback;
	cb.GetFileDataCallback = GetFileDataCallback;

	auto hr = ::PrjStartVirtualizing(m_RootDir.c_str(), &cb, this, nullptr, &m_VirtContext);
	return hr;
}

The callbacks specified above are the absolute minimum required for a valid provider. PrjStartVirtualizing returns a virtualization context that identifies our provider, which we need to use (at least) when stopping virtualization. It’s a blocking call, which is convenient in a console app, but for other cases, it’s best put in a separate thread. The this value passed in is a user-defined context. We’ll use that to delegate these static callback functions to member functions. Here is the code for StartDirectoryEnumerationCallback:

HRESULT ObjectManagerProjection::StartDirectoryEnumerationCallback(const PRJ_CALLBACK_DATA* callbackData, const GUID* enumerationId) {
	return ((ObjectManagerProjection*)callbackData->InstanceContext)->DoStartDirectoryEnumerationCallback(callbackData, enumerationId);
}

The same trick is used for the other callbacks, so that we can implement the functionality within our class. The class ObjectManagerProjection itself holds on to the following data members of interest:

struct GUIDComparer {
	bool operator()(const GUID& lhs, const GUID& rhs) const {
		return memcmp(&lhs, &rhs, sizeof(rhs)) < 0;
	}
};

struct EnumInfo {
	std::vector<ObjectNameAndType> Objects;
	int Index{ -1 };
};
std::wstring m_RootDir;
PRJ_NAMESPACE_VIRTUALIZATION_CONTEXT m_VirtContext;
std::map<GUID, EnumInfo, GUIDComparer> m_Enumerations;

EnumInfo is a structure used to keep an object directory’s contents and the current index requested by the file system. A map is used to keep track of all current enumerations. Remember, it’s the file system – multiple directory listings may be happening at the same time. As it happens, each one is identified by a GUID, which is why it’s used as a key to the map. m_VirtContext is the returned value from PrjStartVirtualizing.

ObjectNameAndType is a little structure that stores the details of an object: its name and type:

struct ObjectNameAndType {
	std::wstring Name;
	std::wstring TypeName;
};

The Callbacks

Obviously, the bulk work for the provider is centered in the callbacks. Let’s start with StartDirectoryEnumerationCallback. Its purpose is to let the provider know that a new directory enumeration of some sort is beginning. The provider can make any necessary preparations. In our case, it’s about adding a new enumeration structure to manage based on the provided enumeration GUID:

HRESULT ObjectManagerProjection::DoStartDirectoryEnumerationCallback(const PRJ_CALLBACK_DATA* callbackData, const GUID* enumerationId) {
	EnumInfo info;
	m_Enumerations.insert({ *enumerationId, std::move(info) });
	return S_OK;
}

We just add a new entry to our map, since we must be able to distinguish between multiple enumerations that may be happening concurrently. The complementary callback ends an enumeration which is where we delete the item from the map:

HRESULT ObjectManagerProjection::DoEndDirectoryEnumerationCallback(const PRJ_CALLBACK_DATA* callbackData, const GUID* enumerationId) {
	m_Enumerations.erase(*enumerationId);
	return S_OK;
}

So far, so good. The real work is centered around the GetDirectoryEnumerationCallback callback where actual enumeration must take place. The callback receives the enumeration ID and a search expression – the client may try to search using functions such as FindFirstFile / FindNextFile or similar APIs. The provided PRJ_CALLBACK_DATA contains the basic details of the request such as the relative directory itself (which could be a subdirectory). First, we reject any unknown enumeration IDs:

HRESULT ObjectManagerProjection::DoGetDirectoryEnumerationCallback(
	const PRJ_CALLBACK_DATA* callbackData, const GUID* enumerationId, 
	PCWSTR searchExpression, PRJ_DIR_ENTRY_BUFFER_HANDLE dirEntryBufferHandle) {

	auto it = m_Enumerations.find(*enumerationId); 
	if(it == m_Enumerations.end())
		return E_INVALIDARG;
    auto& info = it->second;

Next, we need to enumerate the objects in the provided directory, taking into consideration the search expression (that may require returning a subset of the items):

	if (info.Index < 0 || (callbackData->Flags & PRJ_CB_DATA_FLAG_ENUM_RESTART_SCAN)) {
		auto compare = [&](auto name) {
			return ::PrjFileNameMatch(name, searchExpression);
			};
		info.Objects = ObjectManager::EnumDirectoryObjects(callbackData->FilePathName, nullptr, compare);
		std::ranges::sort(info.Objects, [](auto const& item1, auto const& item2) { 
			return ::PrjFileNameCompare(item1.Name.c_str(), item2.Name.c_str()) < 0; 
			});
		info.Index = 0;
	}

There are quite a few things happening here. ObjectManager::EnumDirectoryObjects is a helper function that does the actual enumeration of objects in the object manager’s namespace given the root directory (callbackData->FilePathName), which is always relative to the virtualization root, which is convenient – we don’t need to care where the actual root is. The compare lambda is passed to EnumDirectoryObjects to provide a filter based on the search expression. ProjFS provides the PrjFileNameMatch function we can use to test if a specific name should be returned or not. It has the logic that caters for wildcards like * and ?.

Once the results return in a vector (info.Objects), we must sort it. The file system expects returned files/directories to be sorted in a case insensitive way, but we don’t actually need to know that. PrjFileNameCompare is provided as a function to use for sorting purposes. We call sort on the returned vector passing this function PrjFileNameCompare as the compare function.

The enumeration must happen if the PRJ_CB_DATA_FLAG_ENUM_RESTART_SCAN is specified. I also enumerate if it’s the first call for this enumeration ID.

Now that we have results (or an empty vector), we can proceed by telling ProjFS about the results. If we have no results, just return success (an empty directory):

if (info.Objects.empty())
	return S_OK;

Otherwise, we must call PrjFillDirEntryBuffer for each entry in the results. However, ProjFS provides a limited buffer to accept data, which means we need to keep track of where we left off because we may be called again (without the PRJ_CB_DATA_FLAG_ENUM_RESTART_SCAN flag) to continue filling in data. This is why we keep track of the index we need to use.

The first step in the loop is to fill in details of the item: is it a subdirectory or a “file”? We can also specify the size of its data and common times like creation time, modify time, etc.:

while (info.Index < info.Objects.size()) {
	PRJ_FILE_BASIC_INFO itemInfo{};
	auto& item = info.Objects[info.Index];
	itemInfo.IsDirectory = item.TypeName == L"Directory";
	itemInfo.FileSize = itemInfo.IsDirectory ? 0 : 
		GetObjectSize((callbackData->FilePathName + std::wstring(L"\\") + item.Name).c_str(), item);

We fill in two details: a directory or not, based on the kernel object type being “Directory”, and a file size (in case of another type object). What is the meaning of a “file size”? It can mean whatever we want it to mean, including just specifying a size of zero. However, I decided that the “data” being held in an object would be text that provides the object’s name, type, and target (if it’s a symbolic link). Here are a few example when running the provider and using a command window:

C:\objectmanager>dir p*
Volume in drive C is OS
Volume Serial Number is 18CF-552E

Directory of C:\objectmanager

02/20/2024 11:09 AM 60 PdcPort.ALPC Port
02/20/2024 11:09 AM 76 PendingRenameMutex.Mutant
02/20/2024 11:09 AM 78 PowerMonitorPort.ALPC Port
02/20/2024 11:09 AM 64 PowerPort.ALPC Port
02/20/2024 11:09 AM 88 PrjFltPort.FilterConnectionPort
5 File(s) 366 bytes
0 Dir(s) 518,890,110,976 bytes free

C:\objectmanager>type PendingRenameMutex.Mutant
Name: PendingRenameMutex
Type: Mutant

C:\objectmanager>type powerport
Name: PowerPort
Type: ALPC Port

Here is PRJ_FILE_BASIC_INFO:

typedef struct PRJ_FILE_BASIC_INFO {
    BOOLEAN IsDirectory;
    INT64 FileSize;
    LARGE_INTEGER CreationTime;
    LARGE_INTEGER LastAccessTime;
    LARGE_INTEGER LastWriteTime;
    LARGE_INTEGER ChangeTime;
    UINT32 FileAttributes;
} PRJ_FILE_BASIC_INFO;

What is the meaning of the various times and file attributes? It can mean whatever you want – it might make sense for some types of data. If left at zero, the current time is used.

GetObjectSize is a helper function that calculates the number of bytes needed to keep the object’s text, which is what is reported to the file system.

Now we can pass the information for the item to ProjFS by calling PrjFillDirEntryBuffer:

	if (FAILED(::PrjFillDirEntryBuffer(
		(itemInfo.IsDirectory ? item.Name : (item.Name + L"." + item.TypeName)).c_str(), 
		&itemInfo, dirEntryBufferHandle)))
		break;
	info.Index++;
}

The “name” of the item is comprised of the kernel object’s name, and the “file extension” is the object’s type name. This is just a matter of choice – I could have passed the object’s name only so that it would appear as a file with no extension. If the call to PrjFillDirEntryBuffer fails, it means the buffer is full, so we break out, but the index is not incremented, so we can provide the next object in the next callback that does not requires a rescan.

We have two callbacks remaining. One is GetPlaceholderInformationCallback, whose purpose is to provide “placeholder” information about an item, without providing its data. This is used by the file system for caching purposes. The implementation is like so:

HRESULT ObjectManagerProjection::DoGetPlaceholderInformationCallback(const PRJ_CALLBACK_DATA* callbackData) {
	auto path = callbackData->FilePathName;
	auto dir = ObjectManager::DirectoryExists(path);
	std::optional<ObjectNameAndType> object;
	if (!dir)
		object = ObjectManager::ObjectExists(path);
	if(!dir && !object)
		return HRESULT_FROM_WIN32(ERROR_FILE_NOT_FOUND);

	PRJ_PLACEHOLDER_INFO info{};
	info.FileBasicInfo.IsDirectory = dir;
	info.FileBasicInfo.FileSize = dir ? 0 : GetObjectSize(path, object.value());
	return PrjWritePlaceholderInfo(m_VirtContext, callbackData->FilePathName, &info, sizeof(info));
}

The item could be a file or a directory. We use the file path name provided to figure out if it’s a directory kernel object or something else by utilizing some helpers in the ObjectManager class (we’ll examine those later). Then the structure PRJ_PLACEHOLDER_INFO is filled with the details and provided to PrjWritePlaceholderInfo.

The final required callback is the one that provides the data for files – objects in our case:

HRESULT ObjectManagerProjection::DoGetFileDataCallback(const PRJ_CALLBACK_DATA* callbackData, UINT64 byteOffset, UINT32 length) {
	auto object = ObjectManager::ObjectExists(callbackData->FilePathName);
	if (!object)
		return HRESULT_FROM_WIN32(ERROR_FILE_NOT_FOUND);

	auto buffer = ::PrjAllocateAlignedBuffer(m_VirtContext, length);
	if (!buffer)
		return E_OUTOFMEMORY;

	auto data = GetObjectData(callbackData->FilePathName, object.value());
	memcpy(buffer, (PBYTE)data.c_str() + byteOffset, length);
	auto hr = ::PrjWriteFileData(m_VirtContext, &callbackData->DataStreamId, buffer, byteOffset, length);
	::PrjFreeAlignedBuffer(buffer);

	return hr;
}

First we check if the object’s path is valid. Next, we need to allocate buffer for the data. There are some ProjFS alignment requirements, so we call PrjAllocateAlignedBuffer to allocate a properly-aligned buffer. Then we get the object data (a string, by calling our helper GetObjectData), and copy it into the allocated buffer. Finally, we pass the buffer to PrjWriteFileData and free the buffer. The byte offset provided is usually zero, but could theoretically be larger if the client reads from a non-zero position, so we must be prepared for it. In our case, the data is small, but in general it could be arbitrarily large.

GetObjectData itself looks like this:

std::wstring ObjectManagerProjection::GetObjectData(PCWSTR fullname, ObjectNameAndType const& info) {
	std::wstring target;
	if (info.TypeName == L"SymbolicLink") {
		target = ObjectManager::GetSymbolicLinkTarget(fullname);
	}
	auto result = std::format(L"Name: {}\nType: {}\n", info.Name, info.TypeName);
	if (!target.empty())
		result = std::format(L"{}Target: {}\n", result, target);
	return result;
}

It calls a helper function, ObjectManager::GetSymbolicLinkTarget in case of a symbolic link, and builds the final string by using format (C++ 20) before returning it to the caller.

That’s all for the provider, except when terminating:

void ObjectManagerProjection::Term() {
	::PrjStopVirtualizing(m_VirtContext);
}

The Object Manager

Looking into the ObjectManager helper class is somewhat out of the focus of this post, since it has nothing to do with ProjFS. It uses native APIs to enumerate objects in the object manager’s namespace and get details of a symbolic link’s target. For more information about the native APIs, check out my book “Windows Native API Programming” or search online. First, it includes <Winternl.h> to get some basic native functions like RtlInitUnicodeString, and also adds the APIs for directory objects:

typedef struct _OBJECT_DIRECTORY_INFORMATION {
	UNICODE_STRING Name;
	UNICODE_STRING TypeName;
} OBJECT_DIRECTORY_INFORMATION, * POBJECT_DIRECTORY_INFORMATION;

#define DIRECTORY_QUERY  0x0001

extern "C" {
	NTSTATUS NTAPI NtOpenDirectoryObject(
		_Out_ PHANDLE hDirectory,
		_In_ ACCESS_MASK AccessMask,
		_In_ POBJECT_ATTRIBUTES ObjectAttributes);

	NTSTATUS NTAPI NtQuerySymbolicLinkObject(
		_In_ HANDLE LinkHandle,
		_Inout_ PUNICODE_STRING LinkTarget,
		_Out_opt_ PULONG ReturnedLength);

	NTSTATUS NTAPI NtQueryDirectoryObject(
		_In_  HANDLE hDirectory,
		_Out_ POBJECT_DIRECTORY_INFORMATION DirectoryEntryBuffer,
		_In_  ULONG DirectoryEntryBufferSize,
		_In_  BOOLEAN  bOnlyFirstEntry,
		_In_  BOOLEAN bFirstEntry,
		_In_  PULONG  EntryIndex,
		_Out_ PULONG  BytesReturned);
	NTSTATUS NTAPI NtOpenSymbolicLinkObject(
		_Out_  PHANDLE LinkHandle,
		_In_   ACCESS_MASK DesiredAccess,
		_In_   POBJECT_ATTRIBUTES ObjectAttributes);
}

Here is the main code that enumerates directory objects (some details omitted for clarity, see the full source code in the Github repo):

std::vector<ObjectNameAndType> ObjectManager::EnumDirectoryObjects(PCWSTR path, 
	PCWSTR objectName, std::function<bool(PCWSTR)> compare) {
	std::vector<ObjectNameAndType> objects;
	HANDLE hDirectory;
	OBJECT_ATTRIBUTES attr;
	UNICODE_STRING name;
	std::wstring spath(path);
	if (spath[0] != L'\\')
		spath = L'\\' + spath;

	std::wstring object(objectName ? objectName : L"");

	RtlInitUnicodeString(&name, spath.c_str());
	InitializeObjectAttributes(&attr, &name, 0, nullptr, nullptr);
	if (!NT_SUCCESS(NtOpenDirectoryObject(&hDirectory, DIRECTORY_QUERY, &attr)))
		return objects;

	objects.reserve(128);
	BYTE buffer[1 << 12];
	auto info = reinterpret_cast<OBJECT_DIRECTORY_INFORMATION*>(buffer);
	bool first = true;
	ULONG size, index = 0;
	for (;;) {
		auto start = index;
		if (!NT_SUCCESS(NtQueryDirectoryObject(hDirectory, info, sizeof(buffer), FALSE, first, &index, &size)))
			break;
		first = false;
		for (ULONG i = 0; i < index - start; i++) {
			ObjectNameAndType data;
			auto& p = info[i];
			data.Name = std::wstring(p.Name.Buffer, p.Name.Length / sizeof(WCHAR));
			if(compare && !compare(data.Name.c_str()))
				continue;
			data.TypeName = std::wstring(p.TypeName.Buffer, p.TypeName.Length / sizeof(WCHAR));
			if(!objectName)
				objects.push_back(std::move(data));
			if (objectName && _wcsicmp(object.c_str(), data.Name.c_str()) == 0 || 
				_wcsicmp(object.c_str(), (data.Name + L"." + data.TypeName).c_str()) == 0) {
				objects.push_back(std::move(data));
				break;
			}
		}
	}
	::CloseHandle(hDirectory);
	return objects;
}

NtQueryDirectoryObject is called in a loop with increasing indices until it fails. The returned details for each entry is the object’s name and type name.

Here is how to get a symbolic link’s target:

std::wstring ObjectManager::GetSymbolicLinkTarget(PCWSTR path) {
	std::wstring spath(path);
	if (spath[0] != L'\\')
		spath = L"\\" + spath;

	HANDLE hLink;
	OBJECT_ATTRIBUTES attr;
	std::wstring target;
	UNICODE_STRING name;
	RtlInitUnicodeString(&name, spath.c_str());
	InitializeObjectAttributes(&attr, &name, 0, nullptr, nullptr);
	if (NT_SUCCESS(NtOpenSymbolicLinkObject(&hLink, GENERIC_READ, &attr))) {
		WCHAR buffer[1 << 10];
		UNICODE_STRING result;
		result.Buffer = buffer;
		result.MaximumLength = sizeof(buffer);
		if (NT_SUCCESS(NtQuerySymbolicLinkObject(hLink, &result, nullptr)))
			target.assign(result.Buffer, result.Length / sizeof(WCHAR));
		::CloseHandle(hLink);
	}
	return target;
}

See the full source code at https://github.com/zodiacon/ObjMgrProjFS.

Conclusion

The example provided is the bare minimum needed to write a ProjFS provider. This could be interesting for various types of data that is convenient to access with I/O APIs. Feel free to extend the example and resolve any bugs.

Rust Programming Masterclass Training

Unless you’ve been living under a rock for the past several years (and you are a software developer), the Rust programming language is hard to ignore – in fact, it’s been voted as the “most loved” language for several years (whatever that means). Rust provides the power and performance of C++ with full memory and concurrency safety. It’s a system programming languages, but has high-level features like functional programming style and modularity. That said, Rust has a relatively steep learning curve compared to other mainstream languages.

I’m happy to announce a new training class – Rust Programming Masterclass. This is a brand new, 4 day class, split into 8 half-days, that covers all the foundational pieces of Rust. Here is the list of modules:

  • Module 1: Introduction to Rust
  • Module 2: Language Fundamentals
  • Module 3: Ownership
  • Module 4: Compound Types
  • Module 5: Common Types and Collections
  • Module 6: Modules and Project Management
  • Module 7: Error Handling
  • Module 8: Generics and Traits
  • Module 9: Smart Pointers
  • Module 10: Functional Programming
  • Module 11: Threads and Concurrency
  • Module 12: Async and Await
  • Module 13: Unsafe Rust and Interoperability
  • Module 14: Macros
  • Module 15: Lifetimes

Dates are listed below. The times are 11am-3pm EST (8am-12pm PST) (4pm-8pm UT)
March: 25, 27, 29, April: 1, 3, 5, 8, 10.

Cost: 850 USD (if paid by an individual), 1500 USD if paid by a company. Previous students in my classes get 10% off.

Special bonus for this course: anyone registering gets a 50% discount to any two courses at https://training.trainsec.net.

Registration

If you’d like to register, please send me an email to [email protected] and provide your full name, company (if any), preferred contact email, and your time zone.

The sessions will be recorded, so you can watch any part you may be missing, or that may be somewhat overwhelming in “real time”.

As usual, if you have any questions, feel free to send me an email, or DM on X (twitter) or Linkedin.

x64 Architecture and Programming Class

I promised this class a while back, and now it is happening. This is a brand new, 3 day class, split into 6 half-days, that covers the x64 processor architecture, programming in general, and programming in the context of Windows. The syllabus can be found here. It may change a bit, but should mostly be stable.

Dates are listed below. The times are 12pm-4pm EST (9am-1pm PST) (5pm-9pm UT)
January: 15, 17, 22, 24, 29, 31.

Cost: 750 USD (if paid by an individual), 1400 USD if paid by a company.

Registration

If you’d like to register, please send me an email to [email protected] and provide your full name, company (if any), preferred contact email, and your time zone. Previous participants in my classes get 10% off.

The sessions will be recorded, so you can watch any part you may be missing, or that may be somewhat overwhelming in “real time”.

As usual, if you have any questions, feel free to send me an email, or DM on X (twitter) or Linkedin.

Kernel Programming MasterClass

It’s been a while since I have taught a public class. I am happy to launch a new class that combines Windows Kernel Programming and Advanced Windows Kernel Programming into a 6-day (48 hours) masterclass. The full syllabus can be found here.

There is a special bonus for those registering for this class: you get one free recorded course from Windows Internals and Programming (trainsec.net)!

For those who have attended the Windows Kernel Programming class, and wish to capture the more “advanced” stuff, I offer one of two options:

  • Join the second part (3 days) of the training, at 60% of the entire course cost.
  • Register for the entire course with a 20% discount, and get the free recorded course.

The course is planned to stretch from mid-December to late-January, in 4-hour chunks to make it easier to combine with other activities and also have the time to do lab exercises (very important for truly understanding the material). Yes, I know christmas is in the middle there, I’ll keep the last week of December free 🙂

The course will be conducted remotely using MS Teams or similar.

Dates and times (not final, but unlikely to change much, if at all):

  • Dec 2023: 12, 14, 19, 21: 12pm-4pm EST (9am-1pm PST)
  • Jan 2024: 2, 4, 9, 11, 16, 18, 23, 25: 12pm-4pm EST (9am-1pm PST)

Training cost:

  • Early bird (until Nov 22): 1150 USD
  • After Nov 22: 1450 USD

If you’d like to register, please write to [email protected] with your name, company name (if any), and time zone. If you have any question, use the same email or DM me on X (Twitter) or Linkedin.

Windows Hook Events

Many developers and researcher are faimilar with the SetWindowsHookEx API that provides ways to intercept certain operations related to user interface, such as messages targetting windows. Most of these hooks can be set on a specific thread, or all threads attached to the current desktop. A short video showing how to use this API can be found here. One of the options is to inject a DLL to the target process(es) that is invoked inline to process the relevant events.

There is another mechanism, less known, that provides various events that relate to UI, that can similarly be processed by a callback. This can be attached to a specific thread or process, or to all processes that have threads attached to the current desktop. The API in question is SetWinEventHook:

HWINEVENTHOOK SetWinEventHook(
    _In_ DWORD eventMin,
    _In_ DWORD eventMax,
    _In_opt_ HMODULE hmodWinEventProc,
    _In_ WINEVENTPROC pfnWinEventProc,
    _In_ DWORD idProcess,
    _In_ DWORD idThread,
    _In_ DWORD dwFlags);

The function allows invoking a callback (pfnWinEventProc) when an event occurs. eventMin and eventMax provide a simple way to filter events. If all events are needed, EVENT_MIN and EVENT_MAX can be used to cover every possible event. The module is needed if the function is inside a DLL, so that hmodWinEventProc is the module handle loaded into the calling process. The DLL will automatically be loaded into target process(es) as needed, very similar to the way SetWindowsHookEx works.

idProcess and idThread allow targetting a specific thread, a specific process, or all processes in the current desktop (if both IDs are zero). Targetting all processes is possible even without a DLL. In that case, the event information is marshalled back to the caller’s process and invoked there. This does require to pass the WINEVENT_OUTOFCONTEXT flag to indicate this requirement. The following example shows how to install such event monitoring for all processes/threads in the current desktop:

auto hHook = ::SetWinEventHook(EVENT_MIN, EVENT_MAX, nullptr, 
    OnEvent, 0, 0,
	WINEVENT_OUTOFCONTEXT | 
    WINEVENT_SKIPOWNPROCESS | WINEVENT_SKIPOWNTHREAD);

::GetMessage(nullptr, nullptr, 0, 0);

The last two flags indicate that events from the caller’s process should not be reported. Notice the weird-looking GetMessage call – it’s required for the event handler to be called. The weird part is that a MSG structure is not needed, contrary to the function’s SAL that requires a non-NULL pointer.

The event handler itself can do anything, however, the information provided is fundamentally different than SetWindowsHookEx callbacks. For example, there is no way to “change” anything – it’s just notifying about things that already happended. These events are related to accessibility and are not directly related to windows messaging. Here is the event handler prototype:

void CALLBACK OnEvent(HWINEVENTHOOK hWinEventHook, DWORD event, 
    HWND hwnd, LONG idObject, LONG idChild, DWORD eventTid, DWORD time);

event is the event being reported. Various such events are defined in WinUser.h and there are many values that can be used by third paries and OEMs. It’s worthwile checking the header file because every Microsoft-defined event has details as to when such an event is raised, and the meaning of idObject, idChild and hwnd for that event. eventTid is the thread ID from which the event originated. hwnd is typically the window or constrol associated with the event (if any) – some events are general enough so that no hwnd is provided.

We can get more information on the object that is associated with the event by tapping into the accessibility API. Accessibility objects implement the IAccessible COM interface at least, but may implement other interfaces as well. To get an IAccesible pointer from an event handler, we can use AccessibleObjectFromEvent:

CComPtr<IAccessible> spAcc;
CComVariant child;
::AccessibleObjectFromEvent(hwnd, idObject, idChild, &spAcc, &child);

I’ve included <atlbase.h> to get the ATL client side support (smart pointers and COM type wrappers). Other APIs that can bring an IAccessible in other contexts include AccessibleObjectFromPoint and AccessibleObjectFromWindow.

Note that you must also include <oleacc.h> and link with oleacc.lib.

IAccessible has quite a few methods and properties, the simplest of which is Name that is mandatory for implementors to provide:

CComBSTR name;
spAcc->get_accName(CComVariant(idChild), &name);

Refer to the documentation for other members of IAccessible. We can also get the details of the process associated with the event by going through the window handle or the thread ID and retrieving the executable name. Here is an example with a window handle:

DWORD pid = 0;
WCHAR exeName[MAX_PATH];
PCWSTR pExeName = L"";

if (hwnd && ::GetWindowThreadProcessId(hwnd, &pid)) {
    auto hProcess = ::OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, FALSE, pid);
    if (hProcess) {
        DWORD size = _countof(exeName);
        if (::QueryFullProcessImageName(hProcess, 0, exeName, &size))
            pExeName = wcsrchr(exeName, L'\\') + 1;
        ::CloseHandle(hProcess);
    }
}

GetWindowThreadProcessId retrieves the process ID (and thread ID) associated with a window handle. We could go with the given thread ID – call OpenThread and then GetProcessIdOfThread. The interested reader is welcome to try this approach to retrieve the process ID. Here is the full event handler for this example dumping all using printf:

void CALLBACK OnEvent(HWINEVENTHOOK hWinEventHook, DWORD event, HWND hwnd,
    LONG idObject, LONG idChild, DWORD idEventThread, DWORD time) {
    CComPtr<IAccessible> spAcc;
    CComVariant child;
    ::AccessibleObjectFromEvent(hwnd, idObject, idChild, &spAcc, &child);
    CComBSTR name;
    if (spAcc)
        spAcc->get_accName(CComVariant(idChild), &name);
    DWORD pid = 0;
    WCHAR exeName[MAX_PATH];
    PCWSTR pExeName = L"";

    if (hwnd && ::GetWindowThreadProcessId(hwnd, &pid)) {
        auto hProcess = ::OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, FALSE, pid);
        if (hProcess) {
            DWORD size = _countof(exeName);
            if (::QueryFullProcessImageName(hProcess, 0, exeName, &size))
                pExeName = wcsrchr(exeName, L'\\') + 1;
            ::CloseHandle(hProcess);
        }
    }
    printf("Event: 0x%X (%s) HWND: 0x%p, ID: 0x%X Child: 0x%X TID: %u PID: %u (%ws) Time: %u Name: %ws\n",
        event, EventNameToString(event),
        hwnd, idObject, idChild, idEventThread, 
        pid, pExeName,
        time, name.m_str);
}

EventNameToString is a little helper converting some event IDs to names. If you run this code (SimpleWinEventHook project), you’ll see lots of output, because one of the reported events is EVENT_OBJECT_LOCATIONCHANGE that is raised (among other reasons) when the mouse cursor position changes:

Event: 0x800C (Name Change) HWND: 0x00000000000216F6, ID: 0xFFFFFFFC Child: 0x1DC TID: 39060 PID: 64932 (Taskmgr.exe) Time: 78492375 Name: (null)
Event: 0x8000 (Object Create) HWND: 0x00000000000216F6, ID: 0xFFFFFFFC Child: 0x1DD TID: 39060 PID: 64932 (Taskmgr.exe) Time: 78492375 Name: (null)
Event: 0x800C (Name Change) HWND: 0x00000000000216F6, ID: 0xFFFFFFFC Child: 0x1DD TID: 39060 PID: 64932 (Taskmgr.exe) Time: 78492375 Name: (null)
Event: 0x8000 (Object Create) HWND: 0x00000000000216F6, ID: 0xFFFFFFFC Child: 0x1DE TID: 39060 PID: 64932 (Taskmgr.exe) Time: 78492375 Name: (null)
Event: 0x800C (Name Change) HWND: 0x00000000000216F6, ID: 0xFFFFFFFC Child: 0x1DE TID: 39060 PID: 64932 (Taskmgr.exe) Time: 78492375 Name: (null)
...
Event: 0x800B (Location Changed) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 72172 PID: 0 () Time: 78492562 Name: Normal
Event: 0x800B (Location Changed) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 72172 PID: 0 () Time: 78492562 Name: Normal
...
Event: 0x800B (Location Changed) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 72172 PID: 0 () Time: 78492718 Name: Vertical size
Event: 0x800B (Location Changed) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 72172 PID: 0 () Time: 78492734 Name: Vertical size
Event: 0x800C (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 72172 PID: 0 () Time: 78492734 Name: Normal
Event: 0x800A (State Changed) HWND: 0x000000000001019E, ID: 0xFFFFFFFC Child: 0x16 TID: 15636 PID: 14060 (explorer.exe) Time: 78493000 Name: (null)
Event: 0x800A (State Changed) HWND: 0x00000000000101B0, ID: 0xFFFFFFFC Child: 0x6 TID: 15636 PID: 14060 (explorer.exe) Time: 78493000 Name: (null)
Event: 0x8004 () HWND: 0x0000000000010010, ID: 0xFFFFFFFC Child: 0x0 TID: 72172 PID: 1756 () Time: 78493000 Name: Desktop
Event: 0x8 (Capture Start) HWND: 0x0000000000271D5A, ID: 0x0 Child: 0x0 TID: 72172 PID: 67928 (WindowsTerminal.exe) Time: 78493000 Name: c:\Dev\Temp\WinEventHooks\x64\Debug\SimpleWinEventHook.exe
Event: 0x800B (Location Changed) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 72172 PID: 0 () Time: 78493093 Name: Normal
Event: 0x8001 (Object Destroy) HWND: 0x00000000000216F6, ID: 0xFFFFFFFC Child: 0x45 TID: 39060 PID: 64932 (Taskmgr.exe) Time: 78493093 Name: (null)
Event: 0x8001 (Object Destroy) HWND: 0x00000000000216F6, ID: 0xFFFFFFFC Child: 0xB0 TID: 39060 PID: 64932 (Taskmgr.exe) Time: 78493093 Name: (null)
...
Event: 0x800C (Name Change) HWND: 0x00000000000216F6, ID: 0xFFFFFFFC Child: 0x1A TID: 39060 PID: 64932 (Taskmgr.exe) Time: 78493093 Name: (null)
Event: 0x800C (Name Change) HWND: 0x00000000000216F6, ID: 0xFFFFFFFC Child: 0x1B TID: 39060 PID: 64932 (Taskmgr.exe) Time: 78493109 Name: (null)
Event: 0x800B (Location Changed) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 72172 PID: 0 () Time: 78493109 Name: Normal
Event: 0x9 (Capture End) HWND: 0x0000000000271D5A, ID: 0x0 Child: 0x0 TID: 72172 PID: 67928 (WindowsTerminal.exe) Time: 78493109 Name: c:\Dev\Temp\WinEventHooks\x64\Debug\SimpleWinEventHook.exe

DLL Injection

Instead of getting events on the SetWinEventHook caller’s thread, a DLL can be injected. Such a DLL must export the event handler so that the process setting up the handler can locate the function with GetProcAddress.

As an example, I created a simple DLL that implements the event handler similarly to the previous example (without the process name) like so:

extern "C" __declspec(dllexport)
void CALLBACK OnEvent(HWINEVENTHOOK hWinEventHook, DWORD event, HWND hwnd,
	LONG idObject, LONG idChild, DWORD idEventThread, DWORD time) {
	CComPtr<IAccessible> spAcc;
	CComVariant child;
	::AccessibleObjectFromEvent(hwnd, idObject, idChild, &spAcc, &child);
	CComBSTR name;
	if (spAcc)
		spAcc->get_accName(CComVariant(idChild), &name);

	printf("Event: 0x%X (%s) HWND: 0x%p, ID: 0x%X Child: 0x%X TID: %u Time: %u Name: %ws\n",
		event, EventNameToString(event),
        hwnd, idObject, idChild, idEventThread,
		time, name.m_str);
}

Note the function is exported. The code uses printf, but there is no guarantee that a target process has a console to use. The DllMain function creates such a console and attached the standard output handle to it (otherwise printf wouldn’t have an output handle, since the process wasn’t bootstraped with a console):

HANDLE hConsole;

BOOL APIENTRY DllMain(HMODULE hModule, DWORD reason, PVOID lpReserved) {
	switch (reason) {
		case DLL_PROCESS_DETACH:
			if (hConsole)   // be nice
				::CloseHandle(hConsole);
			break;

		case DLL_PROCESS_ATTACH:
			if (::AllocConsole()) {
				auto hConsole = ::CreateFile(L"CONOUT$", GENERIC_WRITE, 
                    0, nullptr, OPEN_EXISTING, 0, nullptr);
				if (hConsole == INVALID_HANDLE_VALUE)
					return FALSE;
				::SetStdHandle(STD_OUTPUT_HANDLE, hConsole);
			}
			break;
	}
	return TRUE;
}

The injector process (WinHookInject project) first grabs a target process ID (if any):

int main(int argc, const char* argv[]) {
	DWORD pid = argc < 2 ? 0 : atoi(argv[1]);
	if (pid == 0) {
		printf("Warning: injecting to potentially processes with threads connected to the current desktop.\n");
		printf("Continue? (y/n) ");
		char ans[3];
		gets_s(ans);
		if (tolower(ans[0]) != 'y')
			return 0;
	}

The warning is shown of no PID is provided, because creating consoles for certain processes could wreak havoc. If you do want to inject a DLL to all processes on the desktop, avoid creating consoles.

Once we have a target process (or not), we need to load the DLL (hardcoded for simplicity) and grab the exported event handler function:

auto hLib = ::LoadLibrary(L"Injected.Dll");
if (!hLib) {
	printf("DLL not found!\n");
	return 1;
}
auto OnEvent = (WINEVENTPROC)::GetProcAddress(hLib, "OnEvent");
if (!OnEvent) {
	printf("Event handler not found!\n");
	return 1;
}

The final step is to register the handler. If you’re targetting all processes, you’re better off limiting the events you’re interested in, especially the noisy ones. If you just want a DLL injected and you don’t care about any events, select a range that has no events and then call a relevant function to force the DLL to be loaded into the target process(es). I’ll let the interested reader figure these things out.

auto hHook = ::SetWinEventHook(EVENT_MIN, EVENT_MAX, 
	hLib, OnEvent, pid, 0, WINEVENT_INCONTEXT);
::GetMessage(nullptr, nullptr, 0, 0);

Note the arguments include the DLL module, the handler address, and the flag WINEVENT_INCONTEXT. Here is some output when using this DLL on a Notepad instance. A console is created the first time Notepad causes an event to be raised:

Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 34756 Time: 70717718 Name: Edit
Event: 0x800C (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 34756 Time: 70717718 Name: Horizontal size
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 34756 Time: 70717718 Name: Horizontal size
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717734 Name: Horizontal size
Event: 0x800C (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717734 Name: Edit
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717734 Name: Edit
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717734 Name: Edit
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717750 Name: Edit
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717765 Name: Edit
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717765 Name: Edit
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717781 Name: Edit
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717781 Name: Edit
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717796 Name: Edit
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717796 Name: Edit
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717812 Name: Edit
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717812 Name: Edit
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717828 Name: Edit
Event: 0x800B (Name Change) HWND: 0x0000000000000000, ID: 0xFFFFFFF7 Child: 0x0 TID: 29516 Time: 70717843 Name: Edit
Event: 0x8 (Capture Start) HWND: 0x0000000000091CAC, ID: 0x0 Child: 0x0 TID: 29516 Time: 70717843 Name: (null)
Event: 0x3 (Foreground) HWND: 0x00000000000A1D50, ID: 0x0 Child: 0x0 TID: 34756 Time: 70717843 Name: Untitled - Notepad
Event: 0x8004 () HWND: 0x0000000000010010, ID: 0xFFFFFFFC Child: 0x0 TID: 29516 Time: 70717859 Name: Desktop 1
Event: 0x800B (Name Change) HWND: 0x00000000000A1D50, ID: 0x0 Child: 0x0 TID: 34756 Time: 70717859 Name: Untitled - Notepad
...

The full code is at zodiacon/WinEventHooks: SetWinEventHook Sample (github.com)

Writing Your Own Programming Language

Ever since I realized BASIC wasn’t the only living programming language, I thought about writing my own. Who wouldn’t? If you’re a developer, surely this idea popped into your mind at some point. No matter how much you love a particular programming language, you always have some ideas for improvement or even removal of annoying features.

The post assumes you have some background in compilers, and understand concepts like tokenizing (scanning), parsing, and Abstract Syntax Trees (ASTs)

Obviously, writing a programming language is not for the faint of heart. Even before you set out to implement your language, you have to design it first. Or maybe you have some fundamental ideas that would make your language unique, and you may decide to flesh out the details while you’re implementing it.

A new programming language does not have to be “general-purpose” – that is, it could be a “domain specific language” (DSL), which means it’s best suited for certain domain(s) or tasks. This makes your life (usually) at least somewhat easier; in addition, you’ll be unlikely to compete with the gazillion general-purpose languages out there. Still, a general-purpose language might be your goal.

Designing a programming language is a big topic, well outside the scope of this post. I’ll focus on the implementation details, so to speak. There are other considerations for a programming language beyond the language itself – its accompanying standard library, tooling (e.g., some IDE or at least syntax highlighting), debugging, testing, and few more. One decision is whether to make your language compiled or interpreted. This decision may not affect some aspects of the implementation, but it will definitely affect the language’s back-end. You can even support both interpretation and compilation for maximum flexibility.

I played around with the idea of creating a programming language for many years, never really getting very far beyond a basic parser and a minimal interpreter. Lately, I’ve read more about Pratt Parsing, that sparked my interest again. Pratt Parsing is one of many techniques for parsing expressions, something like “a+2*b”, and doing that correctly (parenthesis, operator precedence and associativity). Pratt parsing is really elegant, much more so than other techniques, and it’s also more flexible, supporting (indirectly) ternary operations and other unusual constructs. Once you have an expression parser, the rest of the parser is fairly easy to implement (relatively speaking) using the recursive-descent approach which is well suited for hand-crafted parsers.

Robert Nystrom gives a nice introduction to Pratt Parsing and an elegant idea for implementing it. His implementation is in Java, but there is a link to a C# implementation and even one in Rust. My go-to language is C++ (still), so you know where this is going. I’ve implemented a Pratt parser based on Robert’s ideas, and it turned out very well.

I’ve also been interested in visualization (a term which has way too much stuffed into it), but I thought I’d start small. A popular teaching language in the 80s was LOGO. Although it was treated as a “toy language”, it was a full-blown language, mostly resembling LISP internally.

However, LOGO became famous because of the “Turtle Graphics” built-in support that was provided, which allowed drawing with an imaginary turtle (you could even ask LOGO to show it) that would follow your commands like moving forward, backwards, rotating, lifting the pen and putting it back down. Why not create a fun version of Turtle Graphics with ideas from LOGO?

Here is an example from LOGO to draw a symmetric hexagon:

REPEAT 6 [ FD 100 RT 60 ]

You can probably guess what is going on here. “FD” is “forward” and “RT” is “right”, although it could be mistaken for “rotate”. LOGO supported functions as well, so you could create complex shapes by reusing functions.

My language, called “Logo2” for a lack of originality at this time, tries to capture that fun drawing, but put the syntax more inline with the C-family of functions, which I like more. The above hexagon is written with Logo2 like so:

repeat 6 {
    fd(100); rt(60);
}

Indentation is not significant, so it all could be placed on the same line. You can also define functions and execute them:

fn circle(size, steps) {
    repeat steps {
        fd(size); rt(360 / steps);
    }
}

repeat 10 {
    circle(80, 20); rt(36);
}

I also added support for colors, with the pencolor(r,g,b) function, something I don’t recall LOGO having in the 80s.

Implementation

There are 3 main projects in the solution (a fourth project in the works to create a simple IDE for easier experimentation):

  • Logo2Core – contains the tokenizer, parser, and interpreter.
  • Logo2Runtime – contains the runtime support for turtle graphics, currently using GDI+.
  • Logo2 – is a simple REPL, that can parse and execute single line statements. If you provide a command line argument, it’s treated as file name to be parsed and executed. Anything not inside a function is executed directly (for now).

The Tokenizer

The tokenizer’s job (Tokenizer class) is to read text and turn it into a bunch of tokens. A token is a single unit of the language, like a number, keyword, identifier, operator, etc. To start tokenization, the Tokenize method can be invoked with the string to tokenize.

The Next() method returns the next token, whereas the Peek() method returns the next token without advancing the stream forward. This means the tokenizer is not doing all the work immediately, but only advanced to the next token when requested. The parser is the one “driving” the tokenizer.

The implementation of the tokenizer is not perfect, but it works well-enough. I didn’t want to use any existing tools like YACC (or BISON), for a couple reasons. For one, I don’t like generated code that I have little control colover. Second, I like to understand what I am writing. Writing a tokenizer is not rocket science, but it’s not trivial, either. And since one of my goals is to experiment, I need the freedom not available with generated code.

The Parser

The parser is much more interesting than the tokenizer (by far). This is where the syntax of the language is fleshed out. Just like with tokenization, usage of tools like LEX (or FLEX) is inappropriate. In fact, most languages have their own hand-written parser. The parser accepts a string to parse (Parse method) or a filename (ParseFile method) and begins the parsing. It calls on the tokenizer when the next token is needed.

The Init method of the parser initializes the tokenizer with the specific tokens it should be able to recognize (like specific keywords and operators), and also initializes its own “parslets” (defined in the above mentioned article) to make Pratt Parsing work. I will not show here the Pratt Parsing part since there’s quite a bit of code there, but here is an example of parsing the “repeat” statement:

std::unique_ptr<RepeatStatement> Parser::ParseRepeatStatement() {
	Next();		// eat "repeat"
	auto times = ParseExpression();

	m_LoopCount++;
	auto block = ParseBlock();
	m_LoopCount--;
    return std::make_unique<RepeatStatement>(
        std::move(times), std::move(block));
}

ParseExpression parses an expression to be used for the argument to repeat. Then ParseBlock is called to parse a curly-brace surrounded block of code. Finally, the result is an AST node representing a “repeat” statement is created, initialized, and returned to the caller.

The m_LoopCount variable is incremented when entering loop parsing and decremented afterwards. This is done so that parsing the keywords break and continue can check if there is any enclosing loop for these keywords to make sense.

Here is ParseBlock:

std::unique_ptr<BlockExpression>
Parser::ParseBlock(std::vector<std::string> const& args) {
	if (!Match(TokenType::OpenBrace))
		AddError(ParserError(ParseErrorType::OpenBraceExpected, Peek()));

	m_Symbols.push(std::make_unique<SymbolTable>(m_Symbols.top().get()));

	for (auto& arg : args) {
		Symbol sym;
		sym.Name = arg;
		sym.Flags = SymbolFlags::None;
		sym.Type = SymbolType::Argument;
		AddSymbol(sym);
	}

	auto block = std::make_unique<BlockExpression>();
	while (Peek().Type != TokenType::CloseBrace) {
		auto stmt = ParseStatement();
		if (!stmt)
			break;
		block->Add(std::move(stmt));
	}
	Next();		// eat close brace
	m_Symbols.pop();
	return block;
}

ParseBlock starts by making sure there is an open curly brace. Then it creates a symbol table and pushes it to be the “current” as there is a new scope within the block. The parameter to ParseBlock is used when parsing a function body, where these “args” are the parameters to the function. If this is the case, they are added to the symbol table as local variables.

The main work is to call ParseStatement as many times as needed until a close brace is encountered. The block is a vector of statements being filled up. Finally, the symbol table is popped and the AST node is returned.

ParseStatement is a big switch that calls the appropriate specific parsing method based on the first token encountered. Here is an excerpt:

std::unique_ptr<Statement> Parser::ParseStatement() {
	auto peek = Peek();
	if (peek.Type == TokenType::Invalid) {
		return nullptr;
	}

	switch (peek.Type) {
		case TokenType::Keyword_Var: 
             return ParseVarConstStatement(false);
		case TokenType::Keyword_Const: 
             return ParseVarConstStatement(true);
		case TokenType::Keyword_Repeat: 
             return ParseRepeatStatement();
		case TokenType::Keyword_While: 
             return ParseWhileStatement();
		case TokenType::Keyword_Fn: 
             return ParseFunctionDeclaration();
		case TokenType::Keyword_Return: 
             return ParseReturnStatement();
        case TokenType::Keyword_Break: 
             return ParseBreakContinueStatement(false);
        case TokenType::Keyword_Continue:
             return ParseBreakContinueStatement(true);
	}
	auto expr = ParseExpression();
	if (expr) {
		Match(TokenType::SemiColon);
		return std::make_unique<ExpressionStatement>(std::move(expr));
	}
	AddError(ParserError(ParseErrorType::InvalidStatement, peek));
	return nullptr;
}

If a statement is not recognized, an expression parsing is attempted. This allows using Logo2 as a simple calculator, for example. ParseStatement is where the support for more statements is added based on an initial token.

Once an AST is built by the parser, the next step is to execute the AST by some interpreter. In a more complex language (maybe once it grows some more), some semantic analysis may be appropriate, which is about looking at the usage of the language beyond the syntax. For now, we’ll just interpret what we have, and if any error is encountered it’s going to be a runtime error. Some parsing errors can be caught without semantic analysis, but some cannot.

The Interpreter

The Interpreter class provides the runtime behavior, by “executing” the AST. It receives the root of the AST tree constructed by the parser by implementing the well-known Visitor design pattern, whose purpose here is to decouple between the AST node types and the way they are handled by the interpreter. Alternatively, it would be possible to add a virtual “Execute” or “Eval” method to AST nodes, so the nodes can “evaluate” themselves, but that creates coupling, and goes against the single-responsibility principle (SRP) that states that a class should have one and only one job. Using the visitor pattern also makes it easier to add semantic analysis later without modifying the AST node types.

The gist of the visitor pattern is to have an “Accept” method in the AST nodes that calls back to whoever (the visitor) with the current node details. For example, here it is for a binary operator:

class BinaryExpression : public Expression {
public:
    BinaryExpression(std::unique_ptr<Expression> left, 
        Token op, std::unique_ptr<Expression> right);
	Value Accept(Visitor* visitor) const override;

	std::string ToString() const override;

	Expression* Left() const;
	Expression* Right() const;
	Token const& Operator() const;

private:
	std::unique_ptr<Expression> m_Left, m_Right;
	Token m_Operator;
};

Value BinaryExpression::Accept(Visitor* visitor) const {
	return visitor->VisitBinary(this);
}

This same idea is repeated for all concrete AST nodes. The Visitor type is abstract, implemented by the Interpreter class having methods like: VisitBinary, VisitRepeat, etc.

Each one of these “Visit” method’s purpose is to “execute” (or evaluate) that node. Here is an excerpt for the binary expression visiting:

Value Interpreter::VisitBinary(BinaryExpression const* expr) {
    switch (expr->Operator().Type) {
    case TokenType::Add: 
       return expr->Left()->Accept(this) + expr->Right()->Accept(this);
    case TokenType::Sub:
       return expr->Left()->Accept(this) - expr->Right()->Accept(this);
    case TokenType::Mul:
       return expr->Left()->Accept(this) * expr->Right()->Accept(this);
    case TokenType::Div:
       return expr->Left()->Accept(this) / expr->Right()->Accept(this);
    }
    return Value();
}

Here it is for “repeat”:

Value Interpreter::VisitRepeat(RepeatStatement const* expr) {
    auto count = Eval(expr->Count());
    if (!count.IsInteger())
        throw RuntimeError(ErrorType::TypeMismatch, expr->Count());

    auto n = count.Integer();
    while (n-- > 0) {
        try {
            Eval(expr->Block());
        }
        catch (BreakOrContinue const& bc) {
            if (!bc.Continue)
                break;
        }
    }
    return nullptr;     // repeat has no return value
}

You should get the idea at this point. (Eval is just a simple wrapper that calls Accept with the provided node).

The Value type used with the above code (the one returned from Accept methods is the way to represent “values” in Logo2. Logo2 is a dynamically typed language (at least for now), so variables can hold any one of a listed of supported types, encapsulated in Value. You can think of that as a C-style union. Specifically, it wraps a std::variant<> C++17 type that currently supports the following: 64-bit integer, 64-bit floating point (double), bool, string (std::string), and null (representing no value). The list of possibilities will increase, allowing user-defined types as well.

Turtle Graphics

The Logo2Runtime project contains the support for managing turtles, and displaying their “drawings”. The Turtle class is a graphics-free type to manage the state of the turtle – its position and heading, but also a list of “command” indicating operations the turtle has been instructed to do, such as drawing a line, changing color, or changing width of drawing. This list is necessary whenever a window’s output needs to be refreshed.

The Window class servers as a wrapper for an HWND, that also has the “power” to draw a set of turtle commands. Here is its DrawTurtle method:

void Window::DrawTurtle(Gdiplus::Graphics& g, Turtle* t) const {
    for (auto& cmd : t->GetCommands()) {
        DrawTurtleCommand(g, t, cmd);
    }
}

Each command does the right thing:

void Window::DrawTurtleCommand(Gdiplus::Graphics& g, Turtle* t, 
    TurtleCommand const& cmd) const {
    switch (cmd.Type) {
        case TurtleCommandType::DrawLine:
            g.DrawLine(m_Pen.get(), cmd.Line.From.X, 
               cmd.Line.From.Y, cmd.Line.To.X, cmd.Line.To.Y);
            break;

        case TurtleCommandType::SetWidth:
        {
            Color color;
            m_Pen->GetColor(&color);
            m_Pen.reset(new Pen(color, cmd.Width));
            break;
        }

        case TurtleCommandType::SetColor:
        {
            Color color;
            color.SetValue(cmd.Color);
            m_Pen.reset(new Pen(color, m_Pen->GetWidth()));
            break;
        }
    }
}

The graphical objects are GDI+ objects provided by the Windows API. Of course, it would be possible to switch to a different API. I chose GDI+ for its flexibility and 2D capabilities.

The Runtime class ties a turtle and a window together. It holds on to a (single) Turtle object and single Window object. In the future, this is going to be more dynamic, so any number of windows and turtles can be created, even more than one turtle in the same window.

The REPL

A simple REPL is implemented in the Logo2 project. It’s not trivial, as there is a user interface that must be kept alive, meaning messages have to be pumped. This means using functions like gets_s is not good enough, as they block the calling thread. Assuming the UI is on the same thread, this will cause the UI to become non-responsive. For now, the same thread is used, so that no special synchronization is required. The downside is that a custom input “loop” has to be written, and currently it’s very simple, and only supports the BACKSPACE key for typing error correction.

The first step is to get the input, key by key. If there is no key available, messages are pumped. A WM_QUIT message indicates it’s time to exit. Not very elegant, but here goes:

Tokenizer t;
Parser parser(t);
Interpreter inter;
Runtime runtime(inter);
runtime.Init();
runtime.CreateLogoWindow(L"Logo 2", 800, 800);

for (;;) {
	std::print(">> ");
	std::string input;
	int ch = 0;
	MSG msg{};
	while (ch != 13) {
		while (::PeekMessage(&msg, nullptr, 0, 0, PM_REMOVE) && 
                 msg.message != WM_QUIT) {
			::TranslateMessage(&msg);
			::DispatchMessage(&msg);
		}
		if (msg.message == WM_QUIT)
			break;

		if (_kbhit()) {
			ch = _getch();
			if (isprint(ch)) {
				input += (char)ch;
				printf("%c", ch);
			}
			else if (ch == 8) {		// backspace
				printf("\b \b");
				input = input.substr(0, input.length() - 1);
			}
			else {
				if (_kbhit())
					_getch();
			}
		}
	}

	if (msg.message == WM_QUIT)
		break;

Once we have a line of input, it’s time to parse and (if no errors occur), execute:

try {
	printf("\n");
	auto ast = parser.Parse(input);
	if (parser.HasErrors()) {
		for (auto& err : parser.Errors()) {
			printf("Error (%d,%d): %d\n", 
               err.ErrorToken.Line, err.ErrorToken.Col, err.Error);
		}
		continue;
	}
	try {
		auto result = ast->Accept(&inter); // execute!
		if (result != nullptr)
			std::println("{}", result.ToString());
	}
	catch (RuntimeError const& err) {
		printf("Runtime error: %d\n", (int)err.Error);
	}
}
catch (ParserError const& err) {
	printf("Error (%d,%d): %d\n", err.ErrorToken.Line, 
         err.ErrorToken.Col, err.Error);
	continue;
}

Some parser errors are accumulated in a vector, some throw an exception (errors where it would be difficult for the parser to recover confidently). At runtime, errors could occur as well, such as the wrong types being used with certain operations.

Conclusion

Writing a language can be lots of fun. You can invent your “dream” language. For me, the Logo2 experiment is ongoing. I’m planning to build a simple IDE, to extend the language to support user-defined types, lambdas (with closures), and much more. Your ideas are welcome as well!

The project is at zodiacon/Logo2 (github.com)

Thread Priorities in Windows

When a thread is created, it has some priority, which sets its importance compared to other threads competing for CPU time. The thread priority range is 0 to 31 (31 being the highest), where priority zero is used by the memory manager’s zero-page thread(s), whose purpose is to zero out physical pages (for reasons outside the scope of this post), so technically the allowed priority range is 1 to 31.

It stands to reason (to some extent), that a developer could change a thread’s priority to some valid value in the range of 1 to 31, but this is not the case. The Windows API sets up rules as to how thread priorities may change. First, there is a process priority class (sometimes called Base Priority), that specifies the default thread priority within that process. Processes don’t run – threads do, but still this is a process property and affects all threads in the process. You can see the value of this property very simply with Task Manager’s Base Priority column (not visible by default):

Base Priority column in Task Manager

There are six priority classes (the priority of which is specified after the colon):

  • Idle (called Low in Task Manager, probably not to give the wrong impression): 4
  • Below Normal (6)
  • Normal (8)
  • Above Normal (10)
  • Highest (13)
  • Realtime (24)

A few required notes:

  • Normal is the default priority class unless overridden in some way. For example, double-clicking an executable in Explorer will launch a new process with priority class of Normal (8).
  • The term “Realtime” does not imply Windows is a real-time OS; it’s not. “Real-time” just means “higher than all the others”.
  • To set the Realtime priority class, the process in question must have the SeIncreaseBasePriorityPrivilege, normally granted to administrators. If “Realtime” is requested, but the process’s token does not poses that privilege, the result is “High”. The reason has do to with the fact that many kernel threads have priorities in the real-time range, and it could be problematic if too many threads spend a lot of time running in these priorities, potentially leading to kernel threads getting less time than they need.

Is this the end of the story? Not quite. For example, looking at Task Manager, processes like Csrss.exe (Windows subsystem process) or Smss.exe (Session manager) seem to have a priority class of Normal as well. Is this really the case? Yes and no (everyone likes that kind of answer, right?) We’ll get to that soon.

Setting a Thread’s priority

Changing the process priority class is possible with the SetPriorityClass API. For example, a process can change its own priority class like so:

::SetPriorityClass(::GetCurrentProcess(), HIGH_PRIORITY_CLASS);

You can do the same in .NET by utilizing the System.Diagnostics.Process class:

Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;

You can also change priority class using Task Manager or Process Explorer, by right-clicking a process and selecting “Set Priority”.

Once the priority class is changed, it affects all threads in that process. But how?

It turns out that a specific thread’s priority can be changed around the process priority class. The following diagram shows the full picture:

Every small rectangle in the above diagram indicates a valid thread priority. For example, the Normal priority classes allows setting thread priorities to 1, 6, 7, 8, 9, 10, 15. To be more generic, here are the rules for all except the Realtime class. A thread priority is by default the same as the process priority class, but it can be -1, -2, +1, +2 from that base, or have two extreme values (internally called “Saturation”) with the values 1 and 15.

The Realtime range is unique, where the base priority is 24, but all priorities from 16 to 31 are available. The SetThreadPriority API that can be used to change an individual thread’s priority accepts an enumeration value (as its second argument) rather than an absolute value. Here are the macro definitions:

#define THREAD_PRIORITY_LOWEST         // -2  
#define THREAD_PRIORITY_BELOW_NORMAL   // -1
#define THREAD_PRIORITY_NORMAL         // 0
#define THREAD_PRIORITY_HIGHEST        // + 2
#define THREAD_PRIORITY_ABOVE_NORMAL   // + 1
#define THREAD_PRIORITY_TIME_CRITICAL  // 15 or 31
#define THREAD_PRIORITY_IDLE           // 1 or 16

Here is an example of changing the current thread’s priority to +2 compared to the process priority class:

::SetThreadPriority(::GetCurrentThread(), THREAD_PRIORITY_HIGHEST);

And a C# version:

Thread.CurrentThread.Priority = ThreadPriority.Highest;

You can see threads priorities in Process Explorer‘s bottom view:

Thread priorities in Process Explorer

There are two columns for priorities – A base priority and a Dynamic priority. The base priority is the priority set by code (SetThreadPriority) or the default, while the dynamic priority is the current thread’s priority, which could be slightly higher than the base (temporarily), and is changed because of certain decisions made by the kernel scheduler and other components and drivers that can produce such an effect. These thread boosting scenarios are outside the scope of this post.

If you want to see all threads in the system with their priorities, you can use my System Explorer tool, and select System / Threads menu item:

System Explorer showing all threads in the system

The two priority column are shown (Priority is the same as Dynamic Priority in Process Explorer). You can sort by any column, including the priority to see which threads have the highest priority.

Native APIs

If you look in Process Explorer, there is a column named Base Priority under the Process Performance tab:

Process Performance tab

With this column visible, it indicates a process priority with a number. It’s mostly the corresponding number to the priority class (e.g. 10 for Above Normal, 13 for High, etc.), but not always. For example, Smss.exe has a value of 11, which doesn’t correspond to any priority class. Csrss.exe processes have a value of 13.

Changing to these numbers can only be done with the Native API. Specifically, NtSetInformationProcess with the ProcessBasePriority enumeration value can make that change. Weirdly enough, if the value is higher than the current process priority, the same privilege mentioned earlier is required. The weird part, is that calling SetPriorityClass to change Normal to High always works, but calling NtSetInformationProcess to change from 8 to 13 (the same as Normal to High) requires that privilege; oh, well.

What about a specific thread? The native API allows changing a priority of a thread to any given value directly without the need to depend on the process priority class. Choosing a priority in the realtime range (16 or higher) still requires that privilege. But at least you get the flexibility to choose any priority value. The call to use is NtSetInformationThread with ThreadPriority enumeration. For example:

KPRIORITY priority = 14;
NtSetInformationThread(NtCurrentThread(), ThreadPriority, 
    &priority, sizeof(priority));

Note: the definitions for the native API can be obtained from the phnt project.

What happens if you need a high priority (16 or higher) but don’t have admin privileges in the process? Enter the Multimedia Class Scheduler.

The MMCSS Service

The multimedia class service coupled with a driver (mmcss.sys) provide a thread priority service intended for “multimedia” applications that would like to get some guarantee when “playing” multimedia. For example, if you have Spotify running locally, you’ll find there is one thread with priority 22, although the process itself has a priority class Normal:

Spotify threads

You can use the MMCSS API to get that kind of support. There is a Registry key that defines several “tasks” applications can use. Third parties can add more tasks:

MMCSS tasks

The base key is: HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Multimedia\SystemProfile\Tasks

The selected “Audio” task has several properties that are read by the MMCSS service. The most important is Priority, which is between 1 (low) and 8 (high) representing the relative priority compared to other “tasks”. Some values aren’t currently used (GPU Priority, SFIO Priority), so don’t expect anything from these.

Here is an example that uses the MMCSS API to increase the current thread’s priority:

#include <Windows.h>
#include <avrt.h>

#pragma comment(lib, "avrt")

int main() {
	DWORD index = 0;
    HANDLE h = AvSetMmThreadCharacteristics(L"Audio", &index);
	AvSetMmThreadPriority(h, AVRT_PRIORITY_HIGH);

The priority itself is an enumeration, where each value corresponds to a range of priorities (all above 15).

The returned HANDLE by the way, is to the MMCSS device (\Device\MMCSS). The argument to AvSetMmThreadCharacteristics must correspond to one of the “Tasks” registered. Calling AvRevertMmThreadCharacteristics reverts the thread to “normal”. There are more APIs in that set, check the docs.

Happy Threading!

Window Stations and Desktops

A while back I blogged about the differences between the virtual desktop feature exposed to users on Windows 10/11, and the Desktops tool from Sysinternals. In this post, I’d like to shed some more light on Window Stations, desktops, and windows. I assume you have read the aforementioned blog post before continuing.

We know that Window Stations are contained in sessions. Can we enumerate these? The EnumWindowStations API is available in the Windows API, but it only returns the Windows Stations in the current session. There is no “EnumSessionWindowStations”. Window Stations, however, are named objects, and so are visible in tools such as WinObj (running elevated):

Window stations in session 0

The Window Stations in session 0 are at \Windows\WindowStations
The Window Stations in session x are at \Sessions\x\Windows\WindowStations

The OpenWindowStation API only accepts a “local” name, under the callers session. The native NtUserOpenWindowStation API (from Win32u.dll) is more flexible, accepting a full object name:

HWINSTA NtUserOpenWindowStation(POBJECT_ATTRIBUTES attr, ACCESS_MASK access);

Here is an example that opens the “msswindowstation” Window Station:

#include <Windows.h>
#include <winternl.h>

#pragma comment(lib, "ntdll")

HWINSTA NTAPI _NtUserOpenWindowStation(_In_ POBJECT_ATTRIBUTES attr, _In_ ACCESS_MASK access);
int main() {
	// force Win32u.DLL to load
	::LoadLibrary(L"user32");
	auto NtUserOpenWindowStation = (decltype(_NtUserOpenWindowStation)*)
		::GetProcAddress(::GetModuleHandle(L"win32u"), "NtUserOpenWindowStation");

	UNICODE_STRING winStaName;
	RtlInitUnicodeString(&winStaName, L"\\Windows\\WindowStations\\msswindowstation");
	OBJECT_ATTRIBUTES winStaAttr;
	InitializeObjectAttributes(&winStaAttr, &winStaName, 0, nullptr, nullptr);
	auto hWinSta = NtUserOpenWindowStation(&winStaAttr, READ_CONTROL);
	if (hWinSta) {
        // do something with hWinSta
        ::CloseWindowStation(hWinSta);
    }

You may or may not have enough power to open a handle with the required access – depending on the Window Station in question. Those in session 0 are hardly accessible from non-session 0 processes, even with the SYSTEM account. You can examine their security descriptor with the kernel debugger (as other tools will return access denied):

lkd> !object \Windows\WindowStations\msswindowstation
Object: ffffe103f5321c00  Type: (ffffe103bb0f0ae0) WindowStation
    ObjectHeader: ffffe103f5321bd0 (new version)
    HandleCount: 4  PointerCount: 98285
    Directory Object: ffff808433e412b0  Name: msswindowstation
lkd> dt nt!_OBJECT_HEADER ffffe103f5321bd0

   +0x000 PointerCount     : 0n98285
   +0x008 HandleCount      : 0n4
   +0x008 NextToFree       : 0x00000000`00000004 Void
   +0x010 Lock             : _EX_PUSH_LOCK
   +0x018 TypeIndex        : 0xa2 ''
   +0x019 TraceFlags       : 0 ''
   +0x019 DbgRefTrace      : 0y0
   +0x019 DbgTracePermanent : 0y0
   +0x01a InfoMask         : 0xe ''
   +0x01b Flags            : 0 ''
   +0x01b NewObject        : 0y0
   +0x01b KernelObject     : 0y0
   +0x01b KernelOnlyAccess : 0y0
   +0x01b ExclusiveObject  : 0y0
   +0x01b PermanentObject  : 0y0
   +0x01b DefaultSecurityQuota : 0y0
   +0x01b SingleHandleEntry : 0y0
   +0x01b DeletedInline    : 0y0
   +0x01c Reserved         : 0
   +0x020 ObjectCreateInfo : 0xfffff801`21c53940 _OBJECT_CREATE_INFORMATION
   +0x020 QuotaBlockCharged : 0xfffff801`21c53940 Void
   +0x028 SecurityDescriptor : 0xffff8084`3da8aa6c Void
   +0x030 Body             : _QUAD
lkd> !sd 0xffff8084`3da8aa60
->Revision: 0x1
->Sbz1    : 0x0
->Control : 0x8014
            SE_DACL_PRESENT
            SE_SACL_PRESENT
            SE_SELF_RELATIVE
->Owner   : S-1-5-18
->Group   : S-1-5-18
->Dacl    : 
->Dacl    : ->AclRevision: 0x2
->Dacl    : ->Sbz1       : 0x0
->Dacl    : ->AclSize    : 0x1c
->Dacl    : ->AceCount   : 0x1
->Dacl    : ->Sbz2       : 0x0
->Dacl    : ->Ace[0]: ->AceType: ACCESS_ALLOWED_ACE_TYPE
->Dacl    : ->Ace[0]: ->AceFlags: 0x0
->Dacl    : ->Ace[0]: ->AceSize: 0x14
->Dacl    : ->Ace[0]: ->Mask : 0x0000011b
->Dacl    : ->Ace[0]: ->SID: S-1-1-0

You can become SYSTEM to help with access by using PsExec from Sysinternals to launch a command window (or whatever) as SYSTEM but still run in the interactive session:

psexec -s -i -d cmd.exe

If all else fails, you may need to use the “Take Ownership” privilege to make yourself the owner of the object and change its DACL to allow yourself full access. Apparently, even that won’t work, as getting something from a Window Station in another session seems to be blocked (see replies in Twitter thread). READ_CONTROL is available to get some basic info.

Here is a screenshot of Object Explorer running under SYSTEM that shows some details of the “msswindowstation” Window Station:

Guess which processes hold handles to this hidden Windows Station?

Once you are able to get a Window Station handle, you may be able to go one step deeper by enumerating desktops, if you managed to get at least WINSTA_ENUMDESKTOPS access mask:

::EnumDesktops(hWinSta, [](auto deskname, auto param) -> BOOL {
	printf(" Desktop: %ws\n", deskname);
	auto h = (HWINSTA)param;
	return TRUE;
	}, (LPARAM)hWinSta);

Going one level deeper, you can enumerate the top-level windows in each desktop (if any). For that you will need to connect the process to the Window Station of interest and then call EnumDesktopWindows:

void DoEnumDesktopWindows(HWINSTA hWinSta, PCWSTR name) {
	if (::SetProcessWindowStation(hWinSta)) {
		auto hdesk = ::OpenDesktop(name, 0, FALSE, DESKTOP_READOBJECTS);
		if (!hdesk) {
			printf("--- failed to open desktop %ws (%d)\n", name, ::GetLastError());
			return;
		}
		static WCHAR pname[MAX_PATH];
		::EnumDesktopWindows(hdesk, [](auto hwnd, auto) -> BOOL {
			static WCHAR text[64];
			if (::IsWindowVisible(hwnd) && ::GetWindowText(hwnd, text, _countof(text)) > 0) {
				DWORD pid;
				auto tid = ::GetWindowThreadProcessId(hwnd, &pid);
				auto hProcess = ::OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, FALSE, pid);
				BOOL exeNameFound = FALSE;
				PWSTR exeName = nullptr;
				if (hProcess) {
					DWORD size = MAX_PATH;
					exeNameFound = ::QueryFullProcessImageName(hProcess, 0, pname, &size);
					::CloseHandle(hProcess);
					if (exeNameFound) {
						exeName = ::wcsrchr(pname, L'\\');
						if (exeName == nullptr)
							exeName = pname;
						else
							exeName++;
					}
				}
				printf("  HWND: 0x%08X PID: 0x%X (%d) %ws TID: 0x%X (%d): %ws\n", 
					(DWORD)(DWORD_PTR)hwnd, pid, pid, 
					exeNameFound ? exeName : L"", tid, tid, text);
			}
			return TRUE;
			}, 0);
		::CloseDesktop(hdesk);
	}
}

Calling SetProcessWindowStation can only work with a Windows Station that belongs to the current session.

Here is an example output for the interactive session (Window Stations enumerated with EnumWindowStations):

Window station: WinSta0
 Desktop: Default
  HWND: 0x00010E38 PID: 0x4D04 (19716) Zoom.exe TID: 0x5FF8 (24568): ZPToolBarParentWnd
  HWND: 0x000A1C7A PID: 0xB804 (47108) VsDebugConsole.exe TID: 0xDB50 (56144): D:\Dev\winsta\x64\Debug\winsta.exe
  HWND: 0x00031DE8 PID: 0xBF40 (48960) devenv.exe TID: 0x94E8 (38120): winsta - Microsoft Visual Studio Preview
  HWND: 0x00031526 PID: 0x1384 (4996) msedge.exe TID: 0xE7C (3708): zodiacon/ObjectExplorer: Explore Kernel Objects on Windows and
  HWND: 0x00171A9A PID: 0xA40C (41996)  TID: 0x9C08 (39944): WindowStation (\Windows\WindowStations\msswindowstation)
  HWND: 0x000319D0 PID: 0xA40C (41996)  TID: 0x9C08 (39944): Object Manager - Object Explorer 2.0.2.0 (Administrator)
  HWND: 0x001117DC PID: 0x253C (9532) ObjExp.exe TID: 0x9E10 (40464): Object Manager - Object Explorer 2.0.2.0 (Administrator)
  HWND: 0x00031CA8 PID: 0xBE5C (48732) devenv.exe TID: 0xC250 (49744): OpenWinSta - Microsoft Visual Studio Preview (Administrator)
  HWND: 0x000B1884 PID: 0xA8A0 (43168) DbgX.Shell.exe TID: 0xA668 (42600):  - KD '', Local Connection  - WinDbg 1.2306.12001.0 (Administra
...
  HWND: 0x000101C8 PID: 0x3598 (13720) explorer.exe TID: 0x359C (13724): Program Manager
Window station: Service-0x0-45193$
 Desktop: sbox_alternate_desktop_0x6A80
 Desktop: sbox_alternate_desktop_0xA94C
 Desktop: sbox_alternate_desktop_0x3D8C
 Desktop: sbox_alternate_desktop_0x7EF8
 Desktop: sbox_alternate_desktop_0x72FC
 Desktop: sbox_alternate_desktop_0x27B4
 Desktop: sbox_alternate_desktop_0x6E80
 Desktop: sbox_alternate_desktop_0x6C54
 Desktop: sbox_alternate_desktop_0x68C8
 Desktop: sbox_alternate_desktop_0x691C
 Desktop: sbox_alternate_desktop_0x4150
 Desktop: sbox_alternate_desktop_0x6254
 Desktop: sbox_alternate_desktop_0x5B9C
 Desktop: sbox_alternate_desktop_0x59B4
 Desktop: sbox_alternate_desktop_0x1384
 Desktop: sbox_alternate_desktop_0x5480

The desktops in the Window Station “Service-0x0-45193$” above don’t seem to have top-level visible windows.

You can also access the clipboard and atom table of a given Windows Station, if you have a powerful enough handle. I’ll leave that as an exercise as well.

Finally, what about session enumeration? That’s the easy part – no need to call NtOpenSession with Session objects that can be found in the “\KernelObjects” directory in the Object Manager’s namespace – the WTS family of functions can be used. Specifically, WTSEnumerateSessionsEx can provide some important properties of a session:

void EnumSessions() {
	DWORD level = 1;
	PWTS_SESSION_INFO_1 info;
	DWORD count = 0;
	::WTSEnumerateSessionsEx(WTS_CURRENT_SERVER_HANDLE, &level, 0, &info, &count);
	for (DWORD i = 0; i < count; i++) {
		auto& data = info[i];
		printf("Session %d (%ws) Username: %ws\\%ws State: %s\n", data.SessionId, data.pSessionName, 
			data.pDomainName ? data.pDomainName : L"NT AUTHORITY", data.pUserName ? data.pUserName : L"SYSTEM", 
			StateToString((WindowStationState)data.State));
    }
	::WTSFreeMemory(info);
}

What about creating a process to use a different Window Station and desktop? One member of the STARTUPINFO structure passed to CreateProcess (lpDesktop) allows setting a desktop name and an optional Windows Station name separated by a backslash (e.g. “MyWinSta\MyDesktop”).

There is more to Window Stations and Desktops that meets the eye… this should give interested readers a head start in doing further research.

New Offering: Mentoring Program

A few people have asked me if I provide mentoring. I didn’t consider this avenue of contribution, but after some thought I am happy to open a software development personal mentoring program. My goal is to give from my knowledge and experience in software development, but not just that.

Being a great software developer is not just about writing high-quality code. I will not elaborate on the qualities of great software engineers, as there are many such lists in various articles. Many of the qualities of great software developers are the same as other roles in the IT industry (and some in any industry for that matter).

This is what I can offer:

  • Guidance on how to tackle new topics
  • How to be more productive
  • Building self-confidence
  • Working on foundational/core software related pieces to boost professionalism and performance
  • Specific help in subjects I know well enough (C, C++, C#, Rust, Windows, Kernel, Hardware, Graphics, Algorithms, UI, math, writing, …)
  • Anything I can help with that aligns with your goals!

Program details

  • Initial meeting to discuss goals, purpose, and expectations.
  • Program length: 3, 6, 8, or 12 months.
  • 1×1 meetings (see below).
  • Discussions on activities, challenges, support, resources, and hoe to measure success.
  • Chat access between 1×1 meetings.

1×1 Meetings

  • 40 min/week (first month)
  • 1 hour/2 weeks. (month 2-3)
  • 1 hour/3 weeks. (month 4-6)
  • 1 hour/4 weeks. (month 7+)

Program cost

  • 3 months: 3900 USD (paid in 3 installments)
  • 6 months: 4900 USD (paid in 4 installments)
  • 8 months: 5900 USD (paid in 5 installments)
  • 12 months: 6900 USD (paid in 6 installments)

Get in touch

If you’re interested, send me an email to [email protected], and we’ll get the process going. If you have any questions or doubts, just email me. I’ll try to help as best I can.

Kernel Object Names Lifetime

Much of the Windows kernel functionality is exposed via kernel objects. Processes, threads, events, desktops, semaphores, and many other object types exist. Some object types can have string-based names, which means they can be “looked up” by that name. In this post, I’d like to consider some subtleties that concern object names.

Let’s start by examining kernel object handles in Process Explorer. When we select a process of interest, we can see the list of handles in one of the bottom views:

Handles view in Process Explorer

However, Process Explorer shows what it considers handles to named objects only by default. But even that is not quite right. You will find certain object types in this view that don’t have string-based names. The simplest example is processes. Processes have numeric IDs, rather than string-based names. Still, Process Explorer shows processes with a “name” that shows the process executable name and its unique process ID. This is useful information, for sure, but it’s not the object’s name.

Same goes for threads: these are displayed, even though threads (like processes) have numeric IDs rather than string-based names.

If you wish to see all handles in a process, you need to check the menu item Show Unnamed Handles and Mappings in the View menu.

Object Name Lifetime

What is the lifetime associated with an object’s name? This sounds like a weird question. Kernel objects are reference counted, so obviously when an object reference count drops to zero, it is destroyed, and its name is deleted as well. This is correct in part. Let’s look a bit deeper.

The following example code creates a Notepad process, and puts it into a named Job object (error handling omitted for brevity):

PROCESS_INFORMATION pi;
STARTUPINFO si = { sizeof(si) };

WCHAR name[] = L"notepad";
::CreateProcess(nullptr, name, nullptr, nullptr, FALSE, 0, 
	nullptr, nullptr, &si, &pi);

HANDLE hJob = ::CreateJobObject(nullptr, L"MyTestJob");
::AssignProcessToJobObject(hJob, pi.hProcess);

After running the above code, we can open Process Explorer, locate the new Notepad process, double-click it to get to its properties, and then navigate to the Job tab:

We can clearly see the job object’s name, prefixed with “\Sessions\1\BaseNamedObjects” because simple object names (like “MyTestJob”) are prepended with a session-relative directory name, making the name unique to this session only, which means processes in other sessions can create objects with the same name (“MyTestJob”) without any collision. Further details on names and sessions is outside the scope of this post.

Let’s see what the kernel debugger has to say regarding this job object:

lkd> !process 0 1 notepad.exe
PROCESS ffffad8cfe3f4080
    SessionId: 1  Cid: 6da0    Peb: 175b3b7000  ParentCid: 16994
    DirBase: 14aa86d000  ObjectTable: ffffc2851aa24540  HandleCount: 233.
    Image: notepad.exe
    VadRoot ffffad8d65d53d40 Vads 90 Clone 0 Private 524. Modified 0. Locked 0.
    DeviceMap ffffc28401714cc0
    Token                             ffffc285355e9060
    ElapsedTime                       00:04:55.078
    UserTime                          00:00:00.000
    KernelTime                        00:00:00.000
    QuotaPoolUsage[PagedPool]         214720
    QuotaPoolUsage[NonPagedPool]      12760
    Working Set Sizes (now,min,max)  (4052, 50, 345) (16208KB, 200KB, 1380KB)
    PeakWorkingSetSize                3972
    VirtualSize                       2101395 Mb
    PeakVirtualSize                   2101436 Mb
    PageFaultCount                    4126
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      646
    Job                               ffffad8d14503080

lkd> !object ffffad8d14503080
Object: ffffad8d14503080  Type: (ffffad8cad8b7900) Job
    ObjectHeader: ffffad8d14503050 (new version)
    HandleCount: 1  PointerCount: 32768
    Directory Object: ffffc283fb072730  Name: MyTestJob

Clearly, there is a single handle to the job object. The PointerCount value is not the real reference count because of the kernel’s tracking of the number of usages each handle has (outside the scope of this post as well). To get the real reference count, we can click the PointerCount DML link in WinDbg (the !truref command):

kd> !trueref ffffad8d14503080
ffffad8d14503080: HandleCount: 1 PointerCount: 32768 RealPointerCount: 3

We have a reference count of 3, and since we have one handle, it means there are two references somewhere to this job object.

Now let’s see what happens when we close the job handle we’re holding:

::CloseHandle(hJob);

Reopening the Notepad’s process properties in Process Explorer shows this:

Running the !object command again on the job yields the following:

lkd> !object ffffad8d14503080
Object: ffffad8d14503080  Type: (ffffad8cad8b7900) Job
    ObjectHeader: ffffad8d14503050 (new version)
    HandleCount: 0  PointerCount: 1
    Directory Object: 00000000  Name: MyTestJob

The handle count dropped to zero because we closed our (only) existing handle to the job. The job object’s name seem to be intact at first glance, but not really: The directory object is NULL, which means the object’s name is no longer visible in the object manager’s namespace.

Is the job object alive? Clearly, yes, as the pointer (reference) count is 1. When the handle count it zero, the Pointer Count is the correct reference count, and there is no need to run the !truref command. At this point, you should be able to guess why the object is still alive, and where is that one reference coming from.

If you guessed “the Notepad process”, then you are right. When a process is added to a job, it adds a reference to the job object so that it remains alive if at least one process is part of the job.

We, however, have lost the only handle we have to the job object. Can we get it back knowing the object’s name?

hJob = ::OpenJobObject(JOB_OBJECT_QUERY, FALSE, L"MyTestJob");

This call fails, and GetLastError returns 2 (“the system cannot find the file specified”, which in this case is the job object’s name). This means that the object name is destroyed when the last handle of the object is closed, even if there are outstanding references on the object (the object is alive!).

This the job object example is just that. The same rules apply to any named object.

Is there a way to “preserve” the object name even if all handles are closed? Yes, it’s possible if the object is created as “Permanent”. Unfortunately, this capability is not exposed by the Windows API functions like CreateJobObject, CreateEvent, and all other create functions that accept an object name.

Quick update: The native NtMakePermanentObject can make an object permanent given a handle, if the caller has the SeCreatePermanent privilege. This privilege is not granted to any user/group by default.

A permanent object can be created with kernel APIs, where the flag OBJ_PERMANENT is specified as one of the attribute flags part of the OBJECT_ATTRIBUTES structure that is passed to every object creation API in the kernel.

A “canonical” kernel example is the creation of a callback object. Callback objects are only usable in kernel mode. They provide a way for a driver/kernel to expose notifications in a uniform way, and allow interested parties (drivers/kernel) to register for notifications based on that callback object. Callback objects are created with a name so that they can be looked up easily by interested parties. In fact, there are quite a few callback objects on a typical Windows system, mostly in the Callback object manager namespace:

Most of the above callback objects’ usage is undocumented, except three which are documented in the WDK (ProcessorAdd, PowerState, and SetSystemTime). Creating a callback object with the following code creates the callback object but the name disappears immediately, as the ExCreateCallback API returns an object pointer rather than a handle:

PCALLBACK_OBJECT cb;
UNICODE_STRING name = RTL_CONSTANT_STRING(L"\\Callback\\MyCallback");
OBJECT_ATTRIBUTES cbAttr = RTL_CONSTANT_OBJECT_ATTRIBUTES(&name, 
    OBJ_CASE_INSENSITIVE);
status = ExCreateCallback(&cb, &cbAttr, TRUE, TRUE);

The correct way to create a callback object is to add the OBJ_PERMANENT flag:

PCALLBACK_OBJECT cb;
UNICODE_STRING name = RTL_CONSTANT_STRING(L"\\Callback\\MyCallback");
OBJECT_ATTRIBUTES cbAttr = RTL_CONSTANT_OBJECT_ATTRIBUTES(&name, 
    OBJ_CASE_INSENSITIVE | OBJ_PERMANENT);
status = ExCreateCallback(&cb, &cbAttr, TRUE, TRUE);

A permanent object must be made “temporary” (the opposite of permanent) before actually dereferencing it by calling ObMakeTemporaryObject.

Aside: Getting to an Object’s Name in WinDbg

For those that wonder how to locate an object’s name give its address. I hope that it’s clear enough… (watch the bold text).

lkd> !object ffffad8d190c0080
Object: ffffad8d190c0080  Type: (ffffad8cad8b7900) Job
    ObjectHeader: ffffad8d190c0050 (new version)
    HandleCount: 1  PointerCount: 32770
    Directory Object: ffffc283fb072730  Name: MyTestJob
lkd> dt nt!_OBJECT_HEADER ffffad8d190c0050
   +0x000 PointerCount     : 0n32770
   +0x008 HandleCount      : 0n1
   +0x008 NextToFree       : 0x00000000`00000001 Void
   +0x010 Lock             : _EX_PUSH_LOCK
   +0x018 TypeIndex        : 0xe9 ''
   +0x019 TraceFlags       : 0 ''
   +0x019 DbgRefTrace      : 0y0
   +0x019 DbgTracePermanent : 0y0
   +0x01a InfoMask         : 0xa ''
   +0x01b Flags            : 0 ''
   +0x01b NewObject        : 0y0
   +0x01b KernelObject     : 0y0
   +0x01b KernelOnlyAccess : 0y0
   +0x01b ExclusiveObject  : 0y0
   +0x01b PermanentObject  : 0y0
   +0x01b DefaultSecurityQuota : 0y0
   +0x01b SingleHandleEntry : 0y0
   +0x01b DeletedInline    : 0y0
   +0x01c Reserved         : 0
   +0x020 ObjectCreateInfo : 0xffffad8c`d8e40cc0 _OBJECT_CREATE_INFORMATION
   +0x020 QuotaBlockCharged : 0xffffad8c`d8e40cc0 Void
   +0x028 SecurityDescriptor : 0xffffc284`3dd85eae Void
   +0x030 Body             : _QUAD
lkd> db nt!ObpInfoMaskToOffset L10
fffff807`72625e20  00 20 20 40 10 30 30 50-20 40 40 60 30 50 50 70  .  @.00P @@`0PPp
lkd> dx (nt!_OBJECT_HEADER_NAME_INFO*)(0xffffad8d190c0050 - ((char*)0xfffff807`72625e20)[(((nt!_OBJECT_HEADER*)0xffffad8d190c0050)->InfoMask & 3)])
(nt!_OBJECT_HEADER_NAME_INFO*)(0xffffad8d190c0050 - ((char*)0xfffff807`72625e20)[(((nt!_OBJECT_HEADER*)0xffffad8d190c0050)->InfoMask & 3)])                 : 0xffffad8d190c0030 [Type: _OBJECT_HEADER_NAME_INFO *]
    [+0x000] Directory        : 0xffffc283fb072730 [Type: _OBJECT_DIRECTORY *]
    [+0x008] Name             : "MyTestJob" [Type: _UNICODE_STRING]
    [+0x018] ReferenceCount   : 0 [Type: long]
    [+0x01c] Reserved         : 0x0 [Type: unsigned long]

Upcoming Training Classes for June & July

I’m happy to announce 3 upcoming remote training classes to be held in June and July.

Windows System Programming

This is a 5-day class, split into 10 half-days. The syllabus can be found here.

All times are 11am to 3pm ET (8am to 11am, PT) (4pm to 8pm, London time)

June: 7, 8, 12, 14, 15, 19, 21, 22, 26, 28

Cost: 950 USD if paid by an individual, 1900 USD if paid by a company.

COM Programming

This is a 3-day course, split into 6 half-days. The syllabus can be found here.

All times are 11am to 3pm ET (8am to 11am, PT) (4pm to 8pm, London time)

July: 10, 11, 12, 17, 18, 19

Cost: 750 USD (if paid by an individual), 1500 USD if paid by a company.

x64 Architecture and Programming

This is a brand new, 3 day class, split into 6 half-days, that covers the x64 processor architecture, programming in general, and programming in the context of Windows. The syllabus is not finalized yet, but it will cover at least the following topics:

  • General architecture and brief history
  • Registers
  • Addressing modes
  • Stand-alone assembly programs
  • Mixing assembly with C/C++
  • MSVC compiler-generated assembly
  • Operating modes: real, protected, long (+paging)
  • Major instruction groups
  • Macros
  • Shellcode
  • BIOS and assembly

July: 24, 25, 26, 31, August: 1, 2

Cost: 750 USD (if paid by an individual), 1500 USD if paid by a company.

Registration

If you’d like to register, please send me an email to [email protected] and provide the name of the training class of interest, your full name, company (if any), preferred contact email, and your time zone. Previous participants in my classes get 10% off. If you register for more than one class, the second (and third) are 10% off as well.

The sessions will be recorded, so you can watch any part you may be missing, or that may be somewhat overwhelming in “real time”.

As usual, if you have any questions, feel free to send me an email, or DM on twitter (@zodiacon) or Linkedin (https://www.linkedin.com/in/pavely/).

The Quest for the Ultimate GUI Framework

I love Graphical User Interfaces, especially the good ones 🙂 Some people feel more comfortable with a terminal and command line arguments – I prefer a graphical representation, especially when visualization of information can be much more effective than text (even if colorful).

Most of the tools I write are GUI tools; I like colors and graphics – computers are capable of so much graphic and visualization power – why not see it in all its glory? GUIs are not a silver bullet by any means. Sometimes bad GUIs are encountered, which might send the user to the command terminal. I’m not going to discuss here what makes up a good GUI. This post is about technologies to create GUIs.

Disclaimer: much of the rest of this post is subjective – my experience with Windows GUIs. I’m also not discussing web UI – not really in the same scope. I’m interested in taking advantage of the machine, not being constrained or affected by some browser or HTML/CSS/JS engine. The discussion is not exhaustive, either; there is a limit to a post 🙂

In the old days, the Win32 User Interface reined supreme. It was created in the days where memory was scarce, colors were few, hardware acceleration did not exist, and consistency was the name of the game. Modern GUIs were just starting to come up.

Windows supports all the standard controls (widgets) a typical GUI application would need. From buttons and menus, to list views and tree views, to edit controls, the standard set of typical application usage was covered. The basis of the Win32 GUI model was (and still is) the might Handle to Window (HWND). This entity represented the surface on which the window (typically a control) would render its graphical representation and handle its interaction logic. This worked fairly well throughout the 1990s and early 2000s.

The model was not perfect, but any means. Customizing controls was difficult, and in some cases downright impossible. Built-in customization was minimal, any substantial customization required subclassing – essentially taking control of handling some window messages differently in the hope of not breaking integration with the default message processing. It was a lot of work at best, and imperfect or impossible at worse. Messages like WM_PAINT and WM_ERASEBKGND were commonly overridden, but also mouse and keyboard-related messages. In some cases, there was no good option for customization and full blown control had to be written from scratch.

Here is a simple example: say you want to change the background color of a button. This should in theory be simple – change some property and you’re done. Not so easy with the Win32 button – it had to be owner-drawn or custom-drawn (WM_CUSTOMDRAW) in later versions of Windows. And that’s really a simple example.

Layout didn’t really exist. Controls were placed at an (x,y) coordinate measured from the top-left corner of the parent window – in pixels, mind you – with a specified width and height. There were no “panels” to handle more complex layout, in a grid for example, horizontally, or vertically, etc.

From a programmatic perspective, working directly with the Windows GUI API was no picnic either. Microsoft realized this, and developed The Microsoft Foundation Classes (MFC) library in the early 1990s to make working with Win32 GUI somewhat easier, by wrapping some of the functionality in C++ classes, and adding some nice features like docking windows. MFC was very popular at the time, as it was easier to use when getting started with building GUIs. It didn’t solve anything fundamental, as it was just using the Win32 GUI API under the covers. Several third-party libraries were written on top of MFC to provide even more functionality out of the box. MFC can still be used today, with Visual Studio still providing wizards and other helpers for MFC developers.

MFC wasn’t perfect of course. Beyond the obvious usage of the Win32 UI controls, it was fairly bloated, dragging with it a large DLL or adding a big static chunk if linked statically. Another library came out, the Windows Template Library (WTL), that provided a thin layer around the Windows GUI API, based on template classes, meaning that there was no “runtime” in the same sense as MFC – no library to link with – just whatever is compiled directly.

Personally, I like WTL a lot. In fact, my tools in recent years use WTL exclusively. It’s much more flexible than MFC, and doesn’t impose a particular way of working as MFC strongly did. The downside is that WTL wasn’t an official Microsoft library, mostly developed by good people inside the company in their spare time. Visual Studio has no special support for WTL. That said, WTL is still being maintained, and had some incremental features added throughout the years.

At the same time as MFC and WTL were used by C++ developers, another might tool entered the scene: Visual Basic. This environment was super successful for primary two reasons:

  • The programming language was based on BASIC, which many people had at least acquaintance with, as it was the most common programming language for personal computers in the 1980s and early 1990s.
  • The “Visual” aspect of Visual Basic was new and compelling. Just drag controls from a toolbox onto a surface, change properties in the designer and/or at runtime, connect to events easily, and you’re good to go.

To this day, I sometimes encounter customers and applications still built with Visual Basic 6, even though its official support date is long gone.

The .NET Era

At around 2002, .NET and C# were introduced by Microsoft as a response to the Java language and ecosystem that came out in 1995. With .NET, the Windows Forms (WinForms) library was provided, which was very similar to the Visual Basic experience, but with the more modern and powerful .NET Framework. And with .NET 2 in 2005, where .NET really kicked in (generics and other important features released), Windows Forms was the go-to UI framework while Visual Basic’s popularity somewhat waning.

However, WinForms was still based around the Win32 GUI model – HWNDs, no easy customization, etc. However, Microsoft did a lot of work to make WinForms more customizable than pure Win32 or MFC by subclassing many of the existing controls and adding functionality available with simple properties. Support was added to customize menus with colors and icons, buttons with images and custom colors, and more. The drag-n-drop experience from Visual Basic was available as well, making it relatively easy to migrate from Visual Basic.

.NET 3 and WPF

The true revolution came in 2006 when .NET 3 was released. .NET 3 had 3 new technologies that were greatly advertised:

WCF was hugely successful, and took over older technologies as it unified all types of communications, whether based on remoting, HTTP, sockets, or whatever. WF had only moderate success.

WPF was the new UI framework, and it was revolutionary. WPF ditched the Win32 UI model – a WPF “main” window still had an HWND – you can’t get away with that – but all the controls were drawn by WPF – the Win32 UI controls were not used. From Win32’s perspective there was just one HWND. Compare that to Win32 UI model, where every control is an HWND – buttons, list boxes, list views, toolbars, etc.

With the HWND restrictions gone, WPF used DirectX for rendering purposes, compared to the aging Graphics Device Interface (GDI) API used by Win32 GUIs. Without the artificial boundaries of HWNDs, WPF could do anything – combine anything – 2D, 3D, animation, media, unlimited customization – without any issues, as the entire HWND surface belonged to WPF.

I remember when I was introduced to WPF (at that time code name “Avalon”) – I was blown away. It was a far cry from the old, predictable, non-customizable model of Win32 GUIs.

WPF wasn’t just about the graphics and visuals. It also provided powerful data binding, much more powerful than the limited model supported by WinForms. I would even go so far as say it’s one of the most important of WPF’s features. WPF introduced XAML – an XML based language to declaratively build UIs, with object creation, properties, and even declarative data binding. Customizing controls could be done in several ways, including existing properties, control templates and data templates. WPF was raw power.

So, is WPF the ultimate GUI framework? It certainly looked like a prime candidate.

WPF made progress, ironing out issues, adding some features in .NET 3.5 and .NET 4. But then it seemed to have grinded to a halt. WPF barely made some minor improvements in .NET 4.5. One can say that it was pretty complete, so perhaps nothing much to add?

One aspect of WPF not dealt with well was performance. WPF could be bogged down by many control with complex data bindings – data bindings were mostly implemented with Reflection – a flexible but relatively slow .NET mechanism. There was certainly opportunities for improvement. Additionally, some controls were inherently slow, most notable the DataGrid, which was useful, but problematic as it was painfully slow. Third party libraries came in to the rescue and provided improved Data Grids of their own (most not free).

WPF had a strong following, with community created controls, and other goodies. Microsoft, however, seemed to have lost interest in WPF, the reason perhaps being the “Metro” revolution of 2012.

“Metro” and Going Universal

Windows 8 was a major release for Microsoft where UI is concerned. The “Metro” minimal language was all the rage at the time. Touch devices started to appear and Microsoft did not want to lose the battle. I noticed that Microsoft tends to move from one extreme to another, finally settling somewhere in the middle – but that usually takes years. Windows 8 is a perfect example. Metro applications (as they were called at the time) were always full screen – even on desktops with big displays. A new framework was built, based around the Windows Runtime – a new library based on the old but trusty Component Object Model (COM), with metadata used with the .NET metadata format.

The Windows Runtime UI model was built on similar principles as WPF – XAML (not the same one, mind you; that would be too easy), data binding, control templates, and other similar (but simplified) concepts from WPF. The Windows Runtime was internally built in C++, with “convenient” language projections provided out of the box for C++ (C++/CX at the time), .NET (C# and VB), and even JavaScript.

Generally, Windows 8 and the Universal applications (as they were later renamed) were pretty terrible. The “Metro design language”, with its monochromatic simplistic icons and graphics was ridiculous. Colors were gone. I felt like I’m sliding back to the 1980s when colors were limited. This “Metro” style spread everywhere as far as Microsoft is concerned. For example, Visual Studio 2012 that was out at the time was monochromatic – all icons in black only! It was a nightmare. Microsoft’s explanation was “to focus the developer attention to the code, remove distractions”. In actually, it failed miserably. I remember the control toolbox for WinForms and WPF in VS 2012 – all icons were gray – there was just no way to distinguish between them at a glance – which destroys the point of having icons in the first place. Microsoft boasted that their designers managed to make all these once colorful icons with a single color! What an achievement.

With Visual Studio 2013, they started to bring some colors back… the whole thing was so ridiculous.

The “Universal” model was created at least to address the problem of creating applications with the same code for Windows 8 and Windows Phone 8. To that end, it was successful, as the Win32 GUI was not implemented on Windows Phone, presumably because it was outdated, with lots and lots of code that is not well-suited for a small, much less powerful, form factor like the phone and other small devices.

Working with Universal applications (now called Universal Windows Platform applications) was similar to WPF to some extent, but the controls were geared towards touch devices, where fingers are mostly used. Controls were big, list views were scrolling smoothly but had very few lines of content. For desktop applications, it was a nightmare. Not to mention that Windows 7 (still very popular at the time) was not supported.

WPF was still the best option in the Microsoft space at the time, even though it stagnated. At least it worked on Windows 7, and its default control rendering was suited to desktop applications.

Windows 8.1 made some improvements in Universal apps – at least a minimize button was added! Windows 10 fixed the Universal fiasco by allowing windows to be resized normally like in the “old” days. There was a joke at the time saying that “Windows 10 returned windows to Windows. Before that it was Window – singular”.

That being said, Windows 10’s own UI was heavily influenced by Metro. The settings up use monochrome icons – how can anyone think this is better than colorful icons for easy recognition. This trend continues with Windows 11 where various classic windows are “converted” to the new “design language”. At least the settings app uses somewhat colorful icons on Windows 11.

The Universal apps could only run with a single instance, something that has since changed, but still employed. For example, the settings app in Windows 10 and 11 is single instance. Why on earth should it be in an OS named “Windows”? Give me more than one Settings window at a time!

Current State of Affairs

WPF is not moving forward. With the introduction of .NET Core (later renamed to simply .NET), WPF was open sourced, and is available in .NET 5+. It’s not cross platform, as most of the other .NET 5+ pieces.

UWP is a failure, even Microsoft admits that. It’s written in C++ (it’s based on the Windows Runtime after all), which should give it good performance not bogged down by .NET’s garbage collector and such. But its projections for C++ is awful, and in my opinion unusable. If you create a new UWP application with C++ in Visual Studio, you’ll get plenty of files, including IDL (Interface Definition Language), some generated files, and all that for a single button in a window. I tried writing something more complex, and gave up. It’s too slow and convoluted. The only real option is to use .NET – something I may not want to do with all its dependencies and overhead.

Regardless, the controls default look and feel is geared towards touch devices. I don’t care about the little animations – I want to be able to use a proper list view. For example, the Windows 11 new Task Manager that is built with the new WinUI technology (described next) uses the Win32 classic list view – because it’s fast and appropriate for this kind of tool. The rest is WinUI – the tabs are gone, there are monochromatic icons – it’s just ridiculous. The WinUI adds nothing except a dark theme option.

Task manager in Windows 11

The WinUI technology is similar to UWP in concept and implementation. The current state of UI affairs is messy – there is WinUI, UWP, .NET Maui (to replace Xamarin for mobile devices but not just) – what are people supposed to use?

All these UI libraries don’t really cater for desktop apps. This is why I’m still using WTL (which is wrapping the Win32 classic GUI API). There is no good alternative from Microsoft.

But perhaps not all is lost – Avalonia is a fairly new library attempting to bring WPF style UI and capabilities to more than just Windows. But it’s not a Microsoft library, but built by people in the community as open source – there is no telling if at some point it will stop being supported. On the other hand, WPF – a Microsoft library – stopped being supported.

Other Libraries

At this point you may be wondering why use a Microsoft library at all for desktop GUI – Microsoft has dropped the ball, as they continue to make a mess. Maybe use Blazor on the desktop? Out of scope for this post.

There are other options. many GUI libraries that use C or C++ exist – wxWidgets, GTK, and Qt, to name a few. wxWidgets supports Windows fairly well. Installing GTK successfully is a nightmare. Qt is very powerful and takes control of drawing everything, similar to the WPF model. It has powerful tools for designing GUIs, with its own declarative language based on JavaScript. With Qt you also have to use its own classes for non-UI stuff, like strings and lists. It’s also pricey for closed source.

Another alternative which has a lot of promise (some of which is already delivered) is Dear ImGui. This library is different from most others, as it’s Immediate Mode GUI, rather than Retained Mode which most other are. It’s cross platform, very flexible and fast. Just look at some of the GUIs built with it – truly impressive.

I’ll probably migrate to using ImGui. Is it the ultimate GUI framework? Not yet, but I feel it’s the closest to attain that goal. A couple of years back I implemented a mini-Process Explorer like tool with ImGui. Its list view is flexible and rich, and the library in general gets better all the time. It has great support from the authors and the community. It’s not perfect yet, there are still rough edges, and in some cases you have to work harder because of its cross-platform nature.

I should also mention Uno Platform, another cross-platform UI framework built on top of .NET, that made great strides in recent years.

What’s Next?

Microsoft has dropped the ball on desktop apps. The Win32 classic model is not being maintained. Just try to create a “dark mode” UI. I did that to some extent for the Sysinternals tools at the time. It was hard. Some things I just couldn’t do right – the scrollbars that are attached to list views and tree views, for example.

Prior to common controls version 6 (Vista), Microsoft had a “flat scroll bars” feature that allowed customization of scrollbars fairly easily (colors, for example). But surprisingly, common controls version 6 dropped this feature! Flat scroll bars are no longer supported. I had to go through hoops to implement dark scroll bars for Sysinternals – and even that was imperfect.

In my own tools, I created a theme engine as well – implemented differently – and I decided to forgo customizing scroll bars. Let them remain as is – it’s just too difficult and fragile.

I do hope Microsoft changes something in the way they look at desktop apps. This is where most Windows users are! Give us WPF in C++. Or enhance the Win32 model. The current UI mess is not helping, either.

I’m going to set some time to work on building some tools that use Dear ImGui – I feel it has the most bang for the buck.

Memory Information in Task Manager

You may have been asked this question many times: “How much memory does this process consume?” The question seems innocent enough. Your first instinct might be to open Task Manager, go to the Processes tab, find the process in the list, and look at the column marked “Memory“. What could be simpler?

A complication is hinted at when looking in the Details tab. The default memory-related column is named “Memory (Active Private Working Set)”, which seems more complex than simply “Memory”. Opening the list of columns from the Details tab shows more columns where the term “Memory” is used. What gives?

The Processes’ tab Memory column is the same as the Details’ tab Memory (active private working set). But what does it mean? Let’s break it down:

  • Working set – the memory is accessible by the processor with no page fault exception. Simply put, the memory is in RAM (physical memory).
  • Private – the memory is private to the process. This is in contrast to shared memory, which is (at least can be) shared with other processes. The canonical example of shared memory is PE images – DLLs and executables. A DLL that is mapped to multiple processes will (in most cases) have a single presence in physical memory.
  • Active – this is an artificial term used by Task Manager related to UWP (Universal Windows Platform) processes. If a UWP process’ window is minimized, this column shows zero memory consumption, because in theory, since all the process’ threads are suspended, that memory can be repurposed for other processes to use. You can try it by running Calculator, and minimizing its window. You’ll see this column showing zero. Restore the window, and it will show some non-zero value. In fact, there is a column named Memory (private working set), which shows the same thing but does not take into consideration the “active” aspect of UWP processes.

So what does all this mean? The fact that this column shows only private memory is a good thing. That’s because the shared memory size (in most cases) is not controllable and is fixed – for example, the size of a DLL – it’s out of our control – the process just needs to use the DLL. The downside of this active private working set column is that fact it only shows memory current part of the process working set – in RAM. A process may allocate a large junk of memory, but most of it may not be in RAM right now, but it is still consumed, and counts towards the commit limit of the system.

Here is a simple example. I’m writing the following code to allocate (commit) 64 GM of memory:

auto ptr = VirtualAlloc(nullptr, 64LL << 30, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);

Here is what Task manager shows in its Performance/Memory tab before the call:

“In Use” indicates current RAM (physical memory) usage – it’s 34.6 GB. The “Committed” part is more important – it indicates how much memory I can totally commit on the system, regardless of whether it’s in physical memory now or not. It shows “44/128 GB” – 44 GB are committed now (34.6 of that in RAM), and my commit limit is 128 GB (it’s the sum of my total RAM and the configured page files sizes). Here is the same view after I commit the above 64 GB:

Notice the physical memory didn’t change much, but the committed memory “jumped” by 64 GB, meaning there is now only 20 GB left for other processes to use before the system runs out of memory (or page file expansion occurs). Looking at the Details that for this Test process shows the active private working set column indicating a very low memory consumption because it’s looking at private RAM usage only:

Only when the process starts “touching” (using) the committed memory, physical pages will start being used by the process. The name “committed” indicates the commitment of the system to providing that entire memory block if required no matter what.

Where is that 64 GB shown? The column to use is called in Task Manager Commit Size, which is in fact private committed memory:

Commit Size is the correct column to look at when trying to ascertain memory consumption in processes. The sad thing is that it’s not the default column shown, and that’s why many people use the misleading active private working set column. My guess is the reason the misleading column is shown by default is because physical memory is easy to understand for most people, whereas virtual memory – (some of which is in RAM and some which is not) is not trivially understood.

Compare Commit Size to active private working set sometimes reveals a big difference – an indication that most of the private memory of a process is not in RAM right now, but the memory is still consumed as far as the memory manager is concerned.

A related confusion exists because of different terminology used by different tools. Specifically, Commit Size in Task Manager is called Private Bytes in Process Explorer and Performance Monitor.

Task Manager’s other memory columns allow you to look at more memory counters such as Working Set (total RAM used by a process, including private and shared memory), Peak Working Set, Memory (shared working set), and Working Set Delta.

There are other subtleties I am not expanding on in this post. Hopefully, I’ll touch on these in a future post.

Bottom line: Commit Size is the way to go.

Minimal Executables

Here is a simple experiment to try: open Visual Studio and create a C++ console application. All that app is doing is display “hello world” to the console:

#include <stdio.h>

int main() {
	printf("Hello, world!\n");
	return 0;
}

Build the executable in Release build and check its size. I get 11KB (x64). Not too bad, perhaps. However, if we check the dependencies of this executable (using the dumpbin command line tool or any PE Viewer), we’ll find the following in the Import directory:

There are two dependencies: Kernel32.dll and VCRuntime140.dll. This means these DLLs will load at process start time no matter what. If any of these DLLs is not found, the process will crash. We can’t get rid of Kernel32 easily, but we may be able to link statically to the CRT. Here is the required change to VS project properties:

After building, the resulting executable jumps to 136KB in size! Remember, it’s a “hello, world” application. The Imports directory in a PE viewer now show Kernel32.dll as the only dependency.

Is that best we can do? Why do we need the CRT in the first place? One obvious reason is the usage of the printf function, which is implemented by the CRT. Maybe we can use something else without depending on the CRT. There are other reasons the CRT is needed. Here are a few:

  • The CRT is the one calling our main function with the correct argc and argv. This is expected behavior by developers.
  • Any C++ global objects that have constructors are executed by the CRT before the main function is invoked.
  • Other expected behaviors are provided by the CRT, such as correct handling of the errno (global) variable, which is not really global, but uses Thread-Local-Storage behind the scenes to make it per-thread.
  • The CRT implements the new and delete C++ operators, without which much of the C++ standard library wouldn’t work without major customization.

Still, we may be OK doing things outside the CRT, taking care of ourselves. Let’s see if we can pull it off. Let’s tell the linker that we’re not interested in the CRT:

Setting “Ignore All Default Libraries” tells the linker we’re not interested in linking with the CRT in any way. Building the app now gives some linker errors:

1>Test2.obj : error LNK2001: unresolved external symbol __security_check_cookie
1>Test2.obj : error LNK2001: unresolved external symbol __imp___acrt_iob_func
1>Test2.obj : error LNK2001: unresolved external symbol __imp___stdio_common_vfprintf
1>LINK : error LNK2001: unresolved external symbol mainCRTStartup
1>D:\Dev\Minimal\x64\Release\Test2.exe : fatal error LNK1120: 4 unresolved externals

One thing we expected is the missing printf implementation. What about the other errors? We have the missing “security cookie” implementation, which is a feature of the CRT to try to detect stack overrun by placing a “cookie” – some number – before making certain function calls and making sure that cookie is still there after returning. We’ll have to settle without this feature. The main missing piece is mainCRTStartup, which is the default entry point that the linker is expecting. We can change the name, or overwrite main to have that name.

First, let’s try to fix the linker errors before reimplementing the printf functionality. We’ll remove the printf call and rebuild. Things are improving:

>Test2.obj : error LNK2001: unresolved external symbol __security_check_cookie
1>LINK : error LNK2001: unresolved external symbol mainCRTStartup
1>D:\Dev\Minimal\x64\Release\Test2.exe : fatal error LNK1120: 2 unresolved externals

The “security cookie” feature can be removed with another compiler option:

When rebuilding, we get a warning about the “/sdl” (Security Developer Lifecycle) option conflicting with removing the security cookie, which we can remove as well. Regardless, the final linker error remains – mainCRTStartup.

We can rename main to mainCRTStartup and “implement” printf by going straight to the console API (part of Kernel32.Dll):

#include <Windows.h>

int mainCRTStartup() {
	char text[] = "Hello, World!\n";
	::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
		text, (DWORD)strlen(text), nullptr, nullptr);

	return 0;
}

This compiles and links ok, and we get the expected output. The file size is only 4KB! An improvement even over the initial project. The dependencies are still just Kernel32.DLL, with the only two functions used:

You may be thinking that although we replaced printf, that’s wasn’t the full power of printf – it supports various format specifiers, etc., which are going to be difficult to reimplement. Is this just a futile exercise?

Not necessarily. Remember that every user mode process always links with NTDLL.dll, which means the API in NtDll is always available. As it turns out, a lot of functionality that is implemented by the CRT is also implemented in NTDLL. printf is not there, but the next best thing is – sprintf and the other similar formatting functions. They would fill a buffer with the result, and then we could call WriteConsole to spit it to the console. Problem solved!

Removing the CRT

Well, almost. Let’s add a definition for sprintf_s (we’ll be nice and go with the “safe” version), and then use it:

#include <Windows.h>

extern "C" int __cdecl sprintf_s(
	char* buffer,
	size_t sizeOfBuffer,
	const char* format,	...);

int mainCRTStartup() {
	char text[64];
	sprintf_s(text, _countof(text), "Hello, world from process %u\n", ::GetCurrentProcessId());
	::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
		text, (DWORD)strlen(text), nullptr, nullptr);

	return 0;
}

Unfortunately, this does not link: sprintf_s is an unresolved external, just like strlen. It makes sense, since the linker does not know where to look for it. Let’s help out by adding the import library for NtDll:

#pragma comment(lib, "ntdll")

This should work, but one error persists – sprintf_s; strlen however, is resolved. The reason is that the import library for NtDll provided by Microsoft does not have an import entry for sprintf_s and other CRT-like functions. Why? No good reason I can think of. What can we do? One option is to create an NtDll.lib import library of our own and use it. In fact, some people have already done that. One such file can be found as part of my NativeApps repository (it’s called NtDll64.lib, as the name does not really matter). The other option is to link dynamically. Let’s do that:

int __cdecl sprintf_s_f(
	char* buffer, size_t sizeOfBuffer, const char* format, ...);

int mainCRTStartup() {
	auto sprintf_s = (decltype(sprintf_s_f)*)::GetProcAddress(
        ::GetModuleHandle(L"ntdll"), "sprintf_s");
	if (sprintf_s) {
		char text[64];
		sprintf_s(text, _countof(text), "Hello, world from process %u\n", ::GetCurrentProcessId());
		::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
			text, (DWORD)strlen(text), nullptr, nullptr);
	}

	return 0;
}

Now it works and runs as expected.

You may be wondering why does NTDLL implement the CRT-like functions in the first place? The CRT exists, after all, and can be normally used. “Normally” is the operative word here. Native applications, those that can only depend on NTDLL cannot use the CRT. And this is why these functions are implemented as part of NTDLL – to make it easier to build native applications. Normally, native applications are built by Microsoft only. Examples include Smss.exe (the session manager), CSrss.exe (the Windows subsystem process), and UserInit.exe (normally executed by WinLogon.exe on a successful login).

One thing that may be missing in our “main” function are command line arguments. Can we just add the classic argc and argv and go about our business? Let’s try:

int mainCRTStartup(int argc, const char* argv[]) {
//...
char text[64];
sprintf_s(text, _countof(text), 
    "argc: %d argv[0]: 0x%p\n", argc, argv[0]);
::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
	text, (DWORD)strlen(text), nullptr, nullptr);

Seems simple enough. argv[0] should be the address of the executable path itself. The code carefully displays the address only, not trying to dereference it as a string. The result, however, is perplexing:

argc: -359940096 argv[0]: 0x74894808245C8948

This seems completely wrong. The reason we see these weird values (if you try it, you’ll get different values. In fact, you may get different values in every run!) is that the expected parameters by a true entry point of an executable is not based on argc and argv – this is part of the CRT magic. We don’t have a CRT anymore. There is in fact just one argument, and it’s the Process Environment Block (PEB). We can add some code to show some of what is in there (non-relevant code omitted):

#include <Windows.h>
#include <winternl.h>
//...
int mainCRTStartup(PPEB peb) {
	char text[256];
	sprintf_s(text, _countof(text), "PEB: 0x%p\n", peb);
	::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
		text, (DWORD)strlen(text), nullptr, nullptr);

	sprintf_s(text, _countof(text), "Executable: %wZ\n", 
        peb->ProcessParameters->ImagePathName);
	::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
		text, (DWORD)strlen(text), nullptr, nullptr);

	sprintf_s(text, _countof(text), "Commandline: %wZ\n", 
        peb->ProcessParameters->CommandLine);
	::WriteConsoleA(::GetStdHandle(STD_OUTPUT_HANDLE),
		text, (DWORD)strlen(text), nullptr, nullptr);

<Winternl.h> contains some NTDLL definitions, such as a partially defined PEB. In it, there is a ProcessParameters member that holds the image path and the full command line. Here is the result on my console:

PEB: 0x000000EAC01DB000
Executable: D:\Dev\Minimal\x64\Release\Test3.exe
Commandline: "D:\Dev\Minimal\x64\Release\Test3.exe"

The PEB is the argument provided by the OS to the entry point, whatever its name is. This is exactly what native applications get as well. By the way, we could have used GetCommandLine from Kernel32.dll to get the command line if we didn’t add the PEB argument. But for native applications (that can only depend on NTDLL), GetCommandLine is not an option.

Going Native

How far are we from a true native application? What would be the motivation for such an application anyway, besides small file size and reduced dependencies? Let’s start with the first question.

To make our executable truly native, we have to do two things. The first is to change the subsystem of the executable (stored in the PE header) to Native. VS provides this option via a linker setting:

The second thing is to remove the dependency on Kernel32.Dll. No more WriteConsole and no GetCurrentProcessId. We will have to find some equivalent in NTDLL, or write our own implementation leveraging what NtDll has to offer. This is obviously not easy, given that most of NTDLL is undocumented, but most function prototypes are available as part of the Process Hacker/phnt project.

For the second question – why bother? Well, one reason is that native applications can be configured to run very early in Windows boot – these in fact run by Smss.exe itself when it’s the only existing user-mode process at that time. Such applications (like autochk.exe, a native chkdsk.exe) must be native – they cannot depend on the CRT or even on kernel32.dll, since the Windows Subsystem Process (csrss.exe) has not been launched yet.

For more information on Native Applications, you can view my talk on the subject.

I may write a blog post on native application to give more details. The examples shown here can be found here.

Happy minimization!

Levels of Kernel Debugging

Doing any kind of research into the Windows kernel requires working with a kernel debugger, mostly WinDbg (or WinDbg Preview). There are at least 3 “levels” of debugging the kernel.

Level 1: Local Kernel Debugging

The first is using a local kernel debugger, which means configuring WinDbg to look at the kernel of the local machine. This can be configured by running the following command in an elevated command window, and restarting the system:

bcdedit -debug on

You must disable Secure Boot (if enabled) for this command to work, as Secure Boot protects against putting the machine in local kernel debugging mode. Once the system is restarted, WinDbg launched elevated, select File/Kernel Debug and go with the “Local” option (WinDbg Preview shown):

If all goes well, you’ll see the “lkd>” prompt appearing, confirming you’re in local kernel debugging mode.

What can you in this mode? You can look at anything in kernel and user space, such as listing the currently existing processes (!process 0 0), or examining any memory location in kernel or user space. You can even change kernel memory if you so desire, but be careful, any “bad” change may crash your system.

The downside of local kernel debugging is that the system is a moving target, things change while you’re typing commands, so you don’t want to look at things that change quickly. Additionally, you cannot set any breakpoint; you cannot view any CPU registers, since these are changing constantly, and are on a CPU-basis anyway.

The upside of local kernel debugging is convenience – setting it up is very easy, and you can still get a lot of information with this mode.

Level 2: Remote Debugging of a Virtual Machine

The next level is a full kernel debugging experience of a virtual machine, which can be running locally on your host machine, or perhaps on another host somewhere. Setting this up is more involved. First, the target VM must be set up to allow kernel debugging and set the “interface” to the host debugger. Windows supports several interfaces, but for a VM the best to use is network (supported on Windows 8 and later).

First, go to the VM and ping the host to find out its IP address. Then type the following:

bcdedit /dbgsettings net hostip:172.17.32.1 port:55000 key:1.2.3.4

Replace the host IP with the correct address, and select an unused port on the host. The key can be left out, in which case the command will generate something for you. Since that key is needed on the host side, it’s easier to select something simple. If the target VM is not local, you might prefer to let the command generate a random key and use that.

Next, launch WinDbg elevated on the host, and attach to the kernel using the “Net” option, specifying the correct port and key:

Restart the target, and it should connect early in its boot process:

Microsoft (R) Windows Debugger Version 10.0.25200.1003 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.

Using NET for debugging
Opened WinSock 2.0
Waiting to reconnect...
Connected to target 172.29.184.23 on port 55000 on local IP 172.29.176.1.
You can get the target MAC address by running .kdtargetmac command.
Connected to Windows 10 25309 x64 target at (Tue Mar  7 11:38:18.626 2023 (UTC - 5:00)), ptr64 TRUE
Kernel Debugger connection established.  (Initial Breakpoint requested)

************* Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       SRV*d:\Symbols*https://msdl.microsoft.com/download/symbols
Symbol search path is: SRV*d:\Symbols*https://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows 10 Kernel Version 25309 MP (1 procs) Free x64
Edition build lab: 25309.1000.amd64fre.rs_prerelease.230224-1334
Machine Name:
Kernel base = 0xfffff801`38600000 PsLoadedModuleList = 0xfffff801`39413d70
System Uptime: 0 days 0:00:00.382
nt!DebugService2+0x5:
fffff801`38a18655 cc              int     3

Enter the g command to let the system continue. The prompt is “kd>” with the current CPU number on the left. You can break at any point into the target by clicking the “Break” toolbar button in the debugger. Then you can set up breakpoints, for whatever you’re researching. For example:

1: kd> bp nt!ntWriteFile
1: kd> g
Breakpoint 0 hit
nt!NtWriteFile:
fffff801`38dccf60 4c8bdc          mov     r11,rsp
2: kd> k
 # Child-SP          RetAddr               Call Site
00 fffffa03`baa17428 fffff801`38a81b05     nt!NtWriteFile
01 fffffa03`baa17430 00007ff9`1184f994     nt!KiSystemServiceCopyEnd+0x25
02 00000095`c2a7f668 00007ff9`0ec89268     0x00007ff9`1184f994
03 00000095`c2a7f670 0000024b`ffffffff     0x00007ff9`0ec89268
04 00000095`c2a7f678 00000095`c2a7f680     0x0000024b`ffffffff
05 00000095`c2a7f680 0000024b`00000001     0x00000095`c2a7f680
06 00000095`c2a7f688 00000000`000001a8     0x0000024b`00000001
07 00000095`c2a7f690 00000095`c2a7f738     0x1a8
08 00000095`c2a7f698 0000024b`af215dc0     0x00000095`c2a7f738
09 00000095`c2a7f6a0 0000024b`0000002c     0x0000024b`af215dc0
0a 00000095`c2a7f6a8 00000095`c2a7f700     0x0000024b`0000002c
0b 00000095`c2a7f6b0 00000000`00000000     0x00000095`c2a7f700
2: kd> .reload /user
Loading User Symbols
.....................
2: kd> k
 # Child-SP          RetAddr               Call Site
00 fffffa03`baa17428 fffff801`38a81b05     nt!NtWriteFile
01 fffffa03`baa17430 00007ff9`1184f994     nt!KiSystemServiceCopyEnd+0x25
02 00000095`c2a7f668 00007ff9`0ec89268     ntdll!NtWriteFile+0x14
03 00000095`c2a7f670 00007ff9`08458dda     KERNELBASE!WriteFile+0x108
04 00000095`c2a7f6e0 00007ff9`084591e6     icsvc!ICTransport::PerformIoOperation+0x13e
05 00000095`c2a7f7b0 00007ff9`08457848     icsvc!ICTransport::Write+0x26
06 00000095`c2a7f800 00007ff9`08452ea3     icsvc!ICEndpoint::MsgTransactRespond+0x1f8
07 00000095`c2a7f8b0 00007ff9`08452abc     icsvc!ICTimeSyncReferenceMsgHandler+0x3cb
08 00000095`c2a7faf0 00007ff9`084572cf     icsvc!ICTimeSyncMsgHandler+0x3c
09 00000095`c2a7fb20 00007ff9`08457044     icsvc!ICEndpoint::HandleMsg+0x11b
0a 00000095`c2a7fbb0 00007ff9`084574c1     icsvc!ICEndpoint::DispatchBuffer+0x174
0b 00000095`c2a7fc60 00007ff9`08457149     icsvc!ICEndpoint::MsgDispatch+0x91
0c 00000095`c2a7fcd0 00007ff9`0f0344eb     icsvc!ICEndpoint::DispatchThreadFunc+0x9
0d 00000095`c2a7fd00 00007ff9`0f54292d     ucrtbase!thread_start<unsigned int (__cdecl*)(void *),1>+0x3b
0e 00000095`c2a7fd30 00007ff9`117fef48     KERNEL32!BaseThreadInitThunk+0x1d
0f 00000095`c2a7fd60 00000000`00000000     ntdll!RtlUserThreadStart+0x28
2: kd> !process -1 0
PROCESS ffffc706a12df080
    SessionId: 0  Cid: 0828    Peb: 95c27a1000  ParentCid: 044c
    DirBase: 1c57f1000  ObjectTable: ffffa50dfb92c880  HandleCount: 123.
    Image: svchost.exe

In this “level” of debugging you have full control of the system. When in a breakpoint, nothing is moving. You can view register values, call stacks, etc., without anything changing “under your feet”. This seems perfect, so do we really need another level?

Some aspects of a typical kernel might not show up when debugging a VM. For example, looking at the list of interrupt service routines (ISRs) with the !idt command on my Hyper-V VM shows something like the following (truncated):

2: kd> !idt

Dumping IDT: ffffdd8179e5f000

00:	fffff80138a79800 nt!KiDivideErrorFault
01:	fffff80138a79b40 nt!KiDebugTrapOrFault	Stack = 0xFFFFDD8179E95000
02:	fffff80138a7a140 nt!KiNmiInterrupt	Stack = 0xFFFFDD8179E8D000
03:	fffff80138a7a6c0 nt!KiBreakpointTrap
...
2e:	fffff80138a80e40 nt!KiSystemService
2f:	fffff80138a75750 nt!KiDpcInterrupt
30:	fffff80138a733c0 nt!KiHvInterrupt
31:	fffff80138a73720 nt!KiVmbusInterrupt0
32:	fffff80138a73a80 nt!KiVmbusInterrupt1
33:	fffff80138a73de0 nt!KiVmbusInterrupt2
34:	fffff80138a74140 nt!KiVmbusInterrupt3
35:	fffff80138a71d88 nt!HalpInterruptCmciService (KINTERRUPT ffffc70697f23900)

36:	fffff80138a71d90 nt!HalpInterruptCmciService (KINTERRUPT ffffc70697f23a20)

b0:	fffff80138a72160 ACPI!ACPIInterruptServiceRoutine (KINTERRUPT ffffdd817a1ecdc0)
...

Some things are missing, such as the keyboard interrupt handler. This is due to certain things handled “internally” as the VM is “enlightened”, meaning it “knows” it’s a VM. Normally, it’s a good thing – you get nice support for copy/paste between the VM and the host, seamless mouse and keyboard interaction, etc. But it does mean it’s not the same as another physical machine.

Level 3: Remote debugging of a physical machine

In this final level, you’re debugging a physical machine, which provides the most “authentic” experience. Setting this up is the trickiest. Full description of how to set it up is described in the debugger documentation. In general, it’s similar to the previous case, but network debugging might not work for you depending on the network card type your target and host machines have.

If network debugging is not supported because of the limited list of network cards supported, your best bet is USB debugging using a dedicated USB cable that you must purchase. The instructions to set up USB debugging are provided in the docs, but it may require some trial and error to locate the USB ports that support debugging (not all do). Once you have that set up, you’ll use the “USB” tab in the kernel attachment dialog on the host. Once connected, you can set breakpoints in ISRs that may not exist on a VM:

: kd> !idt

Dumping IDT: fffff8022f5b1000

00:	fffff80233236100 nt!KiDivideErrorFault
...
80:	fffff8023322cd70 i8042prt!I8042KeyboardInterruptService (KINTERRUPT ffffd102109c0500)
...
Dumping Secondary IDT: ffffe5815fa0e000 

01b0:hidi2c!OnInterruptIsr (KMDF) (KINTERRUPT ffffd10212e6edc0)

0: kd> bp i8042prt!I8042KeyboardInterruptService
0: kd> g
Breakpoint 0 hit
i8042prt!I8042KeyboardInterruptService:
fffff802`6dd42100 4889542410      mov     qword ptr [rsp+10h],rdx
0: kd> k
 # Child-SP          RetAddr               Call Site
00 fffff802`2f5cdf48 fffff802`331453cb     i8042prt!I8042KeyboardInterruptService
01 fffff802`2f5cdf50 fffff802`3322b25f     nt!KiCallInterruptServiceRoutine+0x16b
02 fffff802`2f5cdf90 fffff802`3322b527     nt!KiInterruptSubDispatch+0x11f
03 fffff802`2f5be9f0 fffff802`3322e13a     nt!KiInterruptDispatch+0x37
04 fffff802`2f5beb80 00000000`00000000     nt!KiIdleLoop+0x5a

Happy debugging!

Windows Kernel Programming Class Recordings

I’ve recently posted about the upcoming training classes, the first of which is Advanced Windows Kernel Programming in April. Some people have asked me how can they participate if they have not taken the Windows Kernel Programming fundamentals class, and they might not have the required time to read the book.

Since I don’t plan on providing the fundamentals training class before April, after some thought, I decided to do the following.

I am selling one of the previous Windows Kernel Programming class recordings, along with the course PDF materials, the labs, and solutions to the labs. This is the first time I’m selling recordings of my public classes. If this “experiment” goes well, I might consider doing this with other classes as well. Having recordings is not the same as doing a live training class, but it’s the next best thing if the knowledge provided is valuable and useful. It’s about 32 hours of video, and plenty of labs to keep you busy 🙂

As an added bonus, I am also giving the following to those purchasing the training class:

  • You get 10% discount for the Advanced Windows Kernel Programming class in April.
  • You will be added to a discord server that will host all the Alumni from my public classes (an idea I was given by some of my students which will happen soon)
  • A live session with me sometime in early April (I’ll do a couple in different times of day so all time zones can find a comfortable session) where you can ask questions about the class, etc.

These are the modules covered in the class recordings:

  • Module 0: Introduction
  • Module 1: Windows Internals Overview
  • Module 2: The I/O System
  • Module 3: Device Driver Basics
  • Module 4: The I/O Request Packet
  • Module 5: Kernel Mechanisms
  • Module 6: Process and Thread Monitoring
  • Module 7: Object and Registry Notifications
  • Module 8: File System Mini-Filters Fundamentals
  • Module 9: Miscellaneous Techniques

If you’re interested in purchasing the class, send me an email to [email protected] with the title “Kernel Programming class recordings” and I will reply with payment details. Once paid, reply with the payment information, and I will share a link with the course. I’m working on splitting the recordings into meaningful chunks, so not all are ready yet, but these will be completed in the next day or so.

Here are the rules after a purchase:

  • No refunds – once you have access to the recordings, this is it.
  • No sharing – the content is for your own personal viewing. No sharing of any kind is allowed.
  • No reselling – I own the copyright and all rights.

The cost is 490 USD for the entire class. That’s the whole 32 hours.

If you’re part of a company (or simply have friends) that would like to purchase multiple “licenses”, contact me for a discount.

Upcoming Public Training Classes for April/May

Today I’m happy to announce two training classes to take place in April and May. These classes will be in 4-hour session chunks, so that it’s easier to consume even for uncomfortable time zones.

The first is Advanced Windows Kernel Programming, a class I was promising for quite some time now… it will be held on the following dates:

  • April: 18, 20, 24, 27 and May: 1, 4, 8, 11 (4 days total)
  • Times: 11am to 3pm ET (8am-12pm PT, 4pm to 8pm UT/GMT)

The course will include advanced topics in Windows kernel development, and is recommended for those that were in my Windows Kernel Programming class or have equivalent knowledge; for example, by reading my book Windows Kernel Programming.

Example topics include: deep dive into Windows’ kernel design, working with APCs, Windows Filtering Platform callout drivers, advanced memory management techniques, plug & play filter drivers, and more!

The second class is Windows Internals to be held on the following dates:

  • May: 2, 3, 9, 10, 15, 18, 22, 24, 30 and June: 1, 5 (5.5 days)
  • Times: 11am to 3pm ET (8am-12pm PT, 4pm to 8pm UT/GMT)

The syllabus can be found here (some modifications possible, but the general outline remains).

Cost
950 USD (if paid by an individual), 1900 USD (if paid by a company). The cost is the same for these training classes. Previous students in my classes get 10% off.
Multiple participants from the same company get a discount as well (contact me for the details).

If you’d like to register, please send me an email to [email protected] with the name of the training in the email title, provide your full name, company (if any), preferred contact email, and your time zone.

The sessions will be recorded, so you can watch any part you may be missing, or that may be somewhat overwhelming in “real time”.

As usual, if you have any questions, feel free to send me an email, or DM on twitter (@zodiacon) or Linkedin (https://www.linkedin.com/in/pavely/).


Introduction to the Windows Filtering Platform

As part of the second edition of Windows Kernel Programming, I’m working on chapter 13 to describe the basics of the Windows Filtering Platform (WFP). The chapter will focus mostly on kernel-mode WFP Callout drivers (it is a kernel programming book after all), but I am also providing a brief introduction to WFP and its user-mode API.

This introduction (with some simplifications) is what this post is about. Enjoy!

The Windows Filtering Platform (WFP) provides flexible ways to control network filtering. It exposes user-mode and kernel-mode APIs, that interact with several layers of the networking stack. Some configuration and control is available directly from user-mode, without requiring any kernel-mode code (although it does require administrator-level access). WFP replaces older network filtering technologies, such as Transport Driver Interface (TDI) filters some types of NDIS filters.

If examining network packets (and even modification) is required, a kernel-mode Callout driver can be written, which is what we’ll be concerned with in this chapter. We’ll begin with an overview of the main pieces of WFP, look at some user-mode code examples for configuring filters before diving into building simple Callout drivers that allows fine-grained control over network packets.

WFP is comprised of user-mode and kernel-mode components. A very high-level architecture is shown here:

In user-mode, the WFP manager is the Base Filtering Engine (BFE), which is a service implemented by bfe.dll and hosted in a standard svchost.exe instance. It implements the WFP user-mode API, essentially managing the platform, talking to its kernel counterpart when needed. We’ll examine some of these APIs in the next section.

User-mode applications, services and other components can utilize this user-mode management API to examine WFP objects state, and make changes, such as adding or deleting filters. A classic example of such “user” is the Windows Firewall, which is normally controllable by leveraging the Microsoft Management Console (MMC) that is provided for this purpose, but using these APIs from other applications is just as effective.

The kernel-mode filter engine exposes various logical layers, where filters (and callouts) can be attached. Layers represent locations in the network processing of one or more packets. The TCP/IP driver makes calls to the WFP kernel engine so that it can decide which filters (if any) should be “invoked”.

For filters, this means checking the conditions set by the filter against the current request. If the conditions are satisfied, the filter’s action is applied. Common actions include blocking a request from being further processed, allowing the request to continue without further processing in this layer, continuing to the next filter in this layer (if any), and invoking a callout driver. Callouts can perform any kind of processing, such as examining and even modifying packet data.
The relationship between layers, filters, and callouts is shown here:

As you can see the diagram, each layer can have zero or more filters, and zero or more callouts. The number and meaning of the layers is fixed and provided out of the box by Windows. On most system, there are about 100 layers. Many of the layers are sets of pairs, where one is for IPv4 and the other (identical in purpose) is for IPv6.

The WFP Explorer tool I created provides some insight into what makes up WFP. Running the tool and selecting View/Layers from the menu (or clicking the Layers tool bar button) shows a view of all existing layers.

You can download the WFP Explorer tool from its Github repository
(https://github.com/zodiacon/WFPExplorer) or the AllTools repository
(https://github.com/zodiacon/AllTools).

Each layer is uniquely identified by a GUID. Its Layer ID is used internally by the kernel engine as an identifier rather than the GUID, as it’s smaller and so is faster (layer IDs are 16-bit only). Most layers have fields that can be used by filters to set conditions for invoking their actions. Double-clicking a layer shows its properties. The next figure shows the general properties of an example layer. Notice it has 382 filters and 2 callouts attached to it.

Clicking the Fields tab shows the fields available in this layer, that can be used by filters to set conditions.

The meaning of the various layers, and the meaning of the fields for the layers are all documented in the official WFP documentation.

The currently existing filters can be viewed in WFP Explorer by selecting Filters from the View menu. Layers cannot be added or removed, but filters can. Management code (user or kernel) can add and/or remove filters dynamically while the system is running. You can see that on the system the tool is running on there are currently 2978 filters.

Each filter is uniquely identified by a GUID, and just like layers has a “shorter” id (64-bit) that is used by the kernel engine to more quickly compare filter IDs when needed. Since multiple filters can be assigned to the same layer, some kind of ordering must be used when assessing filters. This is where the filter’s weight comes into play. A weight is a 64-bit value that is used to sort filters by priority. As you can see in figure 13-7, there are two weight properties – weight and effective weight. Weight is what is specified when adding the filter, but effective weight is the actual one used. There are three possible values to set for weight:

  • A value between 0 and 15 is interpreted by WFP as a weight index, which simply means that the effective weight is going to start with 4 bits having the specified weight value and generate the other 60 bit. For example, if the weight is set to 5, then the effective weight is going to be between 0x5000000000000000 and 0x5FFFFFFFFFFFFFFF.
  • An empty value tells WFP to generate an effective weight somewhere in the 64-bit range.
  • A value above 15 is taken as is to become the effective weight.

What is an “empty” value? The weight is not really a number, but a FWP_VALUE type can hold all sorts of values, including holding no value at all (empty).

Double-clicking a filter in WFP Explorer shows its general properties:

The Conditions tab shows the conditions this filter is configured with. When all the conditions are met, the action of the filter is going to fire.

The list of fields used by a filter must be a subset of the fields exposed by the layer this filter is attached to. There are six conditions shown in figure 13-9 out of the possible 39 fields supported by this layer (“ALE Receive/Accept v4 Layer”). As you can see, there is a lot of flexibility in specifying conditions for fields – this is evident in the matching enumeration, FWPM_MATCH_TYPE:

typedef enum FWP_MATCH_TYPE_ {
    FWP_MATCH_EQUAL    = 0,
    FWP_MATCH_GREATER,
    FWP_MATCH_LESS,
    FWP_MATCH_GREATER_OR_EQUAL,
    FWP_MATCH_LESS_OR_EQUAL,
    FWP_MATCH_RANGE,
    FWP_MATCH_FLAGS_ALL_SET,
    FWP_MATCH_FLAGS_ANY_SET,
    FWP_MATCH_FLAGS_NONE_SET,
    FWP_MATCH_EQUAL_CASE_INSENSITIVE,
    FWP_MATCH_NOT_EQUAL,
    FWP_MATCH_PREFIX,
    FWP_MATCH_NOT_PREFIX,
    FWP_MATCH_TYPE_MAX
} FWP_MATCH_TYPE;

The WFP API exposes its functionality for user-mode and kernel-mode callers. The header files used are different, to cater for differences in API expectations between user-mode and kernel-mode, but APIs in general are identical. For example, kernel APIs return NTSTATUS, whereas user-mode APIs return a simple LONG, that is the error value that is returned normally from GetLastError. Some APIs are provided for kernel-mode only, as they don’t make sense for user mode.

W> The user-mode WFP APIs never set the last error, and always return the error value directly. Zero (ERROR_SUCCESS) means success, while other (positive) values mean failure. Do not call GetLastError when using WFP – just look at the returned value.

WFP functions and structures use a versioning scheme, where function and structure names end with a digit, indicating version. For example, FWPM_LAYER0 is the first version of a structure describing a layer. At the time of writing, this was the only structure for describing a layer. As a counter example, there are several versions of the function beginning with FwpmNetEventEnum: FwpmNetEventEnum0 (for Vista+), FwpmNetEventEnum1 (Windows 7+), FwpmNetEventEnum2 (Windows 8+), FwpmNetEventEnum3 (Windows 10+), FwpmNetEventEnum4 (Windows 10 RS4+), and FwpmNetEventEnum5 (Windows 10 RS5+). This is an extreme example, but there are others with less “versions”. You can use any version that matches the target platform. To make it easier to work with these APIs and structures, a macro is defined with the base name that is expanded to the maximum supported version based on the target compilation platform. Here is part of the declarations for the macro FwpmNetEventEnum:

DWORD FwpmNetEventEnum0(
   _In_ HANDLE engineHandle,
   _In_ HANDLE enumHandle,
   _In_ UINT32 numEntriesRequested,
   _Outptr_result_buffer_(*numEntriesReturned) FWPM_NET_EVENT0*** entries,
   _Out_ UINT32* numEntriesReturned);
#if (NTDDI_VERSION >= NTDDI_WIN7)
DWORD FwpmNetEventEnum1(
   _In_ HANDLE engineHandle,
   _In_ HANDLE enumHandle,
   _In_ UINT32 numEntriesRequested,
   _Outptr_result_buffer_(*numEntriesReturned) FWPM_NET_EVENT1*** entries,
   _Out_ UINT32* numEntriesReturned);
#endif // (NTDDI_VERSION >= NTDDI_WIN7)
#if (NTDDI_VERSION >= NTDDI_WIN8)
DWORD FwpmNetEventEnum2(
   _In_ HANDLE engineHandle,
   _In_ HANDLE enumHandle,
   _In_ UINT32 numEntriesRequested,
   _Outptr_result_buffer_(*numEntriesReturned) FWPM_NET_EVENT2*** entries,
   _Out_ UINT32* numEntriesReturned);
#endif // (NTDDI_VERSION >= NTDDI_WIN8)

You can see that the differences in the functions relate to the structures returned as part of these APIs (FWPM_NET_EVENTx). It’s recommended you use the macros, and only turn to specific versions if there is a compelling reason to do so.

The WFP APIs adhere to strict naming conventions that make it easier to use. All management functions start with Fwpm (Filtering Windows Platform Management), and all management structures start with FWPM. The function names themselves use the pattern <prefix><object type><operation>, such as FwpmFilterAdd and FwpmLayerGetByKey.

It’s curious that the prefixes used for functions, structures, and enums start with FWP rather than the (perhaps) expected WFP. I couldn’t find a compelling reason for this.

WFP header files start with fwp and end with u for user-mode or k for kernel-mode. For example, fwpmu.h holds the management functions for user-mode callers, whereas fwpmk.h is the header for kernel callers. Two common files, fwptypes.h and fwpmtypes.h are used by both user-mode and kernel-mode headers. They are included by the “main” header files.

User-Mode Examples

Before making any calls to specific APIs, a handle to the WFP engine must be opened with FwpmEngineOpen:

DWORD FwpmEngineOpen0(
   _In_opt_ const wchar_t* serverName,  // must be NULL
   _In_ UINT32 authnService,            // RPC_C_AUTHN_DEFAULT
   _In_opt_ SEC_WINNT_AUTH_IDENTITY_W* authIdentity,
   _In_opt_ const FWPM_SESSION0* session,
   _Out_ HANDLE* engineHandle);

Most of the arguments have good defaults when NULL is specified. The returned handle must be used with subsequent APIs. Once it’s no longer needed, it must be closed:

DWORD FwpmEngineClose0(_Inout_ HANDLE engineHandle);

Enumerating Objects

What can we do with an engine handle? One thing provided with the management API is enumeration. These are the APIs used by WFP Explorer to enumerate layers, filters, sessions, and other object types in WFP. The following example displays some details for all the filters in the system (error handling omitted for brevity, the project wfpfilters has the full source code):

#include <Windows.h>
#include <fwpmu.h>
#include <stdio.h>
#include <string>

#pragma comment(lib, "Fwpuclnt")

std::wstring GuidToString(GUID const& guid) {
    WCHAR sguid[64];
    return ::StringFromGUID2(guid, sguid, _countof(sguid)) ? sguid : L"";
}

const char* ActionToString(FWPM_ACTION const& action) {
    switch (action.type) {
        case FWP_ACTION_BLOCK:               return "Block";
        case FWP_ACTION_PERMIT:              return "Permit";
        case FWP_ACTION_CALLOUT_TERMINATING: return "Callout Terminating";
        case FWP_ACTION_CALLOUT_INSPECTION:  return "Callout Inspection";
        case FWP_ACTION_CALLOUT_UNKNOWN:     return "Callout Unknown";
        case FWP_ACTION_CONTINUE:            return "Continue";
        case FWP_ACTION_NONE:                return "None";
        case FWP_ACTION_NONE_NO_MATCH:       return "None (No Match)";
    }
    return "";
}

int main() {
    //
    // open a handle to the WFP engine
    //
    HANDLE hEngine;
    FwpmEngineOpen(nullptr, RPC_C_AUTHN_DEFAULT, nullptr, nullptr, &hEngine);

    //
    // create an enumeration handle
    //
    HANDLE hEnum;
    FwpmFilterCreateEnumHandle(hEngine, nullptr, &hEnum);

    UINT32 count;
    FWPM_FILTER** filters;
    //
    // enumerate filters
    //
    FwpmFilterEnum(hEngine, hEnum, 
        8192,       // maximum entries, 
        &filters,   // returned result
        &count);    // how many actually returned

    for (UINT32 i = 0; i < count; i++) {
        auto f = filters[i];
        printf("%ws Name: %-40ws Id: 0x%016llX Conditions: %2u Action: %s\n",
            GuidToString(f->filterKey).c_str(),
            f->displayData.name,
            f->filterId,
            f->numFilterConditions,
            ActionToString(f->action));
    }
    //
    // free memory allocated by FwpmFilterEnum
    //
    FwpmFreeMemory((void**)&filters);

    //
    // close enumeration handle
    //
    FwpmFilterDestroyEnumHandle(hEngine, hEnum);

    //
    // close engine handle
    //
    FwpmEngineClose(hEngine);

    return 0;
}

The enumeration pattern repeat itself with all other WFP object types (layers, callouts, sessions, etc.).

Adding Filters

Let’s see if we can add a filter to perform some useful function. Suppose we want to prevent network access from some process. We can add a filter at an appropriate layer to make it happen. Adding a filter is a matter of calling FwpmFilterAdd:

DWORD FwpmFilterAdd0(
   _In_ HANDLE engineHandle,
   _In_ const FWPM_FILTER0* filter,
   _In_opt_ PSECURITY_DESCRIPTOR sd,
   _Out_opt_ UINT64* id);

The main work is to fill a FWPM_FILTER structure defined like so:

typedef struct FWPM_FILTER0_ {
    GUID filterKey;
    FWPM_DISPLAY_DATA0 displayData;
    UINT32 flags;
    /* [unique] */ GUID *providerKey;
    FWP_BYTE_BLOB providerData;
    GUID layerKey;
    GUID subLayerKey;
    FWP_VALUE0 weight;
    UINT32 numFilterConditions;
    /* [unique][size_is] */ FWPM_FILTER_CONDITION0 *filterCondition;
    FWPM_ACTION0 action;
    /* [switch_is] */ /* [switch_type] */ union 
        {
        /* [case()] */ UINT64 rawContext;
        /* [case()] */ GUID providerContextKey;
        }     ;
    /* [unique] */ GUID *reserved;
    UINT64 filterId;
    FWP_VALUE0 effectiveWeight;
} FWPM_FILTER0;

The weird-looking comments are generated by the Microsoft Interface Definition Language (MIDL) compiler when generating the header file from an IDL file. Although IDL is most commonly used by Component Object Model (COM) to define interfaces and types, WFP uses IDL to define its APIs, even though no COM interfaces are used; just plain C functions. The original IDL files are provided with the SDK, and they are worth checking out, since they may contain developer comments that are not “transferred” to the resulting header files.

Some members in FWPM_FILTER are necessary – layerKey to indicate the layer to attach this filter, any conditions needed to trigger the filter (numFilterConditions and the filterCondition array), and the action to take if the filter is triggered (action field).

Let’s create some code that prevents the Windows Calculator from accessing the network. You may be wondering why would calculator require network access? No, it’s not contacting Google to ask for the result of 2+2. It’s using the Internet for accessing current exchange rates.

Clicking the Update Rates button causes Calculator to consult the Internet for the updated exchange rate. We’ll add a filter that prevents this.

We’ll start as usual by opening handle to the WFP engine as was done in the previous example. Next, we need to fill the FWPM_FILTER structure. First, a nice display name:

FWPM_FILTER filter{};   // zero out the structure
WCHAR filterName[] = L"Prevent Calculator from accessing the web";
filter.displayData.name = filterName;

The name has no functional part – it just allows easy identification when enumerating filters. Now we need to select the layer. We’ll also specify the action:

filter.layerKey = FWPM_LAYER_ALE_AUTH_CONNECT_V4;
filter.action.type = FWP_ACTION_BLOCK;

There are several layers that could be used for blocking access, with the above layer being good enough to get the job done. Full description of the provided layers, their purpose and when they are used is provided as part of the WFP documentation.

The last part to initialize is the conditions to use. Without conditions, the filter is always going to be invoked, which will block all network access (or just for some processes, based on its effective weight). In our case, we only care about the application – we don’t care about ports or protocols. The layer we selected has several fields, one of with is called ALE App ID (ALE stands for Application Layer Enforcement).

This field can be used to identify an executable. To get that ID, we can use FwpmGetAppIdFromFileName. Here is the code for Calculator’s executable:

WCHAR filename[] = LR"(C:\Program Files\WindowsApps\Microsoft.WindowsCalculator_11.2210.0.0_x64__8wekyb3d8bbwe\CalculatorApp.exe)";
FWP_BYTE_BLOB* appId;
FwpmGetAppIdFromFileName(filename, &appId);

The code uses the path to the Calculator executable on my system – you should change that as needed because Calculator’s version might be different. A quick way to get the executable path is to run Calculator, open Process Explorer, open the resulting process properties, and copy the path from the Image tab.

The R"( and closing parenthesis in the above snippet disable the “escaping” property of backslashes, making it easier to write file paths (C++ 14 feature).

The return value from FwpmGetAppIdFromFileName is a BLOB that needs to be freed eventually with FwpmFreeMemory.

Now we’re ready to specify the one and only condition:

FWPM_FILTER_CONDITION cond;
cond.fieldKey = FWPM_CONDITION_ALE_APP_ID;      // field
cond.matchType = FWP_MATCH_EQUAL;
cond.conditionValue.type = FWP_BYTE_BLOB_TYPE;
cond.conditionValue.byteBlob = appId;

filter.filterCondition = &cond;
filter.numFilterConditions = 1;

The conditionValue member of FWPM_FILTER_CONDITION is a FWP_VALUE, which is a generic way to specify many types of values. It has a type member that indicates the member in a big union that should be used. In our case, the type is a BLOB (FWP_BYTE_BLOB_TYPE) and the actual value should be passed in the byteBlob union member.

The last step is to add the filter, and repeat the exercise for IPv6, as we don’t know how Calculator connects to the currency exchange server (we can find out, but it would be simpler and more robust to just block IPv6 as well):

FwpmFilterAdd(hEngine, &filter, nullptr, nullptr);

filter.layerKey = FWPM_LAYER_ALE_AUTH_CONNECT_V6;   // IPv6
FwpmFilterAdd(hEngine, &filter, nullptr, nullptr);

We didn’t specify any GUID for the filter. This causes WFP to generate a GUID. We didn’t specify weight, either. WFP will generate them.

All that’s left now is some cleanup:

FwpmFreeMemory((void**)&appId);
FwpmEngineClose(hEngine);

Running this code (elevated) should and trying to refresh the currency exchange rate with Calculator should fail. Note that there is no need to restart Calculator – the effect is immediate.

We can locate the filters added with WFP Explorer:

Double-clicking one of the filters and selecting the Conditions tab shows the only condition where the App ID is revealed to be the full path of the executable in device form. Of course, you should not take any dependency on this format, as it may change in the future.

You can right-click the filters and delete them using WFP Explorer. The FwpmFilterDeleteByKey API is used behind the scenes. This will restore Calculator’s exchange rate update functionality.

Unnamed Directory Objects

A lot of the functionality in Windows is based around various kernel objects. One such object is a Directory, not to be confused with a directory in a file system. A Directory object is conceptually simple: it’s a container for other kernel objects, including other Directory objects, thus creating a hierarchy used by the kernel’s Object Manager to manage named objects. This arrangement can be easily seen with tools like WinObj from Sysinternals:

The left part of WinObj shows object manager directories, where named objects are “stored” and can be located by name. Clear and simple enough.

However, Directory objects can be unnamed as well as named. How can this be? Here is my Object Explorer tool (similar functionality is available with my System Explorer tool as well). One of its views is a “statistical” view of all object types, some of their properties, such as their name, type index, number of objects and handles, peak number of objects and handles, generic access mapping, and the pool type they’re allocated from.

If you right-click the Directory object type and select “All Objects”, you’ll see another view that shows all Directory objects in the system (well, not necessarily all, but most*).

If you scroll a bit, you’ll see many unnamed Directory objects that have no name:

It seems weird, as a Directory with no name doesn’t make sense. These directories, however, are “real” and serve an important purpose – managing a private object namespace. I blogged about private object namespaces quite a few years ago (it was in my old blog site that is now unfortunately lost), but here is the gist of it:

Object names are useful because they allow easy sharing between processes. For example, if two or more processes would like to share memory, they can create a memory mapped file object (called Section within the kernel) with a name they are all aware of. Calling CreateFileMapping (or one of its variants) with the same name will create the object (by the first caller), where subsequent callers get handles to the existing object because it was looked up by name.

This is easy and useful, but there is a possible catch: since the name is “visible” using tools or APIs, other processes can “interfere” with the object by getting their own handle using that visible name and “meddle” with the object, maliciously or accidentally.

The solution to this problem arrived in Windows Vista with the idea of private object namespaces. A set of cooperating processes can create a private namespace only they can use, protected by a “secret” name and more importantly a boundary descriptor. The details are beyond the scope of this post, but it’s all documented in the Windows API functions such as CreateBoundaryDescriptor, CreatePrivateNamespace and friends. Here is an example of using these APIs to create a private namespace with a section object in it (error handling omitted):

HANDLE hBD = ::CreateBoundaryDescriptor(L"MyDescriptor", 0);
BYTE sid[SECURITY_MAX_SID_SIZE];
auto psid = reinterpret_cast<PSID>(sid);
DWORD sidLen;
::CreateWellKnownSid(WinBuiltinUsersSid, nullptr, psid, &sidLen);
::AddSIDToBoundaryDescriptor(&m_hBD, psid);

// create the private namespace
hNamespace = ::CreatePrivateNamespace(nullptr, hBD, L"MyNamespace");
if (!hNamespace) { // maybe created already?
	hNamespace = ::OpenPrivateNamespace(hBD, L"MyNamespace");
namespace");
}

HANDLE hSharedMem = ::CreateFileMapping(INVALID_HANDLE_VALUE, nullptr, PAGE_READWRITE, 0, 1 << 12, L"MyNamespace\\MySharedMem"));

This snippet is taken from the PrivateSharing code example from the Windows 10 System Programming part 1 book.

If you run this demo application, and look at the resulting handle (hSharedMem) in the above code in a tool like Process Explorer or Object Explorer you’ll see the name of the object is not given:

The full name is not shown and cannot be retrieved from user mode. And even if it could somehow be located, the boundary descriptor provides further protection. Let’s examine this object in the kernel debugger. Copying its address from the object’s properties:

Pasting the address into a local kernel debugger – first using the generic !object command:

lkd> !object 0xFFFFB3068E162D10
Object: ffffb3068e162d10  Type: (ffff9507ed78c220) Section
    ObjectHeader: ffffb3068e162ce0 (new version)
    HandleCount: 1  PointerCount: 32769
    Directory Object: ffffb3069e8cbe00  Name: MySharedMem

The name is there, but the directory object is there as well. Let’s examine it:

lkd> !object ffffb3069e8cbe00
Object: ffffb3069e8cbe00  Type: (ffff9507ed6d0d20) Directory
    ObjectHeader: ffffb3069e8cbdd0 (new version)
    HandleCount: 3  PointerCount: 98300

    Hash Address          Type                      Name
    ---- -------          ----                      ----
     19  ffffb3068e162d10 Section                   MySharedMem

There is one object in this directory. What’s the directory’s name? We need to examine the object header for that – its address is given in the above output:

lkd> dt nt!_OBJECT_HEADER ffffb3069e8cbdd0
   +0x000 PointerCount     : 0n32769
   +0x008 HandleCount      : 0n1
   +0x008 NextToFree       : 0x00000000`00000001 Void
   +0x010 Lock             : _EX_PUSH_LOCK
   +0x018 TypeIndex        : 0x53 'S'
   +0x019 TraceFlags       : 0 ''
   +0x019 DbgRefTrace      : 0y0
   +0x019 DbgTracePermanent : 0y0
   +0x01a InfoMask         : 0x8 ''
   +0x01b Flags            : 0 ''
   +0x01b NewObject        : 0y0
   +0x01b KernelObject     : 0y0
   +0x01b KernelOnlyAccess : 0y0
   +0x01b ExclusiveObject  : 0y0
   +0x01b PermanentObject  : 0y0
   +0x01b DefaultSecurityQuota : 0y0
   +0x01b SingleHandleEntry : 0y0
   +0x01b DeletedInline    : 0y0
   +0x01c Reserved         : 0x301
   +0x020 ObjectCreateInfo : 0xffff9508`18f2ba40 _OBJECT_CREATE_INFORMATION
   +0x020 QuotaBlockCharged : 0xffff9508`18f2ba40 Void
   +0x028 SecurityDescriptor : 0xffffb305`dd0d56ed Void
   +0x030 Body             : _QUAD

Getting a kernel’s object name is a little tricky, and will not be fully described here. The first requirement is the InfoMask member must have bit 1 set (value of 2), as this indicates a name is present. Since it’s not (the value is 8), there is no name to this directory. We can examine the directory object in more detail by looking at the real data structure underneath given the object’s original address:

kd> dt nt!_OBJECT_DIRECTORY ffffb3069e8cbe00
   +0x000 HashBuckets      : [37] (null) 
   +0x128 Lock             : _EX_PUSH_LOCK
   +0x130 DeviceMap        : (null) 
   +0x138 ShadowDirectory  : (null) 
   +0x140 NamespaceEntry   : 0xffffb306`9e8cbf58 Void
   +0x148 SessionObject    : (null) 
   +0x150 Flags            : 1
   +0x154 SessionId        : 0xffffffff

The interesting piece is the NamespaceEntry member, which is not-NULL. This indicates the purpose of this directory: to be a container for a private namespace’s objects. You can also click on HasBuckets and locate the single section object there.

Going back to Process Explorer, enabling unnamed object handles (View menu, Show Unnamed Handles and Mappings) and looking for unnamed directory objects:

The directory’s address is the same one we were looking at!

The pointer at NamespaceEntry points to an undocumented structure that is not currently provided with the symbols. But just looking a bit beyond the directory’s object structure shows a hint:

lkd> db ffffb3069e8cbe00+158
ffffb306`9e8cbf58  d8 f9 a3 55 06 b3 ff ff-70 46 12 66 07 f8 ff ff  ...U....pF.f....
ffffb306`9e8cbf68  00 be 8c 9e 06 b3 ff ff-48 00 00 00 00 00 00 00  ........H.......
ffffb306`9e8cbf78  00 00 00 00 00 00 00 00-0b 00 00 00 00 00 00 00  ................
ffffb306`9e8cbf88  01 00 00 00 02 00 00 00-48 00 00 00 00 00 00 00  ........H.......
ffffb306`9e8cbf98  01 00 00 00 20 00 00 00-4d 00 79 00 44 00 65 00  .... ...M.y.D.e.
ffffb306`9e8cbfa8  73 00 63 00 72 00 69 00-70 00 74 00 6f 00 72 00  s.c.r.i.p.t.o.r.
ffffb306`9e8cbfb8  02 00 00 00 18 00 00 00-01 02 00 00 00 00 00 05  ................
ffffb306`9e8cbfc8  20 00 00 00 21 02 00 00-00 00 00 00 00 00 00 00   ...!...........

The name “MyDescriptor” is clearly visible, which is the name of the boundary descriptor in the above code.

The kernel debugger’s documentation indicates that the !object command with a -p switch should show the private namespaces. However, this fails:

lkd> !object -p
00000000: Unable to get value of ObpPrivateNamespaceLookupTable

The debugger seems to fail locating a global kernel variable. This is probably a bug in the debugger command, because object namespaces scope has changed since the introduction of Server Silos in Windows 10 version 1607 (for example, Docker uses these when running Windows containers). Each silo has its own object manager namespace, so the old global variable does not exist anymore. I suspect Microsoft has not updated this command switch to support silos. Even with no server silos running, the host is considered to be in its own (global) silo, called host silo. You can see its details by utilizing the !silo debugger command:

kd> !silo -g host
Server silo globals fffff80766124540:
		Default Error Port: ffff950815bee140
		ServiceSessionId  : 0
		OB Root Directory : 
		State             : Running

Clicking the “Server silo globals” link, shows more details:

kd> dx -r1 (*((nt!_ESERVERSILO_GLOBALS *)0xfffff80766124540))
(*((nt!_ESERVERSILO_GLOBALS *)0xfffff80766124540))                 [Type: _ESERVERSILO_GLOBALS]
    [+0x000] ObSiloState      [Type: _OBP_SILODRIVERSTATE]
    [+0x2e0] SeSiloState      [Type: _SEP_SILOSTATE]
    [+0x310] SeRmSiloState    [Type: _SEP_RM_LSA_CONNECTION_STATE]
    [+0x360] EtwSiloState     : 0xffff9507edbc9000 [Type: _ETW_SILODRIVERSTATE *]
    [+0x368] MiSessionLeaderProcess : 0xffff95080bbdb040 [Type: _EPROCESS *]
    [+0x370] ExpDefaultErrorPortProcess : 0xffff950815bee140 [Type: _EPROCESS *]
<truncated>

ObSiloState is the root object related to the object manager. Clicking this one shows:

lkd> dx -r1 (*((ntkrnlmp!_OBP_SILODRIVERSTATE *)0xfffff80766124540))
(*((ntkrnlmp!_OBP_SILODRIVERSTATE *)0xfffff80766124540))                 [Type: _OBP_SILODRIVERSTATE]
    [+0x000] SystemDeviceMap  : 0xffffb305c8c48720 [Type: _DEVICE_MAP *]
    [+0x008] SystemDosDeviceState [Type: _OBP_SYSTEM_DOS_DEVICE_STATE]
    [+0x078] DeviceMapLock    [Type: _EX_PUSH_LOCK]
    [+0x080] PrivateNamespaceLookupTable [Type: _OBJECT_NAMESPACE_LOOKUPTABLE]

PrivateNamespaceLookupTable is the root object for the private namespaces for this Silo (in this example it’s the host silo).

The interested reader is welcome to dig into this further.

The list of private namespaces is provided with the WinObjEx64 tool if you run it elevated and have local kernel debugging enabled, as it uses the kernel debugger’s driver to read kernel memory.

* Most objects, because the way Object Explorer works is by enumerating handles and associating them with objects. However, some objects are held using references from the kernel with zero handles. Such objects cannot be detected by Object Explorer.

Upcoming COM Programming Class

Today I’m happy to announce the next COM Programming class to be held in February 2023. The syllabus for the 3 day class can be found here. The course will be delivered in 6 half-days (4 hours each).

Dates: February (7, 8, 9, 14, 15, 16).
Times: 11am to 3pm EST (8am to 12pm PST) (4pm to 8pm UT)
Cost: 750 USD (if paid by an individual), 1400 USD (if paid by a company).

Half days should make it comfortable enough even if you’re not in an ideal time zone.

The class will be conducted remotely using Microsoft Teams.

What you need to know before the class: You should be comfortable using Windows on a Power User level. Concepts such as processes, threads, DLLs, and virtual memory should be understood fairly well. You should have experience writing code in C and some C++. You don’t have to be an expert, but you must know C and basic C++ to get the most out of this class. In case you have doubts, talk to me.

Participants in my Windows Internals and Windows System Programming classes have the required knowledge for the class.

We’ll start by looking at why COM was created in the first place, and then build clients and servers, digging into various mechanisms COM provides. See the syllabus for more details.

Previous students in my classes get 10% off. Multiple participants from the same company get a discount (email me for the details).

To register, send an email to [email protected] with the title “COM Programming Training”, and write the name(s), email(s) and time zone(s) of the participants.

Next Windows Internals Training

I’m happy to open registration for the next 5 day Windows Internals training to be conducted in November in the following dates and from 11am to 7pm, Eastern Standard Time (EST) (8am to 4pm PST): 21, 22, 28, 29, 30.

The syllabus can be found here (some modifications possible, but the general outline should remain).

Training cost is 900 USD if paid by an individual, or 1800 USD if paid by a company. Participants in any of my previous training classes get 10% off.

If you’d like to register, please send me an email to [email protected] with “Windows Internals training” in the title, provide your full name, company (if any), preferred contact email, and your time zone.

The sessions will be recorded, so you can watch any part you may be missing, or that may be somewhat overwhelming in “real time”.

As usual, if you have any questions, feel free to send me an email, or DM on twitter (@zodiacon) or Linkedin (https://www.linkedin.com/in/pavely/).

Introduction to Monikers

The foundations of the Component Object Model (COM) are made of two principles:

  1. Clients program against interfaces, never concrete classes.
  2. Location transparency – clients need not know where the actual object is (in-process, out-of-process, another machine).

Although simple in principle, there are many details involved in COM, as those with COM experience are well aware. In this post, I’d like to introduce one extensibility aspect of COM called Monikers.

The idea of a moniker is to provide some way to identify and locate specific objects based on string names instead of some custom mechanism. Windows provides some implementations of monikers, most of which are related to Object Linking and Embedding (OLE), most notably used in Microsoft Office applications. For example, when an Excel chart is embedded in a Word document as a link, an Item moniker is used to point to that specific chart using a string with a specific format understood by the moniker mechanism and the specific monikers involved. This also suggests that monikers can be combined, which is indeed the case. For example, a cell in some Excel document can be located by going to a specific sheet, then a specific range, then a specific cell – each one could be pointed to by a moniker, that when chained together can locate the required object.

Let’s start with perhaps the simplest example of an existing moniker implementation – the Class moniker. This moniker can be used to replace a creation operation. Here is an example that creates a COM object using the “standard” mechanism of calling CoCreateInstance:

#include <shlobjidl.h>
//...
CComPtr<IShellWindows> spShell;
auto hr = spShell.CoCreateInstance(__uuidof(ShellWindows));

I use the ATL smart pointers (#include <atlcomcli.h> or <atlbase.h>). The interface and class I’m using is just an example – any standard COM class would work. The CoCreateInstance method calls the real CoCreateInstance. To make it clearer, here is the CoCreateInstance call without using the helper provided by the smart pointer:

CComPtr<IShellWindows> spShell;
auto hr = ::CoCreateInstance(__uuidof(ShellWindows), nullptr, 
    CLSCTX_ALL, __uuidof(IShellWindows), 
    reinterpret_cast<void**>(&spShell));

CoCreateInstance itself is a glorified wrapper for calling CoGetClassObject to retrieve a class factory, requesting the standard IClassFactory interface, and then calling CreateInstance on it:

CComPtr<IClassFactory> spCF;
auto hr = ::CoGetClassObject(__uuidof(ShellWindows), 
    CLSCTX_ALL, nullptr, __uuidof(IClassFactory), 
    reinterpret_cast<void**>(&spCF));
if (SUCCEEDED(hr)) {
    CComPtr<IShellWindows> spShell;
    hr = spCF->CreateInstance(nullptr, __uuidof(IShellWindows),
        reinterpret_cast<void**>(&spShell));
    if (SUCCEEDED(hr)) {
        // use spShell
    }
}

Here is where the Class moniker comes in: It’s possible to get a class factory directly using a string like so:

CComPtr<IClassFactory> spCF;
BIND_OPTS opts{ sizeof(opts) };
auto hr = ::CoGetObject(
    L"clsid:9BA05972-F6A8-11CF-A442-00A0C90A8F39", 
    &opts, __uuidof(IClassFactory), 
    reinterpret_cast<void**>(&spCF));

Using CoGetObject is the most convenient way in C++ to locate an object based on a moniker. The moniker name is the string provided to CoGetObject. It starts with a ProgID of sorts followed by a colon. The rest of the string is to be interpreted by the moniker behind the scenes. With the class factory in hand, the code can use IClassFactory::CreateInstance just as with the previous example.

How does it work? As is usual with COM, the Registry is involved. If you open RegEdit or TotalRegistry and navigate to HKYE_CLASSES_ROOT, ProgIDs are all there. One of them is “clsid” – yes, it’s a bit weird perhaps, but the entry point to the moniker system is that ProgID. Each ProgID should have a CLSID subkey pointing to the class ID of the moniker. So here, the key is HKCR\CLSID\CLSID!

Class Moniker Registration

Of course, other monikers have different names (not CLSID). If we follow the CLSID on the right to the normal location for COM CLSID registration (HKCR\CLSID), this is what we find:

Class moniker

And the InProcServer32 subkey points to Combase.dll, the DLL implementing the COM infrastructure:

Class Moniker Implementation

At this point, we know how the class moniker got discovered, but it’s still not clear what is that moniker and where is it anyway?

As mentioned earlier, CoGetObject is the simplest way to get an object from a moniker, as it hides the details of the moniker itself. CoGetObject is a shortcut for calling MkParseDisplayName – the real entry point to the COM moniker namespace. Here is the full way to get a class moniker by going through the moniker:

CComPtr<IMoniker> spClsMoniker;
CComPtr<IBindCtx> spBindCtx;
::CreateBindCtx(0, &spBindCtx);
ULONG eaten;
CComPtr<IClassFactory> spCF;
auto hr = ::MkParseDisplayName(
    spBindCtx,
    L"clsid:9BA05972-F6A8-11CF-A442-00A0C90A8F39",
    &eaten, &spClsMoniker);
if (SUCCEEDED(hr)) {
    spClsMoniker->BindToObject(spBindCtx, nullptr,
        __uuidof(IClassFactory), reinterpret_cast<void**>(&spCF));

MkParseDisplayName takes a “display name” – a string, and attempts to locate the moniker based on the information in the Registry (it actually has some special code for certain OLE stuff which is not interesting in this context). The Bind Context is a helper object that can (in the general case) contain an arbitrary set of properties that can be used by the moniker to customize the way it interprets the display name. The class moniker does not use any property, but it’s still necessary to provide the object even if it has no interesting data in it. If successful, MkParseDisplayName returns the moniker interface pointer, implementing the IMoniker interface that all monikers must implement. IMoniker is somewhat a scary interface, having 20 methods (excluding IUnknown). Fortunately, not all have to be implemented. We’ll get to implementing our own moniker soon.

The primary method in IMoniker is BindToObject, which is tasked of interpreting the display name, if possible, and returning the real object that the client is trying to locate. The client provides the interface it expects the target object to implement – IClassFactory in the case of a class moniker.

You might be wondering what’s the point of the class moniker if you could simply create the required object directly with the normal class factory. One advantage of the moniker is that a string is involved, which allows “late binding” of sorts, and allows other languages, such as scripting languages, to create COM objects indirectly. For example, VBScript provides the GetObject function that calls CoGetObject.

Implementing a Moniker

Some details are still missing, such as how does the moniker object itself gets created? To show that, let’s implement our own moniker. We’ll call it the Process Moniker – its purpose is to locate a COM process object we’ll implement that allows working with a Windows Process object.

Here is an example of something a client would do to find a process object based on its PID, and then display its executable path:

BIND_OPTS opts{ sizeof(opts) };
CComPtr<IWinProcess> spProcess;
auto hr = ::CoGetObject(L"process:3284", 
    &opts, __uuidof(IWinProcess), 
    reinterpret_cast<void**>(&spProcess));
if (SUCCEEDED(hr)) {
    CComBSTR path;
    if (S_OK == spProcess->get_ImagePath(&path)) {
        printf("Image path: %ws\n", path.m_str);
    }
}

The IWinProcess is the interface our process object implements, but there is no need to know its CLSID (in fact, it has none, and is created privately by the moniker). The display name “prcess:3284” identifies the string “process” as the moniker name, meaning there must be a subkey under HKCR named “process” for this to have any chance of working. And under the “process” key there must be the CLSID of the moniker. Here is the final result:

process moniker

The CLSID of the process moniker must be registered normally like all COM classes. The text after the colon is passed to the moniker which should interpret it in a way that makes sense for that moniker (or fail trying). In our case, it’s supposed to be a PID of an existing process.

Let’s see the main steps needed to implement the process moniker. From a technical perspective, I created an ATL DLL project in Visual Studio (could be an EXE as well), and then added an “ATL Simple Object” class template to get the boilerplate code the ATL template provides. We just need to implement IMoniker – no need for some custom interface. Here is the layout of the class:

class ATL_NO_VTABLE CProcessMoniker :
	public CComObjectRootEx<CComMultiThreadModel>,
	public CComCoClass<CProcessMoniker, &CLSID_ProcessMoniker>,
	public IMoniker {
public:
	DECLARE_REGISTRY_RESOURCEID(106)
	DECLARE_CLASSFACTORY_EX(CMonikerClassFactory)

	BEGIN_COM_MAP(CProcessMoniker)
		COM_INTERFACE_ENTRY(IMoniker)
	END_COM_MAP()

	DECLARE_PROTECT_FINAL_CONSTRUCT()
	HRESULT FinalConstruct() {
		return S_OK;
	}
	void FinalRelease() {
	}

public:
	// Inherited via IMoniker
	HRESULT __stdcall GetClassID(CLSID* pClassID) override;
	HRESULT __stdcall IsDirty(void) override;
	HRESULT __stdcall Load(IStream* pStm) override;
	HRESULT __stdcall Save(IStream* pStm, BOOL fClearDirty) override;
	HRESULT __stdcall GetSizeMax(ULARGE_INTEGER* pcbSize) override;
	HRESULT __stdcall BindToObject(IBindCtx* pbc, IMoniker* pmkToLeft, REFIID riidResult, void** ppvResult) override;
    // other IMoniker methods...
	std::wstring m_DisplayName;
};

OBJECT_ENTRY_AUTO(__uuidof(ProcessMoniker), CProcessMoniker)

Those familiar with the typical code the ATL wizard generates might notice one important difference from the standard template: the class factory. It turns out that monikers are not created by an IClassFactory when called by a client invoking MkParseDisplayName (or its CoGetObject wrapper), but instead must implement the interface IParseDisplayName, which we’ll tackle in a moment. This is why DECLARE_CLASSFACTORY_EX(CMonikerClassFactory) is used to instruct ATL to use a custom class factory which we must implement.

MkParseDisplayName operation

Before we get to that, let’s implement the “main” method – BindToObject. We have to assume that the m_DisplayName member already has the process ID – it will be provided by our class factory that creates our moniker. First, we’ll convert the display name to a number:

HRESULT __stdcall CProcessMoniker::BindToObject(IBindCtx* pbc, IMoniker* pmkToLeft, REFIID riidResult, void** ppvResult) {
	auto pid = std::stoul(m_DisplayName);

Next, we’ll attempt to open a handle to the process:

auto hProcess = ::OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION, 
    FALSE, pid);
if (!hProcess)
    return HRESULT_FROM_WIN32(::GetLastError());

If we fail, we just return a failed HRESULT and we’re done. If successful, we can create the WinProcess object, pass the handle and return the interface requested by the client (if supported):

	CComObject<CWinProcess>* pProcess;
	auto hr = pProcess->CreateInstance(&pProcess);
	pProcess->SetHandle(hProcess);
	pProcess->AddRef();
	
	hr = pProcess->QueryInterface(riidResult, ppvResult);
	pProcess->Release();
	return hr;
}

The creation of the object is internal via CComObject<>. The WinProcess COM class is not registered, which is just a matter of choice. I decided, a WinProcess object can only be obtained through the Process Moniker.

The calls to AddRef/Release may be puzzling, but there is a good reason for using them. When creating a CComObject<> object, the reference count of the object is zero. Then, the call to AddRef increments it to 1. Next, if the QueryInterface call succeeds, the ref count is incremented to 2. Then, the Release call decrements it to 1, as that is the correct count when the object is returned to the client. If, however, the call to QI fails, the ref count remains at 1, and the Release call will destroy the object! More elegant than calling delete.

SetHandle is a function in CWinProcess (outside the IWinProcess interface) that passes the handle to the object.

The WinProcess COM class is the uninteresting part in all of these, so I created a bare minimum class like so:

class ATL_NO_VTABLE CWinProcess :
	public CComObjectRootEx<CComMultiThreadModel>,
	public IDispatchImpl<IWinProcess> {
public:
	DECLARE_NO_REGISTRY()

	BEGIN_COM_MAP(CWinProcess)
		COM_INTERFACE_ENTRY(IWinProcess)
		COM_INTERFACE_ENTRY(IDispatch)
		COM_INTERFACE_ENTRY_AGGREGATE(IID_IMarshal, m_pUnkMarshaler.p)
	END_COM_MAP()

	DECLARE_PROTECT_FINAL_CONSTRUCT()
	DECLARE_GET_CONTROLLING_UNKNOWN()

	HRESULT FinalConstruct() {
		return CoCreateFreeThreadedMarshaler(
			GetControllingUnknown(), &m_pUnkMarshaler.p);
	}

	void FinalRelease() {
		m_pUnkMarshaler.Release();
		if (m_hProcess)
			::CloseHandle(m_hProcess);
	}

	void SetHandle(HANDLE hProcess);

private:
	HANDLE m_hProcess{ nullptr };
	CComPtr<IUnknown> m_pUnkMarshaler;

	// Inherited via IWinProcess
	HRESULT get_Id(DWORD* pId);
	HRESULT get_ImagePath(BSTR* path);
	HRESULT Terminate(DWORD exitCode);
};

The two properties and one method look like this:

void CWinProcess::SetHandle(HANDLE hProcess) {
	m_hProcess = hProcess;
}

HRESULT CWinProcess::get_Id(DWORD* pId) {
	ATLASSERT(m_hProcess);
	return *pId = ::GetProcessId(m_hProcess), S_OK;
}

HRESULT CWinProcess::get_ImagePath(BSTR* pPath) {
	WCHAR path[MAX_PATH];
	DWORD size = _countof(path);
	if (::QueryFullProcessImageName(m_hProcess, 0, path, &size))
		return CComBSTR(path).CopyTo(pPath);

	return HRESULT_FROM_WIN32(::GetLastError());
}

HRESULT CWinProcess::Terminate(DWORD exitCode) {
	HANDLE hKill;
	if (::DuplicateHandle(::GetCurrentProcess(), m_hProcess, 
		::GetCurrentProcess(), &hKill, PROCESS_TERMINATE, FALSE, 0)) {
		auto success = ::TerminateProcess(hKill, exitCode);
		auto error = ::GetLastError();
		::CloseHandle(hKill);
		return success ? S_OK : HRESULT_FROM_WIN32(error);
	}
	return HRESULT_FROM_WIN32(::GetLastError());
}

The APIs used above are fairly straightforward and of course fully documented.

The last piece of the puzzle is the moniker’s class factory:

class ATL_NO_VTABLE CMonikerClassFactory : 
	public ATL::CComObjectRootEx<ATL::CComMultiThreadModel>,
	public IParseDisplayName {
public:
	BEGIN_COM_MAP(CMonikerClassFactory)
		COM_INTERFACE_ENTRY(IParseDisplayName)
	END_COM_MAP()

	// Inherited via IParseDisplayName
	HRESULT __stdcall ParseDisplayName(IBindCtx* pbc, LPOLESTR pszDisplayName, ULONG* pchEaten, IMoniker** ppmkOut) override;
};

Just one method to implement:

HRESULT __stdcall CMonikerClassFactory::ParseDisplayName(
    IBindCtx* pbc, LPOLESTR pszDisplayName, 
    ULONG* pchEaten, IMoniker** ppmkOut) {
    auto colon = wcschr(pszDisplayName, L':');
    ATLASSERT(colon);
    if (colon == nullptr)
        return E_INVALIDARG;

    //
    // simplistic, assume all display name consumed
    //
    *pchEaten = (ULONG)wcslen(pszDisplayName);

    CComObject<CProcessMoniker>* pMon;
    auto hr = pMon->CreateInstance(&pMon);
    if (FAILED(hr))
        return hr;

    //
    // provide the process ID
    //
    pMon->m_DisplayName = colon + 1;
    pMon->AddRef();
    hr = pMon->QueryInterface(ppmkOut);
    pMon->Release();
    return hr;
}

First, the colon is searched for, as the display name looks like “process:xxxx”. The “xxxx” part is stored in the resulting moniker, created with CComObject<>, similarly to the CWinProcess earlier. The pchEaten value reports back how many characters were consumed – the moniker factory should parse as much as it understands, because moniker composition may be in play. Hopefully, I’ll discuss that in a future post.

Finally, registration must be added for the moniker. Here is ProcessMoniker.rgs, where the lower part was added to connect the “process” ProgId/moniker name to the CLSID of the process moniker:

HKCR
{
	NoRemove CLSID
	{
		ForceRemove {6ea3a80e-2936-43be-8725-2e95896da9a4} = s 'ProcessMoniker class'
		{
			InprocServer32 = s '%MODULE%'
			{
				val ThreadingModel = s 'Both'
			}
			TypeLib = s '{97a86fc5-ffef-4e80-88a0-fa3d1b438075}'
			Version = s '1.0'
		}
	}
	process = s 'Process Moniker Class'
	{
		CLSID = s '{6ea3a80e-2936-43be-8725-2e95896da9a4}'
	}
}

And that is it. Here is an example client that terminates a process given its ID:

void Kill(DWORD pid) {
	std::wstring displayName(L"process:");
	displayName += std::to_wstring(pid);
	BIND_OPTS opts{ sizeof(opts) };
	CComPtr<IWinProcess> spProcess;
	auto hr = ::CoGetObject(displayName.c_str(), &opts, 
		__uuidof(IWinProcess), reinterpret_cast<void**>(&spProcess));
	if (SUCCEEDED(hr)) {
		auto hr = spProcess->Terminate(1);
		if (SUCCEEDED(hr))
			printf("Process %u terminated.\n", pid);
		else
			printf("Error terminating process: hr=0x%X\n", hr);
	}
}

All the code can be found in this Github repo: zodiacon/MonikerFun: Demonstrating a simple moniker. (github.com)

Here is VBScript example (this works because WinProcess implements IDispatch):

set process = GetObject("process:25520")
MsgBox process.ImagePath

How about .NET or PowerShell? Here is Powershell:

PS> $p = [System.Runtime.InteropServices.Marshal]::BindToMoniker("process:25520")
PS> $p | Get-Member                                                                                             

   TypeName: System.__ComObject#{3ab0471f-2635-429d-95e9-f2baede2859e}

Name      MemberType Definition
----      ---------- ----------
Terminate Method     void Terminate (uint)
Id        Property   uint Id () {get}
ImagePath Property   string ImagePath () {get}


PS> $p.ImagePath
C:\Windows\System32\notepad.exe

The DisplayWindows function just displays names of Explorer windows obtained by using IShellWindows:

void DisplayWindows(IShellWindows* pShell) {
	long count = 0;
	pShell->get_Count(&count);
	for (long i = 0; i < count; i++) {
		CComPtr<IDispatch> spDisp;
		pShell->Item(CComVariant(i), &spDisp);
		CComQIPtr<IWebBrowserApp> spWin(spDisp);
		if (spWin) {
			CComBSTR name;
			spWin->get_LocationName(&name);
			printf("Name: %ws\n", name.m_str);
		}
	}
}

Happy Moniker day!

Next Windows Kernel Programming Class

I’m happy to announce the next 5-day virtual Windows Kernel Programming class to be held in October. The syllabus for the class can be found here. A notable addition to the class is an introduction to the Kernel Mode Driver Framework (KMDF).

Dates and Times (all in October 2022), times based on London:
11 (full day): 4pm to 12am
12 (full day): 4pm to 12am
13 (half day): 4pm to 8pm
17 (half day): 4pm to 8pm
18 (full day): 4pm to 12am
19 (half day): 4pm to 8pm
20 (half day): 4pm to 8pm

The class will be recorded and provided to the participants.

Cost:
900 USD if paid by an individual
1700 USD if paid by a company
Previous participants of my classes get 10% off. Multiple participants from the same company get a discount as well (talk to me).

Registration
To register, send email to [email protected] and provide the name(s) and email(s) of the participant(s), the company name (if any), and your time zone (for my information, although I cannot change course times).

Feel free to contact me for any questions or comments via email, twitter (@zodiacon) or Linkedin.

Zombie Processes

The term “Zombie Process” in Windows is not an official one, as far as I know. Regardless, I’ll define zombie process to be a process that has exited (for whatever reason), but at least one reference remains to the kernel process object (EPROCESS), so that the process object cannot be destroyed.

How can we recognize zombie processes? Is this even important? Let’s find out.

All kernel objects are reference counted. The reference count includes the handle count (the number of open handles to the object), and a “pointer count”, the number of kernel clients to the object that have incremented its reference count explicitly so the object is not destroyed prematurely if all handles to it are closed.

Process objects are managed within the kernel by the EPROCESS (undocumented) structure, that contains or points to everything about the process – its handle table, image name, access token, job (if any), threads, address space, etc. When a process is done executing, some aspects of the process get destroyed immediately. For example, all handles in its handle table are closed; its address space is destroyed. General properties of the process remain, however, some of which only have true meaning once a process dies, such as its exit code.

Process enumeration tools such as Task Manager or Process Explorer don’t show zombie processes, simply because the process enumeration APIs (EnumProcesses, Process32First/Process32Next, the native NtQuerySystemInformation, and WTSEnumerateProcesses) don’t return these – they only return processes that can still run code. The kernel debugger, on the other hand, shows all processes, zombie or not when you type something like !process 0 0. Identifying zombie processes is easy – their handle table and handle count is shown as zero. Here is one example:

kd> !process ffffc986a505a080 0
PROCESS ffffc986a505a080
    SessionId: 1  Cid: 1010    Peb: 37648ff000  ParentCid: 0588
    DirBase: 16484cd000  ObjectTable: 00000000  HandleCount:   0.
    Image: smartscreen.exe

Any kernel object referenced by the process object remains alive as well – such as a job (if the process is part of a job), and the process primary token (access token object). We can get more details about the process by passing the detail level “1” in the !process command:

lkd> !process ffffc986a505a080 1
PROCESS ffffc986a505a080
    SessionId: 1  Cid: 1010    Peb: 37648ff000  ParentCid: 0588
    DirBase: 16495cd000  ObjectTable: 00000000  HandleCount:   0.
    Image: smartscreen.exe
    VadRoot 0000000000000000 Vads 0 Clone 0 Private 16. Modified 7. Locked 0.
    DeviceMap ffffa2013f24aea0
    Token                             ffffa20147ded060
    ElapsedTime                       1 Day 15:11:50.174
    UserTime                          00:00:00.000
    KernelTime                        00:00:00.015
    QuotaPoolUsage[PagedPool]         0
    QuotaPoolUsage[NonPagedPool]      0
    Working Set Sizes (now,min,max)  (17, 50, 345) (68KB, 200KB, 1380KB)
    PeakWorkingSetSize                2325
    VirtualSize                       0 Mb
    PeakVirtualSize                   2101341 Mb
    PageFaultCount                    2500
    MemoryPriority                    BACKGROUND
    BasePriority                      8
    CommitCharge                      20
    Job                               ffffc98672eea060

Notice the address space does not exist anymore (VadRoot is zero). The VAD (Virtual Address Descriptors) is a data structure managed as a balanced binary search tree that describes the address space of a process – which parts are committed, which parts are reserved, etc. No address space exists anymore. Other details of the process are still there as they are direct members of the EPROCESS structure, such as the kernel and user time the process has used, its start and exit times (not shown in the debugger’s output above).

We can ask the debugger to show the reference count of any kernel object by using the generic !object command, to be followed by !trueref if there are handles open to the object:

lkd> !object ffffc986a505a080
Object: ffffc986a505a080  Type: (ffffc986478ce380) Process
    ObjectHeader: ffffc986a505a050 (new version)
    HandleCount: 1  PointerCount: 32768
lkd> !trueref ffffc986a505a080
ffffc986a505a080: HandleCount: 1 PointerCount: 32768 RealPointerCount: 1

Clearly, there is a single handle open to the process and that’s the only thing keeping it alive.

One other thing that remains is the unique process ID (shown as Cid in the above output). Process and thread IDs are generated by using a private handle table just for this purpose. This explains why process and thread IDs are always multiples of four, just like handles. In fact, the kernel treats PIDs and TIDs with the HANDLE type, rather with something like ULONG. Since there is a limit to the number of handles in a process (16711680, the reason is not described here), that’s also the limit for the number of process and threads that could exist on a system. This is a rather large number, so probably not an issue from a practical perspective, but zombie processes still keep their PIDs “taken”, so it cannot be reused. This means that in theory, some code can create millions of processes, terminate them all, but not close the handles it receives back, and eventually new processes could not be created anymore because PIDs (and TIDs) run out. I don’t know what would happen then 🙂

Here is a simple loop to do something like that by creating and destroying Notepad processes but keeping handles open:

WCHAR name[] = L"notepad";
STARTUPINFO si{ sizeof(si) };
PROCESS_INFORMATION pi;
int i = 0;
for (; i < 1000000; i++) {	// use 1 million as an example
	auto created = ::CreateProcess(nullptr, name, nullptr, nullptr,
        FALSE, 0, nullptr, nullptr, &si, &pi);
	if (!created)
		break;
	::TerminateProcess(pi.hProcess, 100);
	printf("Index: %6d PID: %u\n", i + 1, pi.dwProcessId);
	::CloseHandle(pi.hThread);
}
printf("Total: %d\n", i);

The code closes the handle to the first thread in the process, as keeping it alive would create “Zombie Threads”, much like zombie processes – threads that can no longer run any code, but still exist because at least one handle is keeping them alive.

How can we get a list of zombie processes on a system given that the “normal” tools for process enumeration don’t show them? One way of doing this is to enumerate all the process handles in the system, and check if the process pointed by that handle is truly alive by calling WaitForSingleObject on the handle (of course the handle must first be duplicated into our process so it’s valid to use) with a timeout of zero – we don’t want to wait really. If the result is WAIT_OBJECT_0, this means the process object is signaled, meaning it exited – it’s no longer capable of running any code. I have incorporated that into my Object Explorer (ObjExp.exe) tool. Here is the basic code to get details for zombie processes (the code for enumerating handles is not shown but is available in the source code):

m_Items.clear();
m_Items.reserve(128);
std::unordered_map<DWORD, size_t> processes;
for (auto const& h : ObjectManager::EnumHandles2(L"Process")) {
	auto hDup = ObjectManager::DupHandle(
        (HANDLE)(ULONG_PTR)h->HandleValue , h->ProcessId, 
        SYNCHRONIZE | PROCESS_QUERY_LIMITED_INFORMATION);
	if (hDup && WAIT_OBJECT_0 == ::WaitForSingleObject(hDup, 0)) {
		//
		// zombie process
		//
		auto pid = ::GetProcessId(hDup);
		if (pid) {
			auto it = processes.find(pid);
			ZombieProcess zp;
			auto& z = it == processes.end() ? zp : m_Items[it->second];
			z.Pid = pid;
			z.Handles.push_back({ h->HandleValue, h->ProcessId });
			WCHAR name[MAX_PATH];
			if (::GetProcessImageFileName(hDup, 
                name, _countof(name))) {
				z.FullPath = 
                    ProcessHelper::GetDosNameFromNtName(name);
				z.Name = wcsrchr(name, L'\\') + 1;
			}
			::GetProcessTimes(hDup, 
                (PFILETIME)&z.CreateTime, (PFILETIME)&z.ExitTime, 
                (PFILETIME)&z.KernelTime, (PFILETIME)&z.UserTime);
			::GetExitCodeProcess(hDup, &z.ExitCode);
			if (it == processes.end()) {
				m_Items.push_back(std::move(z));
				processes.insert({ pid, m_Items.size() - 1 });
			}
		}
	}
	if (hDup)
		::CloseHandle(hDup);
}

The data structure built for each process and stored in the m_Items vector is the following:

struct HandleEntry {
	ULONG Handle;
	DWORD Pid;
};
struct ZombieProcess {
	DWORD Pid;
	DWORD ExitCode{ 0 };
	std::wstring Name, FullPath;
	std::vector<HandleEntry> Handles;
	DWORD64 CreateTime, ExitTime, KernelTime, UserTime;
};

The ObjectManager::DupHandle function is not shown, but it basically calls DuplicateHandle for the process handle identified in some process. if that works, and the returned PID is non-zero, we can go do the work. Getting the process image name is done with GetProcessImageFileName – seems simple enough, but this function gets the NT name format of the executable (something like \Device\harddiskVolume3\Windows\System32\Notepad.exe), which is good enough if only the “short” final image name component is desired. if the full image path is needed in Win32 format (e.g. “c:\Windows\System32\notepad.exe”), it must be converted (ProcessHelper::GetDosNameFromNtName). You might be thinking that it would be far simpler to call QueryFullProcessImageName and get the Win32 name directly – but this does not work, and the function fails. Internally, the NtQueryInformationProcess native API is called with ProcessImageFileNameWin32 in the latter case, which fails if the process is a zombie one.

Running Object Explorer and selecting Zombie Processes from the System menu shows a list of all zombie processes (you should run it elevated for best results):

Object Explorer showing zombie processes

The above screenshot shows that many of the zombie processes are kept alive by GameManagerService.exe. This executable is from Razer running on my system. It definitely has a bug that keeps process handle alive way longer than needed. I’m not sure it would ever close these handles. Terminating this process will resolve the issue as the kernel closes all handles in a process handle table once the process terminates. This will allow all those processes that are held by that single handle to be freed from memory.

I plan to add Zombie Threads to Object Explorer – I wonder how many threads are being kept “alive” without good reason.

image

zodiacon

Mysteries of the Registry

The Windows Registry is one of the most recognized aspects of Windows. It’s a hierarchical database, storing information on a machine-wide basis and on a per-user basis… mostly. In this post, I’d like to examine the major parts of the Registry, including the “real” Registry.

Looking at the Registry is typically done by launching the built-in RegEdit.exe tool, which shows the five “hives” that seem to comprise the Registry:

RegEdit showing the main hives

These so-called “hives” provide some abstracted view of the information in the Registry. I’m saying “abstracted”, because not all of these are true hives. A true hive is stored in a file. The full hive list can be found in the Registry itself – at HKLM\SYSTEM\CurrentControlSet\Control\hivelist (I’ll abbreviate HKEY_LOCAL_MACHINE as HKLM), mapping an internal key name to the file where it’s stored (more on these “internal” key names will be discussed soon):

The hive list

Let’s examine the so-called “hives” as seen in the root RegEdit’s view.

  • HKEY_LOCAL_MACHINE is the simplest to understand. It contains machine-wide information, most of it stored in files (persistent). Some details related to hardware is built when the system initializes and is only kept in memory while the system is running. Such keys are volatile, since their contents disappear when the system is shut down.
    There are many interesting keys within HKLM, but my goal is not to go over every key (that would take a full book), but highlight a few useful pieces. HKLM\System\CurrentControlSet\Services is the key where all services and device drivers are installed. Note that “CurrentControlSet” is not a true key, but in fact is a link key, connecting it to something like HKLM\System\ControlSet001. The reason for this indirection is beyond the scope of this post. Regedit does not show this fact directly – there is no way to tell whether a key is a true key or just points to a different key. This is one reason I created Total Registry (formerly called Registry Explorer), that shows these kind of nuances:
TotalRegistry showing HKLM\System\CurrentControlSet

The liked key seems to have a weird name starting with \REGISTRY\MACHINE\. We’ll get to that shortly.

Other subkeys of note under HKLM include SOFTWARE, where installed applications store their system-level information; SAM and SECURITY, where local security policy and local accounts information are managed. These two subkeys contents is not not visible – even administrators don’t get access – only the SYSTEM account is granted access. One way to see what’s in these keys is to use psexec from Sysinternals to launch RegEdit or TotalRegistry under the SYSTEM account. Here is a command you can run in an elevated command window that will launch RegEdit under the SYSTEM account (if you’re using RegEdit, close it first):

psexec -s -i -d RegEdit

The -s switch indicates the SYSTEM account. -i is critical as to run the process in the interactive session (the default would run it in session 0, where no interactive user will ever see it). The -d switch is optional, and simply returns control to the console while the process is running, rather than waiting for the process to terminate.

The other way to gain access to the SAM and SECURITY subkeys is to use the “Take Ownership” privilege (easy to do when the Permissions dialog is open), and transfer the ownership to an admin user – the owner can specify who can do what with an object, and allow itself full access. Obviously, this is not a good idea in general, as it weakens security.

The BCD00000000 subkey contains the Boot Configuration Data (BCD), normally accessed using the bcdedit.exe tool.

  • HKEY_USERS – this is the other hive that truly stores data. Its subkeys contain user profiles for all users that ever logged in locally to this machine. Each subkey’s name is a Security ID (SID), in its string representation:
HKEY_USERS

There are 3 well-known SIDs, representing the SYSTEM (S-1-5-18), LocalService (S-1-5-19), and NetworkService (S-1-5-20) accounts. These are the typical accounts used for running Windows Services. “Normal” users get ugly SIDs, such as the one shown – that’s my user’s local SID. You may be wondering what is that “_Classes” suffix in the second key. We’ll get to that as well.

  • HKEY_CURRENT_USER is a link key, pointing to the user’s subkey under HKEY_USERS running the current process. Obviously, the meaning of “current user” changes based on the process access token looking at the Registry.
  • HKEY_CLASSES_ROOT is the most curious of the keys. It’s not a “real” key in the sense that it’s not a hive – not stored in a file. It’s not a link key, either. This key is a “combination” of two keys: HKLM\Software\Classes and HKCU\Software\Classes. In other words, the information in HKEY_CLASSES_ROOT is coming from the machine hive first, but can be overridden by the current user’s hive.
    What information is there anyway? The first thing is shell-related information, such as file extensions and associations, and all other information normally used by Explorer.exe. The second thing is information related to the Component Object Model (COM). For example, the CLSID subkey holds COM class registration (GUIDs you can pass to CoCreateInstance to (potentially) create a COM object of that class). Looking at the CLSID subkey under HKLM\Software\Classes shows there are 8160 subkeys, or roughly 8160 COM classes registered on my system from HKLM:
HKLM\Software\Classes

Looking at the same key under HKEY_CURRENT_USER tells a different story:

HKCU\Software\Classes

Only 46 COM classes provide extra or overridden registrations. HKEY_CLASSES_ROOT combines both, and uses HKCU in case of a conflict (same key name). This explains the extra “_Classes” subkey within the HKEY_USERS key – it stores the per user stuff (in the file UsrClasses.dat in something like c:\Users\<username>\AppData\Local\Microsoft\Windows).

  • HKEY_CURRENT_CONFIG is a link to HKLM\SYSTEM\CurrentControlSet\Hardware\Profiles\Current

    The list of “standard” hives (the hives accessible by official Windows APIs such as RegOpenKeyEx contains some more that are not shown by Regedit. They can be viewed by TotalReg if the option “Extra Hives” is selected in the View menu. At this time, however, the tool needs to be restarted for this change to take effect (I just didn’t get around to implementing the change dynamically, as it was low on my priority list). Here are all the hives accessible with the official Windows API:
All hives

I’ll let the interested reader to dig further into these “extra” hives. On of these hives deserves special mentioning – HKEY_PERFORMANCE_DATA – it was used in the pre Windows 2000 days as a way to access Performance Counters. Registry APIs had to be used at the time. Fortunately, starting from Windows 2000, a new dedicated API is provided to access Performance Counters (functions starting with Pdh* in <pdh.h>).

Is this it? Is this the entire Registry? Not quite. As you can see in TotalReg, there is a node called “Registry”, that tells yet another story. Internally, all Registry keys are rooted in a single key called REGISTRY. This is the only named Registry key. You can see it in the root of the Object Manager’s namespace with WinObj from Sysinternals:

WinObj from Sysinternals showing the Registry key object

Here is the object details in a Local Kernel debugger:

lkd> !object \registry
Object: ffffe00c8564c860  Type: (ffff898a519922a0) Key
    ObjectHeader: ffffe00c8564c830 (new version)
    HandleCount: 1  PointerCount: 32770
    Directory Object: 00000000  Name: \REGISTRY
lkd> !trueref ffffe00c8564c860
ffffe00c8564c860: HandleCount: 1 PointerCount: 32770 RealPointerCount: 3

All other Registry keys are based off of that root key, the Configuration Manager (the kernel component in charge of the Registry) parses the remaining path as expected. This is the real Registry. The official Windows APIs cannot use this path format, but native APIs can. For example, using NtOpenKey (documented as ZwOpenKey in the Windows Driver Kit, as this is a system call) allows such access. This is how TotalReg is able to look at the real Registry.

Clearly, the normal user-mode APIs somehow map the “standard” hive path to the real Registry path. The simplest is the mapping of HKEY_LOCAL_MACHINE to \REGISTRY\MACHINE. Another simple one is HKEY_USERS mapped to \REGISTRY\USER. HKEY_CURRENT_USER is a bit more complex, and needs to be mapped to the per-user hive under \REGISTRY\USER. The most complex is our friend HKEY_CLASSES_ROOT – there is no simple mapping – the APIs have to check if there is per-user override or not, etc.

Lastly, it seems there are keys in the real Registry that cannot be reached from the standard Registry at all:

The real Registry

There is a key named “A” which seems inaccessible. This key is used for private keys in processes, very common in Universal Windows Application (UWP) processes, but can be used in other processes as well. They are not accessible generally, not even with kernel code – the Configuration Manager prevents it. You can verify their existence by searching for \Registry\A in tools like Process Explorer or TotalReg itself (by choosing Scan Key Handles from the Tools menu). Here is TotalReg, followed by Process Explorer:

TotalReg key handles
Process Explorer key handles

Finally, the WC key is used for Windows Container, internally called Silos. A container (like the ones created by Docker) is an isolated instance of a user-mode OS, kind of like a lightweight virtual machine, but the kernel is not separate (as would be with a true VM), but is provided by the host. Silos are very interesting, but outside the scope of this post.

Briefly, there are two main Silo types: An Application Silo, which is not a true container, and mostly used with application based on the Desktop Bridge technology. A classic example is WinDbg Preview. The second type is Server Silo, which is a true container. A true container must have its file system, Registry, and Object Manager namespace virtualized. This is exactly the role of the WC subkeys – provide the private Registry keys for containers. The Configuration Manager (as well as other parts of the kernel) are Silo-aware, and will redirect Registry calls to the correct subkey, having no effect on the Host Registry or the private Registry of other Silos.

You can examine some aspects of silos with the kernel debugger !silo command. Here is an example from a server 2022 running a Server Silo and the Registry keys under WC:

lkd> !silo
		Address          Type       ProcessCount Identifier
		ffff800f2986c2e0 ServerSilo 15           {1d29488c-bccd-11ec-a503-d127529101e4} (0n732)
1 active Silo(s)
lkd> !silo ffff800f2986c2e0

Silo ffff800f2986c2e0:
		Job               : ffff800f2986c2e0
		Type              : ServerSilo
		Identifier        : {1d29488c-bccd-11ec-a503-d127529101e4} (0n732)
		Processes         : 15

Server silo globals ffff800f27e65a40:
		Default Error Port: ffff800f234ee080
		ServiceSessionId  : 217
		Root Directory    : 00007ffcad26b3e1 '\Silos\732'
		State             : Running
A Server Silo’s keys

There you have it. The relatively simple-looking Registry shown in RegEdit is viewed differently by the kernel. Device driver writers find this out relatively early – they cannot use the “abstractions” provided by user mode even if these are sometimes convenient.


image-1

zodiacon

Threads, Threads, and More Threads

Looking at a typical Windows system shows thousands of threads, with process numbers in the hundreds, even though the total CPU consumption is low, meaning most of these threads are doing nothing most of the time. I typically rant about it in my Windows Internals classes. Why so many threads?

Here is a snapshot of my Task Manager showing the total number of threads and processes:

Showing processes details and sorting by thread count looks something like this:

The System process clearly has many threads. These are kernel threads created by the kernel itself and by device drivers. These threads are always running in kernel mode. For this post, I’ll disregard the System process and focus on “normal” user-mode processes.

There are other kernel processes that we should ignore, such as Registry and Memory Compression. Registry has few threads, but Memory Compression has many. It’s not shown in Task Manager (by design), but is shown in other tools, such as Process Explorer. While I’m writing this post, it has 78 threads. We should probably skip that process as well as being “out of our control”.

Notice the large number of threads in processes running the images Explorer.exe, SearchIndexer.exe, Nvidia Web helper.exe, Outlook.exe, Powerpnt.exe and MsMpEng.exe. Let’s write some code to calculate the average number of threads in a process and the standard deviation:

float ComputeStdDev(std::vector<int> const& values, float& average) {
	float total = 0;
	std::for_each(values.begin(), values.end(), 
		[&](int n) { total += n; });
	average = total / values.size();
	total = 0;
	std::for_each(values.begin(), values.end(), 
		[&](int n) { total += (n - average) * (n - average); });
	return std::sqrt(total / values.size());
}

int main() {
	auto hSnapshot = ::CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
	
	PROCESSENTRY32 pe;
	pe.dwSize = sizeof(pe);

	// skip the idle process
	::Process32First(hSnapshot, &pe);

	int processes = 0, threads = 0;
	std::vector<int> threads_per_process;
	threads_per_process.reserve(500);
	while (::Process32Next(hSnapshot, &pe)) {
		processes++;
		threads += pe.cntThreads;
		threads_per_process.push_back(pe.cntThreads);
	}
	::CloseHandle(hSnapshot);

	assert(processes == threads_per_process.size());

	printf("Process: %d Threads: %d\n", processes, threads);
	float average;
	auto sd = ComputeStdDev(threads_per_process, average);
	printf("Average threads/process: %.2f\n", average);
	printf("Std. Dev.: %.2f\n", sd);

	return 0;
}

The ComputeStdDev function computes the standard deviation and average of a vector of integers. The main function uses the ToolHelp API to enumerate processes in the system, which fortunately also provides the number of threads in each processes (stored in the threads_per_process vector. If I run this (no processes removed just yet), this is what I get:

Process: 525 Threads: 7810
Average threads/process: 14.88
Std. Dev.: 23.38

Almost 15 threads per process, with little CPU consumption in my Task Manager. The standard deviation is more telling – it’s big compared to the average, which suggests that many processes are far from the average in their thread consumption. And since a negative thread count is not possible (even zero is almost impossible), the the divergence is with higher thread numbers.

To be fair, let’s remove the System and Memory Compression processes from our calculations. Here are the changes to the while loop:

while (::Process32Next(hSnapshot, &pe)) {
	if (pe.th32ProcessID == 4 || _wcsicmp(pe.szExeFile, L"memory compression") == 0)
		continue;
//...

Here are the results:

Process: 521 Threads: 7412
Average threads/process: 14.23
Std. Dev.: 14.14

The standard deviation is definitely smaller, but still pretty big (close to the average), which does not invalidate the previous point. Some processes use lots of threads.

In an ideal world, the number of threads in a system would be the same as the number of logical processors – any more and threads might fight over processors, any less and you’re not using the full power of the machine. Obviously, each “normal” process must have at least one thread running whatever main function is available in the executable, so on my system 521 threads would be the minimum number of threads. Still – we have over 7000.

What are these threads doing, anyway? Let’s examine some processes. First, an Explorer.exe process. Here is the Threads tab shown in Process Explorer:

Thread list in Explorer.exe instance

93 threads. I’ve sorted the list by Start Address to get a sense of the common functions used. Let’s dig into some of them. One of the most common (in other processes as well) is ntdll!TppWorkerThread – this is a thread pool thread, likely waiting for work. Clicking the Stack button (or double clicking the entry in the list) shows the following call stack:

ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x500
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x233
ntoskrnl.exe!KiSchedulerApc+0x3bd
ntoskrnl.exe!KiDeliverApc+0x2e9
ntoskrnl.exe!KiSwapThread+0x827
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeRemoveQueueEx+0x263
ntoskrnl.exe!IoRemoveIoCompletion+0x98
ntoskrnl.exe!NtWaitForWorkViaWorkerFactory+0x38e
ntoskrnl.exe!KiSystemServiceCopyEnd+0x25
ntdll.dll!ZwWaitForWorkViaWorkerFactory+0x14
ntdll.dll!TppWorkerThread+0x2f7
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

The system call NtWaitForWorkViaWorkerFactory is the one waiting for work (the name Worker Factory is the internal name of the thread pool type in the kernel, officially called TpWorkerFactory). The number of such threads is typically dynamic, growing and shrinking based on the amount of work provided to the thread pool(s). The minimum and maximum threads can be tweaked by APIs, but most processes are unlikely to do so.

Another function that appears a lot in the list is shcore.dll!_WrapperThreadProc. It looks like some generic function used by Explorer for its own threads. We can examine some call stacks to get a sense of what’s going on. Here is one:

ntoskrnl.exe!KiSwapContext+0x76
ntoskrnl.exe!KiSwapThread+0x500
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x233
ntoskrnl.exe!KiSchedulerApc+0x3bd
ntoskrnl.exe!KiDeliverApc+0x2e9
ntoskrnl.exe!KiSwapThread+0x827
ntoskrnl.exe!KiCommitThreadWait+0x14f
ntoskrnl.exe!KeWaitForSingleObject+0x233
ntoskrnl.exe!KeWaitForMultipleObjects+0x45b
win32kfull.sys!xxxRealSleepThread+0x362
win32kfull.sys!xxxSleepThread2+0xb5
win32kfull.sys!xxxRealInternalGetMessage+0xcfd
win32kfull.sys!NtUserGetMessage+0x92
win32k.sys!NtUserGetMessage+0x16
ntoskrnl.exe!KiSystemServiceCopyEnd+0x25
win32u.dll!NtUserGetMessage+0x14
USER32.dll!GetMessageW+0x2e
SHELL32.dll!_LocalServerThread+0x66
shcore.dll!_WrapperThreadProc+0xe9
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

This one seems to be waiting for UI messages, probably managing some user interface (GetMessage). We can verify with other tools. Here is my own WinSpy:

Apparently, I was wrong. This thread has the hidden window type used to receive messages targeting COM objects that leave in this Single Threaded Apartment (STA).

We can inspect WinSpy some more to see the threads and windows created by Explorer. I’ll leave that to the interested reader.

Other generic call stacks start with ucrtbase.dll!thread_start+0x42. Many of them have the following call stack (kernel part trimmed for brevity):

ntdll.dll!ZwWaitForMultipleObjects+0x14
KERNELBASE.dll!WaitForMultipleObjectsEx+0xf0
KERNELBASE.dll!WaitForMultipleObjects+0xe
cdp.dll!shared::CallbackNotifierListener::ListenerInternal::StartInternal+0x9f
cdp.dll!std::thread::_Invoke<std::tuple<<lambda_10793e1829a048bb2f8cc95974633b56> >,0>+0x2f
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>+0x42
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

A function in CDP.dll is waiting for something (WaitForMultipleObjects). I count at least 12 threads doing just that. Perhaps all these waits could be consolidated to a smaller number of threads?

Let’s tackle a different process. Here is an instance of Teams.exe. My teams is minimized to the tray and I have not interacted with it for a while:

Teams threads

62 threads. Many have the same CRT wrapper for a thread created by Teams. Here are several call stacks I observed:

ntdll.dll!ZwRemoveIoCompletion+0x14
KERNELBASE.dll!GetQueuedCompletionStatus+0x4f
skypert.dll!rtnet::internal::SingleThreadIOCP::iocpLoop+0x116
skypert.dll!SplOpaqueUpperLayerThread::run+0x84
skypert.dll!auf::priv::MRMWTransport::process1+0x6c
skypert.dll!auf::ThreadPoolExecutorImp::workLoop+0x160
skypert.dll!auf::tpImpThreadTrampoline+0x47
skypert.dll!spl::threadWinDispatch+0x19
skypert.dll!spl::threadWinEntry+0x17b
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>+0x42
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21
ntdll.dll!ZwWaitForAlertByThreadId+0x14
ntdll.dll!RtlSleepConditionVariableCS+0x105
KERNELBASE.dll!SleepConditionVariableCS+0x29
Teams.exe!uv_cond_wait+0x10
Teams.exe!worker+0x8d
Teams.exe!uv__thread_start+0xa2
Teams.exe!thread_start<unsigned int (__cdecl*)(void *),1>+0x50
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

You can check more threads, but you get the idea. Most threads are waiting for something – this is not the ideal activity for a thread. A thread should run (useful) code.

Last example, Word:

57 threads. Word has been minimized for more than an hour now. The clearly common call stack looks like this:

ntdll.dll!ZwWaitForAlertByThreadId+0x14
ntdll.dll!RtlSleepConditionVariableSRW+0x131
KERNELBASE.dll!SleepConditionVariableSRW+0x29
v8jsi.dll!CrashForExceptionInNonABICompliantCodeRange+0x4092f6
v8jsi.dll!CrashForExceptionInNonABICompliantCodeRange+0x11ff2
v8jsi.dll!v8_inspector::V8StackTrace::topScriptIdAsInteger+0x43ad0
ucrtbase.dll!thread_start<unsigned int (__cdecl*)(void *),1>+0x42
KERNEL32.DLL!BaseThreadInitThunk+0x14
ntdll.dll!RtlUserThreadStart+0x21

v8jsi.dll is the React Native v8 engine – it’s creating many threads, most of which are doing nothing. I found it in Outlook and PowerPoint as well.

Many applications today depend on various libraries and frameworks, some of which don’t seem to care too much about using threads economically – examples include Node.js, the Electron framework, even Java and .NET. Threads are not free – there is the ETHREAD and related data structures in the kernel, stack in kernel space, and stack in user space. Context switches and code run by the kernel scheduler when threads change states from Running to Waiting, and from Waiting to Ready are not free, either.

Many desktop/laptop systems today are very powerful and it might seem everything is fine. I don’t think so. Developers use so many layers of abstraction these days, that we sometimes forget there are actual processors that execute the code, and need to use memory and other resources. None of that is free.

image-1

zodiacon

Registration is open for the Windows Internals training

My schedule has been a mess in recent months, and continues to be so for the next few months. However, I am opening registration today for the Windows Internals training with some date changes from my initial plan.

Here are the dates and times (all based on London time) – 5 days total:

  • July 6: 4pm to 12am (full day)
  • July 7: 4pm to 8pm
  • July 11: 4pm to 12am (full day)
  • July 12, 13, 14, 18, 19: 4pm to 8pm

Training cost is 800 USD, if paid by an individual, or 1500 USD if paid by a company. Participants from Ukraine (please provide some proof) are welcome with a 90% discount (paying 80 USD, individual payments only).

If you’d like to register, please send me an email to [email protected] with “Windows Internals training” in the title, provide your full name, company (if any), preferred contact email, and your time zone. The basic syllabus can be found here. if you’ve sent me an email before when I posted about my upcoming classes, you don’t have to do that again – I will send full details soon.

The sessions will be recorded, so can watch any part you may be missing, or that may be somewhat overwhelming in “real time”.

As usual, if you have any questions, feel free to send me an email, or DM me on twitter (@zodiacon) or Linkedin (https://www.linkedin.com/in/pavely/).

Kernel2

zodiacon

Next COM Programming Class

Update: the class is cancelled. I guess there weren’t that many people interested in COM this time around.

Today I’m opening registration for the COM Programming class to be held in April. The syllabus for the 3 day class can be found here. The course will be delivered in 6 half-days (4 hours each).

Dates: April (25, 26, 27, 28), May (2, 3).
Times: 2pm to 6pm, London time
Cost: 700 USD (if paid by an individual), 1300 USD (if paid by a company).

The class will be conducted remotely using Microsoft Teams or a similar platform.

What you need to know before the class: You should be comfortable using Windows on a Power User level. Concepts such as processes, threads, DLLs, and virtual memory should be understood fairly well. You should have experience writing code in C and some C++. You don’t have to be an expert, but you must know C and basic C++ to get the most out of this class. In case you have doubts, talk to me.

Participants in my Windows Internals and Windows System Programming classes have the required knowledge for the class.

We’ll start by looking at why COM was created in the first place, and then build clients and servers, digging into various mechanisms COM provides. See the syllabus for more details.

Previous students in my classes get 10% off. Multiple participants from the same company get a discount (email me for the details).

To register, send an email to [email protected] with the title “COM Training”, and write the name(s), email(s) and time zone(s) of the participants.

COMReuse

zodiacon

❌